Get Logical processor count correctly whether NUMA is enabled or disabled

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Get Logical processor count correctly whether NUMA is enabled or disabled

Srinivasan, Arunkumar
Dear R-devel list,

R's detectCores() function internally calls "ncpus" function to get the total number of logical processors. However, this doesnot seem to take NUMA into account on Windows machines.

On a machine having 48 processors (24 cores) in total and windows server 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA is disabled, detectCores() returns 48.

Similarly, on a machine with 88 cores (176 processors) and windows server 2012, detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA is enabled with 4 nodes (44 processors each), then detectCores() will only return 44. This is particularly limiting since we cannot get to use all processors by enabling/disabling NUMA in this case.

We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, quoting from the first link:

"On systems with more than 64 logical processors, the GetLogicalProcessorInformation function retrieves logical processor information about processors in the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx to which the calling thread is currently assigned. Use the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx function to retrieve information about processors in all processor groups on the system."

Therefore, it might be possible to get the right count of total processors even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be nice to know what you think.

Thank you very much,
Arun.

--
Arun Srinivasan
Analyst, Millennium Management LLC
50 Berkeley Street | London, W1J 8HD


######################################################################

The information contained in this communication is confi...{{dropped:30}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Get Logical processor count correctly whether NUMA is enabled or disabled

Tomas Kalibera
Dear Arun,

thank you for the report. I agree with the analysis, detectCores() will
only report logical processors in the NUMA group in which R is running.
I don't have a system to test on, could you please check these
workarounds for me on your systems?

# number of logical processors - what detectCores() should return

out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
value=TRUE))))

# number of cores - what detectCores(FALSE) should return

out <- system("wmic cpu get numberofcores", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
value=TRUE))))

# number of physical processors - as a sanity check

system("wmic computersystem get numberofprocessors")

Thanks,
Tomas

On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:

> Dear R-devel list,
>
> R's detectCores() function internally calls "ncpus" function to get the total number of logical processors. However, this doesnot seem to take NUMA into account on Windows machines.
>
> On a machine having 48 processors (24 cores) in total and windows server 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA is disabled, detectCores() returns 48.
>
> Similarly, on a machine with 88 cores (176 processors) and windows server 2012, detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA is enabled with 4 nodes (44 processors each), then detectCores() will only return 44. This is particularly limiting since we cannot get to use all processors by enabling/disabling NUMA in this case.
>
> We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, quoting from the first link:
>
> "On systems with more than 64 logical processors, the GetLogicalProcessorInformation function retrieves logical processor information about processors in the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx to which the calling thread is currently assigned. Use the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx function to retrieve information about processors in all processor groups on the system."
>
> Therefore, it might be possible to get the right count of total processors even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be nice to know what you think.
>
> Thank you very much,
> Arun.
>
> --
> Arun Srinivasan
> Analyst, Millennium Management LLC
> 50 Berkeley Street | London, W1J 8HD
>
>
> ######################################################################
>
> The information contained in this communication is con...{{dropped:11}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Get Logical processor count correctly whether NUMA is enabled or disabled

Srinivasan, Arunkumar
Dear Tomas, thank you for looking into this. Here's the output:

# number of logical processors - what detectCores() should return
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
[1] "NumberOfLogicalProcessors  \r" "22                         \r" "22                         \r"
[4] "20                         \r" "22                         \r" "\r"                          
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, value=TRUE))))
# [1] 86

[I've asked the IT team to understand why one of the values is 20 instead of 22].

# number of cores - what detectCores(FALSE) should return
out <- system("wmic cpu get numberofcores", intern=TRUE)
[1] "NumberOfCores  \r" "22             \r" "22             \r" "20             \r" "22             \r"
[6] "\r"
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, value=TRUE))))
# [1] 86

[Currently hyperthreading is disabled. So this output being identical to the previous output makes sense].

system("wmic computersystem get numberofprocessors")
NumberOfProcessors  
4

In addition, I'd also bring to your attention this documentation: https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups on processor groups which explain how one should go about running a process ro run on multiple groups (which seems to be different to NUMA). All this seems overly complicated to allow a process to use all cores by default TBH.

Here's a project on Github 'fio' where the issue of running a process on more than 1 processor group has come up -  https://github.com/axboe/fio/issues/527 and is addressed - https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h . I am not sure though if this is entirely relevant since we would be forking new processes in R instead of allowing a single process to use all cores. Apologies if this is utterly irrelevant.

Thank you,
Arun.

From: Tomas Kalibera <[hidden email]>
Sent: 21 August 2018 11:50
To: Srinivasan, Arunkumar <[hidden email]>; [hidden email]
Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled

Dear Arun,

thank you for the report. I agree with the analysis, detectCores() will only report logical processors in the NUMA group in which R is running. I don't have a system to test on, could you please check these workarounds for me on your systems?

# number of logical processors - what detectCores() should return
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, value=TRUE))))

# number of cores - what detectCores(FALSE) should return
out <- system("wmic cpu get numberofcores", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, value=TRUE))))

# number of physical processors - as a sanity check

system("wmic computersystem get numberofprocessors")

Thanks,
Tomas

On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:
Dear R-devel list,

R's detectCores() function internally calls "ncpus" function to get the total number of logical processors. However, this doesnot seem to take NUMA into account on Windows machines.

On a machine having 48 processors (24 cores) in total and windows server 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA is disabled, detectCores() returns 48.

Similarly, on a machine with 88 cores (176 processors) and windows server 2012, detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA is enabled with 4 nodes (44 processors each), then detectCores() will only return 44. This is particularly limiting since we cannot get to use all processors by enabling/disabling NUMA in this case.

We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, quoting from the first link:

"On systems with more than 64 logical processors, the GetLogicalProcessorInformation function retrieves logical processor information about processors in the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx to which the calling thread is currently assigned. Use the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx function to retrieve information about processors in all processor groups on the system."

Therefore, it might be possible to get the right count of total processors even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be nice to know what you think.

Thank you very much,
Arun.

--
Arun Srinivasan
Analyst, Millennium Management LLC
50 Berkeley Street | London, W1J 8HD


######################################################################

The information contained in this communication is confi...{{dropped:30}}

______________________________________________
mailto:[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

######################################################################

The information contained in this communication is confidential and

intended only for the individual(s) named above. If you are not a named

addressee, please notify the sender immediately and delete this email

from your system and do not disclose the email or any part of it to any

person. The views expressed in this email are the views of the author

and do not necessarily represent the views of Millennium Capital Partners

LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic

communications of MCP LLP and its affiliates, including telephone

communications, may be electronically archived and subject to review

and/or disclosure to someone other than the recipient. MCP LLP is

authorized and regulated by the Financial Conduct Authority. Millennium

Capital Partners LLP is a limited liability partnership registered in

England & Wales with number OC312897 and with its registered office at

50 Berkeley Street, London, W1J 8HD.

######################################################################
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Get Logical processor count correctly whether NUMA is enabled or disabled

Tomas Kalibera
Dear Arun,

thank you for checking the workaround scripts.

I've modified detectCores() to use GetLogicalProcessorInformationEx. It
is in revision 75198 of R-devel, could you please test it on your
machines? For a binary, you can wait until the R-devel snapshot build
gets to at least this svn revision.

Thanks for the link to the processor groups documentation. I don't have
a machine to test this on, but I would hope that snow clusters (e.g.
PSOCK) should work fine on systems with >64 logical processors as they
spawn new processes (not just threads). Note that FORK clusters are not
supported on Windows.

Thanks
Tomas

On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote:

> Dear Tomas, thank you for looking into this. Here's the output:
>
> # number of logical processors - what detectCores() should return
> out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
> [1] "NumberOfLogicalProcessors  \r" "22                         \r" "22                         \r"
> [4] "20                         \r" "22                         \r" "\r"
> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, value=TRUE))))
> # [1] 86
>
> [I've asked the IT team to understand why one of the values is 20 instead of 22].
>
> # number of cores - what detectCores(FALSE) should return
> out <- system("wmic cpu get numberofcores", intern=TRUE)
> [1] "NumberOfCores  \r" "22             \r" "22             \r" "20             \r" "22             \r"
> [6] "\r"
> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, value=TRUE))))
> # [1] 86
>
> [Currently hyperthreading is disabled. So this output being identical to the previous output makes sense].
>
> system("wmic computersystem get numberofprocessors")
> NumberOfProcessors
> 4
>
> In addition, I'd also bring to your attention this documentation: https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups on processor groups which explain how one should go about running a process ro run on multiple groups (which seems to be different to NUMA). All this seems overly complicated to allow a process to use all cores by default TBH.
>
> Here's a project on Github 'fio' where the issue of running a process on more than 1 processor group has come up -  https://github.com/axboe/fio/issues/527 and is addressed - https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h . I am not sure though if this is entirely relevant since we would be forking new processes in R instead of allowing a single process to use all cores. Apologies if this is utterly irrelevant.
>
> Thank you,
> Arun.
>
> From: Tomas Kalibera <[hidden email]>
> Sent: 21 August 2018 11:50
> To: Srinivasan, Arunkumar <[hidden email]>; [hidden email]
> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled
>
> Dear Arun,
>
> thank you for the report. I agree with the analysis, detectCores() will only report logical processors in the NUMA group in which R is running. I don't have a system to test on, could you please check these workarounds for me on your systems?
>
> # number of logical processors - what detectCores() should return
> out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, value=TRUE))))
>
> # number of cores - what detectCores(FALSE) should return
> out <- system("wmic cpu get numberofcores", intern=TRUE)
> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, value=TRUE))))
>
> # number of physical processors - as a sanity check
>
> system("wmic computersystem get numberofprocessors")
>
> Thanks,
> Tomas
>
> On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:
> Dear R-devel list,
>
> R's detectCores() function internally calls "ncpus" function to get the total number of logical processors. However, this doesnot seem to take NUMA into account on Windows machines.
>
> On a machine having 48 processors (24 cores) in total and windows server 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA is disabled, detectCores() returns 48.
>
> Similarly, on a machine with 88 cores (176 processors) and windows server 2012, detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA is enabled with 4 nodes (44 processors each), then detectCores() will only return 44. This is particularly limiting since we cannot get to use all processors by enabling/disabling NUMA in this case.
>
> We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, quoting from the first link:
>
> "On systems with more than 64 logical processors, the GetLogicalProcessorInformation function retrieves logical processor information about processors in the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx to which the calling thread is currently assigned. Use the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx function to retrieve information about processors in all processor groups on the system."
>
> Therefore, it might be possible to get the right count of total processors even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be nice to know what you think.
>
> Thank you very much,
> Arun.
>
> --
> Arun Srinivasan
> Analyst, Millennium Management LLC
> 50 Berkeley Street | London, W1J 8HD
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Get Logical processor count correctly whether NUMA is enabled or disabled

Srinivasan, Arunkumar
Dear Tomas, thank you very much. I installed r-devel r75201 and tested.

The machine with 88 cores has NUMA disabled. It therefore has 2 processor groups with 64 and 24 processors each.

require(parallel)
detectCores()
# [1] 88

This is great!

Then I went on to test with a simple 'foreach()' loop. I started with 64 processors (max limit of 1 processor group). I ran with a simple function of 0.5s sleep.

require(snow)
require(doSNOW)
require(foreach)

cl <- makeCluster(64L, "SOCK")
registerDoSNOW(cl)
system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
# user  system elapsed
# 0.06    0.00    0.64
system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
#    user  system elapsed
#    0.03    0.01    1.04
stopCluster(cl)

With a cluster of 64 processors and loop running with 64 iterations, it completed in ~.5s (0.64), and with 65 iterations, it took ~1s as expected.
 
cl <- makeCluster(65L, "SOCK")
registerDoSNOW(cl)
system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
   user  system elapsed
   0.03    0.02    0.61
system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
# Timing stopped at: 0.08 0 293
stopCluster(cl)

However, when I increased the cluster to have 65 processors, a loop with 64 iterations seem to complete as expected, but using all 65 processors to loop over 65 iterations didn't seem to complete. I stopped it after ~5mins. The same happens with the cluster started with any number between 65 and 88. It seems to me like we are still not being able to use >64 processors all at the same time even if detectCores() returns the right count now.

I'd appreciate your thoughts on this.

Best,
Arun.

-----Original Message-----
From: Tomas Kalibera <[hidden email]>
Sent: 27 August 2018 19:43
To: Srinivasan, Arunkumar <[hidden email]>; [hidden email]
Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled

Dear Arun,

thank you for checking the workaround scripts.

I've modified detectCores() to use GetLogicalProcessorInformationEx. It is in revision 75198 of R-devel, could you please test it on your machines? For a binary, you can wait until the R-devel snapshot build gets to at least this svn revision.

Thanks for the link to the processor groups documentation. I don't have a machine to test this on, but I would hope that snow clusters (e.g.
PSOCK) should work fine on systems with >64 logical processors as they spawn new processes (not just threads). Note that FORK clusters are not supported on Windows.

Thanks
Tomas

On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote:

> Dear Tomas, thank you for looking into this. Here's the output:
>
> # number of logical processors - what detectCores() should return out
> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
> [1] "NumberOfLogicalProcessors  \r" "22                         \r" "22                         \r"
> [4] "20                         \r" "22                         \r" "\r"
> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
> value=TRUE)))) # [1] 86
>
> [I've asked the IT team to understand why one of the values is 20 instead of 22].
>
> # number of cores - what detectCores(FALSE) should return out <-
> system("wmic cpu get numberofcores", intern=TRUE)
> [1] "NumberOfCores  \r" "22             \r" "22             \r" "20             \r" "22             \r"
> [6] "\r"
> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
> value=TRUE)))) # [1] 86
>
> [Currently hyperthreading is disabled. So this output being identical to the previous output makes sense].
>
> system("wmic computersystem get numberofprocessors")
> NumberOfProcessors
> 4
>
> In addition, I'd also bring to your attention this documentation: https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups on processor groups which explain how one should go about running a process ro run on multiple groups (which seems to be different to NUMA). All this seems overly complicated to allow a process to use all cores by default TBH.
>
> Here's a project on Github 'fio' where the issue of running a process on more than 1 processor group has come up -  https://github.com/axboe/fio/issues/527 and is addressed - https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h . I am not sure though if this is entirely relevant since we would be forking new processes in R instead of allowing a single process to use all cores. Apologies if this is utterly irrelevant.
>
> Thank you,
> Arun.
>
> From: Tomas Kalibera <[hidden email]>
> Sent: 21 August 2018 11:50
> To: Srinivasan, Arunkumar <[hidden email]>;
> [hidden email]
> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA
> is enabled or disabled
>
> Dear Arun,
>
> thank you for the report. I agree with the analysis, detectCores() will only report logical processors in the NUMA group in which R is running. I don't have a system to test on, could you please check these workarounds for me on your systems?
>
> # number of logical processors - what detectCores() should return out
> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
> value=TRUE))))
>
> # number of cores - what detectCores(FALSE) should return out <-
> system("wmic cpu get numberofcores", intern=TRUE)
> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
> value=TRUE))))
>
> # number of physical processors - as a sanity check
>
> system("wmic computersystem get numberofprocessors")
>
> Thanks,
> Tomas
>
> On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:
> Dear R-devel list,
>
> R's detectCores() function internally calls "ncpus" function to get the total number of logical processors. However, this doesnot seem to take NUMA into account on Windows machines.
>
> On a machine having 48 processors (24 cores) in total and windows server 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA is disabled, detectCores() returns 48.
>
> Similarly, on a machine with 88 cores (176 processors) and windows server 2012, detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA is enabled with 4 nodes (44 processors each), then detectCores() will only return 44. This is particularly limiting since we cannot get to use all processors by enabling/disabling NUMA in this case.
>
> We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, quoting from the first link:
>
> "On systems with more than 64 logical processors, the GetLogicalProcessorInformation function retrieves logical processor information about processors in the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx to which the calling thread is currently assigned. Use the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx function to retrieve information about processors in all processor groups on the system."
>
> Therefore, it might be possible to get the right count of total processors even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be nice to know what you think.
>
> Thank you very much,
> Arun.
>
> --
> Arun Srinivasan
> Analyst, Millennium Management LLC
> 50 Berkeley Street | London, W1J 8HD
>

######################################################################

The information contained in this communication is confidential and

intended only for the individual(s) named above. If you are not a named

addressee, please notify the sender immediately and delete this email

from your system and do not disclose the email or any part of it to any

person. The views expressed in this email are the views of the author

and do not necessarily represent the views of Millennium Capital Partners

LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic

communications of MCP LLP and its affiliates, including telephone

communications, may be electronically archived and subject to review

and/or disclosure to someone other than the recipient. MCP LLP is

authorized and regulated by the Financial Conduct Authority. Millennium

Capital Partners LLP is a limited liability partnership registered in

England & Wales with number OC312897 and with its registered office at

50 Berkeley Street, London, W1J 8HD.

######################################################################
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Get Logical processor count correctly whether NUMA is enabled or disabled

Tomas Kalibera
A summary for reference: the new detectCores() for Windows in R-devel
seems to be working both for logical and physical cores on systems with
 >64 logical processors  (thanks to Arun for testing!). If the feature
is important for anyone particularly using an older version of Windows
and/or on a system with >64 logical processors, it would be nice if you
could test and report any possible problem.

As I mentioned earlier, in older versions of R one can as a workaround
use "wmic" to detect the number of processors on systems with >64
logical processors (with appropriate error handling added as needed):

# detectCores()
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
value=TRUE))))

#detectCores(logical=FALSE)
out <- system("wmic cpu get numberofcores", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
value=TRUE))))

The remaining problem with running using >64 processors on Windows
turned out to be due to a bug in sockets communication, debugged and
fixed in R-devel by Luke Tierney.

Tomas

On 08/29/2018 12:42 PM, Srinivasan, Arunkumar wrote:

> Dear Tomas, thank you very much. I installed r-devel r75201 and tested.
>
> The machine with 88 cores has NUMA disabled. It therefore has 2 processor groups with 64 and 24 processors each.
>
> require(parallel)
> detectCores()
> # [1] 88
>
> This is great!
>
> Then I went on to test with a simple 'foreach()' loop. I started with 64 processors (max limit of 1 processor group). I ran with a simple function of 0.5s sleep.
>
> require(snow)
> require(doSNOW)
> require(foreach)
>
> cl <- makeCluster(64L, "SOCK")
> registerDoSNOW(cl)
> system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
> # user  system elapsed
> # 0.06    0.00    0.64
> system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
> #    user  system elapsed
> #    0.03    0.01    1.04
> stopCluster(cl)
>
> With a cluster of 64 processors and loop running with 64 iterations, it completed in ~.5s (0.64), and with 65 iterations, it took ~1s as expected.
>  
> cl <- makeCluster(65L, "SOCK")
> registerDoSNOW(cl)
> system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
>     user  system elapsed
>     0.03    0.02    0.61
> system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
> # Timing stopped at: 0.08 0 293
> stopCluster(cl)
>
> However, when I increased the cluster to have 65 processors, a loop with 64 iterations seem to complete as expected, but using all 65 processors to loop over 65 iterations didn't seem to complete. I stopped it after ~5mins. The same happens with the cluster started with any number between 65 and 88. It seems to me like we are still not being able to use >64 processors all at the same time even if detectCores() returns the right count now.
>
> I'd appreciate your thoughts on this.
>
> Best,
> Arun.
>
> -----Original Message-----
> From: Tomas Kalibera <[hidden email]>
> Sent: 27 August 2018 19:43
> To: Srinivasan, Arunkumar <[hidden email]>; [hidden email]
> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled
>
> Dear Arun,
>
> thank you for checking the workaround scripts.
>
> I've modified detectCores() to use GetLogicalProcessorInformationEx. It is in revision 75198 of R-devel, could you please test it on your machines? For a binary, you can wait until the R-devel snapshot build gets to at least this svn revision.
>
> Thanks for the link to the processor groups documentation. I don't have a machine to test this on, but I would hope that snow clusters (e.g.
> PSOCK) should work fine on systems with >64 logical processors as they spawn new processes (not just threads). Note that FORK clusters are not supported on Windows.
>
> Thanks
> Tomas
>
> On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote:
>> Dear Tomas, thank you for looking into this. Here's the output:
>>
>> # number of logical processors - what detectCores() should return out
>> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
>> [1] "NumberOfLogicalProcessors  \r" "22                         \r" "22                         \r"
>> [4] "20                         \r" "22                         \r" "\r"
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE)))) # [1] 86
>>
>> [I've asked the IT team to understand why one of the values is 20 instead of 22].
>>
>> # number of cores - what detectCores(FALSE) should return out <-
>> system("wmic cpu get numberofcores", intern=TRUE)
>> [1] "NumberOfCores  \r" "22             \r" "22             \r" "20             \r" "22             \r"
>> [6] "\r"
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE)))) # [1] 86
>>
>> [Currently hyperthreading is disabled. So this output being identical to the previous output makes sense].
>>
>> system("wmic computersystem get numberofprocessors")
>> NumberOfProcessors
>> 4
>>
>> In addition, I'd also bring to your attention this documentation: https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups on processor groups which explain how one should go about running a process ro run on multiple groups (which seems to be different to NUMA). All this seems overly complicated to allow a process to use all cores by default TBH.
>>
>> Here's a project on Github 'fio' where the issue of running a process on more than 1 processor group has come up -  https://github.com/axboe/fio/issues/527 and is addressed - https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h . I am not sure though if this is entirely relevant since we would be forking new processes in R instead of allowing a single process to use all cores. Apologies if this is utterly irrelevant.
>>
>> Thank you,
>> Arun.
>>
>> From: Tomas Kalibera <[hidden email]>
>> Sent: 21 August 2018 11:50
>> To: Srinivasan, Arunkumar <[hidden email]>;
>> [hidden email]
>> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA
>> is enabled or disabled
>>
>> Dear Arun,
>>
>> thank you for the report. I agree with the analysis, detectCores() will only report logical processors in the NUMA group in which R is running. I don't have a system to test on, could you please check these workarounds for me on your systems?
>>
>> # number of logical processors - what detectCores() should return out
>> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE))))
>>
>> # number of cores - what detectCores(FALSE) should return out <-
>> system("wmic cpu get numberofcores", intern=TRUE)
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE))))
>>
>> # number of physical processors - as a sanity check
>>
>> system("wmic computersystem get numberofprocessors")
>>
>> Thanks,
>> Tomas
>>
>> On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:
>> Dear R-devel list,
>>
>> R's detectCores() function internally calls "ncpus" function to get the total number of logical processors. However, this doesnot seem to take NUMA into account on Windows machines.
>>
>> On a machine having 48 processors (24 cores) in total and windows server 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA is disabled, detectCores() returns 48.
>>
>> Similarly, on a machine with 88 cores (176 processors) and windows server 2012, detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA is enabled with 4 nodes (44 processors each), then detectCores() will only return 44. This is particularly limiting since we cannot get to use all processors by enabling/disabling NUMA in this case.
>>
>> We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, quoting from the first link:
>>
>> "On systems with more than 64 logical processors, the GetLogicalProcessorInformation function retrieves logical processor information about processors in the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx to which the calling thread is currently assigned. Use the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx function to retrieve information about processors in all processor groups on the system."
>>
>> Therefore, it might be possible to get the right count of total processors even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be nice to know what you think.
>>
>> Thank you very much,
>> Arun.
>>
>> --
>> Arun Srinivasan
>> Analyst, Millennium Management LLC
>> 50 Berkeley Street | London, W1J 8HD
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Get Logical processor count correctly whether NUMA is enabled or disabled

Srinivasan, Arunkumar
Tomas, Luke, thank you very much once again for patching both issues swiftly. This’ll be incredibly valuable to us once we move to 3.6.0.

From: Tomas Kalibera <[hidden email]>
Sent: 03 September 2018 13:07
To: [hidden email]
Cc: Srinivasan, Arunkumar <[hidden email]>
Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled

A summary for reference: the new detectCores() for Windows in R-devel seems to be working both for logical and physical cores on systems with >64 logical processors  (thanks to Arun for testing!). If the feature is important for anyone particularly using an older version of Windows and/or on a system with >64 logical processors, it would be nice if you could test and report any possible problem.

As I mentioned earlier, in older versions of R one can as a workaround use "wmic" to detect the number of processors on systems with >64 logical processors (with appropriate error handling added as needed):

# detectCores()
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1<file://1>", grep("[0-9]+[ \t]*", out, value=TRUE))))

#detectCores(logical=FALSE)
out <- system("wmic cpu get numberofcores", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1<file://1>", grep("[0-9]+[ \t]*", out, value=TRUE))))

The remaining problem with running using >64 processors on Windows turned out to be due to a bug in sockets communication, debugged and fixed in R-devel by Luke Tierney.

Tomas

On 08/29/2018 12:42 PM, Srinivasan, Arunkumar wrote:

Dear Tomas, thank you very much. I installed r-devel r75201 and tested.



The machine with 88 cores has NUMA disabled. It therefore has 2 processor groups with 64 and 24 processors each.



require(parallel)

detectCores()

# [1] 88



This is great!



Then I went on to test with a simple 'foreach()' loop. I started with 64 processors (max limit of 1 processor group). I ran with a simple function of 0.5s sleep.



require(snow)

require(doSNOW)

require(foreach)



cl <- makeCluster(64L, "SOCK")

registerDoSNOW(cl)

system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))

# user  system elapsed

# 0.06    0.00    0.64

system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))

#    user  system elapsed

#    0.03    0.01    1.04

stopCluster(cl)



With a cluster of 64 processors and loop running with 64 iterations, it completed in ~.5s (0.64), and with 65 iterations, it took ~1s as expected.



cl <- makeCluster(65L, "SOCK")

registerDoSNOW(cl)

system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))

   user  system elapsed

   0.03    0.02    0.61

system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))

# Timing stopped at: 0.08 0 293

stopCluster(cl)



However, when I increased the cluster to have 65 processors, a loop with 64 iterations seem to complete as expected, but using all 65 processors to loop over 65 iterations didn't seem to complete. I stopped it after ~5mins. The same happens with the cluster started with any number between 65 and 88. It seems to me like we are still not being able to use >64 processors all at the same time even if detectCores() returns the right count now.



I'd appreciate your thoughts on this.



Best,

Arun.



-----Original Message-----

From: Tomas Kalibera <[hidden email]><mailto:[hidden email]>

Sent: 27 August 2018 19:43

To: Srinivasan, Arunkumar <[hidden email]><mailto:[hidden email]>; [hidden email]<mailto:[hidden email]>

Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled



Dear Arun,



thank you for checking the workaround scripts.



I've modified detectCores() to use GetLogicalProcessorInformationEx. It is in revision 75198 of R-devel, could you please test it on your machines? For a binary, you can wait until the R-devel snapshot build gets to at least this svn revision.



Thanks for the link to the processor groups documentation. I don't have a machine to test this on, but I would hope that snow clusters (e.g.

PSOCK) should work fine on systems with >64 logical processors as they spawn new processes (not just threads). Note that FORK clusters are not supported on Windows.



Thanks

Tomas



On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote:

Dear Tomas, thank you for looking into this. Here's the output:



# number of logical processors - what detectCores() should return out

<- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)

[1] "NumberOfLogicalProcessors  \r" "22                         \r" "22                         \r"

[4] "20                         \r" "22                         \r" "\r"

sum(as.numeric(gsub("([0-9]+).*", "\\1<file://1>", grep("[0-9]+[ \t]*", out,

value=TRUE)))) # [1] 86



[I've asked the IT team to understand why one of the values is 20 instead of 22].



# number of cores - what detectCores(FALSE) should return out <-

system("wmic cpu get numberofcores", intern=TRUE)

[1] "NumberOfCores  \r" "22             \r" "22             \r" "20             \r" "22             \r"

[6] "\r"

sum(as.numeric(gsub("([0-9]+).*", "\\1<file://1>", grep("[0-9]+[ \t]*", out,

value=TRUE)))) # [1] 86



[Currently hyperthreading is disabled. So this output being identical to the previous output makes sense].



system("wmic computersystem get numberofprocessors")

NumberOfProcessors

4



In addition, I'd also bring to your attention this documentation: https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups on processor groups which explain how one should go about running a process ro run on multiple groups (which seems to be different to NUMA). All this seems overly complicated to allow a process to use all cores by default TBH.



Here's a project on Github 'fio' where the issue of running a process on more than 1 processor group has come up -  https://github.com/axboe/fio/issues/527 and is addressed - https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h . I am not sure though if this is entirely relevant since we would be forking new processes in R instead of allowing a single process to use all cores. Apologies if this is utterly irrelevant.



Thank you,

Arun.



From: Tomas Kalibera <[hidden email]><mailto:[hidden email]>

Sent: 21 August 2018 11:50

To: Srinivasan, Arunkumar <[hidden email]><mailto:[hidden email]>;

[hidden email]<mailto:[hidden email]>

Subject: Re: [Rd] Get Logical processor count correctly whether NUMA

is enabled or disabled



Dear Arun,



thank you for the report. I agree with the analysis, detectCores() will only report logical processors in the NUMA group in which R is running. I don't have a system to test on, could you please check these workarounds for me on your systems?



# number of logical processors - what detectCores() should return out

<- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)

sum(as.numeric(gsub("([0-9]+).*", "\\1<file://1>", grep("[0-9]+[ \t]*", out,

value=TRUE))))



# number of cores - what detectCores(FALSE) should return out <-

system("wmic cpu get numberofcores", intern=TRUE)

sum(as.numeric(gsub("([0-9]+).*", "\\1<file://1>", grep("[0-9]+[ \t]*", out,

value=TRUE))))



# number of physical processors - as a sanity check



system("wmic computersystem get numberofprocessors")



Thanks,

Tomas



On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:

Dear R-devel list,



R's detectCores() function internally calls "ncpus" function to get the total number of logical processors. However, this doesnot seem to take NUMA into account on Windows machines.



On a machine having 48 processors (24 cores) in total and windows server 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA is disabled, detectCores() returns 48.



Similarly, on a machine with 88 cores (176 processors) and windows server 2012, detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA is enabled with 4 nodes (44 processors each), then detectCores() will only return 44. This is particularly limiting since we cannot get to use all processors by enabling/disabling NUMA in this case.



We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, quoting from the first link:



"On systems with more than 64 logical processors, the GetLogicalProcessorInformation function retrieves logical processor information about processors in the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx to which the calling thread is currently assigned. Use the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx function to retrieve information about processors in all processor groups on the system."



Therefore, it might be possible to get the right count of total processors even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be nice to know what you think.



Thank you very much,

Arun.



--

Arun Srinivasan

Analyst, Millennium Management LLC

50 Berkeley Street | London, W1J 8HD




######################################################################

The information contained in this communication is confidential and

intended only for the individual(s) named above. If you are not a named

addressee, please notify the sender immediately and delete this email

from your system and do not disclose the email or any part of it to any

person. The views expressed in this email are the views of the author

and do not necessarily represent the views of Millennium Capital Partners

LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic

communications of MCP LLP and its affiliates, including telephone

communications, may be electronically archived and subject to review

and/or disclosure to someone other than the recipient. MCP LLP is

authorized and regulated by the Financial Conduct Authority. Millennium

Capital Partners LLP is a limited liability partnership registered in

England & Wales with number OC312897 and with its registered office at

50 Berkeley Street, London, W1J 8HD.

######################################################################

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel