parallel PSOCK connection latency is greater on Linux?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

parallel PSOCK connection latency is greater on Linux?

Jeff-2
I'm exploring latency overhead of parallel PSOCK workers and noticed
that serializing/unserializing data back to the main R session is
significantly slower on Linux than it is on Windows/MacOS with similar
hardware. Is there a reason for this difference and is there a way to
avoid the apparent additional Linux overhead?

I attempted to isolate the behavior with a test that simply returns an
existing object from the worker back to the main R session.

library(parallel)
library(microbenchmark)
gcinfo(TRUE)
cl <- makeCluster(1)
(x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
plot(x$time, ylab = "microseconds")
head(x$time, n = 10)

On Windows/MacOS, the test runs in 300-500 microseconds depending on
hardware. A few of the 1000 runs are an order of magnitude slower but
this can probably be attributed to garbage collection on the worker.

On Linux, the first 5 or so executions run at comparable speeds but all
subsequent executions are two orders of magnitude slower (~40
milliseconds).

I see this behavior across various platforms and hardware combinations:

Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
Linux Mint 19.3 (AMD Ryzen 7 1800X)
Linux Mint 20 (AMD Ryzen 7 3700X)
Windows 10 (AMD Ryzen 7 4800H)
MacOS 10.15.7 (Intel Core i7-8850H)

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: parallel PSOCK connection latency is greater on Linux?

Simon Urbanek
It looks like R sockets on Linux could do with TCP_NODELAY -- without (status quo):

Unit: microseconds
                   expr      min       lq     mean  median       uq      max
 clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83
 neval
  1000

exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect():

Unit: microseconds
                   expr     min     lq     mean  median      uq      max neval
 clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234  1000

Cheers,
Simon


> On 2/11/2020, at 3:39 AM, Jeff <[hidden email]> wrote:
>
> I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead?
>
> I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session.
>
> library(parallel)
> library(microbenchmark)
> gcinfo(TRUE)
> cl <- makeCluster(1)
> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
> plot(x$time, ylab = "microseconds")
> head(x$time, n = 10)
>
> On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker.
>
> On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds).
>
> I see this behavior across various platforms and hardware combinations:
>
> Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
> Linux Mint 19.3 (AMD Ryzen 7 1800X)
> Linux Mint 20 (AMD Ryzen 7 3700X)
> Windows 10 (AMD Ryzen 7 4800H)
> MacOS 10.15.7 (Intel Core i7-8850H)
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: parallel PSOCK connection latency is greater on Linux?

Iñaki Ucar
On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <[hidden email]> wrote:
>
> It looks like R sockets on Linux could do with TCP_NODELAY -- without (status quo):

How many network packets are generated with and without it? If there
are many small writes and thus setting TCP_NODELAY causes many small
packets to be sent, it might make more sense to set TCP_QUICKACK
instead.

Iñaki

> Unit: microseconds
>                    expr      min       lq     mean  median       uq      max
>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83
>  neval
>   1000
>
> exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect():
>
> Unit: microseconds
>                    expr     min     lq     mean  median      uq      max neval
>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234  1000
>
> Cheers,
> Simon
>
>
> > On 2/11/2020, at 3:39 AM, Jeff <[hidden email]> wrote:
> >
> > I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead?
> >
> > I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session.
> >
> > library(parallel)
> > library(microbenchmark)
> > gcinfo(TRUE)
> > cl <- makeCluster(1)
> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
> > plot(x$time, ylab = "microseconds")
> > head(x$time, n = 10)
> >
> > On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker.
> >
> > On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds).
> >
> > I see this behavior across various platforms and hardware combinations:
> >
> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
> > Linux Mint 19.3 (AMD Ryzen 7 1800X)
> > Linux Mint 20 (AMD Ryzen 7 3700X)
> > Windows 10 (AMD Ryzen 7 4800H)
> > MacOS 10.15.7 (Intel Core i7-8850H)
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
Iñaki Úcar

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: parallel PSOCK connection latency is greater on Linux?

Jeff-2
Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that
they might determine what is best for their potentially latency- or
throughput-sensitive application?

Best,
Jeff

On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar <[hidden email]>
wrote:

> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek
> <[hidden email]> wrote:
>>
>>  It looks like R sockets on Linux could do with TCP_NODELAY --
>> without (status quo):
>
> How many network packets are generated with and without it? If there
> are many small writes and thus setting TCP_NODELAY causes many small
> packets to be sent, it might make more sense to set TCP_QUICKACK
> instead.
>
> Iñaki
>
>>  Unit: microseconds
>>                     expr      min       lq     mean  median       uq
>>      max
>>   clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91
>> 48027.83
>>   neval
>>    1000
>>
>>  exactly the same machine + R but with TCP_NODELAY enabled in
>> R_SockConnect():
>>
>>  Unit: microseconds
>>                     expr     min     lq     mean  median      uq    
>>  max neval
>>   clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298
>> 5322.234  1000
>>
>>  Cheers,
>>  Simon
>>
>>
>>  > On 2/11/2020, at 3:39 AM, Jeff <[hidden email]> wrote:
>>  >
>>  > I'm exploring latency overhead of parallel PSOCK workers and
>> noticed that serializing/unserializing data back to the main R
>> session is significantly slower on Linux than it is on Windows/MacOS
>> with similar hardware. Is there a reason for this difference and is
>> there a way to avoid the apparent additional Linux overhead?
>>  >
>>  > I attempted to isolate the behavior with a test that simply
>> returns an existing object from the worker back to the main R
>> session.
>>  >
>>  > library(parallel)
>>  > library(microbenchmark)
>>  > gcinfo(TRUE)
>>  > cl <- makeCluster(1)
>>  > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit =
>> "us"))
>>  > plot(x$time, ylab = "microseconds")
>>  > head(x$time, n = 10)
>>  >
>>  > On Windows/MacOS, the test runs in 300-500 microseconds depending
>> on hardware. A few of the 1000 runs are an order of magnitude slower
>> but this can probably be attributed to garbage collection on the
>> worker.
>>  >
>>  > On Linux, the first 5 or so executions run at comparable speeds
>> but all subsequent executions are two orders of magnitude slower
>> (~40 milliseconds).
>>  >
>>  > I see this behavior across various platforms and hardware
>> combinations:
>>  >
>>  > Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
>>  > Linux Mint 19.3 (AMD Ryzen 7 1800X)
>>  > Linux Mint 20 (AMD Ryzen 7 3700X)
>>  > Windows 10 (AMD Ryzen 7 4800H)
>>  > MacOS 10.15.7 (Intel Core i7-8850H)
>>  >
>>  > ______________________________________________
>>  > [hidden email] mailing list
>>  > https://stat.ethz.ch/mailman/listinfo/r-devel
>>  >
>>
>>  ______________________________________________
>>  [hidden email] mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Iñaki Úcar

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: parallel PSOCK connection latency is greater on Linux?

Iñaki Ucar
On Mon, 2 Nov 2020 at 14:29, Jeff <[hidden email]> wrote:
>
> Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that
> they might determine what is best for their potentially latency- or
> throughput-sensitive application?

I think it makes sense (with a sensible default). E.g., Julia does this [1-2].

[1] https://docs.julialang.org/en/v1/stdlib/Sockets/#Sockets.nagle
[2] https://docs.julialang.org/en/v1/stdlib/Sockets/#Sockets.quickack

--
Iñaki Úcar

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: parallel PSOCK connection latency is greater on Linux?

Simon Urbanek
In reply to this post by Jeff-2
I'm not sure the user would know ;). This is very system-specific issue just because the Linux network stack behaves so differently from other OSes (for purely historical reasons). That makes it hard to abstract as a "feature" for the R sockets that are supposed to be platform-independent. At least TCP_NODELAY is actually part of POSIX so it is on better footing, and disabling delayed ACK is practically only useful to work around the other side having Nagle on, so I would expect it to be rarely used.

This is essentially RFC since we don't have a mechanism for socket options (well, almost, there is timeout and blocking already...) and I don't think we want to expose low-level details so perhaps one idea would be to add something like delay=NA to socketConnection() in order to not touch (NA), enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any other way we could infer the intention of the user to try to choose the right approach...

Cheers,
Simon


> On Nov 3, 2020, at 02:28, Jeff <[hidden email]> wrote:
>
> Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they might determine what is best for their potentially latency- or throughput-sensitive application?
>
> Best,
> Jeff
>
> On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar <[hidden email]> wrote:
>> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <[hidden email]> wrote:
>>> It looks like R sockets on Linux could do with TCP_NODELAY -- without (status quo):
>> How many network packets are generated with and without it? If there
>> are many small writes and thus setting TCP_NODELAY causes many small
>> packets to be sent, it might make more sense to set TCP_QUICKACK
>> instead.
>> Iñaki
>>> Unit: microseconds
>>>                    expr      min       lq     mean  median       uq      max
>>>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83
>>>  neval
>>>   1000
>>> exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect():
>>> Unit: microseconds
>>>                    expr     min     lq     mean  median      uq      max neval
>>>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234  1000
>>> Cheers,
>>> Simon
>>> > On 2/11/2020, at 3:39 AM, Jeff <[hidden email]> wrote:
>>> >
>>> > I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead?
>>> >
>>> > I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session.
>>> >
>>> > library(parallel)
>>> > library(microbenchmark)
>>> > gcinfo(TRUE)
>>> > cl <- makeCluster(1)
>>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
>>> > plot(x$time, ylab = "microseconds")
>>> > head(x$time, n = 10)
>>> >
>>> > On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker.
>>> >
>>> > On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds).
>>> >
>>> > I see this behavior across various platforms and hardware combinations:
>>> >
>>> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
>>> > Linux Mint 19.3 (AMD Ryzen 7 1800X)
>>> > Linux Mint 20 (AMD Ryzen 7 3700X)
>>> > Windows 10 (AMD Ryzen 7 4800H)
>>> > MacOS 10.15.7 (Intel Core i7-8850H)
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> --
>> Iñaki Úcar
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: parallel PSOCK connection latency is greater on Linux?

Iñaki Ucar
Please, check a tcpdump session on localhost while running the following script:

library(parallel)
library(tictoc)
cl <- makeCluster(1)
Sys.sleep(1)

for (i in 1:10) {
  tic()
  x <- clusterEvalQ(cl, iris)
  toc()
}

The initialization phase comprises 7 packets. Then, the 1-second sleep
will help you see where the evaluation starts. Each clusterEvalQ
generates 6 packets:

1. main -> worker PSH, ACK 1026 bytes
2. worker -> main ACK 66 bytes
3. worker -> main PSH, ACK 3758 bytes
4. main -> worker ACK 66 bytes
5. worker -> main PSH, ACK 2484 bytes
6. main -> worker ACK 66 bytes

The first two are the command and its ACK, the following are the data
back and their ACKs. In the first 4-5 iterations, I see no delay at
all. Then, in the following iterations, a 40 ms delay starts to happen
between packets 3 and 4, that is: the main process delays the ACK to
the first packet of the incoming result.

So I'd say Nagle is hardly to blame for this. It would be interesting
to see how many packets are generated with TCP_NODELAY on. If there
are still 6 packets, then we are fine. If we suddenly see a gazillion
packets, then TCP_NODELAY does more harm than good. On the other hand,
TCP_QUICKACK would surely solve the issue without any drawback. As
Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
that makes things worse, let me know."

Iñaki

On Wed, 4 Nov 2020 at 04:34, Simon Urbanek <[hidden email]> wrote:

>
> I'm not sure the user would know ;). This is very system-specific issue just because the Linux network stack behaves so differently from other OSes (for purely historical reasons). That makes it hard to abstract as a "feature" for the R sockets that are supposed to be platform-independent. At least TCP_NODELAY is actually part of POSIX so it is on better footing, and disabling delayed ACK is practically only useful to work around the other side having Nagle on, so I would expect it to be rarely used.
>
> This is essentially RFC since we don't have a mechanism for socket options (well, almost, there is timeout and blocking already...) and I don't think we want to expose low-level details so perhaps one idea would be to add something like delay=NA to socketConnection() in order to not touch (NA), enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any other way we could infer the intention of the user to try to choose the right approach...
>
> Cheers,
> Simon
>
>
> > On Nov 3, 2020, at 02:28, Jeff <[hidden email]> wrote:
> >
> > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they might determine what is best for their potentially latency- or throughput-sensitive application?
> >
> > Best,
> > Jeff
> >
> > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar <[hidden email]> wrote:
> >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <[hidden email]> wrote:
> >>> It looks like R sockets on Linux could do with TCP_NODELAY -- without (status quo):
> >> How many network packets are generated with and without it? If there
> >> are many small writes and thus setting TCP_NODELAY causes many small
> >> packets to be sent, it might make more sense to set TCP_QUICKACK
> >> instead.
> >> Iñaki
> >>> Unit: microseconds
> >>>                    expr      min       lq     mean  median       uq      max
> >>>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83
> >>>  neval
> >>>   1000
> >>> exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect():
> >>> Unit: microseconds
> >>>                    expr     min     lq     mean  median      uq      max neval
> >>>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234  1000
> >>> Cheers,
> >>> Simon
> >>> > On 2/11/2020, at 3:39 AM, Jeff <[hidden email]> wrote:
> >>> >
> >>> > I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead?
> >>> >
> >>> > I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session.
> >>> >
> >>> > library(parallel)
> >>> > library(microbenchmark)
> >>> > gcinfo(TRUE)
> >>> > cl <- makeCluster(1)
> >>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
> >>> > plot(x$time, ylab = "microseconds")
> >>> > head(x$time, n = 10)
> >>> >
> >>> > On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker.
> >>> >
> >>> > On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds).
> >>> >
> >>> > I see this behavior across various platforms and hardware combinations:
> >>> >
> >>> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
> >>> > Linux Mint 19.3 (AMD Ryzen 7 1800X)
> >>> > Linux Mint 20 (AMD Ryzen 7 3700X)
> >>> > Windows 10 (AMD Ryzen 7 4800H)
> >>> > MacOS 10.15.7 (Intel Core i7-8850H)
> >>> >
> >>> > ______________________________________________
> >>> > [hidden email] mailing list
> >>> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >>> >
> >>> ______________________________________________
> >>> [hidden email] mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> --
> >> Iñaki Úcar
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>


--
Iñaki Úcar

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: parallel PSOCK connection latency is greater on Linux?

Jeff-2
I do enjoy free lunch solutions if they exist.

That said, I think the abstraction proposed by Simon is reasonable.
Whether it should be applied to TCP_NODELAY or TCP_QUICKACK is
unfortunately beyond my Linux/networking knowledge.

Jeff Keller

On Wed, Nov 4, 2020 at 11:41, I�aki Ucar <[hidden email]>
wrote:

> Please, check a tcpdump session on localhost while running the
> following script:
>
> library(parallel)
> library(tictoc)
> cl <- makeCluster(1)
> Sys.sleep(1)
>
> for (i in 1:10) {
>   tic()
>   x <- clusterEvalQ(cl, iris)
>   toc()
> }
>
> The initialization phase comprises 7 packets. Then, the 1-second sleep
> will help you see where the evaluation starts. Each clusterEvalQ
> generates 6 packets:
>
> 1. main -> worker PSH, ACK 1026 bytes
> 2. worker -> main ACK 66 bytes
> 3. worker -> main PSH, ACK 3758 bytes
> 4. main -> worker ACK 66 bytes
> 5. worker -> main PSH, ACK 2484 bytes
> 6. main -> worker ACK 66 bytes
>
> The first two are the command and its ACK, the following are the data
> back and their ACKs. In the first 4-5 iterations, I see no delay at
> all. Then, in the following iterations, a 40 ms delay starts to happen
> between packets 3 and 4, that is: the main process delays the ACK to
> the first packet of the incoming result.
>
> So I'd say Nagle is hardly to blame for this. It would be interesting
> to see how many packets are generated with TCP_NODELAY on. If there
> are still 6 packets, then we are fine. If we suddenly see a gazillion
> packets, then TCP_NODELAY does more harm than good. On the other hand,
> TCP_QUICKACK would surely solve the issue without any drawback. As
> Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
> that makes things worse, let me know."
>
> I�aki
>
> On Wed, 4 Nov 2020 at 04:34, Simon Urbanek
> <[hidden email] <mailto:[hidden email]>>
> wrote:
>>
>>  I'm not sure the user would know ;). This is very system-specific
>> issue just because the Linux network stack behaves so differently
>> from other OSes (for purely historical reasons). That makes it hard
>> to abstract as a "feature" for the R sockets that are supposed to be
>> platform-independent. At least TCP_NODELAY is actually part of POSIX
>> so it is on better footing, and disabling delayed ACK is practically
>> only useful to work around the other side having Nagle on, so I
>> would expect it to be rarely used.
>>
>>  This is essentially RFC since we don't have a mechanism for socket
>> options (well, almost, there is timeout and blocking already...) and
>> I don't think we want to expose low-level details so perhaps one
>> idea would be to add something like delay=NA to socketConnection()
>> in order to not touch (NA), enable (TRUE) or disable (FALSE)
>> TCP_NODELAY. I wonder if there is any other way we could infer the
>> intention of the user to try to choose the right approach...
>>
>>  Cheers,
>>  Simon
>>
>>
>>  > On Nov 3, 2020, at 02:28, Jeff <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>  >
>>  > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so
>> that they might determine what is best for their potentially
>> latency- or throughput-sensitive application?
>>  >
>>  > Best,
>>  > Jeff
>>  >
>>  > On Mon, Nov 2, 2020 at 14:05, I�aki Ucar
>> <[hidden email] <mailto:[hidden email]>> wrote:
>>  >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek
>> <[hidden email] <mailto:[hidden email]>>
>> wrote:
>>  >>> It looks like R sockets on Linux could do with TCP_NODELAY --
>> without (status quo):
>>  >> How many network packets are generated with and without it? If
>> there
>>  >> are many small writes and thus setting TCP_NODELAY causes many
>> small
>>  >> packets to be sent, it might make more sense to set TCP_QUICKACK
>>  >> instead.
>>  >> I�aki
>>  >>> Unit: microseconds
>>  >>>                    expr      min       lq     mean  median      
>>  uq      max
>>  >>>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1
>> 44001.91 48027.83
>>  >>>  neval
>>  >>>   1000
>>  >>> exactly the same machine + R but with TCP_NODELAY enabled in
>> R_SockConnect():
>>  >>> Unit: microseconds
>>  >>>                    expr     min     lq     mean  median      uq
>>      max neval
>>  >>>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298
>> 5322.234  1000
>>  >>> Cheers,
>>  >>> Simon
>>  >>> > On 2/11/2020, at 3:39 AM, Jeff <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>  >>> >
>>  >>> > I'm exploring latency overhead of parallel PSOCK workers and
>> noticed that serializing/unserializing data back to the main R
>> session is significantly slower on Linux than it is on Windows/MacOS
>> with similar hardware. Is there a reason for this difference and is
>> there a way to avoid the apparent additional Linux overhead?
>>  >>> >
>>  >>> > I attempted to isolate the behavior with a test that simply
>> returns an existing object from the worker back to the main R
>> session.
>>  >>> >
>>  >>> > library(parallel)
>>  >>> > library(microbenchmark)
>>  >>> > gcinfo(TRUE)
>>  >>> > cl <- makeCluster(1)
>>  >>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000,
>> unit = "us"))
>>  >>> > plot(x$time, ylab = "microseconds")
>>  >>> > head(x$time, n = 10)
>>  >>> >
>>  >>> > On Windows/MacOS, the test runs in 300-500 microseconds
>> depending on hardware. A few of the 1000 runs are an order of
>> magnitude slower but this can probably be attributed to garbage
>> collection on the worker.
>>  >>> >
>>  >>> > On Linux, the first 5 or so executions run at comparable
>> speeds but all subsequent executions are two orders of magnitude
>> slower (~40 milliseconds).
>>  >>> >
>>  >>> > I see this behavior across various platforms and hardware
>> combinations:
>>  >>> >
>>  >>> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
>>  >>> > Linux Mint 19.3 (AMD Ryzen 7 1800X)
>>  >>> > Linux Mint 20 (AMD Ryzen 7 3700X)
>>  >>> > Windows 10 (AMD Ryzen 7 4800H)
>>  >>> > MacOS 10.15.7 (Intel Core i7-8850H)
>>  >>> >
>>  >>> > ______________________________________________
>>  >>> > [hidden email] <mailto:[hidden email]> mailing
>> list
>>  >>> > <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>  >>> >
>>  >>> ______________________________________________
>>  >>> [hidden email] <mailto:[hidden email]> mailing
>> list
>>  >>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>  >> --
>>  >> I�aki �car
>>  >
>>  > ______________________________________________
>>  > [hidden email] <mailto:[hidden email]> mailing list
>>  > <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>  >
>>
>
>
> --
> I�aki �car

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel