Memory error in the libcurl connection code

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory error in the libcurl connection code

Gábor Csárdi
Hi All,

I think there is a memory error in the libcurl connection code that
typically happens when libcurl reads big chunks of data. This
potentially affects all code that use url() with the libcurl download
method, which is the default in most builds. In practice it tends to
happen more with HTTP/2 and if the connection is wrapped into a
gzcon(). macOS Catalina has a libcurl build with HTTP/2 error, so many
users that upgraded macOS are starting to see this.

The workaround is to avoid using url(), if you can. If you need an
HTTP stream, you can use curl::curl(), which is a drop-in replacement.

To reproduce, the easiest is a libcurl build that has HTTP/2 support
and a server with HTTP/2 as well, e.g. the cloud mirror:

------------------------------------------------
~ # R --slave -e 'options(internet.info = 0); foo <-
readRDS(gzcon(url("https://cran.rstudio.com/src/contrib/Meta/archive.rds")))'
*   Trying 13.33.54.118:443...
* TCP_NODELAY set
* Connected to cran.rstudio.com (13.33.54.118) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=cran.rstudio.com
*  start date: Jul 24 00:00:00 2019 GMT
*  expire date: Aug 24 12:00:00 2020 GMT
*  subjectAltName: host "cran.rstudio.com" matched cert's "cran.rstudio.com"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x56303c2910e0)
> GET /src/contrib/Meta/archive.rds HTTP/2
Host: cran.rstudio.com
User-Agent: R (3.4.4 x86_64-pc-linux-gnu x86_64 linux-gnu)
Accept: */*

* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200
< content-length: 2483432
< date: Wed, 22 Jan 2020 21:22:04 GMT
< server: Apache/2.4.39 (Unix)
< last-modified: Wed, 22 Jan 2020 17:10:22 GMT
< etag: "25e4e8-59cbd998a0360"
< accept-ranges: bytes
< cache-control: max-age=1800
< expires: Wed, 22 Jan 2020 21:52:04 GMT
< x-cache: Hit from cloudfront
< via: 1.1 6cbe48f9f9ff0c768f29d83804f75d4c.cloudfront.net (CloudFront)
< x-amz-cf-pop: MAN50-C1
< x-amz-cf-id: WwCQVQz9g8ZP6Az4m4n__h7aUW6vwlg0-AkiCv_DnVfGe10bzaFtfg==
< age: 960
<
* 85 data bytes written
Error in readRDS(gzcon(url("https://cran.rstudio.com/src/contrib/Meta/archive.rds")))
:
  reference index out of range
* stopped the pause stream!
* Connection #0 to host cran.rstudio.com left intact
Execution halted
------------------------------------------------

Sometimes you get a crash, sometimes a corrupt stream, etc. Sometimes
is actually works.

It seems that the fix is simply this:

------------------------------------
--- src/modules/internet/libcurl.c~
+++ src/modules/internet/libcurl.c
@@ -762,6 +762,7 @@
      void *newbuf = realloc(ctxt->buf, newbufsize);
      if (!newbuf) error("Failure in re-allocation in rcvData");
      ctxt->buf = newbuf; ctxt->bufsize = newbufsize;
+    ctxt->current = ctxt->buf;
  }

  memcpy(ctxt->buf + ctxt->filled, ptr, add);
------------------------------------

Best,
Gabor

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Memory error in the libcurl connection code

Martin Maechler
>>>>> Gábor Csárdi
>>>>>     on Wed, 22 Jan 2020 22:56:17 +0000 writes:

    > Hi All,
    > I think there is a memory error in the libcurl connection code that
    > typically happens when libcurl reads big chunks of data. This
    > potentially affects all code that use url() with the libcurl download
    > method, which is the default in most builds. In practice it tends to
    > happen more with HTTP/2 and if the connection is wrapped into a
    > gzcon(). macOS Catalina has a libcurl build with HTTP/2 error, so many
    > users that upgraded macOS are starting to see this.

    > The workaround is to avoid using url(), if you can. If you need an
    > HTTP stream, you can use curl::curl(), which is a drop-in replacement.

    > To reproduce, the easiest is a libcurl build that has HTTP/2 support
    > and a server with HTTP/2 as well, e.g. the cloud mirror:

    > ------------------------------------------------
    > ~ # R --slave -e 'options(internet.info = 0); foo <-
    > readRDS(gzcon(url("https://cran.rstudio.com/src/contrib/Meta/archive.rds")))'
    > *   Trying 13.33.54.118:443...
    > * TCP_NODELAY set
    > * Connected to cran.rstudio.com (13.33.54.118) port 443 (#0)
    > * ALPN, offering h2
    > * ALPN, offering http/1.1
    > * successfully set certificate verify locations:
    > *   CAfile: /etc/ssl/certs/ca-certificates.crt
    > CApath: none
    > * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
    > * ALPN, server accepted to use h2
    > * Server certificate:
    > *  subject: CN=cran.rstudio.com
    > *  start date: Jul 24 00:00:00 2019 GMT
    > *  expire date: Aug 24 12:00:00 2020 GMT
    > *  subjectAltName: host "cran.rstudio.com" matched cert's "cran.rstudio.com"
    > *  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
    > *  SSL certificate verify ok.
    > * Using HTTP2, server supports multi-use
    > * Connection state changed (HTTP/2 confirmed)
    > * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
    > * Using Stream ID: 1 (easy handle 0x56303c2910e0)
    >> GET /src/contrib/Meta/archive.rds HTTP/2
    > Host: cran.rstudio.com
    > User-Agent: R (3.4.4 x86_64-pc-linux-gnu x86_64 linux-gnu)
    > Accept: */*

    > * Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
    > < HTTP/2 200
    > < content-length: 2483432
    > < date: Wed, 22 Jan 2020 21:22:04 GMT
    > < server: Apache/2.4.39 (Unix)
    > < last-modified: Wed, 22 Jan 2020 17:10:22 GMT
    > < etag: "25e4e8-59cbd998a0360"
    > < accept-ranges: bytes
    > < cache-control: max-age=1800
    > < expires: Wed, 22 Jan 2020 21:52:04 GMT
    > < x-cache: Hit from cloudfront
    > < via: 1.1 6cbe48f9f9ff0c768f29d83804f75d4c.cloudfront.net (CloudFront)
    > < x-amz-cf-pop: MAN50-C1
    > < x-amz-cf-id: WwCQVQz9g8ZP6Az4m4n__h7aUW6vwlg0-AkiCv_DnVfGe10bzaFtfg==
    > < age: 960
    > <
    > * 85 data bytes written
    > Error in readRDS(gzcon(url("https://cran.rstudio.com/src/contrib/Meta/archive.rds")))
    > :
    > reference index out of range
    > * stopped the pause stream!
    > * Connection #0 to host cran.rstudio.com left intact
    > Execution halted
    > ------------------------------------------------

    > Sometimes you get a crash, sometimes a corrupt stream, etc. Sometimes
    > is actually works.

    > It seems that the fix is simply this:

    > ------------------------------------
    > --- src/modules/internet/libcurl.c~
    > +++ src/modules/internet/libcurl.c
    > @@ -762,6 +762,7 @@
    > void *newbuf = realloc(ctxt->buf, newbufsize);
    > if (!newbuf) error("Failure in re-allocation in rcvData");
    ctxt-> buf = newbuf; ctxt->bufsize = newbufsize;
    > +    ctxt->current = ctxt->buf;
    > }

    > memcpy(ctxt->buf + ctxt->filled, ptr, add);
    > ------------------------------------

    > Best,
    > Gabor

Thanks a lot, Gábor!

I can reproduce the problem (on Linux Fedora 30) and confirm
that your patch works.

Even more, the patch looks  "almost obvious",
because
        ctxt->current = ctxt->buf

happens earlier in rcvData() after a change to ctxt->buf  and so
should be updated if buf is.

An even slightly "better" patch just moves that statement down
to after the  if(add) { .. }  clause.

I'll patch the sources, and will port to 'R 3.6.2 patched'.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel