Reading large files with R

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading large files with R

Martin Møller Skarbiniks Pedersen
Hi,

  I am trying to read yaml-file which is not so large (7 GB) and I have
plenty of memory.
However I get this error:

$  R --version
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

library(yaml)
keys <- read_yaml("/data/gpg/gpg-keys.yaml")

Error in paste(readLines(file), collapse = "\n") :
  result would exceed 2^31-1 bytes

2^31-1 is only 2GB.

Please advise,

Regards
Martin

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading large files with R

Duncan Murdoch-2
On 01/09/2019 3:06 p.m., Martin Møller Skarbiniks Pedersen wrote:

> Hi,
>
>    I am trying to read yaml-file which is not so large (7 GB) and I have
> plenty of memory.
> However I get this error:
>
> $  R --version
> R version 3.6.1 (2019-07-05) -- "Action of the Toes"
> Copyright (C) 2019 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> library(yaml)
> keys <- read_yaml("/data/gpg/gpg-keys.yaml")
>
> Error in paste(readLines(file), collapse = "\n") :
>    result would exceed 2^31-1 bytes
>
> 2^31-1 is only 2GB.
>
> Please advise,
>
> Regards
> Martin

Individual elements in character vectors have a size limit of 2^31-1.
The read_yaml() function is putting the whole file into one element, and
that's failing.

You probably have a couple of choices:

  - Rewrite read_yaml() so it doesn't try to do that.  This is likely
hard, because most of the work is being done by a C routine, but it's
conceivable you could use the stringi::stri_read_raw function to do the
reading, and convince the C routine to handle the raw value instead of a
character value.

  - Find a way to split up your file into smaller pieces.

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading large files with R

Martin Møller Skarbiniks Pedersen
On Sun, 1 Sep 2019 at 21:53, Duncan Murdoch <[hidden email]>
wrote:

> On 01/09/2019 3:06 p.m., Martin Møller Skarbiniks Pedersen wrote:
> > Hi,
> >
> >    I am trying to read yaml-file which is not so large (7 GB) and I have
> > plenty of memory.
>
>

> Individual elements in character vectors have a size limit of 2^31-1.
> The read_yaml() function is putting the whole file into one element, and
> that's failing.
>
>
Oh. I didn't know that. But ok, why would anyone create a
a single character vector so big ...

You probably have a couple of choices:
>
>   - Rewrite read_yaml() so it doesn't try to do that.  This is likely
> hard, because most of the work is being done by a C routine, but it's
> conceivable you could use the stringi::stri_read_raw function to do the
> reading, and convince the C routine to handle the raw value instead of a
> character value.
>

I actually might do that in the future.

  - Find a way to split up your file into smaller pieces.
>

Yes, that will be my first solution. Most YAML is easier to parse without
pasting all lines together (crazy!)


> Duncan Murdoch
>

Thanks for pointing me in the right direction.

/Martin

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.