How to save very large matrix?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to save very large matrix?

Petar Milin-2
Hello!
I have a very large matrix of results: 50000x100000. I saved it as RDS, but I would also need to save it as txt or csv. Is there a way to do it? Now, with write.table I am receiving an error:
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  :
  long vectors not supported yet: io.c:1116

Please, help! Many thanks!

PM
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to save very large matrix?

Adams, Jean
Have you tried write.csv() or write.matrix()?  I really don't know, but
they may be more efficient than write.table() with large matrices.

Jean


On Tue, Oct 29, 2013 at 2:27 PM, Petar Milin <[hidden email]> wrote:

> Hello!
> I have a very large matrix of results: 50000x100000. I saved it as RDS,
> but I would also need to save it as txt or csv. Is there a way to do it?
> Now, with write.table I am receiving an error:
> Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  :
>   long vectors not supported yet: io.c:1116
>
> Please, help! Many thanks!
>
> PM
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to save very large matrix?

Rui Barradas
In reply to this post by Petar Milin-2
Hello,

You can use the argument to write.csv or write.table  append = TRUE to
write the matrix in chunks. Something like the following.



bigwrite <- function(x, file, rows = 1000L, ...){
        passes <- NROW(x) %/% rows
        remaining <- NROW(x) %% rows
        k <- 1L
        write.table(x[k:rows, ], file, row.names = FALSE, ...)
        k <- k + rows
        for(i in seq_len(passes)[-1]){
                write.table(x[k:(rows*i), ], file, append = TRUE, row.names = FALSE,
col.names = FALSE, ...)
                k <- k + rows
        }
        if(remaining > 0)
                write.table(x[k:NROW(x), ], file, append = TRUE, row.names = FALSE,
col.names = FALSE, ...)
}

f <- "temp"
m <- matrix(0, 50012, 10)

bigwrite(m, f, sep = ",")  # Use 'sep' to get a csv file



Hope this helps,

Rui Barradas


Em 29-10-2013 19:27, Petar Milin escreveu:

> Hello!
> I have a very large matrix of results: 50000x100000. I saved it as RDS, but I would also need to save it as txt or csv. Is there a way to do it? Now, with write.table I am receiving an error:
> Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  :
>    long vectors not supported yet: io.c:1116
>
> Please, help! Many thanks!
>
> PM
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to save very large matrix?

Prof Brian Ripley
On 29/10/2013 20:42, Rui Barradas wrote:
> Hello,
>
> You can use the argument to write.csv or write.table  append = TRUE to
> write the matrix in chunks. Something like the following.

That was going to be my suggestion. But the reason long vectors have not
been implemented is that is rather implausible to be useful.   A text
file with the values of such a numeric matrix is likely to be 100GB.
What are you going to do with such a file?  For transfer to another
program I would seriously consider a binary format (e.g. use writeBin),
as it is the conversion to and from text that is time consuming.

Some experiments suggest that it would take hours to write and at least
an hour to read such a file[*], on a very fast machine with a
start-of-the-art SSD.

[*] a file with reasonable-precision real numbers, not zeroes.

>
>
>
> bigwrite <- function(x, file, rows = 1000L, ...){
>      passes <- NROW(x) %/% rows
>      remaining <- NROW(x) %% rows
>      k <- 1L
>      write.table(x[k:rows, ], file, row.names = FALSE, ...)
>      k <- k + rows
>      for(i in seq_len(passes)[-1]){
>          write.table(x[k:(rows*i), ], file, append = TRUE, row.names =
> FALSE, col.names = FALSE, ...)
>          k <- k + rows
>      }
>      if(remaining > 0)
>          write.table(x[k:NROW(x), ], file, append = TRUE, row.names =
> FALSE, col.names = FALSE, ...)
> }
>
> f <- "temp"
> m <- matrix(0, 50012, 10)
>
> bigwrite(m, f, sep = ",")  # Use 'sep' to get a csv file
>
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Em 29-10-2013 19:27, Petar Milin escreveu:
>> Hello!
>> I have a very large matrix of results: 50000x100000. I saved it as
>> RDS, but I would also need to save it as txt or csv. Is there a way to
>> do it? Now, with write.table I am receiving an error:
>> Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep,
>> eol,  :
>>    long vectors not supported yet: io.c:1116
>>
>> Please, help! Many thanks!
>>
>> PM


--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to save very large matrix?

Petar Milin-2
Hello,

On Oct 29, 2013, at 10:16 PM, Prof Brian Ripley <[hidden email]> wrote:

> On 29/10/2013 20:42, Rui Barradas wrote:
>> Hello,
>>
>> You can use the argument to write.csv or write.table  append = TRUE to
>> write the matrix in chunks. Something like the following.
>
> That was going to be my suggestion. But the reason long vectors have not been implemented is that is rather implausible to be useful.   A text file with the values of such a numeric matrix is likely to be 100GB. What are you going to do with such a file?  For transfer to another program I would seriously consider a binary format (e.g. use writeBin), as it is the conversion to and from text that is time consuming.

I need to submit it to a cluster analysis (k-means). From an independent source I have been advised to use means algorithm written in C which is very fast and efficient. It asks for a txt file as an input.

I tried few options in R, where I am more comfortable, but solution never came, even after too many hours.

Thanks!
Best,
PM
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to save very large matrix?

Hervé Pagès
Hi Petar,

If you're going to share this matrix across R sessions, save()/load() is
probably one of your best options.

Otherwise, you could try the rhdf5 package from Bioconductor:

1. Install the package with:

      source("http://bioconductor.org/biocLite.R")
      biocLite("rhdf5")

2. Then:

      library(rhdf5)

      h5createFile("my_big_matrix.h5")

      # write a matrix
      my_big_matrix <- matrix(runif(5000*10000), nrow=5000)
      attr(my_big_matrix, "scale") <- "liter"
      h5write(my_big_matrix, "my_big_matrix.h5", "my_big_matrix")  #
takes 1 min.
      # file size on disk is 248M

      # read a matrix
      my_big_matrix <- h5read("my_big_matrix.h5", "my_big_matrix")  #
takes 7.4 sec.

Multiply the above numbers (obtained on a laptop with a traditional
hard drive) by 100 for your monster matrix, or less if you have super
fast I/O.

2 advantages of using the HDF5 format: (1) should not be too hard to use
the HDF5 C library in the C code you're going to use to read the matrix,
and (2) my understanding is that HDF5 is good at letting you access
arbitrary slices of the data so chunk-processing should be easy and
efficient:

   http://www.hdfgroup.org/HDF5/

Cheers,
H.


On 10/29/2013 02:34 PM, Petar Milin wrote:

> Hello,
>
> On Oct 29, 2013, at 10:16 PM, Prof Brian Ripley <[hidden email]> wrote:
>
>> On 29/10/2013 20:42, Rui Barradas wrote:
>>> Hello,
>>>
>>> You can use the argument to write.csv or write.table  append = TRUE to
>>> write the matrix in chunks. Something like the following.
>>
>> That was going to be my suggestion. But the reason long vectors have not been implemented is that is rather implausible to be useful.   A text file with the values of such a numeric matrix is likely to be 100GB. What are you going to do with such a file?  For transfer to another program I would seriously consider a binary format (e.g. use writeBin), as it is the conversion to and from text that is time consuming.
>
> I need to submit it to a cluster analysis (k-means). From an independent source I have been advised to use means algorithm written in C which is very fast and efficient. It asks for a txt file as an input.
>
> I tried few options in R, where I am more comfortable, but solution never came, even after too many hours.
>
> Thanks!
> Best,
> PM
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.