

Hello!
I have a very large matrix of results: 50000x100000. I saved it as RDS, but I would also need to save it as txt or csv. Is there a way to do it? Now, with write.table I am receiving an error:
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
long vectors not supported yet: io.c:1116
Please, help! Many thanks!
PM
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Have you tried write.csv() or write.matrix()? I really don't know, but
they may be more efficient than write.table() with large matrices.
Jean
On Tue, Oct 29, 2013 at 2:27 PM, Petar Milin < [hidden email]> wrote:
> Hello!
> I have a very large matrix of results: 50000x100000. I saved it as RDS,
> but I would also need to save it as txt or csv. Is there a way to do it?
> Now, with write.table I am receiving an error:
> Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
> long vectors not supported yet: io.c:1116
>
> Please, help! Many thanks!
>
> PM
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
You can use the argument to write.csv or write.table append = TRUE to
write the matrix in chunks. Something like the following.
bigwrite < function(x, file, rows = 1000L, ...){
passes < NROW(x) %/% rows
remaining < NROW(x) %% rows
k < 1L
write.table(x[k:rows, ], file, row.names = FALSE, ...)
k < k + rows
for(i in seq_len(passes)[1]){
write.table(x[k:(rows*i), ], file, append = TRUE, row.names = FALSE,
col.names = FALSE, ...)
k < k + rows
}
if(remaining > 0)
write.table(x[k:NROW(x), ], file, append = TRUE, row.names = FALSE,
col.names = FALSE, ...)
}
f < "temp"
m < matrix(0, 50012, 10)
bigwrite(m, f, sep = ",") # Use 'sep' to get a csv file
Hope this helps,
Rui Barradas
Em 29102013 19:27, Petar Milin escreveu:
> Hello!
> I have a very large matrix of results: 50000x100000. I saved it as RDS, but I would also need to save it as txt or csv. Is there a way to do it? Now, with write.table I am receiving an error:
> Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
> long vectors not supported yet: io.c:1116
>
> Please, help! Many thanks!
>
> PM
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 29/10/2013 20:42, Rui Barradas wrote:
> Hello,
>
> You can use the argument to write.csv or write.table append = TRUE to
> write the matrix in chunks. Something like the following.
That was going to be my suggestion. But the reason long vectors have not
been implemented is that is rather implausible to be useful. A text
file with the values of such a numeric matrix is likely to be 100GB.
What are you going to do with such a file? For transfer to another
program I would seriously consider a binary format (e.g. use writeBin),
as it is the conversion to and from text that is time consuming.
Some experiments suggest that it would take hours to write and at least
an hour to read such a file[*], on a very fast machine with a
startoftheart SSD.
[*] a file with reasonableprecision real numbers, not zeroes.
>
>
>
> bigwrite < function(x, file, rows = 1000L, ...){
> passes < NROW(x) %/% rows
> remaining < NROW(x) %% rows
> k < 1L
> write.table(x[k:rows, ], file, row.names = FALSE, ...)
> k < k + rows
> for(i in seq_len(passes)[1]){
> write.table(x[k:(rows*i), ], file, append = TRUE, row.names =
> FALSE, col.names = FALSE, ...)
> k < k + rows
> }
> if(remaining > 0)
> write.table(x[k:NROW(x), ], file, append = TRUE, row.names =
> FALSE, col.names = FALSE, ...)
> }
>
> f < "temp"
> m < matrix(0, 50012, 10)
>
> bigwrite(m, f, sep = ",") # Use 'sep' to get a csv file
>
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Em 29102013 19:27, Petar Milin escreveu:
>> Hello!
>> I have a very large matrix of results: 50000x100000. I saved it as
>> RDS, but I would also need to save it as txt or csv. Is there a way to
>> do it? Now, with write.table I am receiving an error:
>> Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep,
>> eol, :
>> long vectors not supported yet: io.c:1116
>>
>> Please, help! Many thanks!
>>
>> PM

Brian D. Ripley, [hidden email]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
On Oct 29, 2013, at 10:16 PM, Prof Brian Ripley < [hidden email]> wrote:
> On 29/10/2013 20:42, Rui Barradas wrote:
>> Hello,
>>
>> You can use the argument to write.csv or write.table append = TRUE to
>> write the matrix in chunks. Something like the following.
>
> That was going to be my suggestion. But the reason long vectors have not been implemented is that is rather implausible to be useful. A text file with the values of such a numeric matrix is likely to be 100GB. What are you going to do with such a file? For transfer to another program I would seriously consider a binary format (e.g. use writeBin), as it is the conversion to and from text that is time consuming.
I need to submit it to a cluster analysis (kmeans). From an independent source I have been advised to use means algorithm written in C which is very fast and efficient. It asks for a txt file as an input.
I tried few options in R, where I am more comfortable, but solution never came, even after too many hours.
Thanks!
Best,
PM
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Petar,
If you're going to share this matrix across R sessions, save()/load() is
probably one of your best options.
Otherwise, you could try the rhdf5 package from Bioconductor:
1. Install the package with:
source(" http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
2. Then:
library(rhdf5)
h5createFile("my_big_matrix.h5")
# write a matrix
my_big_matrix < matrix(runif(5000*10000), nrow=5000)
attr(my_big_matrix, "scale") < "liter"
h5write(my_big_matrix, "my_big_matrix.h5", "my_big_matrix") #
takes 1 min.
# file size on disk is 248M
# read a matrix
my_big_matrix < h5read("my_big_matrix.h5", "my_big_matrix") #
takes 7.4 sec.
Multiply the above numbers (obtained on a laptop with a traditional
hard drive) by 100 for your monster matrix, or less if you have super
fast I/O.
2 advantages of using the HDF5 format: (1) should not be too hard to use
the HDF5 C library in the C code you're going to use to read the matrix,
and (2) my understanding is that HDF5 is good at letting you access
arbitrary slices of the data so chunkprocessing should be easy and
efficient:
http://www.hdfgroup.org/HDF5/Cheers,
H.
On 10/29/2013 02:34 PM, Petar Milin wrote:
> Hello,
>
> On Oct 29, 2013, at 10:16 PM, Prof Brian Ripley < [hidden email]> wrote:
>
>> On 29/10/2013 20:42, Rui Barradas wrote:
>>> Hello,
>>>
>>> You can use the argument to write.csv or write.table append = TRUE to
>>> write the matrix in chunks. Something like the following.
>>
>> That was going to be my suggestion. But the reason long vectors have not been implemented is that is rather implausible to be useful. A text file with the values of such a numeric matrix is likely to be 100GB. What are you going to do with such a file? For transfer to another program I would seriously consider a binary format (e.g. use writeBin), as it is the conversion to and from text that is time consuming.
>
> I need to submit it to a cluster analysis (kmeans). From an independent source I have been advised to use means algorithm written in C which is very fast and efficient. It asks for a txt file as an input.
>
> I tried few options in R, where I am more comfortable, but solution never came, even after too many hours.
>
> Thanks!
> Best,
> PM
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>

Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1B514
P.O. Box 19024
Seattle, WA 981091024
Email: [hidden email]
Phone: (206) 6675791
Fax: (206) 6671319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

