Incremental ReadLines

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Incremental ReadLines

Gene Leynes
I've been trying to figure out how to read in a large file for a few days
now, and after extensive research I'm still not sure what to do.

I have a large comma delimited text file that contains 59 fields in each
record.
There is also a header every 121 records

This function works well for smallish records
getcsv=function(fname){
    ff=file(description = fname)
    x <- readLines(ff)
    closeAllConnections()
    x <- x[x != ""]          # REMOVE BLANKS
    x=x[grep("^[-0-9]", x)]  # REMOVE ALL TEXT

    spl=strsplit(x,',')      # THIS PART IS SLOW, BUT MANAGABLE

xx=t(sapply(1:length(spl),function(temp)as.vector(na.omit(as.numeric(spl[[temp]])))))
    return(xx)
}
It's not elegant, but it works.
For 121,000 records it completes in 2.3 seconds
For 121,000*5 records it completes in 63 seconds
For 121,000*10 records it doesn't complete

When I try other methods to read the file in chunks (using scan), the
process breaks down because I have to start at the beginning of the file on
every iteration.
For example:
fnn=function(n,col){
    a=122*(n-1)+2
    xx=scan(fname,skip=a-1,nlines=121,sep=',',quiet=TRUE,what=character(0))
    xx=xx[xx!='']
    xx=matrix(xx,ncol=49,byrow=TRUE)
    xx[,col]
}
system.time(sapply(1:10,fnn,c=26))     # 0.31 Seconds
system.time(sapply(91:90,fnn,c=26))    # 1.09 Seconds
system.time(sapply(901:910,fnn,c=26))  # 5.78 Seconds

Even though I'm only getting the 26th column for 10 sets of records, it
takes a lot longer the further into the file I go.

How can I tell scan to pick up where it left off, without it starting at the
beginning??  There must be a good example somewhere.

I have done a lot of research (in fact, thank you to Michael J. Crawley and
others for your help thus far)

Thanks,

Gene

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Duncan Murdoch
On 11/2/2009 2:03 PM, Gene Leynes wrote:
> I've been trying to figure out how to read in a large file for a few days
> now, and after extensive research I'm still not sure what to do.
>
> I have a large comma delimited text file that contains 59 fields in each
> record.
> There is also a header every 121 records

You can open the connection before reading, then read in blocks of lines
and process those.  You don't need to reopen it every time.  For example,

ff <- file(fname, open="rt")  # rt is read text
for (block in 1:nblocks) {
   x <- readLines(ff, n=121)
   # process this block
}
close(ff)

Duncan Murdoch

>
> This function works well for smallish records
> getcsv=function(fname){
>     ff=file(description = fname)
>     x <- readLines(ff)
>     closeAllConnections()
>     x <- x[x != ""]          # REMOVE BLANKS
>     x=x[grep("^[-0-9]", x)]  # REMOVE ALL TEXT
>
>     spl=strsplit(x,',')      # THIS PART IS SLOW, BUT MANAGABLE
>
> xx=t(sapply(1:length(spl),function(temp)as.vector(na.omit(as.numeric(spl[[temp]])))))
>     return(xx)
> }
> It's not elegant, but it works.
> For 121,000 records it completes in 2.3 seconds
> For 121,000*5 records it completes in 63 seconds
> For 121,000*10 records it doesn't complete
>
> When I try other methods to read the file in chunks (using scan), the
> process breaks down because I have to start at the beginning of the file on
> every iteration.
> For example:
> fnn=function(n,col){
>     a=122*(n-1)+2
>     xx=scan(fname,skip=a-1,nlines=121,sep=',',quiet=TRUE,what=character(0))
>     xx=xx[xx!='']
>     xx=matrix(xx,ncol=49,byrow=TRUE)
>     xx[,col]
> }
> system.time(sapply(1:10,fnn,c=26))     # 0.31 Seconds
> system.time(sapply(91:90,fnn,c=26))    # 1.09 Seconds
> system.time(sapply(901:910,fnn,c=26))  # 5.78 Seconds
>
> Even though I'm only getting the 26th column for 10 sets of records, it
> takes a lot longer the further into the file I go.
>
> How can I tell scan to pick up where it left off, without it starting at the
> beginning??  There must be a good example somewhere.
>
> I have done a lot of research (in fact, thank you to Michael J. Crawley and
> others for your help thus far)
>
> Thanks,
>
> Gene
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

James W. MacDonald
In reply to this post by Gene Leynes
Hi Gene,

Rather than using R to parse this file, have you considered using either
grep or sed to pre-process the file and then read it in?

It looks like you just want lines starting with numbers, so something like

grep '^[0-9]\+' thefile.csv > otherfile.csv

should be much faster, and then you can just read in otherfile.csv using
read.csv().

Best,

Jim



Gene Leynes wrote:

> I've been trying to figure out how to read in a large file for a few days
> now, and after extensive research I'm still not sure what to do.
>
> I have a large comma delimited text file that contains 59 fields in each
> record.
> There is also a header every 121 records
>
> This function works well for smallish records
> getcsv=function(fname){
>     ff=file(description = fname)
>     x <- readLines(ff)
>     closeAllConnections()
>     x <- x[x != ""]          # REMOVE BLANKS
>     x=x[grep("^[-0-9]", x)]  # REMOVE ALL TEXT
>
>     spl=strsplit(x,',')      # THIS PART IS SLOW, BUT MANAGABLE
>
> xx=t(sapply(1:length(spl),function(temp)as.vector(na.omit(as.numeric(spl[[temp]])))))
>     return(xx)
> }
> It's not elegant, but it works.
> For 121,000 records it completes in 2.3 seconds
> For 121,000*5 records it completes in 63 seconds
> For 121,000*10 records it doesn't complete
>
> When I try other methods to read the file in chunks (using scan), the
> process breaks down because I have to start at the beginning of the file on
> every iteration.
> For example:
> fnn=function(n,col){
>     a=122*(n-1)+2
>     xx=scan(fname,skip=a-1,nlines=121,sep=',',quiet=TRUE,what=character(0))
>     xx=xx[xx!='']
>     xx=matrix(xx,ncol=49,byrow=TRUE)
>     xx[,col]
> }
> system.time(sapply(1:10,fnn,c=26))     # 0.31 Seconds
> system.time(sapply(91:90,fnn,c=26))    # 1.09 Seconds
> system.time(sapply(901:910,fnn,c=26))  # 5.78 Seconds
>
> Even though I'm only getting the 26th column for 10 sets of records, it
> takes a lot longer the further into the file I go.
>
> How can I tell scan to pick up where it left off, without it starting at the
> beginning??  There must be a good example somewhere.
>
> I have done a lot of research (in fact, thank you to Michael J. Crawley and
> others for your help thus far)
>
> Thanks,
>
> Gene
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Gene Leynes
James,

I think those are Unix commands?  I'm on Windows, so that's not an option
(for now)

Also the suggestions posed by Duncan and Phil seem to be working.  Thank you
so much, such a simple thing to add the "r" or "rt" to the file connection.


I read about blocking, but I didn't imagine that it meant "chunks".  I was
thinking something more like "blocking out", or guarding (perhaps for
security).

On Mon, Nov 2, 2009 at 1:47 PM, James W. MacDonald <[hidden email]>wrote:

> Hi Gene,
>
> Rather than using R to parse this file, have you considered using either
> grep or sed to pre-process the file and then read it in?
>
> It looks like you just want lines starting with numbers, so something like
>
> grep '^[0-9]\+' thefile.csv > otherfile.csv
>
> should be much faster, and then you can just read in otherfile.csv using
> read.csv().
>
> Best,
>
> Jim
>
>
>
> Gene Leynes wrote:
>
>> I've been trying to figure out how to read in a large file for a few days
>> now, and after extensive research I'm still not sure what to do.
>>
>> I have a large comma delimited text file that contains 59 fields in each
>> record.
>> There is also a header every 121 records
>>
>> This function works well for smallish records
>> getcsv=function(fname){
>>    ff=file(description = fname)
>>    x <- readLines(ff)
>>    closeAllConnections()
>>    x <- x[x != ""]          # REMOVE BLANKS
>>    x=x[grep("^[-0-9]", x)]  # REMOVE ALL TEXT
>>
>>    spl=strsplit(x,',')      # THIS PART IS SLOW, BUT MANAGABLE
>>
>>
>> xx=t(sapply(1:length(spl),function(temp)as.vector(na.omit(as.numeric(spl[[temp]])))))
>>    return(xx)
>> }
>> It's not elegant, but it works.
>> For 121,000 records it completes in 2.3 seconds
>> For 121,000*5 records it completes in 63 seconds
>> For 121,000*10 records it doesn't complete
>>
>> When I try other methods to read the file in chunks (using scan), the
>> process breaks down because I have to start at the beginning of the file
>> on
>> every iteration.
>> For example:
>> fnn=function(n,col){
>>    a=122*(n-1)+2
>>    xx=scan(fname,skip=a-1,nlines=121,sep=',',quiet=TRUE,what=character(0))
>>    xx=xx[xx!='']
>>    xx=matrix(xx,ncol=49,byrow=TRUE)
>>    xx[,col]
>> }
>> system.time(sapply(1:10,fnn,c=26))     # 0.31 Seconds
>> system.time(sapply(91:90,fnn,c=26))    # 1.09 Seconds
>> system.time(sapply(901:910,fnn,c=26))  # 5.78 Seconds
>>
>> Even though I'm only getting the 26th column for 10 sets of records, it
>> takes a lot longer the further into the file I go.
>>
>> How can I tell scan to pick up where it left off, without it starting at
>> the
>> beginning??  There must be a good example somewhere.
>>
>> I have done a lot of research (in fact, thank you to Michael J. Crawley
>> and
>> others for your help thus far)
>>
>> Thanks,
>>
>> Gene
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Jens Oehlschlägel
In reply to this post by Gene Leynes
Gene,

You might want to look at function read.csv.ffdf from package ff which can read large csv-files into a ffdf object. That's kind of data.frame which is stored on disk resp. in the file-system-cache. Once you subscript part of it, you get a regular data.frame.


Jens Oehlschlägel
--
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Gabor Grothendieck
In reply to this post by Gene Leynes
If the headers all start with the same letter, "A" say, and the data
only contain numbers on their lines then just use

read.table(..., comment = "A")



On Mon, Nov 2, 2009 at 2:03 PM, Gene Leynes <[hidden email]> wrote:

> I've been trying to figure out how to read in a large file for a few days
> now, and after extensive research I'm still not sure what to do.
>
> I have a large comma delimited text file that contains 59 fields in each
> record.
> There is also a header every 121 records
>
> This function works well for smallish records
> getcsv=function(fname){
>    ff=file(description = fname)
>    x <- readLines(ff)
>    closeAllConnections()
>    x <- x[x != ""]          # REMOVE BLANKS
>    x=x[grep("^[-0-9]", x)]  # REMOVE ALL TEXT
>
>    spl=strsplit(x,',')      # THIS PART IS SLOW, BUT MANAGABLE
>
> xx=t(sapply(1:length(spl),function(temp)as.vector(na.omit(as.numeric(spl[[temp]])))))
>    return(xx)
> }
> It's not elegant, but it works.
> For 121,000 records it completes in 2.3 seconds
> For 121,000*5 records it completes in 63 seconds
> For 121,000*10 records it doesn't complete
>
> When I try other methods to read the file in chunks (using scan), the
> process breaks down because I have to start at the beginning of the file on
> every iteration.
> For example:
> fnn=function(n,col){
>    a=122*(n-1)+2
>    xx=scan(fname,skip=a-1,nlines=121,sep=',',quiet=TRUE,what=character(0))
>    xx=xx[xx!='']
>    xx=matrix(xx,ncol=49,byrow=TRUE)
>    xx[,col]
> }
> system.time(sapply(1:10,fnn,c=26))     # 0.31 Seconds
> system.time(sapply(91:90,fnn,c=26))    # 1.09 Seconds
> system.time(sapply(901:910,fnn,c=26))  # 5.78 Seconds
>
> Even though I'm only getting the 26th column for 10 sets of records, it
> takes a lot longer the further into the file I go.
>
> How can I tell scan to pick up where it left off, without it starting at the
> beginning??  There must be a good example somewhere.
>
> I have done a lot of research (in fact, thank you to Michael J. Crawley and
> others for your help thus far)
>
> Thanks,
>
> Gene
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Freds
Hi there,

I am having a similar problem with reading in a large text file with around 550.000 observations with each 10 to 100 lines of description. I am trying to parse it in R but I have troubles with the size of the file. It seems like it is slowing down dramatically at some point. I would be happy for any suggestions. Here is my code, which works fine when I am doing a subsample of my dataset.

#Defining datasource
file <- "filename.txt"

#Creating placeholder for data and assigning column names
data <- data.frame(Id=NA)

#Starting by case = 0
case <- 0

#Opening a connection to data
input <- file(file, "rt")

#Going through cases
repeat {
  line <- readLines(input, n=1)
  if (length(line)==0) break
  if (length(grep("Id:",line)) != 0) {
    case <- case + 1 ; data[case,] <-NA
    split_line <- strsplit(line,"Id:")
    data[case,1] <- as.numeric(split_line[[1]][2])
    }
}

#Closing connection
close(input)

#Saving dataframe
write.csv(data,'data.csv')


Kind regards,


Frederik
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Mike Marchywka





----------------------------------------

> Date: Wed, 13 Apr 2011 10:57:58 -0700
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R] Incremental ReadLines
>
> Hi there,
>
> I am having a similar problem with reading in a large text file with around
> 550.000 observations with each 10 to 100 lines of description. I am trying
> to parse it in R but I have troubles with the size of the file. It seems
> like it is slowing down dramatically at some point. I would be happy for any

This probably occurs when you run out of physical memory but you can
probably verify by looking at task manager. A "readline()" method
wouldn't fit real well with R as you try to had blocks of data
so that inner loops, implemented largely in native code, can operate
efficiently. The thing you want is a data structure that can use
disk more effectively and hide these details from you and algorightm.
This works best if the algorithm works with data strcuture to avoid
lots of disk thrashing. You coudl imagine that your "read" would do
nothing until each item is needed but often people want the whole
file validated before procesing, lots of details come up with exception
handling as you get fancy here. Note of course that your parse output
could be stored in a hash or something represnting a DOM and this could
get arbitrarily large. Since it is designed for random access, this may
cause lots of thrashing if partially on disk. Anything you can do to
make access patterns more regular, for example sort your data, would help.


> suggestions. Here is my code, which works fine when I am doing a subsample
> of my dataset.
>
> #Defining datasource
> file <- "filename.txt"
>
> #Creating placeholder for data and assigning column names
> data <- data.frame(Id=NA)
>
> #Starting by case = 0
> case <- 0
>
> #Opening a connection to data
> input <- file(file, "rt")
>
> #Going through cases
> repeat {
> line <- readLines(input, n=1)
> if (length(line)==0) break
> if (length(grep("Id:",line)) != 0) {
> case <- case + 1 ; data[case,] <-NA
> split_line <- strsplit(line,"Id:")
> data[case,1] <- as.numeric(split_line[[1]][2])
> }
> }
>
> #Closing connection
> close(input)
>
> #Saving dataframe
> write.csv(data,'data.csv')
>
>
> Kind regards,
>
>
> Frederik
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
     
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Freds
Hi Mike,

Thanks for your comment.

I must admit that I am very new to R and although it sounds interesting what
you write I have no idea of where to start. Can you give some functions or
examples where I can see how it can be done.

I was under the impression that I had to do a loop since my blocks of
observations are of varying length.

Thanks again,

Frederik

On Thu, Apr 14, 2011 at 6:19 AM, Mike Marchywka <[hidden email]>wrote:

>
>
>
>
>
> ----------------------------------------
> > Date: Wed, 13 Apr 2011 10:57:58 -0700
> > From: [hidden email]
> > To: [hidden email]
> > Subject: Re: [R] Incremental ReadLines
> >
> > Hi there,
> >
> > I am having a similar problem with reading in a large text file with
> around
> > 550.000 observations with each 10 to 100 lines of description. I am
> trying
> > to parse it in R but I have troubles with the size of the file. It seems
> > like it is slowing down dramatically at some point. I would be happy for
> any
>
> This probably occurs when you run out of physical memory but you can
> probably verify by looking at task manager. A "readline()" method
> wouldn't fit real well with R as you try to had blocks of data
> so that inner loops, implemented largely in native code, can operate
> efficiently. The thing you want is a data structure that can use
> disk more effectively and hide these details from you and algorightm.
> This works best if the algorithm works with data strcuture to avoid
> lots of disk thrashing. You coudl imagine that your "read" would do
> nothing until each item is needed but often people want the whole
> file validated before procesing, lots of details come up with exception
> handling as you get fancy here. Note of course that your parse output
> could be stored in a hash or something represnting a DOM and this could
> get arbitrarily large. Since it is designed for random access, this may
> cause lots of thrashing if partially on disk. Anything you can do to
> make access patterns more regular, for example sort your data, would help.
>
>
> > suggestions. Here is my code, which works fine when I am doing a
> subsample
> > of my dataset.
> >
> > #Defining datasource
> > file <- "filename.txt"
> >
> > #Creating placeholder for data and assigning column names
> > data <- data.frame(Id=NA)
> >
> > #Starting by case = 0
> > case <- 0
> >
> > #Opening a connection to data
> > input <- file(file, "rt")
> >
> > #Going through cases
> > repeat {
> > line <- readLines(input, n=1)
> > if (length(line)==0) break
> > if (length(grep("Id:",line)) != 0) {
> > case <- case + 1 ; data[case,] <-NA
> > split_line <- strsplit(line,"Id:")
> > data[case,1] <- as.numeric(split_line[[1]][2])
> > }
> > }
> >
> > #Closing connection
> > close(input)
> >
> > #Saving dataframe
> > write.csv(data,'data.csv')
> >
> >
> > Kind regards,
> >
> >
> > Frederik
> >
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

William Dunlap
In reply to this post by Freds
I have two suggestions to speed up your code, if you
must use a loop.

First, don't grow your output dataset at each iteration.
Instead of
     cases <- 0
     output <- numeric(cases)
     while(length(line <- readLines(input, n=1))==1) {
        cases <- cases + 1
        output[cases] <- as.numeric(line)
     }
preallocate the output vector to be about the size of
its eventual length (slightly bigger is better), replacing
     output <- numeric(0)
with the likes of
     output <- numeric(500000)
and when you are done with the loop trim down the length
if it is too big
     if (cases < length(output)) length(output) <- cases
Growing your dataset in a loop can cause quadratic or worse
growth in time with problem size and the above sort of
code should make the time grow linearly with problem size.

Second, don't do data.frame subscripting inside your loop.
Instead of
     data <- data.frame(Id=numeric(cases))
     while(...) {
         data[cases, 1] <- newValue
     }
do
     Id <- numeric(cases)
     while(...) {
         Id[cases] <- newValue
     }
     data <- data.frame(Id = Id)
This is just the general principal that you don't want to
repeat the same operation over and over in a loop.
dataFrame[i,j] first extracts column j then extracts element
i from that column.  Since the column is the same every iteration
you may as well extract the column outside of the loop.

Avoiding the loop altogether is the fastest.  E.g., the code
you showed does the same thing as
   idLines <- grep(value=TRUE, "Id:", readLines(file))
   data.frame(Id = as.numeric(sub("^.*Id:[[:space:]]*", "", idLines)))
You can also use an external process (perl or grep) to filter
out the lines that are not of interest.


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Freds
> Sent: Wednesday, April 13, 2011 10:58 AM
> To: [hidden email]
> Subject: Re: [R] Incremental ReadLines
>
> Hi there,
>
> I am having a similar problem with reading in a large text
> file with around
> 550.000 observations with each 10 to 100 lines of
> description. I am trying
> to parse it in R but I have troubles with the size of the
> file. It seems
> like it is slowing down dramatically at some point. I would
> be happy for any
> suggestions. Here is my code, which works fine when I am
> doing a subsample
> of my dataset.
>
> #Defining datasource
> file <- "filename.txt"
>
> #Creating placeholder for data and assigning column names
> data <- data.frame(Id=NA)
>
> #Starting by case = 0
> case <- 0
>
> #Opening a connection to data
> input <- file(file, "rt")
>
> #Going through cases
> repeat {
>   line <- readLines(input, n=1)
>   if (length(line)==0) break
>   if (length(grep("Id:",line)) != 0) {
>     case <- case + 1 ; data[case,] <-NA
>     split_line <- strsplit(line,"Id:")
>     data[case,1] <- as.numeric(split_line[[1]][2])
>     }
> }
>
> #Closing connection
> close(input)
>
> #Saving dataframe
> write.csv(data,'data.csv')
>
>
> Kind regards,
>
>
> Frederik
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3
447859.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Freds
Hi Bill,

Thank you so much for your suggestions. I will try and alter my code.


Regarding the even shorter solution outside the loop it looks good but my
problem is that not all observations have the same variables so that three
different observations might look like this:


Id: 1
Var1: false
Var2: 6
Var3: 8

Id: 2
missing

Id: 3
Var1: true
3 4 5
Var2: 7
Var3: 3


Doing it without looping through I thought my data had to quite systematic,
which it is not. I might be wrong though.


Thanks again,


Frederik


On Thu, Apr 14, 2011 at 12:56 PM, William Dunlap <[hidden email]> wrote:

> I have two suggestions to speed up your code, if you
> must use a loop.
>
> First, don't grow your output dataset at each iteration.
> Instead of
>     cases <- 0
>     output <- numeric(cases)
>     while(length(line <- readLines(input, n=1))==1) {
>        cases <- cases + 1
>        output[cases] <- as.numeric(line)
>     }
> preallocate the output vector to be about the size of
> its eventual length (slightly bigger is better), replacing
>     output <- numeric(0)
> with the likes of
>     output <- numeric(500000)
> and when you are done with the loop trim down the length
> if it is too big
>     if (cases < length(output)) length(output) <- cases
> Growing your dataset in a loop can cause quadratic or worse
> growth in time with problem size and the above sort of
> code should make the time grow linearly with problem size.
>
> Second, don't do data.frame subscripting inside your loop.
> Instead of
>     data <- data.frame(Id=numeric(cases))
>     while(...) {
>         data[cases, 1] <- newValue
>     }
> do
>     Id <- numeric(cases)
>     while(...) {
>         Id[cases] <- newValue
>     }
>     data <- data.frame(Id = Id)
> This is just the general principal that you don't want to
> repeat the same operation over and over in a loop.
> dataFrame[i,j] first extracts column j then extracts element
> i from that column.  Since the column is the same every iteration
> you may as well extract the column outside of the loop.
>
> Avoiding the loop altogether is the fastest.  E.g., the code
> you showed does the same thing as
>   idLines <- grep(value=TRUE, "Id:", readLines(file))
>   data.frame(Id = as.numeric(sub("^.*Id:[[:space:]]*", "", idLines)))
> You can also use an external process (perl or grep) to filter
> out the lines that are not of interest.
>
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of Freds
> > Sent: Wednesday, April 13, 2011 10:58 AM
> > To: [hidden email]
> > Subject: Re: [R] Incremental ReadLines
> >
> > Hi there,
> >
> > I am having a similar problem with reading in a large text
> > file with around
> > 550.000 observations with each 10 to 100 lines of
> > description. I am trying
> > to parse it in R but I have troubles with the size of the
> > file. It seems
> > like it is slowing down dramatically at some point. I would
> > be happy for any
> > suggestions. Here is my code, which works fine when I am
> > doing a subsample
> > of my dataset.
> >
> > #Defining datasource
> > file <- "filename.txt"
> >
> > #Creating placeholder for data and assigning column names
> > data <- data.frame(Id=NA)
> >
> > #Starting by case = 0
> > case <- 0
> >
> > #Opening a connection to data
> > input <- file(file, "rt")
> >
> > #Going through cases
> > repeat {
> >   line <- readLines(input, n=1)
> >   if (length(line)==0) break
> >   if (length(grep("Id:",line)) != 0) {
> >     case <- case + 1 ; data[case,] <-NA
> >     split_line <- strsplit(line,"Id:")
> >     data[case,1] <- as.numeric(split_line[[1]][2])
> >     }
> > }
> >
> > #Closing connection
> > close(input)
> >
> > #Saving dataframe
> > write.csv(data,'data.csv')
> >
> >
> > Kind regards,
> >
> >
> > Frederik
> >
> >
> > --
> > View this message in context:
> > http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3
> 447859.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Mike Marchywka
In reply to this post by Freds








________________________________

> Date: Thu, 14 Apr 2011 11:57:40 -0400
> Subject: Re: [R] Incremental ReadLines
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]
>
> Hi Mike,
>
> Thanks for your comment.
>
> I must admit that I am very new to R and although it sounds interesting
> what you write I have no idea of where to start. Can you give some
> functions or examples where I can see how it can be done.

I'm not sure I have a good R answer, simply pointing out the likley
isuse and maybe the rest belongs on r-develoiper list or something.
If you can determine you are running out of physical memory, then you
either need to partitition somehting or make accesses more regular.
My favorite example from personal experience is sorting a data set
prior to piping into a c++ program that changed the execution time
substantially by avoiding VM thrashing. R either needs a swapping buffer
or has an equivalent that someone else could mention.


>
> I was under the impression that I had to do a loop since my blocks of
> observations are of varying length.
>
> Thanks again,
>
> Frederik
>
> On Thu, Apr 14, 2011 at 6:19 AM, Mike Marchywka
> > wrote:
>
>
>
>
>
> ----------------------------------------
> > Date: Wed, 13 Apr 2011 10:57:58 -0700
> > From: [hidden email]
> > To: [hidden email]
> > Subject: Re: [R] Incremental ReadLines
> >
> > Hi there,
> >
> > I am having a similar problem with reading in a large text file with around
> > 550.000 observations with each 10 to 100 lines of description. I am trying
> > to parse it in R but I have troubles with the size of the file. It seems
> > like it is slowing down dramatically at some point. I would be happy
> for any
>
> This probably occurs when you run out of physical memory but you can
> probably verify by looking at task manager. A "readline()" method
> wouldn't fit real well with R as you try to had blocks of data
> so that inner loops, implemented largely in native code, can operate
> efficiently. The thing you want is a data structure that can use
> disk more effectively and hide these details from you and algorightm.
> This works best if the algorithm works with data strcuture to avoid
> lots of disk thrashing. You coudl imagine that your "read" would do
> nothing until each item is needed but often people want the whole
> file validated before procesing, lots of details come up with exception
> handling as you get fancy here. Note of course that your parse output
> could be stored in a hash or something represnting a DOM and this could
> get arbitrarily large. Since it is designed for random access, this may
> cause lots of thrashing if partially on disk. Anything you can do to
> make access patterns more regular, for example sort your data, would help.
>
>
> > suggestions. Here is my code, which works fine when I am doing a subsample
> > of my dataset.
> >
> > #Defining datasource
> > file <- "filename.txt"
> >
> > #Creating placeholder for data and assigning column names
> > data <- data.frame(Id=NA)
> >
> > #Starting by case = 0
> > case <- 0
> >
> > #Opening a connection to data
> > input <- file(file, "rt")
> >
> > #Going through cases
> > repeat {
> > line <- readLines(input, n=1)
> > if (length(line)==0) break
> > if (length(grep("Id:",line)) != 0) {
> > case <- case + 1 ; data[case,] <-NA
> > split_line <- strsplit(line,"Id:")
> > data[case,1] <- as.numeric(split_line[[1]][2])
> > }
> > }
> >
> > #Closing connection
> > close(input)
> >
> > #Saving dataframe
> > write.csv(data,'data.csv')
> >
> >
> > Kind regards,
> >
> >
> > Frederik
> >
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
     
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

William Dunlap
In reply to this post by Freds
[see below]

From: Frederik Lang [mailto:[hidden email]]
Sent: Thursday, April 14, 2011 12:56 PM
To: William Dunlap
Cc: [hidden email]
Subject: Re: [R] Incremental ReadLines



        Hi Bill,
       
        Thank you so much for your suggestions. I will try and alter my
code.
       
       
        Regarding the even shorter solution outside the loop it looks
good but my problem is that not all observations have the same variables
so that three different observations might look like this:
       
       
        Id: 1
        Var1: false
        Var2: 6
        Var3: 8
       
        Id: 2
        missing
       
        Id: 3
        Var1: true
        3 4 5
        Var2: 7
        Var3: 3
       
       
        Doing it without looping through I thought my data had to quite
systematic, which it is not. I might be wrong though.

Doing the simple preallocation that I describe should speed it up
a lot with very little effort.  It is more work to manipulate the
columns one at a time instead of using data.frame subscripting and
it may not be worth it if you have lots of columns.

If you have a lot of this sort of file and feel that it will be worth
the programming time to do something fancier, here is some code that
reads lines of the form

> cat(lines, sep="\n")
Id: First
  Var1: false
  Var2: 6
  Var3: 8

Id: Second
Id: Last
  Var1: true
  Var3: 8

and produces a matrix with the Id's along the rows and the Var's
along the columns:

> f(lines)
       Var1    Var2 Var3
First  "false" "6"  "8"
Second NA      NA   NA
Last   "true"  NA   "8"

The function f is:

f <- function (lines)
{
    # keep only lines with colons
    lines <- grep(value = TRUE, "^.+:", lines)
    lines <- gsub("^[[:space:]]+|[[:space:]]+$", "", lines)
    isIdLine <- grepl("^Id:", lines)
    group <- cumsum(isIdLine)
    rownames <- sub("^Id:[[:space:]]*", "", lines[isIdLine])
    lines <- lines[!isIdLine]
    group <- group[!isIdLine]
    varname <- sub("[[:space:]]*:.*$", "", lines)
    value <- sub(".*:[[:space:]]*", "", lines)
    colnames <- unique(varname)
    col <- match(varname, colnames)
    retval <- array(NA_character_, c(length(rownames),
length(colnames)),
        dimnames = list(rownames, colnames))
    retval[cbind(group, col)] <- value
    retval
}

The main trick is the matrix subscript given to retval on the
penultimate line.

        Thanks again,
       
       
        Frederik
       
       
       
        On Thu, Apr 14, 2011 at 12:56 PM, William Dunlap
<[hidden email]> wrote:
       

                I have two suggestions to speed up your code, if you
                must use a loop.
               
                First, don't grow your output dataset at each iteration.
                Instead of
                    cases <- 0
                    output <- numeric(cases)
                    while(length(line <- readLines(input, n=1))==1) {
                       cases <- cases + 1
                       output[cases] <- as.numeric(line)
                    }
                preallocate the output vector to be about the size of
                its eventual length (slightly bigger is better),
replacing
                    output <- numeric(0)
                with the likes of
                    output <- numeric(500000)
                and when you are done with the loop trim down the length
                if it is too big
                    if (cases < length(output)) length(output) <- cases
                Growing your dataset in a loop can cause quadratic or
worse
                growth in time with problem size and the above sort of
                code should make the time grow linearly with problem
size.
               
                Second, don't do data.frame subscripting inside your
loop.
                Instead of
                    data <- data.frame(Id=numeric(cases))
                    while(...) {
                        data[cases, 1] <- newValue
                    }
                do
                    Id <- numeric(cases)
                    while(...) {
                        Id[cases] <- newValue
                    }
                    data <- data.frame(Id = Id)
                This is just the general principal that you don't want
to
                repeat the same operation over and over in a loop.
                dataFrame[i,j] first extracts column j then extracts
element
                i from that column.  Since the column is the same every
iteration
                you may as well extract the column outside of the loop.
               
                Avoiding the loop altogether is the fastest.  E.g., the
code
                you showed does the same thing as
                  idLines <- grep(value=TRUE, "Id:", readLines(file))
                  data.frame(Id = as.numeric(sub("^.*Id:[[:space:]]*",
"", idLines)))
                You can also use an external process (perl or grep) to
filter
                out the lines that are not of interest.
               
               
                Bill Dunlap
                Spotfire, TIBCO Software
                wdunlap tibco.com
               

                > -----Original Message-----
                > From: [hidden email]
                > [mailto:[hidden email]] On Behalf Of
Freds
                > Sent: Wednesday, April 13, 2011 10:58 AM
                > To: [hidden email]
                > Subject: Re: [R] Incremental ReadLines
                >
               
                > Hi there,
                >
                > I am having a similar problem with reading in a large
text
                > file with around
                > 550.000 observations with each 10 to 100 lines of
                > description. I am trying
                > to parse it in R but I have troubles with the size of
the
                > file. It seems
                > like it is slowing down dramatically at some point. I
would
                > be happy for any
                > suggestions. Here is my code, which works fine when I
am
                > doing a subsample
                > of my dataset.
                >
                > #Defining datasource
                > file <- "filename.txt"
                >
                > #Creating placeholder for data and assigning column
names
                > data <- data.frame(Id=NA)
                >
                > #Starting by case = 0
                > case <- 0
                >
                > #Opening a connection to data
                > input <- file(file, "rt")
                >
                > #Going through cases
                > repeat {
                >   line <- readLines(input, n=1)
                >   if (length(line)==0) break
                >   if (length(grep("Id:",line)) != 0) {
                >     case <- case + 1 ; data[case,] <-NA
                >     split_line <- strsplit(line,"Id:")
                >     data[case,1] <- as.numeric(split_line[[1]][2])
                >     }
                > }
                >
                > #Closing connection
                > close(input)
                >
                > #Saving dataframe
                > write.csv(data,'data.csv')
                >
                >
                > Kind regards,
                >
                >
                > Frederik
                >
                >
                > --
                > View this message in context:
                >
http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3
                447859.html
<http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3%0A447859
.html>
                > Sent from the R help mailing list archive at
Nabble.com.
                >
                > ______________________________________________
                > [hidden email] mailing list
                > https://stat.ethz.ch/mailman/listinfo/r-help
                > PLEASE do read the posting guide
                > http://www.R-project.org/posting-guide.html
                > and provide commented, minimal, self-contained,
reproducible code.
                >
               

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Incremental ReadLines

Freds
Hi again,

Changing my code by defining vectors outside the loop and combining them
afterwards helped a lot so now the code does not slow down anymore and I was
able to parse the file in less than 2 hours. Not fantastic but it works.

I will William's the last suggestion of how to parse it without looping
through for next time I have to parse a large file.

Many thanks for your help!


Frederik

On Thu, Apr 14, 2011 at 4:58 PM, William Dunlap <[hidden email]> wrote:

> [see below]
>
> From: Frederik Lang [mailto:[hidden email]]
> Sent: Thursday, April 14, 2011 12:56 PM
> To: William Dunlap
> Cc: [hidden email]
> Subject: Re: [R] Incremental ReadLines
>
>
>
>         Hi Bill,
>
>        Thank you so much for your suggestions. I will try and alter my
> code.
>
>
>        Regarding the even shorter solution outside the loop it looks
> good but my problem is that not all observations have the same variables
> so that three different observations might look like this:
>
>
>        Id: 1
>        Var1: false
>        Var2: 6
>        Var3: 8
>
>        Id: 2
>        missing
>
>        Id: 3
>        Var1: true
>        3 4 5
>        Var2: 7
>        Var3: 3
>
>
>        Doing it without looping through I thought my data had to quite
> systematic, which it is not. I might be wrong though.
>
> Doing the simple preallocation that I describe should speed it up
> a lot with very little effort.  It is more work to manipulate the
> columns one at a time instead of using data.frame subscripting and
> it may not be worth it if you have lots of columns.
>
> If you have a lot of this sort of file and feel that it will be worth
> the programming time to do something fancier, here is some code that
> reads lines of the form
>
> > cat(lines, sep="\n")
> Id: First
>   Var1: false
>  Var2: 6
>  Var3: 8
>
> Id: Second
> Id: Last
>  Var1: true
>  Var3: 8
>
> and produces a matrix with the Id's along the rows and the Var's
> along the columns:
>
> > f(lines)
>       Var1    Var2 Var3
> First  "false" "6"  "8"
> Second NA      NA   NA
> Last   "true"  NA   "8"
>
> The function f is:
>
> f <- function (lines)
> {
>    # keep only lines with colons
>    lines <- grep(value = TRUE, "^.+:", lines)
>    lines <- gsub("^[[:space:]]+|[[:space:]]+$", "", lines)
>    isIdLine <- grepl("^Id:", lines)
>    group <- cumsum(isIdLine)
>    rownames <- sub("^Id:[[:space:]]*", "", lines[isIdLine])
>    lines <- lines[!isIdLine]
>    group <- group[!isIdLine]
>    varname <- sub("[[:space:]]*:.*$", "", lines)
>    value <- sub(".*:[[:space:]]*", "", lines)
>    colnames <- unique(varname)
>    col <- match(varname, colnames)
>    retval <- array(NA_character_, c(length(rownames),
> length(colnames)),
>        dimnames = list(rownames, colnames))
>    retval[cbind(group, col)] <- value
>    retval
> }
>
> The main trick is the matrix subscript given to retval on the
> penultimate line.
>
>        Thanks again,
>
>
>        Frederik
>
>
>
>        On Thu, Apr 14, 2011 at 12:56 PM, William Dunlap
> <[hidden email]> wrote:
>
>
>                I have two suggestions to speed up your code, if you
>                must use a loop.
>
>                First, don't grow your output dataset at each iteration.
>                Instead of
>                    cases <- 0
>                    output <- numeric(cases)
>                    while(length(line <- readLines(input, n=1))==1) {
>                       cases <- cases + 1
>                       output[cases] <- as.numeric(line)
>                    }
>                preallocate the output vector to be about the size of
>                its eventual length (slightly bigger is better),
> replacing
>                    output <- numeric(0)
>                with the likes of
>                    output <- numeric(500000)
>                and when you are done with the loop trim down the length
>                if it is too big
>                    if (cases < length(output)) length(output) <- cases
>                Growing your dataset in a loop can cause quadratic or
> worse
>                growth in time with problem size and the above sort of
>                code should make the time grow linearly with problem
> size.
>
>                Second, don't do data.frame subscripting inside your
> loop.
>                Instead of
>                    data <- data.frame(Id=numeric(cases))
>                    while(...) {
>                        data[cases, 1] <- newValue
>                    }
>                do
>                    Id <- numeric(cases)
>                    while(...) {
>                        Id[cases] <- newValue
>                    }
>                    data <- data.frame(Id = Id)
>                This is just the general principal that you don't want
> to
>                repeat the same operation over and over in a loop.
>                dataFrame[i,j] first extracts column j then extracts
> element
>                i from that column.  Since the column is the same every
> iteration
>                you may as well extract the column outside of the loop.
>
>                Avoiding the loop altogether is the fastest.  E.g., the
> code
>                you showed does the same thing as
>                  idLines <- grep(value=TRUE, "Id:", readLines(file))
>                  data.frame(Id = as.numeric(sub("^.*Id:[[:space:]]*",
> "", idLines)))
>                You can also use an external process (perl or grep) to
> filter
>                out the lines that are not of interest.
>
>
>                Bill Dunlap
>                Spotfire, TIBCO Software
>                wdunlap tibco.com
>
>
>                > -----Original Message-----
>                > From: [hidden email]
>                > [mailto:[hidden email]] On Behalf Of
> Freds
>                > Sent: Wednesday, April 13, 2011 10:58 AM
>                > To: [hidden email]
>                > Subject: Re: [R] Incremental ReadLines
>                >
>
>                > Hi there,
>                >
>                > I am having a similar problem with reading in a large
> text
>                > file with around
>                > 550.000 observations with each 10 to 100 lines of
>                > description. I am trying
>                > to parse it in R but I have troubles with the size of
> the
>                > file. It seems
>                > like it is slowing down dramatically at some point. I
> would
>                > be happy for any
>                > suggestions. Here is my code, which works fine when I
> am
>                > doing a subsample
>                > of my dataset.
>                >
>                > #Defining datasource
>                > file <- "filename.txt"
>                >
>                > #Creating placeholder for data and assigning column
> names
>                > data <- data.frame(Id=NA)
>                >
>                > #Starting by case = 0
>                > case <- 0
>                >
>                > #Opening a connection to data
>                > input <- file(file, "rt")
>                >
>                > #Going through cases
>                > repeat {
>                >   line <- readLines(input, n=1)
>                >   if (length(line)==0) break
>                >   if (length(grep("Id:",line)) != 0) {
>                >     case <- case + 1 ; data[case,] <-NA
>                >     split_line <- strsplit(line,"Id:")
>                >     data[case,1] <- as.numeric(split_line[[1]][2])
>                >     }
>                > }
>                >
>                > #Closing connection
>                > close(input)
>                >
>                > #Saving dataframe
>                > write.csv(data,'data.csv')
>                >
>                >
>                > Kind regards,
>                >
>                >
>                > Frederik
>                >
>                >
>                > --
>                > View this message in context:
>                >
> http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3
>                447859.html
> <http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3%0A447859
> .html>
>                > Sent from the R help mailing list archive at
> Nabble.com.
>                >
>                > ______________________________________________
>                > [hidden email] mailing list
>                > https://stat.ethz.ch/mailman/listinfo/r-help
>                > PLEASE do read the posting guide
>                > http://www.R-project.org/posting-guide.html
>                > and provide commented, minimal, self-contained,
> reproducible code.
>                >
>
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.