Reading large data matrix in R

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading large data matrix in R

Madhavan BL
Dear All,

Greetings. I am a new user of R programming.

I have a large ASCII data file with 14 columns and 45000 rows. I tried to
load the file using read.table(), scan(), etc. functions, but failed to
load the data file properly. Missing values in my data are denoted by
"NaN". I also want to exclude these lines. I don't have a column header but
I would like to include the labels in plotting. All the data is in floats.

Can anyone give me some short examples to open my file?

Look forward for your support,

Thanking you in advance,
Regards,
Madhavan

​-----------------------------------------------​

Bomidi Lakshmi Madhavan, Ph.D
Remote Sensing of Atmospheric Processes
Leibniz Institute for Tropospheric Research (TROPOS)
Permoserstraße 15, D-04318 Leipzig, Germany
Phone: +49 (0)341 2717-7187 (Office), +49 (0)1578 8467548 (Mobile)
E-Mail: *[hidden email] <[hidden email]>, **[hidden email]
<[hidden email]>*
Skype ID: *blmadhavan*, Web: *http://sat.tropos.de
<https://mail.tropos.de/owa/redir.aspx?C=b471dab8af5b46d99137a92f561b4753&URL=http%3a%2f%2fwww.tropos.de>*

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading large data matrix in R

jholtman
can you at least attach a sample of what your data file looks like.  That
is not that large of a file and it would depend on exactly how it is
formatted.  Is it a CSV, tab, space, semicolon, etc. separated file?  What
exactly have you tried and what were the results?  A sample of the data
would help to determine how you might want to read it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jan 9, 2015 at 9:44 AM, Madhavan BL <[hidden email]> wrote:

> Dear All,
>
> Greetings. I am a new user of R programming.
>
> I have a large ASCII data file with 14 columns and 45000 rows. I tried to
> load the file using read.table(), scan(), etc. functions, but failed to
> load the data file properly. Missing values in my data are denoted by
> "NaN". I also want to exclude these lines. I don't have a column header but
> I would like to include the labels in plotting. All the data is in floats.
>
> Can anyone give me some short examples to open my file?
>
> Look forward for your support,
>
> Thanking you in advance,
> Regards,
> Madhavan
>
> ​-----------------------------------------------​
>
> Bomidi Lakshmi Madhavan, Ph.D
> Remote Sensing of Atmospheric Processes
> Leibniz Institute for Tropospheric Research (TROPOS)
> Permoserstraße 15, D-04318 Leipzig, Germany
> Phone: +49 (0)341 2717-7187 (Office), +49 (0)1578 8467548 (Mobile)
> E-Mail: *[hidden email] <[hidden email]>, **
> [hidden email]
> <[hidden email]>*
> Skype ID: *blmadhavan*, Web: *http://sat.tropos.de
> <
> https://mail.tropos.de/owa/redir.aspx?C=b471dab8af5b46d99137a92f561b4753&URL=http%3a%2f%2fwww.tropos.de
> >*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading large data matrix in R

David Winsemius
In reply to this post by Madhavan BL

On Jan 9, 2015, at 6:44 AM, Madhavan BL wrote:

> Dear All,
>
> Greetings. I am a new user of R programming.
>
> I have a large ASCII data file with 14 columns and 45000 rows. I tried to
> load the file using read.table(), scan(), etc. functions, but failed to
> load the data file properly. Missing values in my data are denoted by
> "NaN". I also want to exclude these lines. I don't have a column header but
> I would like to include the labels in plotting.

What these "labels" you are referring to?

> All the data is in floats.
>
> Can anyone give me some short examples to open my file?

The default settings for `read.table` assume that there is no header row and that `NA` is the missing value indicator. See ?read.table for more on the default settings. If you have `NaN` as missing value you need to include an na.strings="NaN" argument in the `read.table` call.

You would need to post process results to exclude rows. There is no interval skipping mechanism that I know about.

`scan` assumes all lines have the same structure so answering the question about "labels" would be necessary for any advice about its use here.


>
> Look forward for your support,
>
> Thanking you in advance,
> Regards,
> Madhavan

> [[alternative HTML version deleted]]

I suggest you investigate how to configure your mail-client to respond in plain text.

>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading large data matrix in R

Bert Gunter
You really need to read ?xyplot and probaby also an R tutorial such as

cran.r-project.org/doc/manuals/R-intro.pdf

(which also ships with R) to learn about S3 methods.

R is a language that requires an investment in time and effort to
learn. If you are unwilling to make that investment, use other
software.


Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Mon, Jan 12, 2015 at 3:17 AM, Madhavan BL <[hidden email]> wrote:

> Hello,
>
> While I could read my data through of read.table(), I was unable to get the
> simple line plots and horizon plot using my code below.  I get the
> following error messages:
>
>> xyplot(wdj, scales = list(y= "same"))Error in UseMethod("xyplot") :
>   no applicable method for 'xyplot' applied to an object of class
> "c('matrix', 'double', 'numeric')"> horizonplot(wdj, layout = c(1,12),
> colorkey = TRUE)Error in UseMethod("xyplot") :
>   no applicable method for 'xyplot' applied to an object of class
> "c('matrix', 'double', 'numeric')"
>
> I tried to check my data type by typeof(wdj) which indicates that my
> data is "double". Can anyone help me where I am going wrong? I also
> attach my data file for your convenience along with the code I am
> using below.
>
>
> Below is my R-code for plotting Horizon plot for wavelet details at
> different averaging scales
>
> # Clear out the workspace
> rm(list=ls())
>
> # store the current directory
> initial.dir<-getwd()
>
> # load the necessary libraries
> require(lattice)
> require(latticeExtra)
>
> # load the dataset
> infile <- "pyr43_rsds_wdj_20130413.txt"
> datasets <- ts(read.table(infile, quote="\"", na.strings="NaN"))
>
> # Remove rows with NaN values
> datasets <- datasets[complete.cases(datasets), ]
> time <- datasets[,1]  # Time in UTC hours
> ws0 <- datasets[,2]  # raw irradiance signal at 1 second resolution
> wdj <- datasets[,3:14]  # wavelet details at different averaging scales
>
> # Column names of the dataset
> # colnames(datasets) <-
>  c("Time","1s","5s","10s","20s","40s","1m20s","2m40s","5m20s","10m40s","21m20s","42m40s","1h25m20s","2h50m40s")
> colnames(wdj) <-
> c("5s","10s","20s","40s","1m20s","2m40s","5m20s","10m40s","21m20s","42m40s","1h25m20s","2h50m40s")
>
> # Simple line plot
> xyplot(wdj, scales = list(y= "same"))
>
> # panel with different origin and scale
> horizonplot(wdj, layout = c(1,12), colorkey = TRUE)
>
>
> Thanking you in advance,
> With regards,
> Madhavan
>
>
> ---------------------------------------------------------
>
> Bomidi Lakshmi Madhavan, Ph.D
> Remote Sensing of Atmospheric Processes
> Leibniz Institute for Tropospheric Research (TROPOS)
> Permoserstraße 15, D-04318 Leipzig, Germany
> Phone: +49 (0)341 2717-7187 (Office), +49 (0)1578 8467548 (Mobile)
> E-Mail: *[hidden email] <[hidden email]>, **[hidden email]
> <[hidden email]>*
> Skype ID: *blmadhavan*, Web: *http://sat.tropos.de
> <https://mail.tropos.de/owa/redir.aspx?C=b471dab8af5b46d99137a92f561b4753&URL=http%3a%2f%2fwww.tropos.de>*
>
>
> On Fri, Jan 9, 2015 at 11:23 PM, David Winsemius <[hidden email]>
> wrote:
>
>>
>> On Jan 9, 2015, at 6:44 AM, Madhavan BL wrote:
>>
>> > Dear All,
>> >
>> > Greetings. I am a new user of R programming.
>> >
>> > I have a large ASCII data file with 14 columns and 45000 rows. I tried to
>> > load the file using read.table(), scan(), etc. functions, but failed to
>> > load the data file properly. Missing values in my data are denoted by
>> > "NaN". I also want to exclude these lines. I don't have a column header
>> but
>> > I would like to include the labels in plotting.
>>
>> What these "labels" you are referring to?
>>
>> > All the data is in floats.
>> >
>> > Can anyone give me some short examples to open my file?
>>
>> The default settings for `read.table` assume that there is no header row
>> and that `NA` is the missing value indicator. See ?read.table for more on
>> the default settings. If you have `NaN` as missing value you need to
>> include an na.strings="NaN" argument in the `read.table` call.
>>
>> You would need to post process results to exclude rows. There is no
>> interval skipping mechanism that I know about.
>>
>> `scan` assumes all lines have the same structure so answering the question
>> about "labels" would be necessary for any advice about its use here.
>>
>>
>> >
>> > Look forward for your support,
>> >
>> > Thanking you in advance,
>> > Regards,
>> > Madhavan
>>
>> >       [[alternative HTML version deleted]]
>>
>> I suggest you investigate how to configure your mail-client to respond in
>> plain text.
>>
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading large data matrix in R

Sarah Goslee
In reply to this post by David Winsemius
I'm not at all clear on what you want your plots to look like, but I
would recommend reading the help for xyplot and trying out the
examples. In general, you need to pass xyplot() a formula that
describes what you want your plots to contain, and then a data
argument with the data for doing so.

?xyplot is very long, but do take a look. Then if you can't get your
code working, a detailed description of what you're trying to achieve
would be useful.

Sarah

On Mon, Jan 12, 2015 at 6:17 AM, Madhavan BL <[hidden email]> wrote:

> Hello,
>
> While I could read my data through of read.table(), I was unable to get the
> simple line plots and horizon plot using my code below.  I get the
> following error messages:
>
>> xyplot(wdj, scales = list(y= "same"))Error in UseMethod("xyplot") :
>   no applicable method for 'xyplot' applied to an object of class
> "c('matrix', 'double', 'numeric')"> horizonplot(wdj, layout = c(1,12),
> colorkey = TRUE)Error in UseMethod("xyplot") :
>   no applicable method for 'xyplot' applied to an object of class
> "c('matrix', 'double', 'numeric')"
>
> I tried to check my data type by typeof(wdj) which indicates that my
> data is "double". Can anyone help me where I am going wrong? I also
> attach my data file for your convenience along with the code I am
> using below.
>
>
> Below is my R-code for plotting Horizon plot for wavelet details at
> different averaging scales
>
> # Clear out the workspace
> rm(list=ls())
>
> # store the current directory
> initial.dir<-getwd()
>
> # load the necessary libraries
> require(lattice)
> require(latticeExtra)
>
> # load the dataset
> infile <- "pyr43_rsds_wdj_20130413.txt"
> datasets <- ts(read.table(infile, quote="\"", na.strings="NaN"))
>
> # Remove rows with NaN values
> datasets <- datasets[complete.cases(datasets), ]
> time <- datasets[,1]  # Time in UTC hours
> ws0 <- datasets[,2]  # raw irradiance signal at 1 second resolution
> wdj <- datasets[,3:14]  # wavelet details at different averaging scales
>
> # Column names of the dataset
> # colnames(datasets) <-
>  c("Time","1s","5s","10s","20s","40s","1m20s","2m40s","5m20s","10m40s","21m20s","42m40s","1h25m20s","2h50m40s")
> colnames(wdj) <-
> c("5s","10s","20s","40s","1m20s","2m40s","5m20s","10m40s","21m20s","42m40s","1h25m20s","2h50m40s")
>
> # Simple line plot
> xyplot(wdj, scales = list(y= "same"))
>
> # panel with different origin and scale
> horizonplot(wdj, layout = c(1,12), colorkey = TRUE)
>
>
> Thanking you in advance,
> With regards,
> Madhavan
>
>
--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.