Subsetting data frame problem....

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Subsetting data frame problem....

Milicic B. Marko
Dear R users,

I'm new but already fascinated R user so please forgive for my
ignorance. I have the problem, I read most of help pages but couldn't
find the solution. The problem follows....

I have large data set 10,000 rows and more than 100 columns... Say
something like

var1,var2,var2,var4.......var120
-------------------------------------------
12,12,345,657,67,8.....
12,12,345,657,0,8.....
NA,12,345,657,NA,8.....
12,12,NA,657,67,8.....
12,12,345,657,NA,8.....

I would like to select only rows where all variables are not NA.... so
I can do something like


df <- subset(
                              df
                              , !is.na(var1) & !is.na(var2) &
!is.na(var3) & !is.na(var4) & !is.na(var5)......................
                          );


But that would be very bad solution because I have more than 100
variables and if would be lengthy code to maintan..... also, it might
be error prone programming style...Am I right?

my question is if there is some smarter way of doing this which would
work even if I have 1000 variables???

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting data frame problem....

jholtman
?complete.cases

On Jan 1, 2008 8:50 PM, Marko Milicic <[hidden email]> wrote:

> Dear R users,
>
> I'm new but already fascinated R user so please forgive for my
> ignorance. I have the problem, I read most of help pages but couldn't
> find the solution. The problem follows....
>
> I have large data set 10,000 rows and more than 100 columns... Say
> something like
>
> var1,var2,var2,var4.......var120
> -------------------------------------------
> 12,12,345,657,67,8.....
> 12,12,345,657,0,8.....
> NA,12,345,657,NA,8.....
> 12,12,NA,657,67,8.....
> 12,12,345,657,NA,8.....
>
> I would like to select only rows where all variables are not NA.... so
> I can do something like
>
>
> df <- subset(
>                              df
>                              , !is.na(var1) & !is.na(var2) &
> !is.na(var3) & !is.na(var4) & !is.na(var5)......................
>                          );
>
>
> But that would be very bad solution because I have more than 100
> variables and if would be lengthy code to maintan..... also, it might
> be error prone programming style...Am I right?
>
> my question is if there is some smarter way of doing this which would
> work even if I have 1000 variables???
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting data frame problem....

Ross Darnell
In reply to this post by Milicic B. Marko
You could try


> complete.case.df <- na.omit(df)


Ross Darnell


-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
On Behalf Of Marko Milicic
Sent: Wednesday, 2 January 2008 11:50 AM
To: [hidden email]
Subject: [R] Subsetting data frame problem....

Dear R users,

I'm new but already fascinated R user so please forgive for my
ignorance. I have the problem, I read most of help pages but couldn't
find the solution. The problem follows....

I have large data set 10,000 rows and more than 100 columns... Say
something like

var1,var2,var2,var4.......var120
-------------------------------------------
12,12,345,657,67,8.....
12,12,345,657,0,8.....
NA,12,345,657,NA,8.....
12,12,NA,657,67,8.....
12,12,345,657,NA,8.....

I would like to select only rows where all variables are not NA.... so
I can do something like


df <- subset(
                              df
                              , !is.na(var1) & !is.na(var2) &
!is.na(var3) & !is.na(var4) & !is.na(var5)......................
                          );


But that would be very bad solution because I have more than 100
variables and if would be lengthy code to maintan..... also, it might
be error prone programming style...Am I right?

my question is if there is some smarter way of doing this which would
work even if I have 1000 variables???

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting data frame problem....

Simon Blomberg-4
Or use complete.cases

df.complete <- df[complete.cases(df),]

Simon.

On Wed, 2008-01-02 at 13:21 +1000, Ross Darnell wrote:

> You could try
>
>
> > complete.case.df <- na.omit(df)
>
>
> Ross Darnell
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> On Behalf Of Marko Milicic
> Sent: Wednesday, 2 January 2008 11:50 AM
> To: [hidden email]
> Subject: [R] Subsetting data frame problem....
>
> Dear R users,
>
> I'm new but already fascinated R user so please forgive for my
> ignorance. I have the problem, I read most of help pages but couldn't
> find the solution. The problem follows....
>
> I have large data set 10,000 rows and more than 100 columns... Say
> something like
>
> var1,var2,var2,var4.......var120
> -------------------------------------------
> 12,12,345,657,67,8.....
> 12,12,345,657,0,8.....
> NA,12,345,657,NA,8.....
> 12,12,NA,657,67,8.....
> 12,12,345,657,NA,8.....
>
> I would like to select only rows where all variables are not NA.... so
> I can do something like
>
>
> df <- subset(
>                               df
>                               , !is.na(var1) & !is.na(var2) &
> !is.na(var3) & !is.na(var4) & !is.na(var5)......................
>                           );
>
>
> But that would be very bad solution because I have more than 100
> variables and if would be lengthy code to maintan..... also, it might
> be error prone programming style...Am I right?
>
> my question is if there is some smarter way of doing this which would
> work even if I have 1000 variables???
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
Faculty of Biological and Chemical Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer can
be extracted from a given body of data. - John Tukey.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.