dataframe subset

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

dataframe subset

Bernhard Baumgartner
I have a dataframe with a column, say "x" consisting of values, each
value appearing different times, e.g.
x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
and a vector, including e.g.:
y: 2,9,10,...
I need a subset of the dataframe: all rows where x is equal to one of
the values in y. Currently I use a loop for this, but because x and y
are large this is very slow.
Is there any idea how to solve this problem faster?
Thank you,
Bernhard

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: dataframe subset

Duncan Murdoch
On 2/8/2006 9:21 AM, Bernhard Baumgartner wrote:
> I have a dataframe with a column, say "x" consisting of values, each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of
> the values in y. Currently I use a loop for this, but because x and y
> are large this is very slow.
> Is there any idea how to solve this problem faster?

It's actually very easy.  Assume your dataframe is df, then

subset(df, x %in% y)

will give you what you want (assuming there is no column y in the
dataframe).

Duncan Murdoch

> Thank you,
> Bernhard
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: dataframe subset

Chuck Cleland
In reply to this post by Bernhard Baumgartner
Bernhard Baumgartner wrote:
> I have a dataframe with a column, say "x" consisting of values, each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of
> the values in y. Currently I use a loop for this, but because x and y
> are large this is very slow.
> Is there any idea how to solve this problem faster?

mydata <- data.frame(X = sample(1:10, 10000, replace=TRUE),
                      Y = sample(c(2,9,10), 10000, replace=TRUE))

newdata <- mydata[mydata$X %in% unique(mydata$Y),]

?"%in%"

--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

R: dataframe subset

Guazzetti Stefano
In reply to this post by Bernhard Baumgartner
Dear Bernhard,
if I understand correctly your question
may be you want something like

 df<-data.frame(x=sample(1:10, 100, repl=T),
                y=sample(1:5, 100, repl=T))
 subset(df, x%in%y)

Regards,

Stefano

   >-----Messaggio originale-----
   >Da: [hidden email]
   >[mailto:[hidden email]]Per conto di Bernhard
   >Baumgartner
   >Inviato: 08 February 2006 15:22
   >A: [hidden email]
   >Oggetto: [R] dataframe subset
   >
   >
   >I have a dataframe with a column, say "x" consisting of
   >values, each
   >value appearing different times, e.g.
   >x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
   >and a vector, including e.g.:
   >y: 2,9,10,...
   >I need a subset of the dataframe: all rows where x is equal
   >to one of
   >the values in y. Currently I use a loop for this, but
   >because x and y
   >are large this is very slow.
   >Is there any idea how to solve this problem faster?
   >Thank you,
   >Bernhard
   >
   >______________________________________________
   >[hidden email] mailing list
   >https://stat.ethz.ch/mailman/listinfo/r-help
   >PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: dataframe subset

Adaikalavan Ramasamy
In reply to this post by Bernhard Baumgartner
Sounds like you may need no use match().

On Wed, 2006-02-08 at 15:21 +0100, Bernhard Baumgartner wrote:

> I have a dataframe with a column, say "x" consisting of values, each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of
> the values in y. Currently I use a loop for this, but because x and y
> are large this is very slow.
> Is there any idea how to solve this problem faster?
> Thank you,
> Bernhard
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: dataframe subset

PIKAL Petr
In reply to this post by Bernhard Baumgartner
Hi

something like

xx<-data.frame(x=sample(1:10,100,replace=T))
y<-c(2,5,8)
xx[xx$x%in%y,]

HTH
Petr



On 8 Feb 2006 at 15:21, Bernhard Baumgartner wrote:

From:           "Bernhard Baumgartner" <[hidden email]>
Organization:   Universitaet Regensburg
To:             [hidden email]
Date sent:       Wed, 08 Feb 2006 15:21:46 +0100
Priority:       normal
Subject:         [R] dataframe subset

> I have a dataframe with a column, say "x" consisting of values, each
> value appearing different times, e.g. x:
> 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... and a vector, including e.g.:
> y: 2,9,10,... I need a subset of the dataframe: all rows where x is
> equal to one of the values in y. Currently I use a loop for this, but
> because x and y are large this is very slow. Is there any idea how to
> solve this problem faster? Thank you, Bernhard
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html