

I have a dataframe with a column, say "x" consisting of values, each
value appearing different times, e.g.
x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
and a vector, including e.g.:
y: 2,9,10,...
I need a subset of the dataframe: all rows where x is equal to one of
the values in y. Currently I use a loop for this, but because x and y
are large this is very slow.
Is there any idea how to solve this problem faster?
Thank you,
Bernhard
On 2/8/2006 9:21 AM, Bernhard Baumgartner wrote:
> I have a dataframe with a column, say "x" consisting of values, each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of
> the values in y. Currently I use a loop for this, but because x and y
> are large this is very slow.
> Is there any idea how to solve this problem faster?
It's actually very easy. Assume your dataframe is df, then
subset(df, x %in% y)
will give you what you want (assuming there is no column y in the
dataframe).
Duncan Murdoch
Bernhard Baumgartner wrote:
> I have a dataframe with a column, say "x" consisting of values, each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of
> the values in y. Currently I use a loop for this, but because x and y
> are large this is very slow.
> Is there any idea how to solve this problem faster?
mydata < data.frame(X = sample(1:10, 10000, replace=TRUE),
Y = sample(c(2,9,10), 10000, replace=TRUE))
newdata < mydata[mydata$X %in% unique(mydata$Y),]
?"%in%"

Dear Bernhard,
if I understand correctly your question
may be you want something like
df<data.frame(x=sample(1:10, 100, repl=T),
y=sample(1:5, 100, repl=T))
subset(df, x%in%y)
Regards,
Stefano
Sounds like you may need no use match().
On Wed, 20060208 at 15:21 +0100, Bernhard Baumgartner wrote:
> I have a dataframe with a column, say "x" consisting of values, each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of
> the values in y. Currently I use a loop for this, but because x and y
> are large this is very slow.
> Is there any idea how to solve this problem faster?
Hi
something like
xx<data.frame(x=sample(1:10,100,replace=T))
y<c(2,5,8)
xx[xx$x%in%y,]
HTH
Petr
