[R] Subset by using multiple values

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[R] Subset by using multiple values

Farrel Buchinsky
I have a vector containg about 20 unique values. It is called rejectrs$rs.
It is a factor
I have a data frame with about 100000 rows.
I want to exclude all rows where in variable rs the value is one of the 20
on the exclude list. I thought this would work but none did.

RawSeqBig<-subset(RawSeqBig,ASSAY_ID!=rejectrs$rs)

RawSeqBig<-subset(RawSeqBig,ASSAY_ID!=list(rejectrs$rs))


--
Farrel Buchinsky
Mobile: (412) 779-1073

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [R] Subset by using multiple values

Farrel Buchinsky
I found a solution to my problem. I thought I would post it here. That will
help me in 3 months when I have forgotten it or some other poor soul who
stumbles across the same problem.

RawSeqBig<-RawSeqBig[RawSeqBig$ASSAY_ID %in% rejectrs$rs==FALSE,]

"Farrel Buchinsky" <[hidden email]> wrote in message
news:[hidden email]...

>I have a vector containg about 20 unique values. It is called rejectrs$rs.
> It is a factor
> I have a data frame with about 100000 rows.
> I want to exclude all rows where in variable rs the value is one of the 20
> on the exclude list. I thought this would work but none did.
>
> RawSeqBig<-subset(RawSeqBig,ASSAY_ID!=rejectrs$rs)
>
> RawSeqBig<-subset(RawSeqBig,ASSAY_ID!=list(rejectrs$rs))
>
>
> --
> Farrel Buchinsky
> Mobile: (412) 779-1073
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [R] Subset by using multiple values

clangkamp
Hi
I would like to extend this item to the following:
I have the following table

  X1   X2   X3 value
1   BVEq AGR 11412 954.75
2 CA_Tot AGR 11412 970.59
...
> str(DC2_m)
'data.frame': 104160 obs. of  4 variables:
 $ X1   : Factor w/ 62 levels "BVEq","CA_Tot",..: 1 2 3 4 5 6 45 46 47 48 ...
  ..- attr(*, "names")= chr  "Figure.1" "Figure.995" "Figure.17873" "Figure.17874" ...
 $ X2   : Factor w/ 48 levels "AGR","AKZ","ALB",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr  "1" "1" "1" "1" ...
 $ X3   : int  11412 11412 11412 11412 11412 11412 11412 11412 11412 11412 ...
 $ value: num  955 971 NA NA NA ...


And I have a second (manual) table with entries of combinations of X2 and X3 which I want to exclude:
> str(Exclude_Data)
'data.frame': 8 obs. of  2 variables:
 $ Code : Factor w/ 5 levels "ALB","ALQ","BAY",..: 3 3 2 4 5 3 1 2
 $ Dates: int  12052 12233 12508 11960 13056 12142 12691 12783

subset(DC2_m, cbind(X2,X3) %in% Exclude_Data[])

Now the trick is to precisely exclude just the combinations chosen, and not all combinations of Exclude_Data[1] and Exclude_Data[2], which is what happens when doing two statements "X2 in ED[1]" AND "X3 in ED[3]".

Any takers ? Thanks in advance
Christian
Christian Langkamp
christian.langkamp-at-gmxpro.de
Reply | Threaded
Open this post in threaded view
|

Re: Subset by using multiple values

Phil Spector
One possibility would be to paste together the values before
subsetting:

subset(DC2_m,!paste(as.character(X2),X3,sep='\\0') %in% paste(as.character(Exclude_Data$Code),Exclude_Data$Dates,sep='\\0'))

(untested due to lack of a reproducible example).

  - Phil Spector
  Statistical Computing Facility
  Department of Statistics
  UC Berkeley
  [hidden email]


On Mon, 29 Nov 2010, clangkamp wrote:

>
> Hi
> I would like to extend this item to the following:
> I have the following table
>
>  X1   X2   X3 value
> 1   BVEq AGR 11412 954.75
> 2 CA_Tot AGR 11412 970.59
> ...
>> str(DC2_m)
> 'data.frame': 104160 obs. of  4 variables:
> $ X1   : Factor w/ 62 levels "BVEq","CA_Tot",..: 1 2 3 4 5 6 45 46 47 48
> ...
>  ..- attr(*, "names")= chr  "Figure.1" "Figure.995" "Figure.17873"
> "Figure.17874" ...
> $ X2   : Factor w/ 48 levels "AGR","AKZ","ALB",..: 1 1 1 1 1 1 1 1 1 1 ...
>  ..- attr(*, "names")= chr  "1" "1" "1" "1" ...
> $ X3   : int  11412 11412 11412 11412 11412 11412 11412 11412 11412 11412
> ...
> $ value: num  955 971 NA NA NA ...
>
>
> And I have a second (manual) table with entries of combinations of X2 and X3
> which I want to exclude:
>> str(Exclude_Data)
> 'data.frame': 8 obs. of  2 variables:
> $ Code : Factor w/ 5 levels "ALB","ALQ","BAY",..: 3 3 2 4 5 3 1 2
> $ Dates: int  12052 12233 12508 11960 13056 12142 12691 12783
>
> subset(DC2_m, cbind(X2,X3) %in% Exclude_Data[])
>
> Now the trick is to precisely exclude just the combinations chosen, and not
> all combinations of Exclude_Data[1] and Exclude_Data[2], which is what
> happens when doing two statements "X2 in ED[1]" AND "X3 in ED[3]".
>
> Any takers ? Thanks in advance
> Christian
> --
> View this message in context: http://r.789695.n4.nabble.com/R-Subset-by-using-multiple-values-tp815278p3064226.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Subset by using multiple values

clangkamp
Dear Phil
thanks a lot, it worked just perfect !
Christian
Christian Langkamp
christian.langkamp-at-gmxpro.de