Quantcast

Subset based on items in a list.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Subset based on items in a list.

Kyle.-3
R-help:

I have a variable ("ID_list") containing about 1800 unique numbers, and a
143066x29 data frame.  One of the columns ("ID") in my data frame contains a
list of ids, many of which appear more than once.  I'd like to find the
subset of my data frame for which "ID" matches one of the numbers in
"ID_list." I'm pretty sure I could write a function to do this--something
like:

dataSubset<-function(df, id_list){
    tmp = data.frame()
    for(i in id_list){
        for(j in 1:dim(df)[1]){
            if(i==df$ID[j]){
                tmp<-data.frame(df[j,])
                }
            }
        }
    tmp
    }

but this seems inefficient. As I understand it, the subset function won't
really solve my problem, but it seems like there must be something out there
that will that I must be forgetting. Does anyone know of a way to solve this
problem in an efficient way?  Thanks!


Kyle H. Ambert
Graduate Student, Department of Medical Informatics & Clinical Epidemiology
Oregon Health & Science University
[hidden email]

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Subset based on items in a list.

Kyle.

R-help:

I have a variable ("ID_list") containing about 1800 unique numbers, and a
143066x29 data frame.  One of the columns ("ID") in my data frame contains a
list of ids, many of which appear more than once.  I'd like to find the
subset of my data frame for which "ID" matches one of the numbers in
"ID_list." I'm pretty sure I could write a function to do this--something
like:

dataSubset<-function(df, id_list){
    tmp = data.frame()
    for(i in id_list){
        for(j in 1:dim(df)[1]){
            if(i==df$ID[j]){
                tmp<-data.frame(df[j,])
                }
            }
        }
    tmp
    }

but this seems inefficient. As I understand it, the subset function won't
really solve my problem, but it seems like there must be something out there
that will that I must be forgetting. Does anyone know of a way to solve this
problem in an efficient way?  Thanks!


Kyle H. Ambert
Graduate Student, Department of Medical Informatics & Clinical Epidemiology
Oregon Health & Science University
[hidden email]

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Subset based on items in a list.

Erik Iverson
In reply to this post by Kyle.-3
I don't know if I understand (small example with R command wouuld help),
but, assuming your data.frame is called 'df'

subset(df, ID %in% ID_list)

Question, is ID_list a "list" or a vector, and are they really "numbers"
or "factors"?

Kyle. wrote:

> R-help:
>
> I have a variable ("ID_list") containing about 1800 unique numbers, and a
> 143066x29 data frame.  One of the columns ("ID") in my data frame contains a
> list of ids, many of which appear more than once.  I'd like to find the
> subset of my data frame for which "ID" matches one of the numbers in
> "ID_list." I'm pretty sure I could write a function to do this--something
> like:
>
> dataSubset<-function(df, id_list){
>     tmp = data.frame()
>     for(i in id_list){
>         for(j in 1:dim(df)[1]){
>             if(i==df$ID[j]){
>                 tmp<-data.frame(df[j,])
>                 }
>             }
>         }
>     tmp
>     }
>
> but this seems inefficient. As I understand it, the subset function won't
> really solve my problem, but it seems like there must be something out there
> that will that I must be forgetting. Does anyone know of a way to solve this
> problem in an efficient way?  Thanks!
>
>
> Kyle H. Ambert
> Graduate Student, Department of Medical Informatics & Clinical Epidemiology
> Oregon Health & Science University
> [hidden email]
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...