subtract a list of vectors from a list of data.frames in an elegant way

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

subtract a list of vectors from a list of data.frames in an elegant way

Eric Fail-3
Dear R experts,

I've realized that it might not be possible to define a negative SELCET statement in a SQL call so now I'm looking for the smoothest way to generate a list of what I would like from my large database by first pulling all the names with a query like this "SELECT top 1 * FROM your_table" (thank you Bart Joosen for the idea) and then subtract the variables I am not allow to pull manually ending up with a 'positive' definition of what I want, something I can use in a SQL SELCT statement (see my email on this list from yesterday for more on that).

When I query the database for the variable names I get something similar to 'DBquery' in my working example below, but considerable longer with over 2400 hundred variables. As I only need to remove two or three variables I would like to define a lookup table (like the list 'lookup' in my example) and subtract that from my data base query. Now to my question. Is there a way I can subtract one list from another? Like setoff or alike?

I would like to end up with a list like the one shown in my example called 'result.' In short, I would like to subtract 'lookup' from 'DBquery' and end up with 'result,' please note that 'result' is a list fo vecktors and not a list of dataframes. In my real life example DBquery is considerable longer so defining that by hand would make a really really long syntax.

Hope someone know some smart function that I can use to solve my problem in an elegant way.

Thanks for reading.

Erick

###### begin R code ######

DBquery     <- list(tableA=data.frame(id = numeric(0), atwin = numeric(0), atrout = numeric(0)),
                    tableB=data.frame(id = numeric(0), mq   = numeric(0), z = numeric(0), m = numeric(0)),
                    tableC=data.frame(V1 = numeric(0), mfn   = numeric(0), iiff = numeric(0)),
                    tableD=data.frame(id    = numeric(0), msf   = numeric(0), oom  = numeric(0)))

lookup     <- list(tableA=   c('atwin', 'atrout'),
                   tableB=    c('m', 'z'),
                   tableC=    'ALL')

### ...

result     <- list(tableA= c('id'),
                    tableB= c('id', 'mq'),
                    tableC= c('V1', 'mfn', 'iiff'))

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subtract a list of vectors from a list of data.frames in an elegant way

jholtman
?setdiff

Sent from my iPad

On Mar 29, 2012, at 4:28, "Eric Fail" <[hidden email]> wrote:

> Dear R experts,
>
> I've realized that it might not be possible to define a negative SELCET statement in a SQL call so now I'm looking for the smoothest way to generate a list of what I would like from my large database by first pulling all the names with a query like this "SELECT top 1 * FROM your_table" (thank you Bart Joosen for the idea) and then subtract the variables I am not allow to pull manually ending up with a 'positive' definition of what I want, something I can use in a SQL SELCT statement (see my email on this list from yesterday for more on that).
>
> When I query the database for the variable names I get something similar to 'DBquery' in my working example below, but considerable longer with over 2400 hundred variables. As I only need to remove two or three variables I would like to define a lookup table (like the list 'lookup' in my example) and subtract that from my data base query. Now to my question. Is there a way I can subtract one list from another? Like setoff or alike?
>
> I would like to end up with a list like the one shown in my example called 'result.' In short, I would like to subtract 'lookup' from 'DBquery' and end up with 'result,' please note that 'result' is a list fo vecktors and not a list of dataframes. In my real life example DBquery is considerable longer so defining that by hand would make a really really long syntax.
>
> Hope someone know some smart function that I can use to solve my problem in an elegant way.
>
> Thanks for reading.
>
> Erick
>
> ###### begin R code ######
>
> DBquery     <- list(tableA=data.frame(id = numeric(0), atwin = numeric(0), atrout = numeric(0)),
>                    tableB=data.frame(id = numeric(0), mq   = numeric(0), z = numeric(0), m = numeric(0)),
>                    tableC=data.frame(V1 = numeric(0), mfn   = numeric(0), iiff = numeric(0)),
>                    tableD=data.frame(id    = numeric(0), msf   = numeric(0), oom  = numeric(0)))
>
> lookup     <- list(tableA=   c('atwin', 'atrout'),
>                   tableB=    c('m', 'z'),
>                   tableC=    'ALL')
>
> ### ...
>
> result     <- list(tableA= c('id'),
>                    tableB= c('id', 'mq'),
>                    tableC= c('V1', 'mfn', 'iiff'))
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subtract a list of vectors from a list of data.frames in an elegant way

Eric Fail-3
a working solution to the problem,

 a <- DBquery[names(lookup)]

mother.of.lookup <- list()
for(string in names(a)) {
   a[[string]] <- names(a[[string]])
   mother.of.lookup[[string]] <- setdiff(a[[string]], lookup[[string]])
}

identical(mother.of.lookup, result)

It might not be the most elegant solution, but it works.

Best,
Eric

On Thu, Mar 29, 2012 at 4:07 AM, Jim Holtman <[hidden email]> wrote:

>
> ?setdiff
>
> Sent from my iPad
>
> On Mar 29, 2012, at 4:28, "Eric Fail" <[hidden email]> wrote:
>
> > Dear R experts,
> >
> > I've realized that it might not be possible to define a negative SELCET statement in a SQL call so now I'm looking for the smoothest way to generate a list of what I would like from my large database by first pulling all the names with a query like this "SELECT top 1 * FROM your_table" (thank you Bart Joosen for the idea) and then subtract the variables I am not allow to pull manually ending up with a 'positive' definition of what I want, something I can use in a SQL SELCT statement (see my email on this list from yesterday for more on that).
> >
> > When I query the database for the variable names I get something similar to 'DBquery' in my working example below, but considerable longer with over 2400 hundred variables. As I only need to remove two or three variables I would like to define a lookup table (like the list 'lookup' in my example) and subtract that from my data base query. Now to my question. Is there a way I can subtract one list from another? Like setoff or alike?
> >
> > I would like to end up with a list like the one shown in my example called 'result.' In short, I would like to subtract 'lookup' from 'DBquery' and end up with 'result,' please note that 'result' is a list fo vecktors and not a list of dataframes. In my real life example DBquery is considerable longer so defining that by hand would make a really really long syntax.
> >
> > Hope someone know some smart function that I can use to solve my problem in an elegant way.
> >
> > Thanks for reading.
> >
> > Erick
> >
> > ###### begin R code ######
> >
> > DBquery     <- list(tableA=data.frame(id = numeric(0), atwin = numeric(0), atrout = numeric(0)),
> >                    tableB=data.frame(id = numeric(0), mq   = numeric(0), z = numeric(0), m = numeric(0)),
> >                    tableC=data.frame(V1 = numeric(0), mfn   = numeric(0), iiff = numeric(0)),
> >                    tableD=data.frame(id    = numeric(0), msf   = numeric(0), oom  = numeric(0)))
> >
> > lookup     <- list(tableA=   c('atwin', 'atrout'),
> >                   tableB=    c('m', 'z'),
> >                   tableC=    'ALL')
> >
> > ### ...
> >
> > result     <- list(tableA= c('id'),
> >                    tableB= c('id', 'mq'),
> >                    tableC= c('V1', 'mfn', 'iiff'))
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subtract a list of vectors from a list of data.frames in an elegant way

Henrique Dallazuanna
Try this:

mapply('[', DBquery, mapply(setdiff, lapply(DBquery, names), lookup))

On Sat, Mar 31, 2012 at 5:50 PM, Eric Fail <[hidden email]> wrote:

> a working solution to the problem,
>
>  a <- DBquery[names(lookup)]
>
> mother.of.lookup <- list()
> for(string in names(a)) {
>   a[[string]] <- names(a[[string]])
>   mother.of.lookup[[string]] <- setdiff(a[[string]], lookup[[string]])
> }
>
> identical(mother.of.lookup, result)
>
> It might not be the most elegant solution, but it works.
>
> Best,
> Eric
>
> On Thu, Mar 29, 2012 at 4:07 AM, Jim Holtman <[hidden email]> wrote:
>>
>> ?setdiff
>>
>> Sent from my iPad
>>
>> On Mar 29, 2012, at 4:28, "Eric Fail" <[hidden email]> wrote:
>>
>> > Dear R experts,
>> >
>> > I've realized that it might not be possible to define a negative SELCET statement in a SQL call so now I'm looking for the smoothest way to generate a list of what I would like from my large database by first pulling all the names with a query like this "SELECT top 1 * FROM your_table" (thank you Bart Joosen for the idea) and then subtract the variables I am not allow to pull manually ending up with a 'positive' definition of what I want, something I can use in a SQL SELCT statement (see my email on this list from yesterday for more on that).
>> >
>> > When I query the database for the variable names I get something similar to 'DBquery' in my working example below, but considerable longer with over 2400 hundred variables. As I only need to remove two or three variables I would like to define a lookup table (like the list 'lookup' in my example) and subtract that from my data base query. Now to my question. Is there a way I can subtract one list from another? Like setoff or alike?
>> >
>> > I would like to end up with a list like the one shown in my example called 'result.' In short, I would like to subtract 'lookup' from 'DBquery' and end up with 'result,' please note that 'result' is a list fo vecktors and not a list of dataframes. In my real life example DBquery is considerable longer so defining that by hand would make a really really long syntax.
>> >
>> > Hope someone know some smart function that I can use to solve my problem in an elegant way.
>> >
>> > Thanks for reading.
>> >
>> > Erick
>> >
>> > ###### begin R code ######
>> >
>> > DBquery     <- list(tableA=data.frame(id = numeric(0), atwin = numeric(0), atrout = numeric(0)),
>> >                    tableB=data.frame(id = numeric(0), mq   = numeric(0), z = numeric(0), m = numeric(0)),
>> >                    tableC=data.frame(V1 = numeric(0), mfn   = numeric(0), iiff = numeric(0)),
>> >                    tableD=data.frame(id    = numeric(0), msf   = numeric(0), oom  = numeric(0)))
>> >
>> > lookup     <- list(tableA=   c('atwin', 'atrout'),
>> >                   tableB=    c('m', 'z'),
>> >                   tableC=    'ALL')
>> >
>> > ### ...
>> >
>> > result     <- list(tableA= c('id'),
>> >                    tableB= c('id', 'mq'),
>> >                    tableC= c('V1', 'mfn', 'iiff'))
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subtract a list of vectors from a list of data.frames in an elegant way

Gabor Grothendieck
In reply to this post by Eric Fail-3
On Thu, Mar 29, 2012 at 4:28 AM, Eric Fail <[hidden email]> wrote:
> Dear R experts,
>
> I've realized that it might not be possible to define a negative SELCET statement in a SQL call so now I'm looking for the smoothest way to generate a list of what I would like from my large database by first pulling all the names with a query like this "SELECT top 1 * FROM your_table" (thank you Bart Joosen for the idea) and then subtract the variables I am not allow to pull manually ending up with a 'positive' definition of what I want, something I can use in a SQL SELCT statement (see my email on this list from yesterday for more on that).
>
> When I query the database for the variable names I get something similar to 'DBquery' in my working example below, but considerable longer with over 2400 hundred variables. As I only need to remove two or three variables I would like to define a lookup table (like the list 'lookup' in my example) and subtract that from my data base query. Now to my question. Is there a way I can subtract one list from another? Like setoff or alike?
>
> I would like to end up with a list like the one shown in my example called 'result.' In short, I would like to subtract 'lookup' from 'DBquery' and end up with 'result,' please note that 'result' is a list fo vecktors and not a list of dataframes. In my real life example DBquery is considerable longer so defining that by hand would make a really really long syntax.
>

One can use  except  in sqlite and in a number of other database systems:

library(sqldf)

# first two rows of BOD
BOD12 <- BOD[1:2, ]

# all rows of BOD except those in BOD12
sqldf("select * from BOD except (select * from BOD12)")



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.