subset using noncontiguous variables by name (not index)

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

subset using noncontiguous variables by name (not index)

Muenchen, Robert A (Bob)
Hi All,

I'm using the subset function to select a list of variables, some of
which are contiguous in the data frame, and others of which are not. It
works fine when I use the form:

subset(mydata,select=c(x1,x3:x5,x7) )

In reality, my list is far more complex. So I would like to store it in
a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
work. That use of the c function seems to violate R rules, so I'm not
sure how it works at all. A small simulation of the problem is below.

If the variable names & orders were really this simple, I could use
indices like

summary( mydata[ ,c(1,3:5,7) ] )

but alas, they are not.

How does the c function work this way in the first place, and how can I
make this substitution?

Thanks,
Bob

mydata <- data.frame(
  x1=c(1,2,3,4,5),
  x2=c(1,2,3,4,5),
  x3=c(1,2,3,4,5),
  x4=c(1,2,3,4,5),
  x5=c(1,2,3,4,5),
  x6=c(1,2,3,4,5),
  x7=c(1,2,3,4,5)
)
mydata

# This does what I want.
summary(
  subset(mydata,select=c(x1,x3:x5,x7) )
)

# Can I substitute myVars?
attach(mydata)
myVars1 <- c(x1,x3:x5,x7)

# Not looking good!
myVars1

# This doesn't do the right thing.
summary(
  subset(mydata,select=myVars1 )
)

# Total desperation on this attempt:
myVars2 <- "x1,x3:x5,x7"
myVars2

# This doesn't work either.
summary(
  subset(mydata,select=myVars2 )
)



=========================================================
Bob Muenchen (pronounced Min'-chen), Manager
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: [hidden email]
Web: http://oit.utk.edu/scc,
News: http://listserv.utk.edu/archives/statnews.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subset using noncontiguous variables by name (not index)

Gabor Grothendieck
Using builtin data frame anscombe try this. First we set up a data frame
anscombe.seq which has one row containing 1, 2, 3, ... .  Then select
out from that data frame and unlist it to get the desired index vector.

> anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> anscombe[idx]
   x1 x3 x4   y2
1  10 10  8 9.14
2   8  8  8 8.14
3  13 13  8 8.74
4   9  9  8 8.77
5  11 11  8 9.26
6  14 14  8 8.10
7   6  6  8 6.13
8   4  4 19 3.10
9  12 12  8 9.13
10  7  7  8 7.26
11  5  5  8 4.74


On 8/26/07, Muenchen, Robert A (Bob) <[hidden email]> wrote:

> Hi All,
>
> I'm using the subset function to select a list of variables, some of
> which are contiguous in the data frame, and others of which are not. It
> works fine when I use the form:
>
> subset(mydata,select=c(x1,x3:x5,x7) )
>
> In reality, my list is far more complex. So I would like to store it in
> a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
> work. That use of the c function seems to violate R rules, so I'm not
> sure how it works at all. A small simulation of the problem is below.
>
> If the variable names & orders were really this simple, I could use
> indices like
>
> summary( mydata[ ,c(1,3:5,7) ] )
>
> but alas, they are not.
>
> How does the c function work this way in the first place, and how can I
> make this substitution?
>
> Thanks,
> Bob
>
> mydata <- data.frame(
>  x1=c(1,2,3,4,5),
>  x2=c(1,2,3,4,5),
>  x3=c(1,2,3,4,5),
>  x4=c(1,2,3,4,5),
>  x5=c(1,2,3,4,5),
>  x6=c(1,2,3,4,5),
>  x7=c(1,2,3,4,5)
> )
> mydata
>
> # This does what I want.
> summary(
>  subset(mydata,select=c(x1,x3:x5,x7) )
> )
>
> # Can I substitute myVars?
> attach(mydata)
> myVars1 <- c(x1,x3:x5,x7)
>
> # Not looking good!
> myVars1
>
> # This doesn't do the right thing.
> summary(
>  subset(mydata,select=myVars1 )
> )
>
> # Total desperation on this attempt:
> myVars2 <- "x1,x3:x5,x7"
> myVars2
>
> # This doesn't work either.
> summary(
>  subset(mydata,select=myVars2 )
> )
>
>
>
> =========================================================
> Bob Muenchen (pronounced Min'-chen), Manager
> Statistical Consulting Center
> U of TN Office of Information Technology
> 200 Stokely Management Center, Knoxville, TN 37996-0520
> Voice: (865) 974-5230
> FAX: (865) 974-4810
> Email: [hidden email]
> Web: http://oit.utk.edu/scc,
> News: http://listserv.utk.edu/archives/statnews.html
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subset using noncontiguous variables by name (not index)

François Pinard
In reply to this post by Muenchen, Robert A (Bob)
[Muenchen, Robert A (Bob)]

>I'm using the subset function to select a list of variables, some of
>which are contiguous in the data frame, and others of which are not. It
>works fine when I use the form:

>subset(mydata,select=c(x1,x3:x5,x7))

>In reality, my list is far more complex. So I would like to store it in
>a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
>work. That use of the c function seems to violate R rules, so I'm not
>sure how it works at all. A small simulation of the problem is below.  

>mydata <- data.frame(
>  x1=c(1,2,3,4,5),
>  x2=c(1,2,3,4,5),
>  x3=c(1,2,3,4,5),
>  x4=c(1,2,3,4,5),
>  x5=c(1,2,3,4,5),
>  x6=c(1,2,3,4,5),
>  x7=c(1,2,3,4,5)
>)
>mydata

># This does what I want.
>summary(subset(mydata, select=c(x1, x3:x5, x7)))

Maybe:

  variables <- expression(c(x1, x3:x5, x7))

and later:

  summary(subset(mydata, select=eval(variables)))

However, I do not know how one computes the expression piecemeal, that
is, better than by building a string and parsing the result.

--
François Pinard   http://pinard.progiciels-bpi.ca

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subset using noncontiguous variables by name (not index)

Bert Gunter
In reply to this post by Gabor Grothendieck
The problem is that "x3:x5" does not mean what you think it means. The only
reason it does the right thing in subset() is because a clever trick is used
there (read the code -- it's not hard to understand) to ensure that it does.
Gabor has essentially mimicked that trick in his solution.

However, it is not necessary do this. You can construct the call directly as
you tried to do. Using the anscombe example, here's how:

chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
do.call (subset, list( x = anscombe, select = parse(text = chooz)))

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Gabor
> Grothendieck
> Sent: Sunday, August 26, 2007 2:10 PM
> To: Muenchen, Robert A (Bob)
> Cc: [hidden email]
> Subject: Re: [R] subset using noncontiguous variables by name
> (not index)
>
> Using builtin data frame anscombe try this. First we set up a
> data frame
> anscombe.seq which has one row containing 1, 2, 3, ... .  Then select
> out from that data frame and unlist it to get the desired
> index vector.
>
> > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > anscombe[idx]
>    x1 x3 x4   y2
> 1  10 10  8 9.14
> 2   8  8  8 8.14
> 3  13 13  8 8.74
> 4   9  9  8 8.77
> 5  11 11  8 9.26
> 6  14 14  8 8.10
> 7   6  6  8 6.13
> 8   4  4 19 3.10
> 9  12 12  8 9.13
> 10  7  7  8 7.26
> 11  5  5  8 4.74
>
>
> On 8/26/07, Muenchen, Robert A (Bob) <[hidden email]> wrote:
> > Hi All,
> >
> > I'm using the subset function to select a list of variables, some of
> > which are contiguous in the data frame, and others of which
> are not. It
> > works fine when I use the form:
> >
> > subset(mydata,select=c(x1,x3:x5,x7) )
> >
> > In reality, my list is far more complex. So I would like to
> store it in
> > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
> > work. That use of the c function seems to violate R rules,
> so I'm not
> > sure how it works at all. A small simulation of the problem
> is below.
> >
> > If the variable names & orders were really this simple, I could use
> > indices like
> >
> > summary( mydata[ ,c(1,3:5,7) ] )
> >
> > but alas, they are not.
> >
> > How does the c function work this way in the first place,
> and how can I
> > make this substitution?
> >
> > Thanks,
> > Bob
> >
> > mydata <- data.frame(
> >  x1=c(1,2,3,4,5),
> >  x2=c(1,2,3,4,5),
> >  x3=c(1,2,3,4,5),
> >  x4=c(1,2,3,4,5),
> >  x5=c(1,2,3,4,5),
> >  x6=c(1,2,3,4,5),
> >  x7=c(1,2,3,4,5)
> > )
> > mydata
> >
> > # This does what I want.
> > summary(
> >  subset(mydata,select=c(x1,x3:x5,x7) )
> > )
> >
> > # Can I substitute myVars?
> > attach(mydata)
> > myVars1 <- c(x1,x3:x5,x7)
> >
> > # Not looking good!
> > myVars1
> >
> > # This doesn't do the right thing.
> > summary(
> >  subset(mydata,select=myVars1 )
> > )
> >
> > # Total desperation on this attempt:
> > myVars2 <- "x1,x3:x5,x7"
> > myVars2
> >
> > # This doesn't work either.
> > summary(
> >  subset(mydata,select=myVars2 )
> > )
> >
> >
> >
> > =========================================================
> > Bob Muenchen (pronounced Min'-chen), Manager
> > Statistical Consulting Center
> > U of TN Office of Information Technology
> > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > Voice: (865) 974-5230
> > FAX: (865) 974-4810
> > Email: [hidden email]
> > Web: http://oit.utk.edu/scc,
> > News: http://listserv.utk.edu/archives/statnews.html
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subset using noncontiguous variables by name (not index)

Muenchen, Robert A (Bob)
Thanks Bert & Gabor for two very interesting solutions!

It would be very handy in R if string1:stringN generated
"string1","string2"..."stringN" it would make selections like this much
more obvious. I know it's easy to with the colon operator and paste
function but that's quite a step up in complexity compared to SAS' x1
x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners
face early in learning R.

While on the subject of the colon operator, why doesn't anscombe[[1:4]]
select the x variables in list form as anscombe[,1:4] or anscombe[1:4]
do in data frame form?

Thanks,

Bob

=========================================================
Bob Muenchen (pronounced Min'-chen), Manager
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: [hidden email]
Web: http://oit.utk.edu/scc,
News: http://listserv.utk.edu/archives/statnews.html
=========================================================


> -----Original Message-----
> From: Bert Gunter [mailto:[hidden email]]
> Sent: Sunday, August 26, 2007 6:50 PM
> To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> Cc: [hidden email]
> Subject: RE: [R] subset using noncontiguous variables by name (not
> index)
>
> The problem is that "x3:x5" does not mean what you think it means. The
> only
> reason it does the right thing in subset() is because a clever trick
is

> used
> there (read the code -- it's not hard to understand) to ensure that it
> does.
> Gabor has essentially mimicked that trick in his solution.
>
> However, it is not necessary do this. You can construct the call
> directly as
> you tried to do. Using the anscombe example, here's how:
>
> chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
> do.call (subset, list( x = anscombe, select = parse(text = chooz)))
>
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
>
> "The business of the statistician is to catalyze the scientific
> learning
> process."  - George E. P. Box
>
>
>
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of Gabor
> > Grothendieck
> > Sent: Sunday, August 26, 2007 2:10 PM
> > To: Muenchen, Robert A (Bob)
> > Cc: [hidden email]
> > Subject: Re: [R] subset using noncontiguous variables by name
> > (not index)
> >
> > Using builtin data frame anscombe try this. First we set up a
> > data frame
> > anscombe.seq which has one row containing 1, 2, 3, ... .  Then
select

> > out from that data frame and unlist it to get the desired
> > index vector.
> >
> > > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > > anscombe[idx]
> >    x1 x3 x4   y2
> > 1  10 10  8 9.14
> > 2   8  8  8 8.14
> > 3  13 13  8 8.74
> > 4   9  9  8 8.77
> > 5  11 11  8 9.26
> > 6  14 14  8 8.10
> > 7   6  6  8 6.13
> > 8   4  4 19 3.10
> > 9  12 12  8 9.13
> > 10  7  7  8 7.26
> > 11  5  5  8 4.74
> >
> >
> > On 8/26/07, Muenchen, Robert A (Bob) <[hidden email]> wrote:
> > > Hi All,
> > >
> > > I'm using the subset function to select a list of variables, some
> of
> > > which are contiguous in the data frame, and others of which
> > are not. It
> > > works fine when I use the form:
> > >
> > > subset(mydata,select=c(x1,x3:x5,x7) )
> > >
> > > In reality, my list is far more complex. So I would like to
> > store it in
> > > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it
to
> > > work. That use of the c function seems to violate R rules,
> > so I'm not
> > > sure how it works at all. A small simulation of the problem
> > is below.
> > >
> > > If the variable names & orders were really this simple, I could
use

> > > indices like
> > >
> > > summary( mydata[ ,c(1,3:5,7) ] )
> > >
> > > but alas, they are not.
> > >
> > > How does the c function work this way in the first place,
> > and how can I
> > > make this substitution?
> > >
> > > Thanks,
> > > Bob
> > >
> > > mydata <- data.frame(
> > >  x1=c(1,2,3,4,5),
> > >  x2=c(1,2,3,4,5),
> > >  x3=c(1,2,3,4,5),
> > >  x4=c(1,2,3,4,5),
> > >  x5=c(1,2,3,4,5),
> > >  x6=c(1,2,3,4,5),
> > >  x7=c(1,2,3,4,5)
> > > )
> > > mydata
> > >
> > > # This does what I want.
> > > summary(
> > >  subset(mydata,select=c(x1,x3:x5,x7) )
> > > )
> > >
> > > # Can I substitute myVars?
> > > attach(mydata)
> > > myVars1 <- c(x1,x3:x5,x7)
> > >
> > > # Not looking good!
> > > myVars1
> > >
> > > # This doesn't do the right thing.
> > > summary(
> > >  subset(mydata,select=myVars1 )
> > > )
> > >
> > > # Total desperation on this attempt:
> > > myVars2 <- "x1,x3:x5,x7"
> > > myVars2
> > >
> > > # This doesn't work either.
> > > summary(
> > >  subset(mydata,select=myVars2 )
> > > )
> > >
> > >
> > >
> > > =========================================================
> > > Bob Muenchen (pronounced Min'-chen), Manager
> > > Statistical Consulting Center
> > > U of TN Office of Information Technology
> > > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > > Voice: (865) 974-5230
> > > FAX: (865) 974-4810
> > > Email: [hidden email]
> > > Web: http://oit.utk.edu/scc,
> > > News: http://listserv.utk.edu/archives/statnews.html
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subset using noncontiguous variables by name (not index)

Gabor Grothendieck
Try this:

> "%:%" <- function(x, y) {
+    prex <- gsub("[0-9]", "", x); postx <- gsub("[^0-9]", "", x)
+    prey <- gsub("[0-9]", "", y); posty <- gsub("[^0-9]", "", y)
+    stopifnot(prex == prey)
+    paste(prex, seq(from = as.numeric(postx), to =
as.numeric(posty)), sep = "")
+ }
> "x2" %:% "x4"
[1] "x2" "x3" "x4"


On 8/26/07, Muenchen, Robert A (Bob) <[hidden email]> wrote:

> Thanks Bert & Gabor for two very interesting solutions!
>
> It would be very handy in R if string1:stringN generated
> "string1","string2"..."stringN" it would make selections like this much
> more obvious. I know it's easy to with the colon operator and paste
> function but that's quite a step up in complexity compared to SAS' x1
> x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners
> face early in learning R.
>
> While on the subject of the colon operator, why doesn't anscombe[[1:4]]
> select the x variables in list form as anscombe[,1:4] or anscombe[1:4]
> do in data frame form?
>
> Thanks,
>
> Bob
>
> =========================================================
> Bob Muenchen (pronounced Min'-chen), Manager
> Statistical Consulting Center
> U of TN Office of Information Technology
> 200 Stokely Management Center, Knoxville, TN 37996-0520
> Voice: (865) 974-5230
> FAX: (865) 974-4810
> Email: [hidden email]
> Web: http://oit.utk.edu/scc,
> News: http://listserv.utk.edu/archives/statnews.html
> =========================================================
>
>
> > -----Original Message-----
> > From: Bert Gunter [mailto:[hidden email]]
> > Sent: Sunday, August 26, 2007 6:50 PM
> > To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> > Cc: [hidden email]
> > Subject: RE: [R] subset using noncontiguous variables by name (not
> > index)
> >
> > The problem is that "x3:x5" does not mean what you think it means. The
> > only
> > reason it does the right thing in subset() is because a clever trick
> is
> > used
> > there (read the code -- it's not hard to understand) to ensure that it
> > does.
> > Gabor has essentially mimicked that trick in his solution.
> >
> > However, it is not necessary do this. You can construct the call
> > directly as
> > you tried to do. Using the anscombe example, here's how:
> >
> > chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
> > do.call (subset, list( x = anscombe, select = parse(text = chooz)))
> >
> > -- Bert Gunter
> > Genentech Non-Clinical Statistics
> > South San Francisco, CA
> >
> > "The business of the statistician is to catalyze the scientific
> > learning
> > process."  - George E. P. Box
> >
> >
> >
> > > -----Original Message-----
> > > From: [hidden email]
> > > [mailto:[hidden email]] On Behalf Of Gabor
> > > Grothendieck
> > > Sent: Sunday, August 26, 2007 2:10 PM
> > > To: Muenchen, Robert A (Bob)
> > > Cc: [hidden email]
> > > Subject: Re: [R] subset using noncontiguous variables by name
> > > (not index)
> > >
> > > Using builtin data frame anscombe try this. First we set up a
> > > data frame
> > > anscombe.seq which has one row containing 1, 2, 3, ... .  Then
> select
> > > out from that data frame and unlist it to get the desired
> > > index vector.
> > >
> > > > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > > > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > > > anscombe[idx]
> > >    x1 x3 x4   y2
> > > 1  10 10  8 9.14
> > > 2   8  8  8 8.14
> > > 3  13 13  8 8.74
> > > 4   9  9  8 8.77
> > > 5  11 11  8 9.26
> > > 6  14 14  8 8.10
> > > 7   6  6  8 6.13
> > > 8   4  4 19 3.10
> > > 9  12 12  8 9.13
> > > 10  7  7  8 7.26
> > > 11  5  5  8 4.74
> > >
> > >
> > > On 8/26/07, Muenchen, Robert A (Bob) <[hidden email]> wrote:
> > > > Hi All,
> > > >
> > > > I'm using the subset function to select a list of variables, some
> > of
> > > > which are contiguous in the data frame, and others of which
> > > are not. It
> > > > works fine when I use the form:
> > > >
> > > > subset(mydata,select=c(x1,x3:x5,x7) )
> > > >
> > > > In reality, my list is far more complex. So I would like to
> > > store it in
> > > > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it
> to
> > > > work. That use of the c function seems to violate R rules,
> > > so I'm not
> > > > sure how it works at all. A small simulation of the problem
> > > is below.
> > > >
> > > > If the variable names & orders were really this simple, I could
> use
> > > > indices like
> > > >
> > > > summary( mydata[ ,c(1,3:5,7) ] )
> > > >
> > > > but alas, they are not.
> > > >
> > > > How does the c function work this way in the first place,
> > > and how can I
> > > > make this substitution?
> > > >
> > > > Thanks,
> > > > Bob
> > > >
> > > > mydata <- data.frame(
> > > >  x1=c(1,2,3,4,5),
> > > >  x2=c(1,2,3,4,5),
> > > >  x3=c(1,2,3,4,5),
> > > >  x4=c(1,2,3,4,5),
> > > >  x5=c(1,2,3,4,5),
> > > >  x6=c(1,2,3,4,5),
> > > >  x7=c(1,2,3,4,5)
> > > > )
> > > > mydata
> > > >
> > > > # This does what I want.
> > > > summary(
> > > >  subset(mydata,select=c(x1,x3:x5,x7) )
> > > > )
> > > >
> > > > # Can I substitute myVars?
> > > > attach(mydata)
> > > > myVars1 <- c(x1,x3:x5,x7)
> > > >
> > > > # Not looking good!
> > > > myVars1
> > > >
> > > > # This doesn't do the right thing.
> > > > summary(
> > > >  subset(mydata,select=myVars1 )
> > > > )
> > > >
> > > > # Total desperation on this attempt:
> > > > myVars2 <- "x1,x3:x5,x7"
> > > > myVars2
> > > >
> > > > # This doesn't work either.
> > > > summary(
> > > >  subset(mydata,select=myVars2 )
> > > > )
> > > >
> > > >
> > > >
> > > > =========================================================
> > > > Bob Muenchen (pronounced Min'-chen), Manager
> > > > Statistical Consulting Center
> > > > U of TN Office of Information Technology
> > > > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > > > Voice: (865) 974-5230
> > > > FAX: (865) 974-4810
> > > > Email: [hidden email]
> > > > Web: http://oit.utk.edu/scc,
> > > > News: http://listserv.utk.edu/archives/statnews.html
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subset using noncontiguous variables by name (not index)

Muenchen, Robert A (Bob)
Gabor, That works great!

I think this would be a very helpful addition to the main R
distribution. Perhaps with a single colon representing numerical order
(exactly as you have written it) and two colons representing the order
of the variables as they appear in the data frame (your first example).
That's analogous to SAS' x1-xN, which you know gets those N variables,
and a--z, which selects an unknown number of variables a through z. How
many that is depends upon their order in the data frame. That would not
only be very useful in general, but it would also make transitioning to
R from SAS or SPSS less confusing.

Is R still being extended in such basic ways, or does that muck up
existing programs too much?

Thanks,
Bob

> -----Original Message-----
> From: Gabor Grothendieck [mailto:[hidden email]]
> Sent: Sunday, August 26, 2007 8:52 PM
> To: Muenchen, Robert A (Bob)
> Cc: [hidden email]
> Subject: Re: [R] subset using noncontiguous variables by name (not
> index)
>
> Try this:
>
> > "%:%" <- function(x, y) {
> +    prex <- gsub("[0-9]", "", x); postx <- gsub("[^0-9]", "", x)
> +    prey <- gsub("[0-9]", "", y); posty <- gsub("[^0-9]", "", y)
> +    stopifnot(prex == prey)
> +    paste(prex, seq(from = as.numeric(postx), to =
> as.numeric(posty)), sep = "")
> + }
> > "x2" %:% "x4"
> [1] "x2" "x3" "x4"
>
>
> On 8/26/07, Muenchen, Robert A (Bob) <[hidden email]> wrote:
> > Thanks Bert & Gabor for two very interesting solutions!
> >
> > It would be very handy in R if string1:stringN generated
> > "string1","string2"..."stringN" it would make selections like this
> much
> > more obvious. I know it's easy to with the colon operator and paste
> > function but that's quite a step up in complexity compared to SAS'
x1
> > x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that
beginners

> > face early in learning R.
> >
> > While on the subject of the colon operator, why doesn't
> anscombe[[1:4]]
> > select the x variables in list form as anscombe[,1:4] or
> anscombe[1:4]
> > do in data frame form?
> >
> > Thanks,
> >
> > Bob
> >
> > =========================================================
> > Bob Muenchen (pronounced Min'-chen), Manager
> > Statistical Consulting Center
> > U of TN Office of Information Technology
> > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > Voice: (865) 974-5230
> > FAX: (865) 974-4810
> > Email: [hidden email]
> > Web: http://oit.utk.edu/scc,
> > News: http://listserv.utk.edu/archives/statnews.html
> > =========================================================
> >
> >
> > > -----Original Message-----
> > > From: Bert Gunter [mailto:[hidden email]]
> > > Sent: Sunday, August 26, 2007 6:50 PM
> > > To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> > > Cc: [hidden email]
> > > Subject: RE: [R] subset using noncontiguous variables by name (not
> > > index)
> > >
> > > The problem is that "x3:x5" does not mean what you think it means.
> The
> > > only
> > > reason it does the right thing in subset() is because a clever
> trick
> > is
> > > used
> > > there (read the code -- it's not hard to understand) to ensure
that

> it
> > > does.
> > > Gabor has essentially mimicked that trick in his solution.
> > >
> > > However, it is not necessary do this. You can construct the call
> > > directly as
> > > you tried to do. Using the anscombe example, here's how:
> > >
> > > chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in
> quotes
> > > do.call (subset, list( x = anscombe, select = parse(text =
chooz)))

> > >
> > > -- Bert Gunter
> > > Genentech Non-Clinical Statistics
> > > South San Francisco, CA
> > >
> > > "The business of the statistician is to catalyze the scientific
> > > learning
> > > process."  - George E. P. Box
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: [hidden email]
> > > > [mailto:[hidden email]] On Behalf Of Gabor
> > > > Grothendieck
> > > > Sent: Sunday, August 26, 2007 2:10 PM
> > > > To: Muenchen, Robert A (Bob)
> > > > Cc: [hidden email]
> > > > Subject: Re: [R] subset using noncontiguous variables by name
> > > > (not index)
> > > >
> > > > Using builtin data frame anscombe try this. First we set up a
> > > > data frame
> > > > anscombe.seq which has one row containing 1, 2, 3, ... .  Then
> > select
> > > > out from that data frame and unlist it to get the desired
> > > > index vector.
> > > >
> > > > > anscombe.seq <- replace(anscombe[1,], TRUE,
> seq_along(anscombe))
> > > > > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > > > > anscombe[idx]
> > > >    x1 x3 x4   y2
> > > > 1  10 10  8 9.14
> > > > 2   8  8  8 8.14
> > > > 3  13 13  8 8.74
> > > > 4   9  9  8 8.77
> > > > 5  11 11  8 9.26
> > > > 6  14 14  8 8.10
> > > > 7   6  6  8 6.13
> > > > 8   4  4 19 3.10
> > > > 9  12 12  8 9.13
> > > > 10  7  7  8 7.26
> > > > 11  5  5  8 4.74
> > > >
> > > >
> > > > On 8/26/07, Muenchen, Robert A (Bob) <[hidden email]> wrote:
> > > > > Hi All,
> > > > >
> > > > > I'm using the subset function to select a list of variables,
> some
> > > of
> > > > > which are contiguous in the data frame, and others of which
> > > > are not. It
> > > > > works fine when I use the form:
> > > > >
> > > > > subset(mydata,select=c(x1,x3:x5,x7) )
> > > > >
> > > > > In reality, my list is far more complex. So I would like to
> > > > store it in
> > > > > a variable to substitute in for c(x1,x3:x5,x7) but cannot get
> it
> > to
> > > > > work. That use of the c function seems to violate R rules,
> > > > so I'm not
> > > > > sure how it works at all. A small simulation of the problem
> > > > is below.
> > > > >
> > > > > If the variable names & orders were really this simple, I
could

> > use
> > > > > indices like
> > > > >
> > > > > summary( mydata[ ,c(1,3:5,7) ] )
> > > > >
> > > > > but alas, they are not.
> > > > >
> > > > > How does the c function work this way in the first place,
> > > > and how can I
> > > > > make this substitution?
> > > > >
> > > > > Thanks,
> > > > > Bob
> > > > >
> > > > > mydata <- data.frame(
> > > > >  x1=c(1,2,3,4,5),
> > > > >  x2=c(1,2,3,4,5),
> > > > >  x3=c(1,2,3,4,5),
> > > > >  x4=c(1,2,3,4,5),
> > > > >  x5=c(1,2,3,4,5),
> > > > >  x6=c(1,2,3,4,5),
> > > > >  x7=c(1,2,3,4,5)
> > > > > )
> > > > > mydata
> > > > >
> > > > > # This does what I want.
> > > > > summary(
> > > > >  subset(mydata,select=c(x1,x3:x5,x7) )
> > > > > )
> > > > >
> > > > > # Can I substitute myVars?
> > > > > attach(mydata)
> > > > > myVars1 <- c(x1,x3:x5,x7)
> > > > >
> > > > > # Not looking good!
> > > > > myVars1
> > > > >
> > > > > # This doesn't do the right thing.
> > > > > summary(
> > > > >  subset(mydata,select=myVars1 )
> > > > > )
> > > > >
> > > > > # Total desperation on this attempt:
> > > > > myVars2 <- "x1,x3:x5,x7"
> > > > > myVars2
> > > > >
> > > > > # This doesn't work either.
> > > > > summary(
> > > > >  subset(mydata,select=myVars2 )
> > > > > )
> > > > >
> > > > >
> > > > >
> > > > > =========================================================
> > > > > Bob Muenchen (pronounced Min'-chen), Manager
> > > > > Statistical Consulting Center
> > > > > U of TN Office of Information Technology
> > > > > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > > > > Voice: (865) 974-5230
> > > > > FAX: (865) 974-4810
> > > > > Email: [hidden email]
> > > > > Web: http://oit.utk.edu/scc,
> > > > > News: http://listserv.utk.edu/archives/statnews.html
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible
> code.
> > > > >
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> code.
> > > >
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subset using noncontiguous variables by name (not index)

Thomas Lumley
On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:

> Gabor, That works great!
>
> I think this would be a very helpful addition to the main R
> distribution. Perhaps with a single colon representing numerical order
> (exactly as you have written it) and two colons representing the order
> of the variables as they appear in the data frame (your first example).
> That's analogous to SAS' x1-xN, which you know gets those N variables,
> and a--z, which selects an unknown number of variables a through z. How
> many that is depends upon their order in the data frame. That would not
> only be very useful in general, but it would also make transitioning to
> R from SAS or SPSS less confusing.
>
> Is R still being extended in such basic ways, or does that muck up
> existing programs too much?
>

In principle base R can be extended like that, but a strong case is needed
for non-standard evaluation rules and for depleting the restricted supply
of short binary operator names.

The reason for subset() and its behaviour is that 'variables as they
appear the in data frame' is typically ambiguous -- which data frame?  In
SPSS you have only one and in SAS there is a default one, so there is no
ambiguity in X1--Y2, but in R it needs another argument specifying the
data frame, so it can't really be a binary operator.

The double colon :: and triple colon ::: are already used for namespaces,
and a search of r-help reveals two previous, different, suggestions for
%:%.


  -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

FW: subset using noncontiguous variables by name (not index)

Muenchen, Robert A (Bob)
In reply to this post by Muenchen, Robert A (Bob)
Thomas, that's a good point. I was thinking of anscombe[x1::y1] making
it clear which one, but you would then want just x1::y1 to have
unambiguous meaning on its own, which is impossible.

As for x1:xN, it's unambiguous on its own. I thought one of the great
advantages of R was that it could use different methods so that a new
operator would not be needed. The colon operator would just have a new
method for when stringN appeared. One that would be very useful & have
obvious meaning.

Thanks,
Bob

> -----Original Message-----
> From: Thomas Lumley [mailto:[hidden email]]
> Sent: Monday, August 27, 2007 10:25 AM
> To: Muenchen, Robert A (Bob)
> Cc: [hidden email]
> Subject: Re: [R] subset using noncontiguous variables by name (not
> index)
>
> On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:
>
> > Gabor, That works great!
> >
> > I think this would be a very helpful addition to the main R
> > distribution. Perhaps with a single colon representing numerical
> order
> > (exactly as you have written it) and two colons representing the
> order
> > of the variables as they appear in the data frame (your first
> example).
> > That's analogous to SAS' x1-xN, which you know gets those N
> variables,
> > and a--z, which selects an unknown number of variables a through z.
> How
> > many that is depends upon their order in the data frame. That would
> not
> > only be very useful in general, but it would also make transitioning
> to
> > R from SAS or SPSS less confusing.
> >
> > Is R still being extended in such basic ways, or does that muck up
> > existing programs too much?
> >
>
> In principle base R can be extended like that, but a strong case is
> needed
> for non-standard evaluation rules and for depleting the restricted
> supply
> of short binary operator names.
>
> The reason for subset() and its behaviour is that 'variables as they
> appear the in data frame' is typically ambiguous -- which data frame?
> In
> SPSS you have only one and in SAS there is a default one, so there is
> no
> ambiguity in X1--Y2, but in R it needs another argument specifying the
> data frame, so it can't really be a binary operator.
>
> The double colon :: and triple colon ::: are already used for
> namespaces,
> and a search of r-help reveals two previous, different, suggestions
for
> %:%.
>
>
>   -thomas
>
> Thomas Lumley Assoc. Professor, Biostatistics
> [hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subset using noncontiguous variables by name (not index)

Muenchen, Robert A (Bob)
In reply to this post by Muenchen, Robert A (Bob)
Thanks for helping me see why R doesn't have the "obvious"! -Bob

> -----Original Message-----
> From: Thomas Lumley [mailto:[hidden email]]
> Sent: Monday, August 27, 2007 2:12 PM
> To: Muenchen, Robert A (Bob)
> Subject: RE: [R] subset using noncontiguous variables by name (not
> index)
>
> On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:
>
> > Thomas, that's a good point. I was thinking of anscombe[x1::y1]
> making
> > it clear which one, but you would then want just x1::y1 to have
> > unambiguous meaning on its own, which is impossible.
> >
> > As for x1:xN, it's unambiguous on its own.
>
>
> It actually isn't. We already have a meaning. Consider
>    x1<-4
>    xN<-6
>    x1:xN
> It also breaks R's argument passing rules by treating x1 as string
> rather than a name.
>
> What would be unambiguous at the moment is "x1":"x4", provided there
> was a sufficiently precise set of rules on what was allowed. Consider
>   "x1":"x-1"    (negative?)
>   "x1":"x3.14"  (non-integer?)
>   "x3.12":"x3.14" (is the prefix x or x3.?)
>   "x1":"X4"     (the prefix changes)
>   "01":"14"     (is the prefix empty or 0?)
>   "x09":"xA2"     (is this illegal decimal or legal hexadecimal?)
>   "IL23R1":"IL23R4" (what is the prefix?)
>   "x1a":"x4a"    (infix numbering?)
>
>
>
>       -thomas
>
> Thomas Lumley Assoc. Professor, Biostatistics
> [hidden email] University of Washington, Seattle
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.