|
Today, I was looking for an elegant (and efficient) way
to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence "R-devel"): Consider this toy example, where the dataframe already has only one column : > nv <- c(a=1, d=17, e=101); nv a d e 1 17 101 > df <- as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get > identical(nv, .......) [1] TRUE where ...... only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '.......' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
I don't know if this is better, but it's the most obvious/shortest I
could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R> identical(nv, drop(t(df))) [1] TRUE Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler <[hidden email]> wrote: > Today, I was looking for an elegant (and efficient) way > to get a named (atomic) vector by selecting one column of a data frame. > Of course, the vector names must be the rownames of the data frame. > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > wondering if there are obvious better ones, and > also if this should not become more idiomatic (hence "R-devel"): > > Consider this toy example, where the dataframe already has only > one column : > >> nv <- c(a=1, d=17, e=101); nv > a d e > 1 17 101 > >> df <- as.data.frame(cbind(VAR = nv)); df > VAR > a 1 > d 17 > e 101 > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> identical(nv, .......) > [1] TRUE > > where ...... only uses 'df' (and no non-standard R packages)? > > As said, I know a simple solution (*), but I'm sure it is not > obvious to most R users and probably not even to the majority of > R-devel readers... OTOH, people like Bill Dunlap will not take > long to provide it or a better one. > > (*) In my solution, the above '.......' consists of 17 letters. > I'll post it later today (CEST time) ... or confirm > that someone else has done so. > > Martin > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
Hello,
A bit more general nv <- c(a=1, d=17, e=101); nv nv2 <- c(a="a", d="d", e="e") df2 <- data.frame(VAR = nv, CHAR = nv2); df2 identical( nv, drop(t( df2[1] )) ) # TRUE identical( nv, drop(t( df2[[1]] )) ) # FALSE Rui Barradas Em 18-08-2012 16:16, Joshua Ulrich escreveu: > I don't know if this is better, but it's the most obvious/shortest I > could come up with. Transpose the data.frame column to a 'row' vector > and drop the dimensions. > > R> identical(nv, drop(t(df))) > [1] TRUE > > Best, > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > > > On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler > <[hidden email]> wrote: >> Today, I was looking for an elegant (and efficient) way >> to get a named (atomic) vector by selecting one column of a data frame. >> Of course, the vector names must be the rownames of the data frame. >> >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was >> wondering if there are obvious better ones, and >> also if this should not become more idiomatic (hence "R-devel"): >> >> Consider this toy example, where the dataframe already has only >> one column : >> >>> nv <- c(a=1, d=17, e=101); nv >> a d e >> 1 17 101 >> >>> df <- as.data.frame(cbind(VAR = nv)); df >> VAR >> a 1 >> d 17 >> e 101 >> >> Now how, can I get 'nv' back from 'df' ? I.e., how to get >> >>> identical(nv, .......) >> [1] TRUE >> >> where ...... only uses 'df' (and no non-standard R packages)? >> >> As said, I know a simple solution (*), but I'm sure it is not >> obvious to most R users and probably not even to the majority of >> R-devel readers... OTOH, people like Bill Dunlap will not take >> long to provide it or a better one. >> >> (*) In my solution, the above '.......' consists of 17 letters. >> I'll post it later today (CEST time) ... or confirm >> that someone else has done so. >> >> Martin >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Joshua Ulrich
Or to expand just a hair on Joshua's suggestion, is the following what you want:
> x <- 1:10 > names(x) <- letters[1:10] > x a b c d e f g h i j 1 2 3 4 5 6 7 8 9 10 > df <- data.frame(x=x,y=LETTERS[1:10],row.names=names(x)) > df x y a 1 A b 2 B c 3 C d 4 D e 5 E f 6 F g 7 G h 8 H i 9 I j 10 J > y <- t(df[,1,drop=FALSE])[1,] > y a b c d e f g h i j 1 2 3 4 5 6 7 8 9 10 > identical(x,y) [1] TRUE Cheers, Bert On Sat, Aug 18, 2012 at 8:16 AM, Joshua Ulrich <[hidden email]> wrote: > I don't know if this is better, but it's the most obvious/shortest I > could come up with. Transpose the data.frame column to a 'row' vector > and drop the dimensions. > > R> identical(nv, drop(t(df))) > [1] TRUE > > Best, > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > > > On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler > <[hidden email]> wrote: >> Today, I was looking for an elegant (and efficient) way >> to get a named (atomic) vector by selecting one column of a data frame. >> Of course, the vector names must be the rownames of the data frame. >> >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was >> wondering if there are obvious better ones, and >> also if this should not become more idiomatic (hence "R-devel"): >> >> Consider this toy example, where the dataframe already has only >> one column : >> >>> nv <- c(a=1, d=17, e=101); nv >> a d e >> 1 17 101 >> >>> df <- as.data.frame(cbind(VAR = nv)); df >> VAR >> a 1 >> d 17 >> e 101 >> >> Now how, can I get 'nv' back from 'df' ? I.e., how to get >> >>> identical(nv, .......) >> [1] TRUE >> >> where ...... only uses 'df' (and no non-standard R packages)? >> >> As said, I know a simple solution (*), but I'm sure it is not >> obvious to most R users and probably not even to the majority of >> R-devel readers... OTOH, people like Bill Dunlap will not take >> long to provide it or a better one. >> >> (*) In my solution, the above '.......' consists of 17 letters. >> I'll post it later today (CEST time) ... or confirm >> that someone else has done so. >> >> Martin >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechbühler .... wrote:
> On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler > <[hidden email]> wrote: >> >> Today, I was looking for an elegant (and efficient) way >> to get a named (atomic) vector by selecting one column of a data frame. >> Of course, the vector names must be the rownames of the data frame. >> >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was >> wondering if there are obvious better ones, and >> also if this should not become more idiomatic (hence "R-devel"): >> >> Consider this toy example, where the dataframe already has only >> one column : >> >> > nv <- c(a=1, d=17, e=101); nv >> a d e >> 1 17 101 >> >> > df <- as.data.frame(cbind(VAR = nv)); df >> VAR >> a 1 >> d 17 >> e 101 >> >> Now how, can I get 'nv' back from 'df' ? I.e., how to get >> >> > identical(nv, .......) >> [1] TRUE >> >> where ...... only uses 'df' (and no non-standard R packages)? > > >> identical(nv, df[,1]) > [1] TRUE > >> In my solution, the above '.......' consists of 17 letters. > > > I count 6 in mine But it is not a solution in a current version of R! though it's still interesting that df[,1] worked in some incantation of R. What's your sessionInfo()? Martin > > /Christian ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Joshua Ulrich
>>>>> Joshua Ulrich <[hidden email]>
>>>>> on Sat, 18 Aug 2012 10:16:09 -0500 writes: > I don't know if this is better, but it's the most obvious/shortest I > could come up with. Transpose the data.frame column to a 'row' vector > and drop the dimensions. R> identical(nv, drop(t(df))) > [1] TRUE Yes, that's definitely shorter, congratulations! One gotta is that I'd want a solution that also works when the df has more columns than just one... Your idea to use t(.) is nice and "perfect" insofar as it coerces the data frame to a matrix, and that's really the clue: Where as df[,1] is losing the names, the matrix indexing is not. So your solution can be changed into t(df)[1,] which is even shorter... and slightly less efficient, at least conceptually, than mine, which has been as.matrix(df)[,1] Now, the remaining question is: Shouldn't there be something more natural to achieve that? (There is not, currently, AFAIK). Martin > Best, > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler > <[hidden email]> wrote: >> Today, I was looking for an elegant (and efficient) way to get a named >> (atomic) vector by selecting one column of a data frame. Of course, >> the vector names must be the rownames of the data frame. >> >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was >> wondering if there are obvious better ones, and also if this should >> not become more idiomatic (hence "R-devel"): >> >> Consider this toy example, where the dataframe already has only one >> column : >> >>> nv <- c(a=1, d=17, e=101); nv >> a d e >> 1 17 101 >> >>> df <- as.data.frame(cbind(VAR = nv)); df >> VAR >> a 1 >> d 17 >> e 101 >> >> Now how, can I get 'nv' back from 'df' ? I.e., how to get >> >>> identical(nv, .......) >> [1] TRUE >> >> where ...... only uses 'df' (and no non-standard R packages)? >> >> As said, I know a simple solution (*), but I'm sure it is not >> obvious to most R users and probably not even to the majority of >> R-devel readers... OTOH, people like Bill Dunlap will not take >> long to provide it or a better one. >> >> (*) In my solution, the above '.......' consists of 17 letters. >> I'll post it later today (CEST time) ... or confirm >> that someone else has done so. >> >> Martin >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Rui Barradas
Yes, but either
drop(t(df[,1,drop=TRUE])) or t(df[,1,drop=TRUE])[1,] does work. My minimal effort to check timings found that the first version was a hair faster. -- Bert On Sat, Aug 18, 2012 at 9:01 AM, Rui Barradas <[hidden email]> wrote: > Hello, > > A bit more general > > nv <- c(a=1, d=17, e=101); nv > nv2 <- c(a="a", d="d", e="e") > df2 <- data.frame(VAR = nv, CHAR = nv2); df2 > > identical( nv, drop(t( df2[1] )) ) # TRUE > identical( nv, drop(t( df2[[1]] )) ) # FALSE > > Rui Barradas > > Em 18-08-2012 16:16, Joshua Ulrich escreveu: >> >> I don't know if this is better, but it's the most obvious/shortest I >> could come up with. Transpose the data.frame column to a 'row' vector >> and drop the dimensions. >> >> R> identical(nv, drop(t(df))) >> [1] TRUE >> >> Best, >> -- >> Joshua Ulrich | about.me/joshuaulrich >> FOSS Trading | www.fosstrading.com >> >> >> On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler >> <[hidden email]> wrote: >>> >>> Today, I was looking for an elegant (and efficient) way >>> to get a named (atomic) vector by selecting one column of a data frame. >>> Of course, the vector names must be the rownames of the data frame. >>> >>> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was >>> wondering if there are obvious better ones, and >>> also if this should not become more idiomatic (hence "R-devel"): >>> >>> Consider this toy example, where the dataframe already has only >>> one column : >>> >>>> nv <- c(a=1, d=17, e=101); nv >>> >>> a d e >>> 1 17 101 >>> >>>> df <- as.data.frame(cbind(VAR = nv)); df >>> >>> VAR >>> a 1 >>> d 17 >>> e 101 >>> >>> Now how, can I get 'nv' back from 'df' ? I.e., how to get >>> >>>> identical(nv, .......) >>> >>> [1] TRUE >>> >>> where ...... only uses 'df' (and no non-standard R packages)? >>> >>> As said, I know a simple solution (*), but I'm sure it is not >>> obvious to most R users and probably not even to the majority of >>> R-devel readers... OTOH, people like Bill Dunlap will not take >>> long to provide it or a better one. >>> >>> (*) In my solution, the above '.......' consists of 17 letters. >>> I'll post it later today (CEST time) ... or confirm >>> that someone else has done so. >>> >>> Martin >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
Sorry! -- Change that to drop = FALSE !
drop(t(df[,1,drop=FALSE])) t(df[,1,drop=FALSE])[1,] -- Bert On Sat, Aug 18, 2012 at 9:37 AM, Bert Gunter <[hidden email]> wrote: > Yes, but either > > drop(t(df[,1,drop=TRUE])) > > or > > t(df[,1,drop=TRUE])[1,] > > does work. My minimal effort to check timings found that the first > version was a hair faster. > > -- Bert > > On Sat, Aug 18, 2012 at 9:01 AM, Rui Barradas <[hidden email]> wrote: >> Hello, >> >> A bit more general >> >> nv <- c(a=1, d=17, e=101); nv >> nv2 <- c(a="a", d="d", e="e") >> df2 <- data.frame(VAR = nv, CHAR = nv2); df2 >> >> identical( nv, drop(t( df2[1] )) ) # TRUE >> identical( nv, drop(t( df2[[1]] )) ) # FALSE >> >> Rui Barradas >> >> Em 18-08-2012 16:16, Joshua Ulrich escreveu: >>> >>> I don't know if this is better, but it's the most obvious/shortest I >>> could come up with. Transpose the data.frame column to a 'row' vector >>> and drop the dimensions. >>> >>> R> identical(nv, drop(t(df))) >>> [1] TRUE >>> >>> Best, >>> -- >>> Joshua Ulrich | about.me/joshuaulrich >>> FOSS Trading | www.fosstrading.com >>> >>> >>> On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler >>> <[hidden email]> wrote: >>>> >>>> Today, I was looking for an elegant (and efficient) way >>>> to get a named (atomic) vector by selecting one column of a data frame. >>>> Of course, the vector names must be the rownames of the data frame. >>>> >>>> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was >>>> wondering if there are obvious better ones, and >>>> also if this should not become more idiomatic (hence "R-devel"): >>>> >>>> Consider this toy example, where the dataframe already has only >>>> one column : >>>> >>>>> nv <- c(a=1, d=17, e=101); nv >>>> >>>> a d e >>>> 1 17 101 >>>> >>>>> df <- as.data.frame(cbind(VAR = nv)); df >>>> >>>> VAR >>>> a 1 >>>> d 17 >>>> e 101 >>>> >>>> Now how, can I get 'nv' back from 'df' ? I.e., how to get >>>> >>>>> identical(nv, .......) >>>> >>>> [1] TRUE >>>> >>>> where ...... only uses 'df' (and no non-standard R packages)? >>>> >>>> As said, I know a simple solution (*), but I'm sure it is not >>>> obvious to most R users and probably not even to the majority of >>>> R-devel readers... OTOH, people like Bill Dunlap will not take >>>> long to provide it or a better one. >>>> >>>> (*) In my solution, the above '.......' consists of 17 letters. >>>> I'll post it later today (CEST time) ... or confirm >>>> that someone else has done so. >>>> >>>> Martin >>>> >>>> ______________________________________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
On Sat, Aug 18, 2012 at 9:33 AM, Martin Maechler
<[hidden email]> wrote: >>>>>> Joshua Ulrich <[hidden email]> >>>>>> on Sat, 18 Aug 2012 10:16:09 -0500 writes: > > > I don't know if this is better, but it's the most obvious/shortest I > > could come up with. Transpose the data.frame column to a 'row' vector > > and drop the dimensions. > > R> identical(nv, drop(t(df))) > > [1] TRUE > > Yes, that's definitely shorter, > congratulations! > > One gotta is that I'd want a solution that also works when the > df has more columns than just one... > > Your idea to use t(.) is nice and "perfect" insofar as it > coerces the data frame to a matrix, and that's really the clue: > > Where as df[,1] is losing the names, > the matrix indexing is not. > So your solution can be changed into > > t(df)[1,] > > which is even shorter... > and slightly less efficient, at least conceptually, than mine, which has > been > > as.matrix(df)[,1] > > Now, the remaining question is: Shouldn't there be something > more natural to achieve that? > (There is not, currently, AFAIK). Perhaps a data frame method for as.vector? as.vector.data.frame <- function(x, ...) as.matrix(x)[,1] as.vector(df[1]) or an additional argument to `[.data.frame` like keep.names, which defaults to FALSE to maintain current behavior but can optionally be TRUE. Cheers, Josh > > Martin > > > > Best, > > -- > > Joshua Ulrich | about.me/joshuaulrich > > FOSS Trading | www.fosstrading.com > > > > On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler > > <[hidden email]> wrote: > >> Today, I was looking for an elegant (and efficient) way to get a named > >> (atomic) vector by selecting one column of a data frame. Of course, > >> the vector names must be the rownames of the data frame. > >> > >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > >> wondering if there are obvious better ones, and also if this should > >> not become more idiomatic (hence "R-devel"): > >> > >> Consider this toy example, where the dataframe already has only one > >> column : > >> > >>> nv <- c(a=1, d=17, e=101); nv > >> a d e > >> 1 17 101 > >> > >>> df <- as.data.frame(cbind(VAR = nv)); df > >> VAR > >> a 1 > >> d 17 > >> e 101 > >> > >> Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> > >>> identical(nv, .......) > >> [1] TRUE > >> > >> where ...... only uses 'df' (and no non-standard R packages)? > >> > >> As said, I know a simple solution (*), but I'm sure it is not > >> obvious to most R users and probably not even to the majority of > >> R-devel readers... OTOH, people like Bill Dunlap will not take > >> long to provide it or a better one. > >> > >> (*) In my solution, the above '.......' consists of 17 letters. > >> I'll post it later today (CEST time) ... or confirm > >> that someone else has done so. > >> > >> Martin > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
On 8/18/12, Martin Maechler <[hidden email]> wrote:
> On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechbühler .... wrote: >> On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler >> <[hidden email]> wrote: >>> Consider this toy example, where the dataframe already has only >>> one column : >>> >>> > nv <- c(a=1, d=17, e=101); nv >>> a d e >>> 1 17 101 >>> >>> > df <- as.data.frame(cbind(VAR = nv)); df >>> VAR >>> a 1 >>> d 17 >>> e 101 >>> >>> Now how, can I get 'nv' back from 'df' ? I.e., how to get >>> >>> > identical(nv, .......) >>> [1] TRUE >>> >>> where ...... only uses 'df' (and no non-standard R packages)? >> >> >>> identical(nv, df[,1]) >> [1] TRUE > But it is not a solution in a current version of R! > though it's still interesting that df[,1] worked in some incantation of > R. My mistake! We disliked some quirks of indexing, so we've long had our own patch for "[.data.frame" in place, which I used inadvertently. In essence, it does this: result <- base::"[.data.frame"(df,,1, drop=F) if (drop && length(ncol(result) > 0) && ncol(result)==1) { save.names <- dimnames(result)[[1]] result <- result[[1]] names(result) <- save.names } That obviously violated your constraint "no non-standard R packages". I apologize. Still, maybe the behavior of getting the named column would be desirable in general? /Christian ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
This isn't super-concise, but has the virtue of being clear:
nv <- c(a=1, d=17, e=101) df <- as.data.frame(cbind(VAR = nv)) identical(nv, setNames(df$VAR, rownames(df))) # TRUE It seems to be more efficient than the other methods as well: f1 <- function() setNames(df$VAR, rownames(df)) f2 <- function() t(df)[1,] f3 <- function() as.matrix(df)[,1] r <- microbenchmark(f1(), f2(), f3(), times=1000) r # Unit: microseconds # expr min lq median uq max # 1 f1() 14.589 17.0315 18.608 19.3220 89.388 # 2 f2() 68.057 70.8735 72.240 75.8065 3707.012 # 3 f3() 58.153 61.2600 62.521 65.0380 238.483 -Winston On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler <[hidden email]> wrote: > Today, I was looking for an elegant (and efficient) way > to get a named (atomic) vector by selecting one column of a data frame. > Of course, the vector names must be the rownames of the data frame. > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > wondering if there are obvious better ones, and > also if this should not become more idiomatic (hence "R-devel"): > > Consider this toy example, where the dataframe already has only > one column : > >> nv <- c(a=1, d=17, e=101); nv > a d e > 1 17 101 > >> df <- as.data.frame(cbind(VAR = nv)); df > VAR > a 1 > d 17 > e 101 > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> identical(nv, .......) > [1] TRUE > > where ...... only uses 'df' (and no non-standard R packages)? > > As said, I know a simple solution (*), but I'm sure it is not > obvious to most R users and probably not even to the majority of > R-devel readers... OTOH, people like Bill Dunlap will not take > long to provide it or a better one. > > (*) In my solution, the above '.......' consists of 17 letters. > I'll post it later today (CEST time) ... or confirm > that someone else has done so. > > Martin > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
<[hidden email]> wrote: > Today, I was looking for an elegant (and efficient) way > to get a named (atomic) vector by selecting one column of a data frame. > Of course, the vector names must be the rownames of the data frame. > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > wondering if there are obvious better ones, and > also if this should not become more idiomatic (hence "R-devel"): > > Consider this toy example, where the dataframe already has only > one column : > >> nv <- c(a=1, d=17, e=101); nv > a d e > 1 17 101 > >> df <- as.data.frame(cbind(VAR = nv)); df > VAR > a 1 > d 17 > e 101 > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> identical(nv, .......) > [1] TRUE > > where ...... only uses 'df' (and no non-standard R packages)? > > As said, I know a simple solution (*), but I'm sure it is not > obvious to most R users and probably not even to the majority of > R-devel readers... OTOH, people like Bill Dunlap will not take > long to provide it or a better one. But aren't you making life difficult for yourself by not using I ? df <- data.frame(VAR = I(nv)) str(df[[1]]) (which isn't quite identically because it now has the AsIs class) Hadley -- Assistant Professor Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
On 2012-08-18 11:03, Martin Maechler wrote:
> Today, I was looking for an elegant (and efficient) way > to get a named (atomic) vector by selecting one column of a data frame. > Of course, the vector names must be the rownames of the data frame. > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > wondering if there are obvious better ones, and > also if this should not become more idiomatic (hence "R-devel"): > > Consider this toy example, where the dataframe already has only > one column : > >> nv<- c(a=1, d=17, e=101); nv > a d e > 1 17 101 > >> df<- as.data.frame(cbind(VAR = nv)); df > VAR > a 1 > d 17 > e 101 > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> identical(nv, .......) > [1] TRUE > > where ...... only uses 'df' (and no non-standard R packages)? > > As said, I know a simple solution (*), but I'm sure it is not > obvious to most R users and probably not even to the majority of > R-devel readers... OTOH, people like Bill Dunlap will not take > long to provide it or a better one. > > (*) In my solution, the above '.......' consists of 17 letters. > I'll post it later today (CEST time) ... or confirm > that someone else has done so. > > Martin For this purpose my private function library has a function withnames(): withnames(): Extract from data frame as a named vector Description: Extracts data from a data frame; if the result is a vector (i.e. we extracted a single column and did not specify 'drop=FALSE') it is assigned names derived from the row names of the data frame. Usage: withnames(expr) Arguments: expr: R expression. Details: 'expr' is evaluated in an environment in which the extractor functions '$.data.frame', '[.data.frame', and '[[.data.frame' are replaced by versions that attach the data frame's row names to an extracted vector. Value: 'expr', evaluated as described above. ## Code withnames<-function(expr) { eval(substitute(expr), list( `[.data.frame` = function(x,i,...) { out<-x[i,...] if (is.null(dim(out))) names(out)<-row.names(x)[i] return(out)}, `[[.data.frame` = function(x,...) { out<-x[[...]] if (is.null(dim(out))) names(out)<-row.names(x) return(out)}, `$.data.frame` = function(x,name) { out<-x[[name, exact=FALSE]] if (is.null(dim(out))) names(out)<-row.names(x) return(out)} ), enclos=parent.frame()) } ## Examples dd <- data.frame(aa=1:6, bb=letters[c(1,3,2,3,3,1)], row.names=LETTERS[1:6]) dd dd$aa # Unnamed vector withnames(dd$aa) # Named vector withnames(dd[["aa"]]) # Named vector withnames(dd[2:4,"aa"]) # Named vector withnames(dd$bb) # Factor with names withnames(outer(dd$a,dd$a)) # Both dimensions have names ## But now I am looking for a version that will play nicely with with(): withnames(with(dd, aa)) # No names! with(dd, withnames(aa)) # No names! ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Winston Chang
That would have been essentially my suggestion as well. I prefer its clarity
(and speed). I didn't know if you wanted your solution to also apply to matrices embedded in data.frames. In S+ rownames<-() works on vectors (because it calls the generic rowId<-()) so the following works: > f4 <- function(df, column) { tmp <- df[[column]] ; rownames(tmp) <- rownames(df) ; tmp} > nv <- c(a=1,d=17,e=101) > df <- data.frame(VAR=nv, Two=3^(1:3)) > f4(df, 2) a d e 3 9 27 > df$Matrix <- matrix(1001:1006, ncol=2, nrow=3) > f4(df, "Matrix") [,1] [,2] a 1001 1004 d 1002 1005 e 1003 1006 I forget if R has something like rowIds() (it is to names and rownames as NROW is to length and nrow). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf > Of Winston Chang > Sent: Saturday, August 18, 2012 11:54 AM > To: Martin Maechler > Cc: R. Devel List > Subject: Re: [Rd] Quiz: How to get a "named column" from a data frame > > This isn't super-concise, but has the virtue of being clear: > > nv <- c(a=1, d=17, e=101) > df <- as.data.frame(cbind(VAR = nv)) > > identical(nv, setNames(df$VAR, rownames(df))) > # TRUE > > > It seems to be more efficient than the other methods as well: > > f1 <- function() setNames(df$VAR, rownames(df)) > f2 <- function() t(df)[1,] > f3 <- function() as.matrix(df)[,1] > > r <- microbenchmark(f1(), f2(), f3(), times=1000) > r > # Unit: microseconds > # expr min lq median uq max > # 1 f1() 14.589 17.0315 18.608 19.3220 89.388 > # 2 f2() 68.057 70.8735 72.240 75.8065 3707.012 > # 3 f3() 58.153 61.2600 62.521 65.0380 238.483 > > -Winston > > > > On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler > <[hidden email]> wrote: > > Today, I was looking for an elegant (and efficient) way > > to get a named (atomic) vector by selecting one column of a data frame. > > Of course, the vector names must be the rownames of the data frame. > > > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > > wondering if there are obvious better ones, and > > also if this should not become more idiomatic (hence "R-devel"): > > > > Consider this toy example, where the dataframe already has only > > one column : > > > >> nv <- c(a=1, d=17, e=101); nv > > a d e > > 1 17 101 > > > >> df <- as.data.frame(cbind(VAR = nv)); df > > VAR > > a 1 > > d 17 > > e 101 > > > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > > > >> identical(nv, .......) > > [1] TRUE > > > > where ...... only uses 'df' (and no non-standard R packages)? > > > > As said, I know a simple solution (*), but I'm sure it is not > > obvious to most R users and probably not even to the majority of > > R-devel readers... OTOH, people like Bill Dunlap will not take > > long to provide it or a better one. > > > > (*) In my solution, the above '.......' consists of 17 letters. > > I'll post it later today (CEST time) ... or confirm > > that someone else has done so. > > > > Martin > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Quigi
On Sat, Aug 18, 2012 at 02:13:20PM -0400, Christian Brechb?hler wrote:
> On 8/18/12, Martin Maechler <[hidden email]> wrote: > > On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechb?hler .... wrote: > >> On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler > >> <[hidden email]> wrote: > > >>> Consider this toy example, where the dataframe already has only > >>> one column : > >>> > >>> > nv <- c(a=1, d=17, e=101); nv > >>> a d e > >>> 1 17 101 > >>> > >>> > df <- as.data.frame(cbind(VAR = nv)); df > >>> VAR > >>> a 1 > >>> d 17 > >>> e 101 > >>> > >>> Now how, can I get 'nv' back from 'df' ? I.e., how to get > >>> identical(nv, df[,1]) > >> [1] TRUE > > > But it is not a solution in a current version of R! > > though it's still interesting that df[,1] worked in some incantation of > > R. > > My mistake! We disliked some quirks of indexing, so we've long had > our own patch for "[.data.frame" in place, which I used inadvertently. As I understand it, when when doing 'df[,1]' on a data frame, Bell Labs S and all versions of S-Plus prior to 3.4 always retained the data frame's row names as the names on the result vector. 'df[,1]' gave you a named vector identical to your 'nv' above. Then in 1996 with S-Plus 3.4, Insightful broke that behavior, after which 'df[,1]' returned a vector without any names. I believe R copied that late-1990s S-Plus behavior, but I don't know why exactly. When subscripting objects, R sometimes retains the object's dimnames as names in the result, and sometimes not, which I find frustrating. Personally, I think it would make much more sense if subscripting ALWAYS retained any names it could, and worked as similarly as possible across data frames, matrices, arrays, vectors, etc. After all, explicitly dropping names afterwards is trivial, while adding them back on is not. Back on 2005-10-19 with R 2.2.0, I gave a simple test of 15 cases; 4 of them dropped names during subscripting, the other 11 preseved them. That's towards the end of the discussion here: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=8192 Contrary to the initial tone of my old 2005 "bug" report, current R subscripting behavior is of course NOT a bug, as AFAIK it's working as the R Core Team intended. However, I definitely consider the current behavior a design infelicity. Just now on stock R 2.15.1 (with --vanilla), I ran an updated version of those same simple tests. Of 22 subscripting test cases, 7 lose names and 15 preserve them. (If anyone's interested in the specific tests, I can send them, or try to append them to that old 8192 feature request.) For what it's worth, at work, for years we ran various versions of pre-namespace R using some ugly patches of "[" and "[.data.frame" to force name retention during subscripting. Since we were not using namespaces at all, those "keep names" subscripting hacks were affecting ALL R code we ran, not just our own custom code which needed and expected the names to be retained. Yet perhaps surprisingly, I don't think I ever ran into a single case where the forced retention of names broke any code. We of course ran only a tiny sample of the huge amount of code on CRAN, but that experience suggests that most R code which expects un-named objects doesn't mind at all if names are present. If anyone would genuinely like to add an option for name-preserving subscripting to R, I'm willing to work on it, so please do let me know your thoughts. So far though, I've never dug into the guts of the .Primitive("[") and "[.data.frame" functions to see how/why they sometimes keep and sometime discard names during subscripting. -- Andrew Piskorski <[hidden email]> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
On 12-08-18 12:33 PM, Martin Maechler wrote:
>>>>>> Joshua Ulrich <[hidden email]> >>>>>> on Sat, 18 Aug 2012 10:16:09 -0500 writes: > > > I don't know if this is better, but it's the most obvious/shortest I > > could come up with. Transpose the data.frame column to a 'row' vector > > and drop the dimensions. > > R> identical(nv, drop(t(df))) > > [1] TRUE > > Yes, that's definitely shorter, > congratulations! > > One gotta is that I'd want a solution that also works when the > df has more columns than just one... > > Your idea to use t(.) is nice and "perfect" insofar as it > coerces the data frame to a matrix, and that's really the clue: > > Where as df[,1] is losing the names, > the matrix indexing is not. > So your solution can be changed into > > t(df)[1,] > > which is even shorter... > and slightly less efficient, at least conceptually, than mine, which has > been > > as.matrix(df)[,1] > > Now, the remaining question is: Shouldn't there be something > more natural to achieve that? > (There is not, currently, AFAIK). I've been offline, so I'm a bit late to this game, but the examples above fail when df contains a character column as well as the desired one, because everything gets coerced to a character matrix. You need to select the column first, then convert to a matrix, e.g. drop(t(df[,1,drop=FALSE])) Duncan Murdoch > > Martin > > > > Best, > > -- > > Joshua Ulrich | about.me/joshuaulrich > > FOSS Trading | www.fosstrading.com > > > > On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler > > <[hidden email]> wrote: > >> Today, I was looking for an elegant (and efficient) way to get a named > >> (atomic) vector by selecting one column of a data frame. Of course, > >> the vector names must be the rownames of the data frame. > >> > >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > >> wondering if there are obvious better ones, and also if this should > >> not become more idiomatic (hence "R-devel"): > >> > >> Consider this toy example, where the dataframe already has only one > >> column : > >> > >>> nv <- c(a=1, d=17, e=101); nv > >> a d e > >> 1 17 101 > >> > >>> df <- as.data.frame(cbind(VAR = nv)); df > >> VAR > >> a 1 > >> d 17 > >> e 101 > >> > >> Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> > >>> identical(nv, .......) > >> [1] TRUE > >> > >> where ...... only uses 'df' (and no non-standard R packages)? > >> > >> As said, I know a simple solution (*), but I'm sure it is not > >> obvious to most R users and probably not even to the majority of > >> R-devel readers... OTOH, people like Bill Dunlap will not take > >> long to provide it or a better one. > >> > >> (*) In my solution, the above '.......' consists of 17 letters. > >> I'll post it later today (CEST time) ... or confirm > >> that someone else has done so. > >> > >> Martin > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
On Tue, Aug 21, 2012 at 2:34 PM, Duncan Murdoch
<[hidden email]> wrote: > On 12-08-18 12:33 PM, Martin Maechler wrote: >>>>>>> >>>>>>> Joshua Ulrich <[hidden email]> >>>>>>> on Sat, 18 Aug 2012 10:16:09 -0500 writes: >> >> >> > I don't know if this is better, but it's the most obvious/shortest >> I >> > could come up with. Transpose the data.frame column to a 'row' >> vector >> > and drop the dimensions. >> >> R> identical(nv, drop(t(df))) >> > [1] TRUE >> >> Yes, that's definitely shorter, >> congratulations! >> >> One gotta is that I'd want a solution that also works when the >> df has more columns than just one... >> >> Your idea to use t(.) is nice and "perfect" insofar as it >> coerces the data frame to a matrix, and that's really the clue: >> >> Where as df[,1] is losing the names, >> the matrix indexing is not. >> So your solution can be changed into >> >> t(df)[1,] >> >> which is even shorter... >> and slightly less efficient, at least conceptually, than mine, which has >> been >> >> as.matrix(df)[,1] >> >> Now, the remaining question is: Shouldn't there be something >> more natural to achieve that? >> (There is not, currently, AFAIK). > > > I've been offline, so I'm a bit late to this game, but the examples above > fail when df contains a character column as well as the desired one, because > everything gets coerced to a character matrix. You need to select the > column first, then convert to a matrix, e.g. > > drop(t(df[,1,drop=FALSE])) > That's true, but I was assuming a one-column data.frame, which can be achieved via: df <- data.frame(VAR=nv,CHAR=letters[1:3],stringsAsFactors=FALSE) drop(t(df[1])) That said, I prefer the setNames() solution for its efficiency. Best, Josh > Duncan Murdoch > > >> >> Martin >> >> >> > Best, >> > -- >> > Joshua Ulrich | about.me/joshuaulrich >> > FOSS Trading | www.fosstrading.com >> >> >> > On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler >> > <[hidden email]> wrote: >> >> Today, I was looking for an elegant (and efficient) way to get a >> named >> >> (atomic) vector by selecting one column of a data frame. Of >> course, >> >> the vector names must be the rownames of the data frame. >> >> >> >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but >> was >> >> wondering if there are obvious better ones, and also if this >> should >> >> not become more idiomatic (hence "R-devel"): >> >> >> >> Consider this toy example, where the dataframe already has only >> one >> >> column : >> >> >> >>> nv <- c(a=1, d=17, e=101); nv >> >> a d e >> >> 1 17 101 >> >> >> >>> df <- as.data.frame(cbind(VAR = nv)); df >> >> VAR >> >> a 1 >> >> d 17 >> >> e 101 >> >> >> >> Now how, can I get 'nv' back from 'df' ? I.e., how to get >> >> >> >>> identical(nv, .......) >> >> [1] TRUE >> >> >> >> where ...... only uses 'df' (and no non-standard R packages)? >> >> >> >> As said, I know a simple solution (*), but I'm sure it is not >> >> obvious to most R users and probably not even to the majority of >> >> R-devel readers... OTOH, people like Bill Dunlap will not take >> >> long to provide it or a better one. >> >> >> >> (*) In my solution, the above '.......' consists of 17 letters. >> >> I'll post it later today (CEST time) ... or confirm >> >> that someone else has done so. >> >> >> >> Martin >> >> >> >> ______________________________________________ >> >> [hidden email] mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
| Powered by Nabble | Edit this page |
