Hi all,
Let's say I have a data.frame and wants to turn each of it's columns into a factor. My instinct would be to use as.factor with apply. But this won't work, and result with a data.frame of characters. I found another solution for how to achieve this, but I would also like to understand - *WHY* does it work this way? Here is an example script: a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), x3 = factor(c(rep("a",50) , rep("b",50)))) apply(a2, 2,class) # why is column 3 not a factor ? a[,3] # since it IS a factor. a2 <- apply(a, 2,as.factor) # won't work - why not ? a2[,3] # Why was this just turned into a character ??? # A solution a2 <- lapply(a, as.factor) a3 <- as.data.frame(a2) str(a3) Thanks, Tal ----------------Contact Details:------------------------------------------------------- Contact me: [hidden email] | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
The basic reason because apply works with matrices - it first turns
the input into a matrix, processes each column and then returns a matrix. See colwise in the plyr package for a function that works column wise on a data frame, returning a data frame. Hadley On Sun, Mar 7, 2010 at 11:07 AM, Tal Galili <[hidden email]> wrote: > Hi all, > > Let's say I have a data.frame and wants to turn each of it's columns into a > factor. > My instinct would be to use as.factor with apply. But this won't work, and > result with a data.frame of characters. > I found another solution for how to achieve this, but I would also like to > understand - *WHY* does it work this way? > > Here is an example script: > a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), > x3 = factor(c(rep("a",50) , rep("b",50)))) > apply(a2, 2,class) # why is column 3 not a factor ? > a[,3] # since it IS a factor. > a2 <- apply(a, 2,as.factor) # won't work - why not ? > a2[,3] # Why was this just turned into a character ??? > # A solution > a2 <- lapply(a, as.factor) > a3 <- as.data.frame(a2) > str(a3) > > > Thanks, > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: [hidden email] | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Thanks for the explanation (and the function) Hadley.
Tal ----------------Contact Details:------------------------------------------------------- Contact me: [hidden email] | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Sun, Mar 7, 2010 at 2:05 PM, hadley wickham <[hidden email]> wrote: > The basic reason because apply works with matrices - it first turns > the input into a matrix, processes each column and then returns a > matrix. See colwise in the plyr package for a function that works > column wise on a data frame, returning a data frame. > > Hadley > > On Sun, Mar 7, 2010 at 11:07 AM, Tal Galili <[hidden email]> wrote: > > Hi all, > > > > Let's say I have a data.frame and wants to turn each of it's columns into > a > > factor. > > My instinct would be to use as.factor with apply. But this won't work, > and > > result with a data.frame of characters. > > I found another solution for how to achieve this, but I would also like > to > > understand - *WHY* does it work this way? > > > > Here is an example script: > > a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = > T), > > x3 = factor(c(rep("a",50) , rep("b",50)))) > > apply(a2, 2,class) # why is column 3 not a factor ? > > a[,3] # since it IS a factor. > > a2 <- apply(a, 2,as.factor) # won't work - why not ? > > a2[,3] # Why was this just turned into a character ??? > > # A solution > > a2 <- lapply(a, as.factor) > > a3 <- as.data.frame(a2) > > str(a3) > > > > > > Thanks, > > Tal > > > > > > > > ----------------Contact > > Details:------------------------------------------------------- > > Contact me: [hidden email] | 972-52-7275845 > > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > > www.r-statistics.com (English) > > > ---------------------------------------------------------------------------------------------- > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Tal Galili
And just a small followup. To find out what class each column is, you wanted
> lapply(a,class) $x1 [1] "numeric" $x2 [1] "factor" $x3 [1] "factor" With regard to your solution, and why it works, it is my understanding that data frames are in some sense actually lists, each column corresponding to one element in a list. Hence, lapply() works column-wise on data frames. Also for this reason it's pretty easy to convert back and forth between data frames and lists . Provided, of course, that each element of the list has an appropriate structure; see this example: > data.frame( list(a=1:2, b=3:4) ) a b 1 1 3 2 2 4 > data.frame( list(a=1:2, b=3:7) ) Error in data.frame(a = 1:2, b = 3:7, check.names = FALSE, stringsAsFactors = TRUE) : arguments imply differing number of rows: 2, 5 No doubt there are subtle details, but don't ask me to provide details on what exactly the "some sense" is! -Don At 12:07 PM +0200 3/7/10, Tal Galili wrote: >Hi all, > >Let's say I have a data.frame and wants to turn each of it's columns into a >factor. >My instinct would be to use as.factor with apply. But this won't work, and >result with a data.frame of characters. >I found another solution for how to achieve this, but I would also like to >understand - *WHY* does it work this way? > >Here is an example script: >a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), >x3 = factor(c(rep("a",50) , rep("b",50)))) >apply(a2, 2,class) # why is column 3 not a factor ? >a[,3] # since it IS a factor. >a2 <- apply(a, 2,as.factor) # won't work - why not ? >a2[,3] # Why was this just turned into a character ??? ># A solution >a2 <- lapply(a, as.factor) >a3 <- as.data.frame(a2) >str(a3) > > >Thanks, >Tal > > > >----------------Contact >Details:------------------------------------------------------- >Contact me: [hidden email] | 972-52-7275845 >Read me: www.*talgalili.com (Hebrew) | www.*biostatistics.co.il (Hebrew) | >www.*r-statistics.com (English) >---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > >______________________________________________ >[hidden email] mailing list >https://*stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- --------------------------------- Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Mar 7, 2010, at 3:20 PM, Don MacQueen wrote: > And just a small followup. To find out what class each column is, > you wanted > >> lapply(a,class) > $x1 > [1] "numeric" > > $x2 > [1] "factor" > > $x3 > [1] "factor" > > With regard to your solution, and why it works, it is my > understanding that data frames are in some sense actually lists, > each column corresponding to one element in a list. > > Hence, lapply() works column-wise on data frames. > > Also for this reason it's pretty easy to convert back and forth > between data frames and lists . Provided, of course, that each > element of the list has an appropriate structure; see this example: > >> data.frame( list(a=1:2, b=3:4) ) > a b > 1 1 3 > 2 2 4 > >> data.frame( list(a=1:2, b=3:7) ) > Error in data.frame(a = 1:2, b = 3:7, check.names = FALSE, > stringsAsFactors = TRUE) : > arguments imply differing number of rows: 2, 5 > > > No doubt there are subtle details, but don't ask me to provide > details on what exactly the "some sense" is! It's not that complicated: > class(dfrm) [1] "data.frame" > is.list(dfrm) [1] TRUE > > dput(dfrm) structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names = c(NA, -2L), class = "data.frame") # Let's do some violence to this dataframe ... > class(dfrm) <- "list" > dfrm $a [1] 1 2 $b [1] 3 4 attr(,"row.names") [1] 1 2 > is.data.frame(dfrm) [1] FALSE > is.data.frame(as.data.frame(dfrm)) [1] TRUE > dput(dfrm) structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names = c(NA, -2L)) # Now let's restore it to its original data.frame-ish state: > class(dfrm) <- "data.frame" > dput(dfrm) structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names = c(NA, -2L), class = "data.frame") > > -Don > > At 12:07 PM +0200 3/7/10, Tal Galili wrote: >> Hi all, >> >> Let's say I have a data.frame and wants to turn each of it's >> columns into a >> factor. >> My instinct would be to use as.factor with apply. But this won't >> work, and >> result with a data.frame of characters. >> I found another solution for how to achieve this, but I would also >> like to >> understand - *WHY* does it work this way? >> >> Here is an example script: >> a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, >> replace = T), >> x3 = factor(c(rep("a",50) , rep("b",50)))) >> apply(a2, 2,class) # why is column 3 not a factor ? >> a[,3] # since it IS a factor. >> a2 <- apply(a, 2,as.factor) # won't work - why not ? >> a2[,3] # Why was this just turned into a character ??? >> # A solution >> a2 <- lapply(a, as.factor) >> a3 <- as.data.frame(a2) >> str(a3) >> >> >> Thanks, >> Tal >> >> >> >> ----------------Contact >> Details:------------------------------------------------------- >> Contact me: [hidden email] | 972-52-7275845 >> Read me: www.*talgalili.com (Hebrew) | www.*biostatistics.co.il >> (Hebrew) | >> www.*r-statistics.com (English) >> ---------------------------------------------------------------------------------------------- >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list >> https://*stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://*www.*R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > -- > --------------------------------- > Don MacQueen > Lawrence Livermore National Laboratory > Livermore, CA, USA > 925-423-1062 > [hidden email] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |