

Hi all,
Let's say I have a data.frame and wants to turn each of it's columns into a
factor.
My instinct would be to use as.factor with apply. But this won't work, and
result with a data.frame of characters.
I found another solution for how to achieve this, but I would also like to
understand  *WHY* does it work this way?
Here is an example script:
a < data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T),
x3 = factor(c(rep("a",50) , rep("b",50))))
apply(a2, 2,class) # why is column 3 not a factor ?
a[,3] # since it IS a factor.
a2 < apply(a, 2,as.factor) # won't work  why not ?
a2[,3] # Why was this just turned into a character ???
# A solution
a2 < lapply(a, as.factor)
a3 < as.data.frame(a2)
str(a3)
Thanks,
Tal
Contact
Details:
Contact me: [hidden email]  972527275845
Read me: www.talgalili.com (Hebrew)  www.biostatistics.co.il (Hebrew) 
www.rstatistics.com (English)

[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The basic reason because apply works with matrices  it first turns
the input into a matrix, processes each column and then returns a
matrix. See colwise in the plyr package for a function that works
column wise on a data frame, returning a data frame.
Hadley
On Sun, Mar 7, 2010 at 11:07 AM, Tal Galili < [hidden email]> wrote:
> Hi all,
>
> Let's say I have a data.frame and wants to turn each of it's columns into a
> factor.
> My instinct would be to use as.factor with apply. But this won't work, and
> result with a data.frame of characters.
> I found another solution for how to achieve this, but I would also like to
> understand  *WHY* does it work this way?
>
> Here is an example script:
> a < data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T),
> x3 = factor(c(rep("a",50) , rep("b",50))))
> apply(a2, 2,class) # why is column 3 not a factor ?
> a[,3] # since it IS a factor.
> a2 < apply(a, 2,as.factor) # won't work  why not ?
> a2[,3] # Why was this just turned into a character ???
> # A solution
> a2 < lapply(a, as.factor)
> a3 < as.data.frame(a2)
> str(a3)
>
>
> Thanks,
> Tal
>
>
>
> Contact
> Details:
> Contact me: [hidden email]  972527275845
> Read me: www.talgalili.com (Hebrew)  www.biostatistics.co.il (Hebrew) 
> www.rstatistics.com (English)
> 
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>

Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks for the explanation (and the function) Hadley.
Tal
Contact
Details:
Contact me: [hidden email]  972527275845
Read me: www.talgalili.com (Hebrew)  www.biostatistics.co.il (Hebrew) 
www.rstatistics.com (English)

On Sun, Mar 7, 2010 at 2:05 PM, hadley wickham < [hidden email]> wrote:
> The basic reason because apply works with matrices  it first turns
> the input into a matrix, processes each column and then returns a
> matrix. See colwise in the plyr package for a function that works
> column wise on a data frame, returning a data frame.
>
> Hadley
>
> On Sun, Mar 7, 2010 at 11:07 AM, Tal Galili < [hidden email]> wrote:
> > Hi all,
> >
> > Let's say I have a data.frame and wants to turn each of it's columns into
> a
> > factor.
> > My instinct would be to use as.factor with apply. But this won't work,
> and
> > result with a data.frame of characters.
> > I found another solution for how to achieve this, but I would also like
> to
> > understand  *WHY* does it work this way?
> >
> > Here is an example script:
> > a < data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace =
> T),
> > x3 = factor(c(rep("a",50) , rep("b",50))))
> > apply(a2, 2,class) # why is column 3 not a factor ?
> > a[,3] # since it IS a factor.
> > a2 < apply(a, 2,as.factor) # won't work  why not ?
> > a2[,3] # Why was this just turned into a character ???
> > # A solution
> > a2 < lapply(a, as.factor)
> > a3 < as.data.frame(a2)
> > str(a3)
> >
> >
> > Thanks,
> > Tal
> >
> >
> >
> > Contact
> > Details:
> > Contact me: [hidden email]  972527275845
> > Read me: www.talgalili.com (Hebrew)  www.biostatistics.co.il (Hebrew) 
> > www.rstatistics.com (English)
> >
> 
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
> >
>
>
>
> 
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


And just a small followup. To find out what class each column is, you wanted
> lapply(a,class)
$x1
[1] "numeric"
$x2
[1] "factor"
$x3
[1] "factor"
With regard to your solution, and why it works, it is my
understanding that data frames are in some sense actually lists, each
column corresponding to one element in a list.
Hence, lapply() works columnwise on data frames.
Also for this reason it's pretty easy to convert back and forth
between data frames and lists . Provided, of course, that each
element of the list has an appropriate structure; see this example:
> data.frame( list(a=1:2, b=3:4) )
a b
1 1 3
2 2 4
> data.frame( list(a=1:2, b=3:7) )
Error in data.frame(a = 1:2, b = 3:7, check.names = FALSE,
stringsAsFactors = TRUE) :
arguments imply differing number of rows: 2, 5
No doubt there are subtle details, but don't ask me to provide
details on what exactly the "some sense" is!
Don
At 12:07 PM +0200 3/7/10, Tal Galili wrote:
>Hi all,
>
>Let's say I have a data.frame and wants to turn each of it's columns into a
>factor.
>My instinct would be to use as.factor with apply. But this won't work, and
>result with a data.frame of characters.
>I found another solution for how to achieve this, but I would also like to
>understand  *WHY* does it work this way?
>
>Here is an example script:
>a < data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T),
>x3 = factor(c(rep("a",50) , rep("b",50))))
>apply(a2, 2,class) # why is column 3 not a factor ?
>a[,3] # since it IS a factor.
>a2 < apply(a, 2,as.factor) # won't work  why not ?
>a2[,3] # Why was this just turned into a character ???
># A solution
>a2 < lapply(a, as.factor)
>a3 < as.data.frame(a2)
>str(a3)
>
>
>Thanks,
>Tal
>
>
>
>Contact
>Details:
>Contact me: [hidden email]  972527275845
>Read me: www.*talgalili.com (Hebrew)  www.*biostatistics.co.il (Hebrew) 
>www.*rstatistics.com (English)
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
> [hidden email] mailing list
> https://*stat.ethz.ch/mailman/listinfo/rhelp>PLEASE do read the posting guide http://*www.*Rproject.org/postingguide.html>and provide commented, minimal, selfcontained, reproducible code.


Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
9254231062
[hidden email]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Mar 7, 2010, at 3:20 PM, Don MacQueen wrote:
> And just a small followup. To find out what class each column is,
> you wanted
>
>> lapply(a,class)
> $x1
> [1] "numeric"
>
> $x2
> [1] "factor"
>
> $x3
> [1] "factor"
>
> With regard to your solution, and why it works, it is my
> understanding that data frames are in some sense actually lists,
> each column corresponding to one element in a list.
>
> Hence, lapply() works columnwise on data frames.
>
> Also for this reason it's pretty easy to convert back and forth
> between data frames and lists . Provided, of course, that each
> element of the list has an appropriate structure; see this example:
>
>> data.frame( list(a=1:2, b=3:4) )
> a b
> 1 1 3
> 2 2 4
>
>> data.frame( list(a=1:2, b=3:7) )
> Error in data.frame(a = 1:2, b = 3:7, check.names = FALSE,
> stringsAsFactors = TRUE) :
> arguments imply differing number of rows: 2, 5
>
>
> No doubt there are subtle details, but don't ask me to provide
> details on what exactly the "some sense" is!
It's not that complicated:
> class(dfrm)
[1] "data.frame"
> is.list(dfrm)
[1] TRUE
>
> dput(dfrm)
structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names =
c(NA,
2L), class = "data.frame")
# Let's do some violence to this dataframe ...
> class(dfrm) < "list"
> dfrm
$a
[1] 1 2
$b
[1] 3 4
attr(,"row.names")
[1] 1 2
> is.data.frame(dfrm)
[1] FALSE
> is.data.frame(as.data.frame(dfrm))
[1] TRUE
> dput(dfrm)
structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names =
c(NA,
2L))
# Now let's restore it to its original data.frameish state:
> class(dfrm) < "data.frame"
> dput(dfrm)
structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names =
c(NA,
2L), class = "data.frame")
>
> Don
>
> At 12:07 PM +0200 3/7/10, Tal Galili wrote:
>> Hi all,
>>
>> Let's say I have a data.frame and wants to turn each of it's
>> columns into a
>> factor.
>> My instinct would be to use as.factor with apply. But this won't
>> work, and
>> result with a data.frame of characters.
>> I found another solution for how to achieve this, but I would also
>> like to
>> understand  *WHY* does it work this way?
>>
>> Here is an example script:
>> a < data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100,
>> replace = T),
>> x3 = factor(c(rep("a",50) , rep("b",50))))
>> apply(a2, 2,class) # why is column 3 not a factor ?
>> a[,3] # since it IS a factor.
>> a2 < apply(a, 2,as.factor) # won't work  why not ?
>> a2[,3] # Why was this just turned into a character ???
>> # A solution
>> a2 < lapply(a, as.factor)
>> a3 < as.data.frame(a2)
>> str(a3)
>>
>>
>> Thanks,
>> Tal
>>
>>
>>
>> Contact
>> Details:
>> Contact me: [hidden email]  972527275845
>> Read me: www.*talgalili.com (Hebrew)  www.*biostatistics.co.il
>> (Hebrew) 
>> www.*rstatistics.com (English)
>> 
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://*stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://*www.*Rproject.org/posting
>> guide.html
>> and provide commented, minimal, selfcontained, reproducible code.
>
>
> 
> 
> Don MacQueen
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
> 9254231062
> [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

