Why can't "apply" be used with "as.factor" on a data.frame ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Why can't "apply" be used with "as.factor" on a data.frame ?

Tal Galili
Hi all,

Let's say I have a data.frame and wants to turn each of it's columns into a
factor.
My instinct would be to use as.factor with apply. But this won't work, and
result with a data.frame of characters.
I found another solution for how to achieve this, but I would also like to
understand - *WHY* does it work this way?

Here is an example script:
a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T),
x3 = factor(c(rep("a",50) , rep("b",50))))
apply(a2, 2,class) # why is column 3 not a factor ?
a[,3]  # since it IS a factor.
a2 <- apply(a, 2,as.factor) # won't work - why not ?
a2[,3]  # Why was this just turned into a character ???
# A solution
a2 <- lapply(a, as.factor)
a3 <- as.data.frame(a2)
str(a3)


Thanks,
Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: [hidden email] |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't "apply" be used with "as.factor" on a data.frame ?

hadley wickham
The basic reason because apply works with matrices - it first turns
the input into a matrix, processes each column and then returns a
matrix.  See colwise in the plyr package for a function that works
column wise on a data frame, returning a data frame.

Hadley

On Sun, Mar 7, 2010 at 11:07 AM, Tal Galili <[hidden email]> wrote:

> Hi all,
>
> Let's say I have a data.frame and wants to turn each of it's columns into a
> factor.
> My instinct would be to use as.factor with apply. But this won't work, and
> result with a data.frame of characters.
> I found another solution for how to achieve this, but I would also like to
> understand - *WHY* does it work this way?
>
> Here is an example script:
> a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T),
> x3 = factor(c(rep("a",50) , rep("b",50))))
> apply(a2, 2,class) # why is column 3 not a factor ?
> a[,3]  # since it IS a factor.
> a2 <- apply(a, 2,as.factor) # won't work - why not ?
> a2[,3]  # Why was this just turned into a character ???
> # A solution
> a2 <- lapply(a, as.factor)
> a3 <- as.data.frame(a2)
> str(a3)
>
>
> Thanks,
> Tal
>
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: [hidden email] |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't "apply" be used with "as.factor" on a data.frame ?

Tal Galili
Thanks for the explanation (and the function) Hadley.

Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: [hidden email] |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Sun, Mar 7, 2010 at 2:05 PM, hadley wickham <[hidden email]> wrote:

> The basic reason because apply works with matrices - it first turns
> the input into a matrix, processes each column and then returns a
> matrix.  See colwise in the plyr package for a function that works
> column wise on a data frame, returning a data frame.
>
> Hadley
>
> On Sun, Mar 7, 2010 at 11:07 AM, Tal Galili <[hidden email]> wrote:
> > Hi all,
> >
> > Let's say I have a data.frame and wants to turn each of it's columns into
> a
> > factor.
> > My instinct would be to use as.factor with apply. But this won't work,
> and
> > result with a data.frame of characters.
> > I found another solution for how to achieve this, but I would also like
> to
> > understand - *WHY* does it work this way?
> >
> > Here is an example script:
> > a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace =
> T),
> > x3 = factor(c(rep("a",50) , rep("b",50))))
> > apply(a2, 2,class) # why is column 3 not a factor ?
> > a[,3]  # since it IS a factor.
> > a2 <- apply(a, 2,as.factor) # won't work - why not ?
> > a2[,3]  # Why was this just turned into a character ???
> > # A solution
> > a2 <- lapply(a, as.factor)
> > a3 <- as.data.frame(a2)
> > str(a3)
> >
> >
> > Thanks,
> > Tal
> >
> >
> >
> > ----------------Contact
> > Details:-------------------------------------------------------
> > Contact me: [hidden email] |  972-52-7275845
> > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> > www.r-statistics.com (English)
> >
> ----------------------------------------------------------------------------------------------
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't "apply" be used with "as.factor" on a data.frame ?

Don MacQueen
In reply to this post by Tal Galili
And just a small followup. To find out what class each column is, you wanted

>  lapply(a,class)
$x1
[1] "numeric"

$x2
[1] "factor"

$x3
[1] "factor"

With regard to your solution, and why it works, it is my
understanding that data frames are in some sense actually lists, each
column corresponding to one element in a list.

Hence, lapply() works column-wise on data frames.

Also for this reason it's pretty easy to convert back and forth
between data frames and lists . Provided, of course, that each
element of the list has an appropriate structure; see this example:

>  data.frame( list(a=1:2, b=3:4) )
   a b
1 1 3
2 2 4

>  data.frame( list(a=1:2, b=3:7) )
Error in data.frame(a = 1:2, b = 3:7, check.names = FALSE,
stringsAsFactors = TRUE) :
   arguments imply differing number of rows: 2, 5


No doubt there are subtle details, but don't ask me to provide
details on what exactly the "some sense" is!

-Don

At 12:07 PM +0200 3/7/10, Tal Galili wrote:

>Hi all,
>
>Let's say I have a data.frame and wants to turn each of it's columns into a
>factor.
>My instinct would be to use as.factor with apply. But this won't work, and
>result with a data.frame of characters.
>I found another solution for how to achieve this, but I would also like to
>understand - *WHY* does it work this way?
>
>Here is an example script:
>a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T),
>x3 = factor(c(rep("a",50) , rep("b",50))))
>apply(a2, 2,class) # why is column 3 not a factor ?
>a[,3]  # since it IS a factor.
>a2 <- apply(a, 2,as.factor) # won't work - why not ?
>a2[,3]  # Why was this just turned into a character ???
># A solution
>a2 <- lapply(a, as.factor)
>a3 <- as.data.frame(a2)
>str(a3)
>
>
>Thanks,
>Tal
>
>
>
>----------------Contact
>Details:-------------------------------------------------------
>Contact me: [hidden email] |  972-52-7275845
>Read me: www.*talgalili.com (Hebrew) | www.*biostatistics.co.il (Hebrew) |
>www.*r-statistics.com (English)
>----------------------------------------------------------------------------------------------
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


--
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't "apply" be used with "as.factor" on a data.frame ?

David Winsemius

On Mar 7, 2010, at 3:20 PM, Don MacQueen wrote:

> And just a small followup. To find out what class each column is,  
> you wanted
>
>> lapply(a,class)
> $x1
> [1] "numeric"
>
> $x2
> [1] "factor"
>
> $x3
> [1] "factor"
>
> With regard to your solution, and why it works, it is my  
> understanding that data frames are in some sense actually lists,  
> each column corresponding to one element in a list.
>
> Hence, lapply() works column-wise on data frames.
>
> Also for this reason it's pretty easy to convert back and forth  
> between data frames and lists . Provided, of course, that each  
> element of the list has an appropriate structure; see this example:
>
>> data.frame( list(a=1:2, b=3:4) )
>  a b
> 1 1 3
> 2 2 4
>
>> data.frame( list(a=1:2, b=3:7) )
> Error in data.frame(a = 1:2, b = 3:7, check.names = FALSE,  
> stringsAsFactors = TRUE) :
>  arguments imply differing number of rows: 2, 5
>
>
> No doubt there are subtle details, but don't ask me to provide  
> details on what exactly the "some sense" is!

It's not that complicated:

 > class(dfrm)
[1] "data.frame"
 > is.list(dfrm)
[1] TRUE
 >
 > dput(dfrm)
structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names =  
c(NA,
-2L), class = "data.frame")
# Let's do some violence to this dataframe ...
 > class(dfrm) <- "list"
 > dfrm
$a
[1] 1 2

$b
[1] 3 4

attr(,"row.names")
[1] 1 2
 > is.data.frame(dfrm)
[1] FALSE
 > is.data.frame(as.data.frame(dfrm))
[1] TRUE
 > dput(dfrm)
structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names =  
c(NA,
-2L))
# Now let's restore it to its original data.frame-ish state:
 > class(dfrm) <- "data.frame"
 > dput(dfrm)
structure(list(a = 1:2, b = 3:4), .Names = c("a", "b"), row.names =  
c(NA,
-2L), class = "data.frame")

>
> -Don
>
> At 12:07 PM +0200 3/7/10, Tal Galili wrote:
>> Hi all,
>>
>> Let's say I have a data.frame and wants to turn each of it's  
>> columns into a
>> factor.
>> My instinct would be to use as.factor with apply. But this won't  
>> work, and
>> result with a data.frame of characters.
>> I found another solution for how to achieve this, but I would also  
>> like to
>> understand - *WHY* does it work this way?
>>
>> Here is an example script:
>> a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100,  
>> replace = T),
>> x3 = factor(c(rep("a",50) , rep("b",50))))
>> apply(a2, 2,class) # why is column 3 not a factor ?
>> a[,3]  # since it IS a factor.
>> a2 <- apply(a, 2,as.factor) # won't work - why not ?
>> a2[,3]  # Why was this just turned into a character ???
>> # A solution
>> a2 <- lapply(a, as.factor)
>> a3 <- as.data.frame(a2)
>> str(a3)
>>
>>
>> Thanks,
>> Tal
>>
>>
>>
>> ----------------Contact
>> Details:-------------------------------------------------------
>> Contact me: [hidden email] |  972-52-7275845
>> Read me: www.*talgalili.com (Hebrew) | www.*biostatistics.co.il  
>> (Hebrew) |
>> www.*r-statistics.com (English)
>> ----------------------------------------------------------------------------------------------
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://*stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://*www.*R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> --
> ---------------------------------
> Don MacQueen
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
> 925-423-1062
> [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.