cannot turn some columns in a data frame into factors

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

cannot turn some columns in a data frame into factors

Sam Steingold-2
Hi,
I have a data frame df and a list of names of columns that I want to
turn into factors:

  df.names <- attr(df,"names")
  sapply(factors, function (name) {
    pos <- match(name,df.names)
    if (is.na(pos)) stop(paste(name,": no such column\n"))
    df[[pos]] <- factor(df[[pos]])
    cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
  })
  cat("factors:",sapply(df,is.factor),"\n")

the output is:


Month ( 1 ): TRUE
factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


i.e., there is a column named "Month" (the 1st column), and it is indeed
turned into a factor inside sapply(), but after that it is numerical
again!

what am I doing wrong?

--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux)
http://honestreporting.com http://truepeace.org http://openvotingconsortium.org
http://thereligionofpeace.com http://memri.org http://palestinefacts.org
UNIX, car: hard to learn/easy to use; Windows, bike: hard to learn/hard to use.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: cannot turn some columns in a data frame into factors

jholtman
try '<<-' as the assignment to make it global.

     df[[pos]] <<- factor(df[[pos]])



On 5/11/06, Sam Steingold <[hidden email]> wrote:

>
> Hi,
> I have a data frame df and a list of names of columns that I want to
> turn into factors:
>
> df.names <- attr(df,"names")
> sapply(factors, function (name) {
>    pos <- match(name,df.names)
>    if (is.na(pos)) stop(paste(name,": no such column\n"))
>    df[[pos]] <- factor(df[[pos]])
>    cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
> })
> cat("factors:",sapply(df,is.factor),"\n")
>
> the output is:
>
>
> Month ( 1 ): TRUE
> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>
> i.e., there is a column named "Month" (the 1st column), and it is indeed
> turned into a factor inside sapply(), but after that it is numerical
> again!
>
> what am I doing wrong?
>
> --
> Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5
> (Bordeaux)
> http://honestreporting.com http://truepeace.org
> http://openvotingconsortium.org
> http://thereligionofpeace.com http://memri.org http://palestinefacts.org
> UNIX, car: hard to learn/easy to use; Windows, bike: hard to learn/hard to
> use.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: cannot turn some columns in a data frame into factors

Sam Steingold-2
> * jim holtman <[hidden email]> [2006-05-11 12:27:39 -0400]:
>
> try '<<-' as the assignment to make it global.
>
>      df[[pos]] <<- factor(df[[pos]])

nothing changed -- I observe the exact same behaviour:

Month ( 1 ): TRUE
factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


> On 5/11/06, Sam Steingold <[hidden email]> wrote:
>>
>> Hi,
>> I have a data frame df and a list of names of columns that I want to
>> turn into factors:
>>
>> df.names <- attr(df,"names")
>> sapply(factors, function (name) {
>>    pos <- match(name,df.names)
>>    if (is.na(pos)) stop(paste(name,": no such column\n"))
>>    df[[pos]] <- factor(df[[pos]])
>>    cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
>> })
>> cat("factors:",sapply(df,is.factor),"\n")
>>
>> the output is:
>>
>>
>> Month ( 1 ): TRUE
>> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>
>>
>> i.e., there is a column named "Month" (the 1st column), and it is indeed
>> turned into a factor inside sapply(), but after that it is numerical
>> again!
>>
>> what am I doing wrong?

--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux)
http://pmw.org.il http://ffii.org http://memri.org http://palestinefacts.org
http://truepeace.org http://mideasttruth.com http://dhimmi.com
If you're being passed on the right, you're in the wrong lane.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: cannot turn some columns in a data frame into factors

Douglas Grove
You need to create a new object and assign it to 'df'

so you'd do something like this:

df <- sapply(factors, function (name) {
             pos <- match(name,df.names)
             factor(df[[pos]])
             })
             

Doug






On Thu, 11 May 2006, Sam Steingold wrote:

> > * jim holtman <[hidden email]> [2006-05-11 12:27:39 -0400]:
> >
> > try '<<-' as the assignment to make it global.
> >
> >      df[[pos]] <<- factor(df[[pos]])
>
> nothing changed -- I observe the exact same behaviour:
>
> Month ( 1 ): TRUE
> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>
> > On 5/11/06, Sam Steingold <[hidden email]> wrote:
> >>
> >> Hi,
> >> I have a data frame df and a list of names of columns that I want to
> >> turn into factors:
> >>
> >> df.names <- attr(df,"names")
> >> sapply(factors, function (name) {
> >>    pos <- match(name,df.names)
> >>    if (is.na(pos)) stop(paste(name,": no such column\n"))
> >>    df[[pos]] <- factor(df[[pos]])
> >>    cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
> >> })
> >> cat("factors:",sapply(df,is.factor),"\n")
> >>
> >> the output is:
> >>
> >>
> >> Month ( 1 ): TRUE
> >> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >>
> >>
> >> i.e., there is a column named "Month" (the 1st column), and it is indeed
> >> turned into a factor inside sapply(), but after that it is numerical
> >> again!
> >>
> >> what am I doing wrong?
>
> --
> Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux)
> http://pmw.org.il http://ffii.org http://memri.org http://palestinefacts.org
> http://truepeace.org http://mideasttruth.com http://dhimmi.com
> If you're being passed on the right, you're in the wrong lane.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: cannot turn some columns in a data frame into factors

Sam Steingold-2
> * Douglas Grove <[hidden email]> [2006-05-11 09:41:07 -0700]:
>
> You need to create a new object and assign it to 'df'

why can't I modify an existing object?!

> so you'd do something like this:
>
> df <- sapply(factors, function (name) {
>              pos <- match(name,df.names)
>              factor(df[[pos]])
>              })

this cannot be right:
this will make df have the same length as factors.


> On Thu, 11 May 2006, Sam Steingold wrote:
>
>> > * jim holtman <[hidden email]> [2006-05-11 12:27:39 -0400]:
>> >
>> > try '<<-' as the assignment to make it global.
>> >
>> >      df[[pos]] <<- factor(df[[pos]])
>>
>> nothing changed -- I observe the exact same behaviour:
>>
>> Month ( 1 ): TRUE
>> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>
>>
>> > On 5/11/06, Sam Steingold <[hidden email]> wrote:
>> >>
>> >> Hi,
>> >> I have a data frame df and a list of names of columns that I want to
>> >> turn into factors:
>> >>
>> >> df.names <- attr(df,"names")
>> >> sapply(factors, function (name) {
>> >>    pos <- match(name,df.names)
>> >>    if (is.na(pos)) stop(paste(name,": no such column\n"))
>> >>    df[[pos]] <- factor(df[[pos]])
>> >>    cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
>> >> })
>> >> cat("factors:",sapply(df,is.factor),"\n")
>> >>
>> >> the output is:
>> >>
>> >>
>> >> Month ( 1 ): TRUE
>> >> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>> >>
>> >>
>> >> i.e., there is a column named "Month" (the 1st column), and it is indeed
>> >> turned into a factor inside sapply(), but after that it is numerical
>> >> again!
>> >>
>> >> what am I doing wrong?
>>
>> --
>> Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux)
>> http://pmw.org.il http://ffii.org http://memri.org http://palestinefacts.org
>> http://truepeace.org http://mideasttruth.com http://dhimmi.com
>> If you're being passed on the right, you're in the wrong lane.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux)
http://dhimmi.com http://thereligionofpeace.com http://memri.org
http://honestreporting.com http://mideasttruth.com
There are two ways to write error-free programs; only the third one works.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: cannot turn some columns in a data frame into factors

jholtman
In reply to this post by Sam Steingold-2
This seems to have worked for me:


> df <- data.frame(a=1:10, b=1:10, c=1:10)
> str(df)
`data.frame':   10 obs. of  3 variables:
 $ a: int  1 2 3 4 5 6 7 8 9 10
 $ b: int  1 2 3 4 5 6 7 8 9 10
 $ c: int  1 2 3 4 5 6 7 8 9 10
> df.names <- attr(df,"names")
> factors <- 'c'
> sapply(factors, function (name) {
+    pos <- match(name,df.names)
+    if (is.na(pos)) stop(paste(name,": no such column\n"))
+    df[[pos]] <<- factor(df[[pos]])
+    cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
+ })
c ( 3 ): TRUE
$c
NULL

> cat("factors:",sapply(df,is.factor),"\n")
factors: FALSE FALSE TRUE
> str(df)
`data.frame':   10 obs. of  3 variables:
 $ a: int  1 2 3 4 5 6 7 8 9 10
 $ b: int  1 2 3 4 5 6 7 8 9 10
 $ c: Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10
>





On 5/11/06, Sam Steingold <[hidden email]> wrote:

>
> > * jim holtman <[hidden email]> [2006-05-11 12:27:39 -0400]:
> >
> > try '<<-' as the assignment to make it global.
> >
> >      df[[pos]] <<- factor(df[[pos]])
>
> nothing changed -- I observe the exact same behaviour:
>
> Month ( 1 ): TRUE
> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>
> > On 5/11/06, Sam Steingold <[hidden email]> wrote:
> >>
> >> Hi,
> >> I have a data frame df and a list of names of columns that I want to
> >> turn into factors:
> >>
> >> df.names <- attr(df,"names")
> >> sapply(factors, function (name) {
> >>    pos <- match(name,df.names)
> >>    if (is.na(pos)) stop(paste(name,": no such column\n"))
> >>    df[[pos]] <- factor(df[[pos]])
> >>    cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
> >> })
> >> cat("factors:",sapply(df,is.factor),"\n")
> >>
> >> the output is:
> >>
> >>
> >> Month ( 1 ): TRUE
> >> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >>
> >>
> >> i.e., there is a column named "Month" (the 1st column), and it is
> indeed
> >> turned into a factor inside sapply(), but after that it is numerical
> >> again!
> >>
> >> what am I doing wrong?
>
> --
> Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5
> (Bordeaux)
> http://pmw.org.il http://ffii.org http://memri.org
> http://palestinefacts.org
> http://truepeace.org http://mideasttruth.com http://dhimmi.com
> If you're being passed on the right, you're in the wrong lane.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: cannot turn some columns in a data frame into factors

Sam Steingold-2
In reply to this post by Sam Steingold-2
Thanks to everyone who took time to respond, both here on the list and
via private e-mail (I do read the list on gmane, so there is not reason
to CC me).

it turned out that R passes _structured_ arguments by value.

the solution I use now is:

  df[factors] = lapply(df[factors],factor)

  if (!all(sort(names(df)[sapply(df,is.factor)]) == sort(factors)))
    stop(paste("bad factors:",sort(names(df)[sapply(df,is.factor)]),"!=",
               sort(factors)))

it is based on a private e-mail reply by Phil Spector.

> * Sam Steingold <[hidden email]> [2006-05-11 12:09:26 -0400]:
>
> I have a data frame df and a list of names of columns that I want to
> turn into factors:
>
>   df.names <- attr(df,"names")
>   sapply(factors, function (name) {
>     pos <- match(name,df.names)
>     if (is.na(pos)) stop(paste(name,": no such column\n"))
>     df[[pos]] <- factor(df[[pos]])
>     cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")
>   })
>   cat("factors:",sapply(df,is.factor),"\n")
>
> the output is:
>
>
> Month ( 1 ): TRUE
> factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>
> i.e., there is a column named "Month" (the 1st column), and it is indeed
> turned into a factor inside sapply(), but after that it is numerical
> again!
>
> what am I doing wrong?

--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux)
http://camera.org http://iris.org.il http://dhimmi.com
http://memri.org http://ffii.org http://jihadwatch.org http://pmw.org.il
PI seconds is a nanocentury

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html