convert columns of dataframe to same factor levels

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

convert columns of dataframe to same factor levels

Luigi
Dear all,
I have a data frame with character values where each character is a
level; however, not all columns of the data frame have the same
characters thus, when generating the data frame with stringsAsFactors
= TRUE, the levels are different for each column.
Is there a way to provide a single vector of levels and assign the
characters so that they match such vector?
Is there a way to do that not only when setting the data frame but
also when reading data from a file with read.table()?

For instance, I have:
column_1 = c("A", "B", "C", "D", "E")
column_2 = c("B", "B", "C", "E", "E")
column_3 = c("C", "C", "D", "D", "C")
my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
> str(my.data)
'data.frame': 5 obs. of  3 variables:
 $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
 $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
 $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1

Thank you
--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: convert columns of dataframe to same factor levels

Duncan Murdoch-2
On 19/12/2018 5:58 AM, Luigi Marongiu wrote:

> Dear all,
> I have a data frame with character values where each character is a
> level; however, not all columns of the data frame have the same
> characters thus, when generating the data frame with stringsAsFactors
> = TRUE, the levels are different for each column.
> Is there a way to provide a single vector of levels and assign the
> characters so that they match such vector?
> Is there a way to do that not only when setting the data frame but
> also when reading data from a file with read.table()?
>
> For instance, I have:
> column_1 = c("A", "B", "C", "D", "E")
> column_2 = c("B", "B", "C", "E", "E")
> column_3 = c("C", "C", "D", "D", "C")
> my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
>> str(my.data)
> 'data.frame': 5 obs. of  3 variables:
>   $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
>   $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
>   $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
>
> Thank you
>

I don't think read.table() can do it for you automatically.  To do it
yourself, you need to get a vector of the levels.  If you know this,
just assign it to a variable; if you don't know it, compute it as

   thelevels <- unique(unlist(lapply(my.data, levels)))

Then set the levels of each column to thelevels:

   my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x)
<- thelevels; x}))

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: convert columns of dataframe to same factor levels

Luigi
Thank you,
that worked fine for me.
Best wishes of merry Christmas and happy new year,
Luigi

On Wed, Dec 19, 2018 at 12:19 PM Duncan Murdoch
<[hidden email]> wrote:

>
> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
> > Dear all,
> > I have a data frame with character values where each character is a
> > level; however, not all columns of the data frame have the same
> > characters thus, when generating the data frame with stringsAsFactors
> > = TRUE, the levels are different for each column.
> > Is there a way to provide a single vector of levels and assign the
> > characters so that they match such vector?
> > Is there a way to do that not only when setting the data frame but
> > also when reading data from a file with read.table()?
> >
> > For instance, I have:
> > column_1 = c("A", "B", "C", "D", "E")
> > column_2 = c("B", "B", "C", "E", "E")
> > column_3 = c("C", "C", "D", "D", "C")
> > my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
> >> str(my.data)
> > 'data.frame': 5 obs. of  3 variables:
> >   $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> >   $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
> >   $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
> >
> > Thank you
> >
>
> I don't think read.table() can do it for you automatically.  To do it
> yourself, you need to get a vector of the levels.  If you know this,
> just assign it to a variable; if you don't know it, compute it as
>
>    thelevels <- unique(unlist(lapply(my.data, levels)))
>
> Then set the levels of each column to thelevels:
>
>    my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x)
> <- thelevels; x}))
>
> Duncan Murdoch



--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: convert columns of dataframe to same factor levels

Duncan Murdoch-2
On 19/12/2018 6:48 AM, Luigi Marongiu wrote:
> Thank you,
> that worked fine for me.
> Best wishes of merry Christmas and happy new year,
> Luigi
>

Actually it's wrong!  Sorry about that.

If you look at my.data.new$column_2, you'll see that the levels have
changed:

 > my.data
   column_1 column_2 column_3
1        A        B        A
2        B        B        A
3        C        C        B
4        D        E        B
5        E        E        A


 > my.data.new
   column_1 column_2 column_3
1        A        A        A
2        B        A        A
3        C        B        B
4        D        C        B
5        E        C        A

What you want is this instead:

my.data.new <- as.data.frame(lapply(my.data, function(x) {factor(x,
levels = thelevels)}))

The last example in the ?levels help page does this too.  I wonder if
that is intentional?

levels> ## we can add levels this way:
levels> f <- factor(c("a","b"))

levels> levels(f) <- c("c", "a", "b")

levels> f
[1] c a
Levels: c a b

levels> f <- factor(c("a","b"))

levels> levels(f) <- list(C = "C", A = "a", B = "b")

levels> f
[1] A B
Levels: C A B

Duncan Murdoch

> On Wed, Dec 19, 2018 at 12:19 PM Duncan Murdoch
> <[hidden email]> wrote:
>>
>> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
>>> Dear all,
>>> I have a data frame with character values where each character is a
>>> level; however, not all columns of the data frame have the same
>>> characters thus, when generating the data frame with stringsAsFactors
>>> = TRUE, the levels are different for each column.
>>> Is there a way to provide a single vector of levels and assign the
>>> characters so that they match such vector?
>>> Is there a way to do that not only when setting the data frame but
>>> also when reading data from a file with read.table()?
>>>
>>> For instance, I have:
>>> column_1 = c("A", "B", "C", "D", "E")
>>> column_2 = c("B", "B", "C", "E", "E")
>>> column_3 = c("C", "C", "D", "D", "C")
>>> my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
>>>> str(my.data)
>>> 'data.frame': 5 obs. of  3 variables:
>>>    $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
>>>    $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
>>>    $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
>>>
>>> Thank you
>>>
>>
>> I don't think read.table() can do it for you automatically.  To do it
>> yourself, you need to get a vector of the levels.  If you know this,
>> just assign it to a variable; if you don't know it, compute it as
>>
>>     thelevels <- unique(unlist(lapply(my.data, levels)))
>>
>> Then set the levels of each column to thelevels:
>>
>>     my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x)
>> <- thelevels; x}))
>>
>> Duncan Murdoch
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: convert columns of dataframe to same factor levels

R help mailing list-2
In reply to this post by Duncan Murdoch-2
You can abuse the S4 class system to do this.

setClass("Size") # no representation, no prototype
setAs(from="character", to="Size", # nothing but a coercion method
  function(from){
    ret <- factor(from, levels=c("Small","Medium","Large"), ordered=TRUE)
    class(ret) <- c("Size", class(ret))
    ret
  })
z <- read.table(colClasses=c("integer", "Size"), text="7 Medium\n5 Large\n3
Large")
dput(z)
#structure(list(V1 = c(7L, 5L, 3L), V2 = structure(c(2L, 3L, 3L
#), .Label = c("Small", "Medium", "Large"), class = c("Size",
#"ordered", "factor"))), class = "data.frame", row.names = c(NA,
#-3L))

I wonder if this behavior is intended or if there is a more sanctioned way
to get read.table(colClasses=...) to make a factor with a specified set of
levels.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Dec 19, 2018 at 3:19 AM Duncan Murdoch <[hidden email]>
wrote:

> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
> > Dear all,
> > I have a data frame with character values where each character is a
> > level; however, not all columns of the data frame have the same
> > characters thus, when generating the data frame with stringsAsFactors
> > = TRUE, the levels are different for each column.
> > Is there a way to provide a single vector of levels and assign the
> > characters so that they match such vector?
> > Is there a way to do that not only when setting the data frame but
> > also when reading data from a file with read.table()?
> >
> > For instance, I have:
> > column_1 = c("A", "B", "C", "D", "E")
> > column_2 = c("B", "B", "C", "E", "E")
> > column_3 = c("C", "C", "D", "D", "C")
> > my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors =
> TRUE)
> >> str(my.data)
> > 'data.frame': 5 obs. of  3 variables:
> >   $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> >   $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
> >   $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
> >
> > Thank you
> >
>
> I don't think read.table() can do it for you automatically.  To do it
> yourself, you need to get a vector of the levels.  If you know this,
> just assign it to a variable; if you don't know it, compute it as
>
>    thelevels <- unique(unlist(lapply(my.data, levels)))
>
> Then set the levels of each column to thelevels:
>
>    my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x)
> <- thelevels; x}))
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.