

Dear all,
I have a data frame with character values where each character is a
level; however, not all columns of the data frame have the same
characters thus, when generating the data frame with stringsAsFactors
= TRUE, the levels are different for each column.
Is there a way to provide a single vector of levels and assign the
characters so that they match such vector?
Is there a way to do that not only when setting the data frame but
also when reading data from a file with read.table()?
For instance, I have:
column_1 = c("A", "B", "C", "D", "E")
column_2 = c("B", "B", "C", "E", "E")
column_3 = c("C", "C", "D", "D", "C")
my.data < data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
> str(my.data)
'data.frame': 5 obs. of 3 variables:
$ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
$ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
Thank you

Best regards,
Luigi
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
> Dear all,
> I have a data frame with character values where each character is a
> level; however, not all columns of the data frame have the same
> characters thus, when generating the data frame with stringsAsFactors
> = TRUE, the levels are different for each column.
> Is there a way to provide a single vector of levels and assign the
> characters so that they match such vector?
> Is there a way to do that not only when setting the data frame but
> also when reading data from a file with read.table()?
>
> For instance, I have:
> column_1 = c("A", "B", "C", "D", "E")
> column_2 = c("B", "B", "C", "E", "E")
> column_3 = c("C", "C", "D", "D", "C")
> my.data < data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
>> str(my.data)
> 'data.frame': 5 obs. of 3 variables:
> $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
> $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
>
> Thank you
>
I don't think read.table() can do it for you automatically. To do it
yourself, you need to get a vector of the levels. If you know this,
just assign it to a variable; if you don't know it, compute it as
thelevels < unique(unlist(lapply(my.data, levels)))
Then set the levels of each column to thelevels:
my.data.new < as.data.frame(lapply(my.data, function(x) {levels(x)
< thelevels; x}))
Duncan Murdoch
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thank you,
that worked fine for me.
Best wishes of merry Christmas and happy new year,
Luigi
On Wed, Dec 19, 2018 at 12:19 PM Duncan Murdoch
< [hidden email]> wrote:
>
> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
> > Dear all,
> > I have a data frame with character values where each character is a
> > level; however, not all columns of the data frame have the same
> > characters thus, when generating the data frame with stringsAsFactors
> > = TRUE, the levels are different for each column.
> > Is there a way to provide a single vector of levels and assign the
> > characters so that they match such vector?
> > Is there a way to do that not only when setting the data frame but
> > also when reading data from a file with read.table()?
> >
> > For instance, I have:
> > column_1 = c("A", "B", "C", "D", "E")
> > column_2 = c("B", "B", "C", "E", "E")
> > column_3 = c("C", "C", "D", "D", "C")
> > my.data < data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
> >> str(my.data)
> > 'data.frame': 5 obs. of 3 variables:
> > $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> > $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
> > $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
> >
> > Thank you
> >
>
> I don't think read.table() can do it for you automatically. To do it
> yourself, you need to get a vector of the levels. If you know this,
> just assign it to a variable; if you don't know it, compute it as
>
> thelevels < unique(unlist(lapply(my.data, levels)))
>
> Then set the levels of each column to thelevels:
>
> my.data.new < as.data.frame(lapply(my.data, function(x) {levels(x)
> < thelevels; x}))
>
> Duncan Murdoch

Best regards,
Luigi
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 19/12/2018 6:48 AM, Luigi Marongiu wrote:
> Thank you,
> that worked fine for me.
> Best wishes of merry Christmas and happy new year,
> Luigi
>
Actually it's wrong! Sorry about that.
If you look at my.data.new$column_2, you'll see that the levels have
changed:
> my.data
column_1 column_2 column_3
1 A B A
2 B B A
3 C C B
4 D E B
5 E E A
> my.data.new
column_1 column_2 column_3
1 A A A
2 B A A
3 C B B
4 D C B
5 E C A
What you want is this instead:
my.data.new < as.data.frame(lapply(my.data, function(x) {factor(x,
levels = thelevels)}))
The last example in the ?levels help page does this too. I wonder if
that is intentional?
levels> ## we can add levels this way:
levels> f < factor(c("a","b"))
levels> levels(f) < c("c", "a", "b")
levels> f
[1] c a
Levels: c a b
levels> f < factor(c("a","b"))
levels> levels(f) < list(C = "C", A = "a", B = "b")
levels> f
[1] A B
Levels: C A B
Duncan Murdoch
> On Wed, Dec 19, 2018 at 12:19 PM Duncan Murdoch
> < [hidden email]> wrote:
>>
>> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
>>> Dear all,
>>> I have a data frame with character values where each character is a
>>> level; however, not all columns of the data frame have the same
>>> characters thus, when generating the data frame with stringsAsFactors
>>> = TRUE, the levels are different for each column.
>>> Is there a way to provide a single vector of levels and assign the
>>> characters so that they match such vector?
>>> Is there a way to do that not only when setting the data frame but
>>> also when reading data from a file with read.table()?
>>>
>>> For instance, I have:
>>> column_1 = c("A", "B", "C", "D", "E")
>>> column_2 = c("B", "B", "C", "E", "E")
>>> column_3 = c("C", "C", "D", "D", "C")
>>> my.data < data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
>>>> str(my.data)
>>> 'data.frame': 5 obs. of 3 variables:
>>> $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
>>> $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
>>> $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
>>>
>>> Thank you
>>>
>>
>> I don't think read.table() can do it for you automatically. To do it
>> yourself, you need to get a vector of the levels. If you know this,
>> just assign it to a variable; if you don't know it, compute it as
>>
>> thelevels < unique(unlist(lapply(my.data, levels)))
>>
>> Then set the levels of each column to thelevels:
>>
>> my.data.new < as.data.frame(lapply(my.data, function(x) {levels(x)
>> < thelevels; x}))
>>
>> Duncan Murdoch
>
>
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


You can abuse the S4 class system to do this.
setClass("Size") # no representation, no prototype
setAs(from="character", to="Size", # nothing but a coercion method
function(from){
ret < factor(from, levels=c("Small","Medium","Large"), ordered=TRUE)
class(ret) < c("Size", class(ret))
ret
})
z < read.table(colClasses=c("integer", "Size"), text="7 Medium\n5 Large\n3
Large")
dput(z)
#structure(list(V1 = c(7L, 5L, 3L), V2 = structure(c(2L, 3L, 3L
#), .Label = c("Small", "Medium", "Large"), class = c("Size",
#"ordered", "factor"))), class = "data.frame", row.names = c(NA,
#3L))
I wonder if this behavior is intended or if there is a more sanctioned way
to get read.table(colClasses=...) to make a factor with a specified set of
levels.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Dec 19, 2018 at 3:19 AM Duncan Murdoch < [hidden email]>
wrote:
> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
> > Dear all,
> > I have a data frame with character values where each character is a
> > level; however, not all columns of the data frame have the same
> > characters thus, when generating the data frame with stringsAsFactors
> > = TRUE, the levels are different for each column.
> > Is there a way to provide a single vector of levels and assign the
> > characters so that they match such vector?
> > Is there a way to do that not only when setting the data frame but
> > also when reading data from a file with read.table()?
> >
> > For instance, I have:
> > column_1 = c("A", "B", "C", "D", "E")
> > column_2 = c("B", "B", "C", "E", "E")
> > column_3 = c("C", "C", "D", "D", "C")
> > my.data < data.frame(column_1, column_2, column_3, stringsAsFactors =
> TRUE)
> >> str(my.data)
> > 'data.frame': 5 obs. of 3 variables:
> > $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> > $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
> > $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
> >
> > Thank you
> >
>
> I don't think read.table() can do it for you automatically. To do it
> yourself, you need to get a vector of the levels. If you know this,
> just assign it to a variable; if you don't know it, compute it as
>
> thelevels < unique(unlist(lapply(my.data, levels)))
>
> Then set the levels of each column to thelevels:
>
> my.data.new < as.data.frame(lapply(my.data, function(x) {levels(x)
> < thelevels; x}))
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

