

Dear all,
I have a data frame with character values where each character is a
level; however, not all columns of the data frame have the same
characters thus, when generating the data frame with stringsAsFactors
= TRUE, the levels are different for each column.
Is there a way to provide a single vector of levels and assign the
characters so that they match such vector?
Is there a way to do that not only when setting the data frame but
also when reading data from a file with read.table()?
For instance, I have:
column_1 = c("A", "B", "C", "D", "E")
column_2 = c("B", "B", "C", "E", "E")
column_3 = c("C", "C", "D", "D", "C")
my.data < data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
> str(my.data)
'data.frame': 5 obs. of 3 variables:
$ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
$ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
Thank you

Best regards,
Luigi
I don't think read.table() can do it for you automatically. To do it
yourself, you need to get a vector of the levels. If you know this,
just assign it to a variable; if you don't know it, compute it as
thelevels < unique(unlist(lapply(my.data, levels)))
Then set the levels of each column to thelevels:
my.data.new < as.data.frame(lapply(my.data, function(x) {levels(x)
< thelevels; x}))
Duncan Murdoch
Thank you,
that worked fine for me.
Best wishes of merry Christmas and happy new year,
Luigi
Best regards,
Luigi
Actually it's wrong! Sorry about that.
If you look at my.data.new$column_2, you'll see that the levels have
changed:
> my.data
column_1 column_2 column_3
1 A B A
2 B B A
3 C C B
4 D E B
5 E E A
> my.data.new
column_1 column_2 column_3
1 A A A
2 B A A
3 C B B
4 D C B
5 E C A
What you want is this instead:
my.data.new < as.data.frame(lapply(my.data, function(x) {factor(x,
levels = thelevels)}))
The last example in the ?levels help page does this too. I wonder if
that is intentional?
levels> ## we can add levels this way:
levels> f < factor(c("a","b"))
levels> levels(f) < c("c", "a", "b")
levels> f
[1] c a
Levels: c a b
levels> f < factor(c("a","b"))
levels> levels(f) < list(C = "C", A = "a", B = "b")
levels> f
[1] A B
Levels: C A B
Duncan Murdoch
You can abuse the S4 class system to do this.
setClass("Size") # no representation, no prototype
setAs(from="character", to="Size", # nothing but a coercion method
function(from){
ret < factor(from, levels=c("Small","Medium","Large"), ordered=TRUE)
class(ret) < c("Size", class(ret))
ret
})
z < read.table(colClasses=c("integer", "Size"), text="7 Medium\n5 Large\n3
Large")
dput(z)
#structure(list(V1 = c(7L, 5L, 3L), V2 = structure(c(2L, 3L, 3L
#), .Label = c("Small", "Medium", "Large"), class = c("Size",
#"ordered", "factor"))), class = "data.frame", row.names = c(NA,
#3L))
I wonder if this behavior is intended or if there is a more sanctioned way
to get read.table(colClasses=...) to make a factor with a specified set of
levels.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
