unused factor levels in reshape

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

unused factor levels in reshape

Daniel Farewell
When reshaping a dataframe in which there are unused factor levels in the id variable, I get the following error:

Error in if (!all(really.constant)) warning(gettextf("some constant variables (%s) are really varying",  :
        missing value where TRUE/FALSE needed

For example,

> df <- data.frame(i = factor(rep(1:5, each = 2)), t = factor(rep(1:2, 5)), x = rep(rbinom(5, 1, 0.5), each = 2), y = rpois(10, 10))

> subdf <- subset(df, i %in% 1:3)

defines a dataframe, and a subframe with some unused factor levels (i = 4, 5). Then

> reshape(df, v.names = "y", timevar = "t", idvar = "i", direction = "wide")
  i x y.1 y.2
1 1 0  13   6
3 2 0  12   5
5 3 0  10   9
7 4 1   9  11
9 5 1  12   8

works fine but

> reshape(subdf, v.names = "y", timevar = "t", idvar = "i", direction = "wide")
Error in if (!all(really.constant)) warning(gettextf("some constant variables (%s) are really varying",  :
        missing value where TRUE/FALSE needed

produces the error, which happens during the check to see if the variables assumed constant are constant. The problem is that reshape searches over all the levels of the id variable (i in this case) to see if the other variables (here x) are constant. But there is no x associated with i = 4, 5 in the smaller dataframe, so

> tapply(subdf$x, subdf$i, function(x) length(unique(x)) == 1)
   1    2    3    4    5
TRUE TRUE TRUE   NA   NA

produces some NAs. A slight change to the reshape code to work around this problem would be to use (the equivalent of)

> tapply(subdf$x, subdf$i[, drop = TRUE], function(x) length(unique(x)) == 1)
   1    2    3
TRUE TRUE TRUE

in the reshapeWide function within reshape, but perhaps there is a good reason not to do this?

Daniel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: unused factor levels in reshape

Thomas Lumley
On Wed, 22 Feb 2006, Daniel Farewell wrote:

> When reshaping a dataframe in which there are unused factor levels in
> the id variable, I get the following error:
>
> Error in if (!all(really.constant)) warning(gettextf("some constant variables (%s) are really varying",  :
>        missing value where TRUE/FALSE needed

Yes, I think it's a bug. Thanks for pointing it out.

  -thomas

>
> For example,
>
>> df <- data.frame(i = factor(rep(1:5, each = 2)), t = factor(rep(1:2, 5)), x = rep(rbinom(5, 1, 0.5), each = 2), y = rpois(10, 10))
>
>> subdf <- subset(df, i %in% 1:3)
>
> defines a dataframe, and a subframe with some unused factor levels (i = 4, 5). Then
>
>> reshape(df, v.names = "y", timevar = "t", idvar = "i", direction = "wide")
>  i x y.1 y.2
> 1 1 0  13   6
> 3 2 0  12   5
> 5 3 0  10   9
> 7 4 1   9  11
> 9 5 1  12   8
>
> works fine but
>
>> reshape(subdf, v.names = "y", timevar = "t", idvar = "i", direction = "wide")
> Error in if (!all(really.constant)) warning(gettextf("some constant variables (%s) are really varying",  :
>        missing value where TRUE/FALSE needed
>
> produces the error, which happens during the check to see if the variables assumed constant are constant. The problem is that reshape searches over all the levels of the id variable (i in this case) to see if the other variables (here x) are constant. But there is no x associated with i = 4, 5 in the smaller dataframe, so
>
>> tapply(subdf$x, subdf$i, function(x) length(unique(x)) == 1)
>   1    2    3    4    5
> TRUE TRUE TRUE   NA   NA
>
> produces some NAs. A slight change to the reshape code to work around this problem would be to use (the equivalent of)
>
>> tapply(subdf$x, subdf$i[, drop = TRUE], function(x) length(unique(x)) == 1)
>   1    2    3
> TRUE TRUE TRUE
>
> in the reshapeWide function within reshape, but perhaps there is a good reason not to do this?
>
> Daniel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html