Quantcast

melt error that I don't understand.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

melt error that I don't understand.

nutterb
I'm stumped.  I have a dataset I want to melt to create a temporal sequence of events for each subject, but in each row, I would like to retain the baseline characteristics.

D <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
                    AGE = structure(c(68L, 63L, 55L, 64L, 60L, 78L, 60L, 62L, 60L, 75L),
                                    label = "Age", class = "labelled"),
                    BMI = structure(c(25L, 27L, 27L, 28L, 32L, NA, 36L, 27L, 31L, 25L),
                                    label = "BMI (kg/m2)", class = "labelled"),
                    EventDays = structure(c(722L, 738L, 707L, 751L, 735L, 728L, 731L, 717L, 728L, 735L),
                                          label = "Time to first ACM/censor (days)", class = "labelled"),
                    ImplantDays = c(NA, NA, 575, NA, NA, NA, 490, 643, NA, NA)),
               .Names = c("ID", "AGE", "BMI", "EventDays", "InterventionDays"),
               row.names = c(NA, 10L),
               class = "data.frame")

melt(D, c("ID", "AGE", "BMI")) # produces the following error

Error in data.frame(ids, variable, value, stringsAsFactors = FALSE) :
  arguments imply differing number of rows: 10, 20


Now, I know AGE and BMI aren't exactly identifying variables, but my hope would be that, since ID uniquely identifies the subjects, I could use this as a short cut to getting the data set I want.  I can get the data I want if I go about it a little differently.

#* What I would like it to look like.
Timeline <- melt(D[, c("ID", "EventDays", "InterventionDays")], "ID", na.rm=TRUE)
Timeline <- arrange(Timeline, ID, value)
Timeline <- merge(D[, c("ID", "AGE", "BMI")],
                  Timeline,                  
                  by="ID", all.x=TRUE)


At first I thought it might be the mixture of character and numeric variables as IDs, but the following example works

A <- data.frame(id = LETTERS[1:10],
                age = c(50, NA, 51, 52, 53, 54, 55, 56, 57, 58),
                meas1 = rnorm(10),
                meas2 = rnorm(10, 5),
                stringsAsFactors=FALSE)
melt(A, c("id", "age"))


I'm sure I'm missing something really obvious (kind of like how I can stare at the dry goods aisle for 10 minutes and still not find the chocolate chips).  If anyone could help me understand why this error is occurring, I'd greatly appreciate it.  

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] lazyWeave_2.2.3  Hmisc_3.10-1     survival_2.36-14 plyr_1.7.1       reshape2_1.2.2  

loaded via a namespace (and not attached):
[1] cluster_1.14.3  grid_2.15.2     lattice_0.20-10 stringr_0.6.1   tools_2.15.2


  Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
  Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 445-1365



===================================


 Please consider the environment before printing this e-mail

Cleveland Clinic is ranked as one of the top hospitals in America by U.S.News & World Report (2013).  
Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations.


Confidentiality Note:  This message is intended for use ...{{dropped:18}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: melt error that I don't understand.

Ista Zahn
Hi Benjamin,

This looks like a bug, whereby melt fails when numeric id.vars have
attributes. Consider:

D <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
                    AGE = structure(c(68L, 63L, 55L, 64L, 60L, 78L,
60L, 62L, 60L, 75L),
                                    label = "Age", class = "labelled"),
                    BMI = structure(c(25L, 27L, 27L, 28L, 32L, NA,
36L, 27L, 31L, 25L),
                                    label = "BMI (kg/m2)", class = "labelled"),
                    EventDays = structure(c(722L, 738L, 707L, 751L,
735L, 728L, 731L, 717L, 728L, 735L),
                                          label = "Time to first
ACM/censor (days)", class = "labelled"),
                    ImplantDays = c(NA, NA, 575, NA, NA, NA, 490, 643, NA, NA)),
               .Names = c("ID", "AGE", "BMI", "EventDays", "InterventionDays"),
               row.names = c(NA, 10L),
               class = "data.frame")

melt(D, c("ID", "AGE", "BMI")) ## does not work

D <- as.data.frame(lapply(D, as.vector)) ## strip attributes
melt(D, c("ID", "AGE", "BMI")) ## works

attr(D$ID, "label") <- "ID number"  ## add attribute to factor
melt(D, c("ID", "AGE", "BMI")) ## works

attr(D$AGE, "label") <- "Age" ## add attribute to numeric variable
melt(D, c("ID", "AGE", "BMI")) ## does not work


I've reported the bug at https://github.com/hadley/reshape/issues/36

Best,
Ista

On Fri, Sep 6, 2013 at 11:59 AM, Nutter, Benjamin <[hidden email]> wrote:

> I'm stumped.  I have a dataset I want to melt to create a temporal sequence of events for each subject, but in each row, I would like to retain the baseline characteristics.
>
> D <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
>                     AGE = structure(c(68L, 63L, 55L, 64L, 60L, 78L, 60L, 62L, 60L, 75L),
>                                     label = "Age", class = "labelled"),
>                     BMI = structure(c(25L, 27L, 27L, 28L, 32L, NA, 36L, 27L, 31L, 25L),
>                                     label = "BMI (kg/m2)", class = "labelled"),
>                     EventDays = structure(c(722L, 738L, 707L, 751L, 735L, 728L, 731L, 717L, 728L, 735L),
>                                           label = "Time to first ACM/censor (days)", class = "labelled"),
>                     ImplantDays = c(NA, NA, 575, NA, NA, NA, 490, 643, NA, NA)),
>                .Names = c("ID", "AGE", "BMI", "EventDays", "InterventionDays"),
>                row.names = c(NA, 10L),
>                class = "data.frame")
>
> melt(D, c("ID", "AGE", "BMI")) # produces the following error
>
> Error in data.frame(ids, variable, value, stringsAsFactors = FALSE) :
>   arguments imply differing number of rows: 10, 20
>
>
> Now, I know AGE and BMI aren't exactly identifying variables, but my hope would be that, since ID uniquely identifies the subjects, I could use this as a short cut to getting the data set I want.  I can get the data I want if I go about it a little differently.
>
> #* What I would like it to look like.
> Timeline <- melt(D[, c("ID", "EventDays", "InterventionDays")], "ID", na.rm=TRUE)
> Timeline <- arrange(Timeline, ID, value)
> Timeline <- merge(D[, c("ID", "AGE", "BMI")],
>                   Timeline,
>                   by="ID", all.x=TRUE)
>
>
> At first I thought it might be the mixture of character and numeric variables as IDs, but the following example works
>
> A <- data.frame(id = LETTERS[1:10],
>                 age = c(50, NA, 51, 52, 53, 54, 55, 56, 57, 58),
>                 meas1 = rnorm(10),
>                 meas2 = rnorm(10, 5),
>                 stringsAsFactors=FALSE)
> melt(A, c("id", "age"))
>
>
> I'm sure I'm missing something really obvious (kind of like how I can stare at the dry goods aisle for 10 minutes and still not find the chocolate chips).  If anyone could help me understand why this error is occurring, I'd greatly appreciate it.
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] lazyWeave_2.2.3  Hmisc_3.10-1     survival_2.36-14 plyr_1.7.1       reshape2_1.2.2
>
> loaded via a namespace (and not attached):
> [1] cluster_1.14.3  grid_2.15.2     lattice_0.20-10 stringr_0.6.1   tools_2.15.2
>
>
>   Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
>   Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 445-1365
>
>
>
> ===================================
>
>
>  Please consider the environment before printing this e-mail
>
> Cleveland Clinic is ranked as one of the top hospitals in America by U.S.News & World Report (2013).
> Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations.
>
>
> Confidentiality Note:  This message is intended for use ...{{dropped:18}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...