Reshaping data from wide to tall format for multilevel modeling

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Reshaping data from wide to tall format for multilevel modeling

dadrivr
Hi,

I'm trying to reshape my data set from wide to tall format for multilevel modeling.  Unfortunately, the function I typically use (make.univ from the multilevel package) does not appear to work with unbalanced data frames, which is what I'm dealing with.

Below is an example of the columns of a data frame similar to what I'm working with:
ID  a1  a2  a4  b2  b3  b4  b5  b6

Below is what I want the columns to be after reshaping the data to long format:
ID  a  b time

Here is an example data frame that I want to reshape:
ID <- c(1,2,3)
a1 <- c(NA, rnorm(2))
a2 <- c(NA, rnorm(1), NA)
a4 <- c(NA, rnorm(2))
b2 <- c(rnorm(2), NA)
b3 <- rnorm(3)
b4 <- NA
b5 <- rnorm(3)
b6 <- rnorm(3)
mydata <- as.data.frame(cbind(ID,a1,a2,a4,b2,b3,b4,b5,b6))

What is the best way to do this efficiently with MANY variables with widely differing time ranges?  Note that I will have to manually enter the time for a given measurement because in the wide format, the time is in the variable names.  By the way, I have a fairly large data set, with some variables occurring at 2 time points and other variables occurring at 20 time points.  Thanks for your help!
Reply | Threaded
Open this post in threaded view
|

Re: Reshaping data from wide to tall format for multilevel modeling

Jim Lemon
On 09/08/2011 12:02 AM, dadrivr wrote:

> Hi,
>
> I'm trying to reshape my data set from wide to tall format for multilevel
> modeling.  Unfortunately, the function I typically use (make.univ from the
> multilevel package) does not appear to work with unbalanced data frames,
> which is what I'm dealing with.
>
> Below is an example of the columns of a data frame similar to what I'm
> working with:
> ID  a1  a2  a4  b2  b3  b4  b5  b6
>
> Below is what I want the columns to be after reshaping the data to long
> format:
> ID  a  b time
>
> Here is an example data frame that I want to reshape:
> ID<- c(1,2,3)
> a1<- c(NA, rnorm(2))
> a2<- c(NA, rnorm(1), NA)
> a4<- c(NA, rnorm(2))
> b2<- c(rnorm(2), NA)
> b3<- rnorm(3)
> b4<- NA
> b5<- rnorm(3)
> b6<- rnorm(3)
> mydata<- as.data.frame(cbind(ID,a1,a2,a4,b2,b3,b4,b5,b6))
>
> What is the best way to do this efficiently with MANY variables with widely
> differing time ranges?  Note that I will have to manually enter the time for
> a given measurement because in the wide format, the time is in the variable
> names.  By the way, I have a fairly large data set, with some variables
> occurring at 2 time points and other variables occurring at 20 time points.
> Thanks for your help!
>
Hi dadrivr,
I think you can do what you want using the rep_n_stack function in the
prettyR package. If you want a data frame at the end, you will have to
pad out your input data frame so that the lengths of the columns will be
equal. You'll get lots of NAs, but without them, you won't get a data frame.

mydata$a3<-NA
mydata$a5<-NA
mydata$a6<-NA
mydata$b1<-NA
mydata

Now you have equal numbers of "a" and "b" columns. To reshape this into
three columns is easy:

rep_n_stack(mydata,to.stack=c("a1","a2","a3","a4","a5","a6",
  "b1","b2","b3","b4","b5","b6"),stack.names=c("ab","time"))

If you want the "a" and "b" columns separate, try this:

rep_n_stack(mydata,to.stack=matrix(c(2,3,10,4,11,12,13,5,6,7,8,9),nrow=2,
  byrow=TRUE),stack.names=c("a","time","b","time"))

Currently you have to pass the column indices directly to get the
correct order in the output. I hadn't anticipated the missing column
problem when I wrote the function.

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.