Dear list,
I'm trying to figure out how to use the reshape package to reshape data from a "wide" format to a "long" format. I have data like this pid <- c(1:10) predA <- c(-1,-2,-1,-2,-1,-2,-1,-2,-1,-2) predB.1 <- c(0,0,0,1,1,0,0,0,1,1) predB.2 <- c(2,2,3,3,3,2,2,3,3,3) predC.1 <- c(10,10,10,10,10,11,11,11,11,11) predC.2 <- c(12,12,13,13,13,12,12,13,13,13) out.1 <- c(100:109) out.2 <- c(200:209) Data <- data.frame(pid, predA, predB.1, predB.2, predC.1, predC.2, out. 1, out.2) and I want to make it look like this: head(L.Data <- reshape(Data, varying = list(3:4, 5:6, 7:8), idvar="pid", v.names=c("PredA", "PredB", "Out"), timevar="measure.num", times=c(1,2), direction="long")) pid predA measure.num PredA PredB Out 1.1 1 -1 1 0 10 100 2.1 2 -2 1 0 10 101 3.1 3 -1 1 0 10 102 4.1 4 -2 1 1 10 103 5.1 5 -1 1 1 10 104 6.1 6 -2 1 0 11 105 Using Hadley's JSS article "Reshaping Data with the reshape Package" as a guide, I tried the following: M.Data <- melt(Data, id="pid") M.Data2 <- cbind(M.Data, colsplit(M.Data$variable, split = ".", names = c("treatment", "time"))) but this gave a warning and resulted in head(M.Data2) pid variable value treatment time NA. NA..1 NA..2 NA..3 NA..4 1 1 predA -1 NA NA NA NA NA NA NA 2 2 predA -2 NA NA NA NA NA NA NA 3 3 predA -1 NA NA NA NA NA NA NA 4 4 predA -2 NA NA NA NA NA NA NA 5 5 predA -1 NA NA NA NA NA NA NA 6 6 predA -2 NA NA NA NA NA NA NA I searched the mailing list and found this post: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/11857.html which led me to try M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names = c("treatment", "time"))) which gave: head(M.Data2) pid variable value treatment time 1 1 predA -1 predA predA 2 2 predA -2 predA predA 3 3 predA -1 predA predA 4 4 predA -2 predA predA 5 5 predA -1 predA predA 6 6 predA -2 predA predA Closer but no cigar. I would be grateful if someone will tell me (a) how to reshape the data as described above using the reshape package, (b) what difference between split = "." and split = "\\." is, and (c) if more information about the colsplit command is available anywhere. Thank you very much in advance, Ista ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
> M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names
> = c("treatment", "time"))) > > which gave: > > head(M.Data2) > pid variable value treatment time > 1 1 predA -1 predA predA > 2 2 predA -2 predA predA > 3 3 predA -1 predA predA > 4 4 predA -2 predA predA > 5 5 predA -1 predA predA > 6 6 predA -2 predA predA > > Closer but no cigar. Have a look at the whole thing - it's getting it right most of the time. Going back to the original variable names, I see that "PredA" does not have a time associated with it. What do you expect the time to be? > I would be grateful if someone will tell me (a) how to reshape the data as > described above using the reshape package, (b) what difference between split > = "." and split = "\\." is, The splitting argument is a regular expression, and in regular expression speak "." means to match any one character. "\\." escapes the full stop, so it only matches full stops. > and (c) if more information about the colsplit > command is available anywhere. Probably the best way is just to look at the code (it's pretty simple): > colsplit.character function (x, split = "", names) { vars <- as.data.frame(do.call(rbind, strsplit(x, split))) names(vars) <- names as.data.frame(lapply(vars, function(x) type.convert(as.character(x)))) } If strsplit doesn't do what you want, you might need to write your own function following those lines. Hadley -- http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Thanks Hadley, with your help I'm getting things figured out.
On Jun 13, 2008, at 2:09 PM, hadley wickham wrote: >> M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\ >> \.", names >> = c("treatment", "time"))) >> >> which gave: >> >> head(M.Data2) >> pid variable value treatment time >> 1 1 predA -1 predA predA >> 2 2 predA -2 predA predA >> 3 3 predA -1 predA predA >> 4 4 predA -2 predA predA >> 5 5 predA -1 predA predA >> 6 6 predA -2 predA predA >> >> Closer but no cigar. > > Have a look at the whole thing - it's getting it right most of the > time. Going back to the original variable names, I see that "PredA" > does not have a time associated with it. What do you expect the time > to be? again, treating it as an id: M.Data <- melt(Data, id = c("pid", "predA")) From here I was able to achieve the desired result, as follows: M.Data <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names=c("measure", "time"))) M.Data$variable <- M.Data$measure M.Data <- M.Data[-5] L.Data <- cast(M.Data, ... ~ variable) This is perhaps a bit inelegant but it works! I'm interested in knowing if there is a better way to do it, but I'm happy that I've at least figured out this much. As always I'm humbled by the generosity of people who not only make their software available but also take the time to answer questions on this list. Thank you! -Ista > > >> I would be grateful if someone will tell me (a) how to reshape the >> data as >> described above using the reshape package, (b) what difference >> between split >> = "." and split = "\\." is, > > The splitting argument is a regular expression, and in regular > expression speak "." means to match any one character. "\\." escapes > the full stop, so it only matches full stops. > >> and (c) if more information about the colsplit >> command is available anywhere. > > Probably the best way is just to look at the code (it's pretty > simple): > >> colsplit.character > function (x, split = "", names) > { > vars <- as.data.frame(do.call(rbind, strsplit(x, split))) > names(vars) <- names > as.data.frame(lapply(vars, function(x) > type.convert(as.character(x)))) > } > > If strsplit doesn't do what you want, you might need to write your own > function following those lines. > > Hadley > > -- > http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
> Right, there is no time associated with this variable. So I tried again,
> treating it as an id: > > M.Data <- melt(Data, id = c("pid", "predA")) > > From here I was able to achieve the desired result, as follows: > > M.Data <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", > names=c("measure", "time"))) > M.Data$variable <- M.Data$measure > M.Data <- M.Data[-5] > L.Data <- cast(M.Data, ... ~ variable) > > This is perhaps a bit inelegant but it works! I'm interested in knowing if > there is a better way to do it, but I'm happy that I've at least figured out > this much. As always I'm humbled by the generosity of people who not only > make their software available but also take the time to answer questions on > this list. Thank you! You're welcome. And don't worry too much about data cleaning routines being elegant - it's very very hard to write elegant code to clean up something that's not at all elegant. Hadley -- http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Powered by Nabble | Edit this page |