Quantcast

help with colsplit (reshape)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

help with colsplit (reshape)

Ista Zahn
Dear list,

I'm trying to figure out how to use the reshape package to reshape  
data from a "wide" format to a "long" format. I have data like this

pid <- c(1:10)
predA <- c(-1,-2,-1,-2,-1,-2,-1,-2,-1,-2)
predB.1 <- c(0,0,0,1,1,0,0,0,1,1)
predB.2 <- c(2,2,3,3,3,2,2,3,3,3)
predC.1 <- c(10,10,10,10,10,11,11,11,11,11)
predC.2 <- c(12,12,13,13,13,12,12,13,13,13)
out.1 <- c(100:109)
out.2 <- c(200:209)
Data <- data.frame(pid, predA, predB.1, predB.2, predC.1, predC.2, out.
1, out.2)

and I want to make it look like this:

head(L.Data <- reshape(Data, varying = list(3:4, 5:6, 7:8),  
idvar="pid", v.names=c("PredA", "PredB", "Out"),  
timevar="measure.num", times=c(1,2), direction="long"))
     pid predA measure.num PredA PredB Out
1.1   1    -1           1     0    10 100
2.1   2    -2           1     0    10 101
3.1   3    -1           1     0    10 102
4.1   4    -2           1     1    10 103
5.1   5    -1           1     1    10 104
6.1   6    -2           1     0    11 105

Using Hadley's JSS article "Reshaping Data with the reshape Package"  
as a guide, I tried the following:

M.Data <- melt(Data, id="pid")
M.Data2 <- cbind(M.Data, colsplit(M.Data$variable, split = ".", names  
= c("treatment", "time")))

but this gave a warning and resulted in

head(M.Data2)
   pid variable value treatment time NA. NA..1 NA..2 NA..3 NA..4
1   1    predA    -1        NA   NA  NA    NA    NA    NA    NA
2   2    predA    -2        NA   NA  NA    NA    NA    NA    NA
3   3    predA    -1        NA   NA  NA    NA    NA    NA    NA
4   4    predA    -2        NA   NA  NA    NA    NA    NA    NA
5   5    predA    -1        NA   NA  NA    NA    NA    NA    NA
6   6    predA    -2        NA   NA  NA    NA    NA    NA    NA

I searched the mailing list and found this post: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/11857.html 
  which led me to try

M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.",  
names = c("treatment", "time")))

which gave:

head(M.Data2)
   pid variable value treatment  time
1   1    predA    -1     predA predA
2   2    predA    -2     predA predA
3   3    predA    -1     predA predA
4   4    predA    -2     predA predA
5   5    predA    -1     predA predA
6   6    predA    -2     predA predA

Closer but no cigar.

I would be grateful if someone will tell me (a) how to reshape the  
data as described above using the reshape package, (b) what difference  
between split = "." and split = "\\." is, and (c) if more information  
about the colsplit command is available anywhere.

Thank you very much in advance,
Ista

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: help with colsplit (reshape)

hadley wickham
> M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names
> = c("treatment", "time")))
>
> which gave:
>
> head(M.Data2)
>  pid variable value treatment  time
> 1   1    predA    -1     predA predA
> 2   2    predA    -2     predA predA
> 3   3    predA    -1     predA predA
> 4   4    predA    -2     predA predA
> 5   5    predA    -1     predA predA
> 6   6    predA    -2     predA predA
>
> Closer but no cigar.

Have a look at the whole thing - it's getting it right most of the
time.  Going back to the original variable names, I see that "PredA"
does not have a time associated with it.  What do you expect the time
to be?

> I would be grateful if someone will tell me (a) how to reshape the data as
> described above using the reshape package, (b) what difference between split
> = "." and split = "\\." is,

The splitting argument is a regular expression, and in regular
expression speak "." means to match any one character.  "\\." escapes
the full stop, so it only matches full stops.

> and (c) if more information about the colsplit
> command is available anywhere.

Probably the best way is just to look at the code (it's pretty simple):

> colsplit.character
function (x, split = "", names)
{
    vars <- as.data.frame(do.call(rbind, strsplit(x, split)))
    names(vars) <- names
    as.data.frame(lapply(vars, function(x) type.convert(as.character(x))))
}

If strsplit doesn't do what you want, you might need to write your own
function following those lines.

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: help with colsplit (reshape)

Ista Zahn
Thanks Hadley, with your help I'm getting things figured out.
On Jun 13, 2008, at 2:09 PM, hadley wickham wrote:

>> M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\
>> \.", names
>> = c("treatment", "time")))
>>
>> which gave:
>>
>> head(M.Data2)
>> pid variable value treatment  time
>> 1   1    predA    -1     predA predA
>> 2   2    predA    -2     predA predA
>> 3   3    predA    -1     predA predA
>> 4   4    predA    -2     predA predA
>> 5   5    predA    -1     predA predA
>> 6   6    predA    -2     predA predA
>>
>> Closer but no cigar.
>
> Have a look at the whole thing - it's getting it right most of the
> time.  Going back to the original variable names, I see that "PredA"
> does not have a time associated with it.  What do you expect the time
> to be?
Right, there is no time associated with this variable. So I tried  
again, treating it as an id:

M.Data <- melt(Data, id = c("pid", "predA"))

 From here I was able to achieve the desired result, as follows:

M.Data <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.",  
names=c("measure", "time")))
M.Data$variable <- M.Data$measure
M.Data <- M.Data[-5]
L.Data <- cast(M.Data, ... ~ variable)

This is perhaps a bit inelegant but it works! I'm interested in  
knowing if there is a better way to do it, but I'm happy that I've at  
least figured out this much. As always I'm humbled by the generosity  
of people who not only make their software available but also take the  
time to answer questions on this list. Thank you!

-Ista

>
>
>> I would be grateful if someone will tell me (a) how to reshape the  
>> data as
>> described above using the reshape package, (b) what difference  
>> between split
>> = "." and split = "\\." is,
>
> The splitting argument is a regular expression, and in regular
> expression speak "." means to match any one character.  "\\." escapes
> the full stop, so it only matches full stops.
>
>> and (c) if more information about the colsplit
>> command is available anywhere.
>
> Probably the best way is just to look at the code (it's pretty  
> simple):
>
>> colsplit.character
> function (x, split = "", names)
> {
>   vars <- as.data.frame(do.call(rbind, strsplit(x, split)))
>   names(vars) <- names
>   as.data.frame(lapply(vars, function(x)  
> type.convert(as.character(x))))
> }
>
> If strsplit doesn't do what you want, you might need to write your own
> function following those lines.
>
> Hadley
>
> --
> http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: help with colsplit (reshape)

hadley wickham
> Right, there is no time associated with this variable. So I tried again,
> treating it as an id:
>
> M.Data <- melt(Data, id = c("pid", "predA"))
>
> From here I was able to achieve the desired result, as follows:
>
> M.Data <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.",
> names=c("measure", "time")))
> M.Data$variable <- M.Data$measure
> M.Data <- M.Data[-5]
> L.Data <- cast(M.Data, ... ~ variable)
>
> This is perhaps a bit inelegant but it works! I'm interested in knowing if
> there is a better way to do it, but I'm happy that I've at least figured out
> this much. As always I'm humbled by the generosity of people who not only
> make their software available but also take the time to answer questions on
> this list. Thank you!

You're welcome.  And don't worry too much about data cleaning routines
being elegant - it's very very hard to write elegant code to clean up
something that's not at all elegant.

Hadley


--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...