help for reshape function

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

help for reshape function

xin wei
hi, everyone:

i have a question on the reshape function. i have the following dataset :
gene   tissue        patient1 patient2 patient3.............
_________________________________________________
gene1   breast         10       20       50
gene2   breast         20       40       60
gene3   breast         100      200      300

which i hope to convert to the following format:

gene patientID value  gene1
-----------------------------
gene1   1       10    10
gene1   2       20    20
gene1   3       50    100
gene2   1       20    10
gene2   2       40    20
gene2   3       60    100

the column "gene" is required and column "tissue" is not needed. I use the following syntax to perform this task:

tdata<- reshape(data, varying=names(data)[-c(1,2)],direction="long", timevar
="label",v.names="value",time=names(data)[-c(1,2)]);

however, i lose the column "gene" in the resulting tranposed dataset. I did my best to go through the help doc for reshape. however, I am frustrated that the examples used in the help doc is kind o hard to follow to me. Can anyone help me modify the code to keep the column "gene" in the resulting table?

Any constructive suggestion is welcome.
thanks

Reply | Threaded
Open this post in threaded view
|

Re: help for reshape function

Henrique Dallazuanna
Try this:

 reshape(x, direction = 'long', varying = list(3:5), timevar = "gene",
v.names = "value")

On Thu, Jun 17, 2010 at 3:22 PM, xin wei <[hidden email]> wrote:

>
> hi, everyone:
>
> i have a question on the reshape function. i have the following dataset :
> gene   tissue        patient1 patient2 patient3.............
> _________________________________________________
> gene1   breast         10       20       50
> gene2   breast         20       40       60
> gene3   breast         100      200      300
>
> which i hope to convert to the following format:
>
> gene patientID value  gene1
> -----------------------------
> gene1   1       10    10
> gene1   2       20    20
> gene1   3       50    100
> gene2   1       20    10
> gene2   2       40    20
> gene2   3       60    100
>
> the column "gene" is required and column "tissue" is not needed. I use the
> following syntax to perform this task:
>
> tdata<- reshape(data, varying=names(data)[-c(1,2)],direction="long",
> timevar
> ="label",v.names="value",time=names(data)[-c(1,2)]);
>
> however, i lose the column "gene" in the resulting tranposed dataset. I did
> my best to go through the help doc for reshape. however, I am frustrated
> that the examples used in the help doc is kind o hard to follow to me. Can
> anyone help me modify the code to keep the column "gene" in the resulting
> table?
>
> Any constructive suggestion is welcome.
> thanks
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/help-for-reshape-function-tp2259286p2259286.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help for reshape function

xin wei
I am afraid that your solution is not solving the problem. it seems that timevar="gene" just create the followings:

GENE        SAMPLE   value id
1.1        1 Kidney 3.69351  1
2.1        1 Kidney 5.42710  2
3.1        1 Kidney 5.26883  3
4.1        1 Kidney 2.88098  4
5.1        1 Kidney 4.68519  5
6.1        1 Kidney 5.92774  6 ]

here the "gene" is just a empty column. I also lost the column that is supposed to store the header name of transposed variables in my target table.

more suggests?

thanks
Reply | Threaded
Open this post in threaded view
|

Re: help for reshape function

Joshua Wiley-2
Hello,

Try this, it is based off of your sample wide format data.  I am not
quite sure how you got the 'gene1' column in your desired output data,
it looks like it is just the data from patient1, but since I was not
sure, I did not include it.

##################################
temp <- data.frame(gene=c("gene1","gene2","gene3"),
tissue=c("breast","breast","breast"),
                   patient1=c(10,20,100), patient2=c(20,40,200),
patient3=c(50,60,300))

temp.long <- reshape(data=temp, direction="long",
                     idvar="gene", ids=gene,
                     timevar="patientID", times=c(1, 2, 3),
                     v.names="value",
varying=list(patients=c("patient1","patient2","patient3")))

temp.long <- temp.long[order(temp.long$gene),]
rownames(temp.long) <- NULL

temp.long
##################################

Just as a note, the argument names in reshape() are designed for the
repeated measures to be over time, so they may seem a bit odd in your
example.

The id variable (what identifies multiple records from the same gene)
is "gene".  So, the argument idvar="gene" sets the name for the new
variable in the long format, and ids=gene specifies that the actual
values should come from the 'gene' variable from the wide format.

Next, the patients' numbers distinguish multiple records for the same
gene.  So, the variables name is timevar="patientID" and and the
actual values are times=c(1, 2, 3).  Obviously if you have more
patients, you would include a number for each one.  If their numbers
are sequential, you could leave it blank, or just use 1:n where n is
the last patient's number.

Finally we can specify what variables are time-varying (or repeated
for each gene in your case).  v.names="value" is the name for the new
variable, and varying=list(patients=c("patient1","patient2","patient3"))
specifies the names of the columns from the wide format data.  Note
that since it is a list, if you had multiple variables that varied,
you just create additional elements.

Now you have the long format data.  Then I reorder it by gene rather
than patientID and reset the row names to their default.

HTH,

Josh


On Thu, Jun 17, 2010 at 9:47 PM, xin wei <[hidden email]> wrote:

>
> I am afraid that your solution is not solving the problem. it seems that
> timevar="gene" just create the followings:
>
> GENE        SAMPLE   value id
> 1.1        1 Kidney 3.69351  1
> 2.1        1 Kidney 5.42710  2
> 3.1        1 Kidney 5.26883  3
> 4.1        1 Kidney 2.88098  4
> 5.1        1 Kidney 4.68519  5
> 6.1        1 Kidney 5.92774  6 ]
>
> here the "gene" is just a empty column. I also lost the column that is
> supposed to store the header name of transposed variables in my target
> table.
>
> more suggests?
>
> thanks
> --
> View this message in context: http://r.789695.n4.nabble.com/help-for-reshape-function-tp2259286p2259706.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua Wiley
Ph.D. Student
Health Psychology
University of California, Los Angeles

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help for reshape function

Andrie de Vries
Xin Wei

I have sympathy with your difficulties in understanding the reshape() function.

May I recommend using the melt() and cast() functions instead, available in the reshape package.  You can find information, help and examples here:

http://had.co.nz/reshape/

This simplifies the coding of your problem dramatically:

---

library(reshape)
df <- data.frame(gene=c("gene1", "gene2", "gene3"),
                 patient1=c(10,20,100),
                 patient2=c(20,40,200),
                 patient3=c(50,60,300))
                 
melt(df, id.vars="gene")

---

Andrie