Quantcast

Pairwise correlation

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Pairwise correlation

muzz56
Dear All,
I am not familiar with R yet I want to use it to perform some task, hence my posting here. I hope someone can help.
I have a set of data, genes (rows) and samples (columns). I want to do a Pearson correlation on all the possible pairwise combinations of all the genes (2000). Does anyone have an idea of how to execute this in R?

Thanks in advance.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

plangfelder
On Wed, Nov 16, 2011 at 8:37 AM, muzz56 <[hidden email]> wrote:
> Dear All,
> I am not familiar with R yet I want to use it to perform some task, hence my
> posting here. I hope someone can help.
> I have a set of data, genes (rows) and samples (columns). I want to do a
> Pearson correlation on all the possible pairwise combinations of all the
> genes (2000). Does anyone have an idea of how to execute this in R?

Put the expression data in a matrix called expression

Then simply execute

correlations = cor(t(expression))

If you have missing data, use

correlations = cor(t(expression), use = 'p')

HTH

Peter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

Michael Weylandt
In reply to this post by muzz56
?cor

X = matrix(rnorm(400),ncol = 4)
cor(X)

Michael

On Wed, Nov 16, 2011 at 11:37 AM, muzz56 <[hidden email]> wrote:

> Dear All,
> I am not familiar with R yet I want to use it to perform some task, hence my
> posting here. I hope someone can help.
> I have a set of data, genes (rows) and samples (columns). I want to do a
> Pearson correlation on all the possible pairwise combinations of all the
> genes (2000). Does anyone have an idea of how to execute this in R?
>
> Thanks in advance.
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4076963.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

muzz56
In reply to this post by plangfelder
Thanks Peter. I tried this after reading in the csv (read.csv) and converted the data to matrix (as.matrix). But when I tried the correlation, I keeping getting the error (x must be numeric) yet when I view the data, its numeric.

On 16 November 2011 14:32, plangfelder [via R] <[hidden email]> wrote:
On Wed, Nov 16, 2011 at 8:37 AM, muzz56 <[hidden email]> wrote:
> Dear All,
> I am not familiar with R yet I want to use it to perform some task, hence my
> posting here. I hope someone can help.
> I have a set of data, genes (rows) and samples (columns). I want to do a
> Pearson correlation on all the possible pairwise combinations of all the
> genes (2000). Does anyone have an idea of how to execute this in R?

Put the expression data in a matrix called expression

Then simply execute

correlations = cor(t(expression))

If you have missing data, use

correlations = cor(t(expression), use = 'p')

HTH

Peter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4077593.html
To unsubscribe from Pairwise correlation, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

Nordlund, Dan (DSHS/RDA)
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of muzz56
> Sent: Wednesday, November 16, 2011 12:28 PM
> To: [hidden email]
> Subject: Re: [R] Pairwise correlation
>
> Thanks Peter. I tried this after reading in the csv (read.csv) and
> converted the data to matrix (as.matrix). But when I tried the
> correlation,
> I keeping getting the error (x must be numeric) yet when I view the
> data,
> its numeric.
>

What does R tell you if you execute the following?

str(x)

Just because the data looks like it is numeric when it prints doesn't mean it is.


Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

muzz56
Thanks to everyone who replied to my post, I finally got it to work. I am however not sure how well it worked since it run so quickly, but seems like I have a 2000 x 2000 data set. My followup questions would be, how do I get only pairs with say a certain pearson correlation value additionally it seems like my output didn't retain the headers but instead replaced them with numbers making it hard to know which gene pairs correlate.

On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <[hidden email]> wrote:
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]
> project.org] On Behalf Of muzz56
> Sent: Wednesday, November 16, 2011 12:28 PM
> To: [hidden email]
> Subject: Re: [R] Pairwise correlation
>
> Thanks Peter. I tried this after reading in the csv (read.csv) and
> converted the data to matrix (as.matrix). But when I tried the
> correlation,
> I keeping getting the error (x must be numeric) yet when I view the
> data,
> its numeric.
>

What does R tell you if you execute the following?

str(x)

Just because the data looks like it is numeric when it prints doesn't mean it is.


Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
To unsubscribe from Pairwise correlation, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

Michael Weylandt
On Wed, Nov 16, 2011 at 11:22 PM, muzz56 <[hidden email]> wrote:
> Thanks to everyone who replied to my post, I finally got it to work. I am
> however not sure how well it worked since it run so quickly, but seems like
> I have a 2000 x 2000 data set.

Behold the great and mighty power that is R! Don't worry -- on a
decent machine the correlation of a 2k x 2k data set should be pretty
fast. (It's about 9 seconds on my old-ish laptop with a bunch of other
junk running)

>  My followup questions would be, how do I get
> only pairs with say a certain pearson correlation value additionally it
> seems like my output didn't retain the headers but instead replaced them
> with numbers making it hard to know which gene pairs correlate.

This is a little worrisome: R carries column names through cor() so
this would suggest you weren't using them. Were your headers listed as
part of your data (instead of being names)? If so, they would have
been taken as numbers.

Take a look at dimnames(NAMEOFDATA) -- if your headers aren't there,
then they are being treated as data instead of numbers. If they are,
can you provide some reproducible code and we can debug more fully.
The easiest way to send data is to use the dput() function to get a
copy-pasteable plain text representation. It would also be great if
you could restrict it to a subset of your data rather than the full 4M
data points, but if that's hard to do, don't worry.

You should have expected behavior like

X <- matrix(1:9,3)
colnames(X) <- c("A","B","C")
cor(X) # Prints with labels

Michael

>
> On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <
> [hidden email]> wrote:
>
>> > -----Original Message-----
>> > From: [hidden email]<http://user/SendEmail.jtp?type=node&node=4078114&i=0>[mailto:
>> r-help-bounces@r-
>> > project.org] On Behalf Of muzz56
>> > Sent: Wednesday, November 16, 2011 12:28 PM
>> > To: [hidden email]<http://user/SendEmail.jtp?type=node&node=4078114&i=1>
>> > Subject: Re: [R] Pairwise correlation
>> >
>> > Thanks Peter. I tried this after reading in the csv (read.csv) and
>> > converted the data to matrix (as.matrix). But when I tried the
>> > correlation,
>> > I keeping getting the error (x must be numeric) yet when I view the
>> > data,
>> > its numeric.
>> >
>>
>> What does R tell you if you execute the following?
>>
>> str(x)
>>
>> Just because the data looks like it is numeric when it prints doesn't mean
>> it is.
>>
>>
>> Dan
>>
>> Daniel J. Nordlund
>> Washington State Department of Social and Health Services
>> Planning, Performance, and Accountability
>> Research and Data Analysis Division
>> Olympia, WA 98504-5204
>>
>>
>> ______________________________________________
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4078114&i=2>mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
>>  To unsubscribe from Pairwise correlation, click here<
>> .
>> NAML<
http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078915.html
> Sent from the R help mailing list archive at Nabble.com.
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

Michael Weylandt
I think something like this should do it, but I can't test without data:

rownames(mydata) <- mydata[,1] # Put the elements in the first column
as rownames
mydata <- mydata[,-1] # drop the things that are now rownames

Michael

On Thu, Nov 17, 2011 at 9:23 AM, Musa Hassan <[hidden email]> wrote:

> Hi Michael,
> Thanks for the response. I have noticed that the error occurred during my
> data read. It appears that the rownames (which when the data is transposed
> become my colnames) were converted to numbers instead of strings as they
> should be. The original header names don't change, just the rownames. I have
> to figure out how to import the data and have the strings not converted.
> Right now am using:
> mydata = read.csv(mydata.csv, headers=T,stringsAsFactors=F)
>
> then to convert the data frame to matrix
> mydata=data.matrix(mydata)
>
> Then I just do the correlation as Peter suggested.
>
> expression=cor(t(expression))
>
> Thanks.
>
> On 17 November 2011 08:51, R. Michael Weylandt <[hidden email]>
> wrote:
>>
>> On Wed, Nov 16, 2011 at 11:22 PM, muzz56 <[hidden email]> wrote:
>> > Thanks to everyone who replied to my post, I finally got it to work. I
>> > am
>> > however not sure how well it worked since it run so quickly, but seems
>> > like
>> > I have a 2000 x 2000 data set.
>>
>> Behold the great and mighty power that is R! Don't worry -- on a
>> decent machine the correlation of a 2k x 2k data set should be pretty
>> fast. (It's about 9 seconds on my old-ish laptop with a bunch of other
>> junk running)
>>
>> >  My followup questions would be, how do I get
>> > only pairs with say a certain pearson correlation value additionally it
>> > seems like my output didn't retain the headers but instead replaced them
>> > with numbers making it hard to know which gene pairs correlate.
>>
>> This is a little worrisome: R carries column names through cor() so
>> this would suggest you weren't using them. Were your headers listed as
>> part of your data (instead of being names)? If so, they would have
>> been taken as numbers.
>>
>> Take a look at dimnames(NAMEOFDATA) -- if your headers aren't there,
>> then they are being treated as data instead of numbers. If they are,
>> can you provide some reproducible code and we can debug more fully.
>> The easiest way to send data is to use the dput() function to get a
>> copy-pasteable plain text representation. It would also be great if
>> you could restrict it to a subset of your data rather than the full 4M
>> data points, but if that's hard to do, don't worry.
>>
>> You should have expected behavior like
>>
>> X <- matrix(1:9,3)
>> colnames(X) <- c("A","B","C")
>> cor(X) # Prints with labels
>>
>> Michael
>>
>> >
>> > On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <
>> > [hidden email]> wrote:
>> >
>> >> > -----Original Message-----
>> >> > From: [hidden
>> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=0>[mailto:
>> >> r-help-bounces@r-
>> >> > project.org] On Behalf Of muzz56
>> >> > Sent: Wednesday, November 16, 2011 12:28 PM
>> >> > To: [hidden
>> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=1>
>> >> > Subject: Re: [R] Pairwise correlation
>> >> >
>> >> > Thanks Peter. I tried this after reading in the csv (read.csv) and
>> >> > converted the data to matrix (as.matrix). But when I tried the
>> >> > correlation,
>> >> > I keeping getting the error (x must be numeric) yet when I view the
>> >> > data,
>> >> > its numeric.
>> >> >
>> >>
>> >> What does R tell you if you execute the following?
>> >>
>> >> str(x)
>> >>
>> >> Just because the data looks like it is numeric when it prints doesn't
>> >> mean
>> >> it is.
>> >>
>> >>
>> >> Dan
>> >>
>> >> Daniel J. Nordlund
>> >> Washington State Department of Social and Health Services
>> >> Planning, Performance, and Accountability
>> >> Research and Data Analysis Division
>> >> Olympia, WA 98504-5204
>> >>
>> >>
>> >> ______________________________________________
>> >> [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=4078114&i=2>mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> >> discussion
>> >> below:
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
>> >>  To unsubscribe from Pairwise correlation, click
>> >> here<
>> >> .
>> >>
>> >> NAML<
http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> >>
>> >
>> >
>> > --
>> > View this message in context:
>> > http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078915.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

muzz56
Hi Michael,
Here is a sample of the data.

Gene Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 Array9 Array10 Array11
Fth1 26016.01 23134.66 17445.71 39856.04 27245.45 23622.98 37887.75 49857.46 25864.73 21852.51 29198.4
B2m 7573.64 7768.52 6608.24 8571.65 6380.78 6242.76 6903.92 7330.63 7256.18 5678.21 10937.05
Tmsb4x 6192.44 4277.22 5024.59 4851.51 3062.55 4562.43 7948.1 5018.58 3200.17 2855.77 6139.23
H2-D1 3141.41 3986.06 3328.62 4726.6 3589.89 2885.95 7509.88 5257.62 4742.26 3431.33 5300.72
Prdx5 3935.7 3938.9 3401.68 4193.14 4028.95 3438.19 6640.15 5486.61 4424.57 3368.83 5265.92

I want to retain the gene names in the data. What you've proposed will take them out and I'll have to append them back to the results after the cor()

On 17 November 2011 09:33, Michael Weylandt [via R] <[hidden email]> wrote:
I think something like this should do it, but I can't test without data:

rownames(mydata) <- mydata[,1] # Put the elements in the first column
as rownames
mydata <- mydata[,-1] # drop the things that are now rownames

Michael

On Thu, Nov 17, 2011 at 9:23 AM, Musa Hassan <[hidden email]> wrote:

> Hi Michael,
> Thanks for the response. I have noticed that the error occurred during my
> data read. It appears that the rownames (which when the data is transposed
> become my colnames) were converted to numbers instead of strings as they
> should be. The original header names don't change, just the rownames. I have
> to figure out how to import the data and have the strings not converted.
> Right now am using:
> mydata = read.csv(mydata.csv, headers=T,stringsAsFactors=F)
>
> then to convert the data frame to matrix
> mydata=data.matrix(mydata)
>
> Then I just do the correlation as Peter suggested.
>
> expression=cor(t(expression))
>
> Thanks.
>
> On 17 November 2011 08:51, R. Michael Weylandt <[hidden email]>
> wrote:
>>

>> On Wed, Nov 16, 2011 at 11:22 PM, muzz56 <[hidden email]> wrote:
>> > Thanks to everyone who replied to my post, I finally got it to work. I
>> > am
>> > however not sure how well it worked since it run so quickly, but seems
>> > like
>> > I have a 2000 x 2000 data set.
>>
>> Behold the great and mighty power that is R! Don't worry -- on a
>> decent machine the correlation of a 2k x 2k data set should be pretty
>> fast. (It's about 9 seconds on my old-ish laptop with a bunch of other
>> junk running)
>>
>> >  My followup questions would be, how do I get
>> > only pairs with say a certain pearson correlation value additionally it
>> > seems like my output didn't retain the headers but instead replaced them
>> > with numbers making it hard to know which gene pairs correlate.
>>
>> This is a little worrisome: R carries column names through cor() so
>> this would suggest you weren't using them. Were your headers listed as
>> part of your data (instead of being names)? If so, they would have
>> been taken as numbers.
>>
>> Take a look at dimnames(NAMEOFDATA) -- if your headers aren't there,
>> then they are being treated as data instead of numbers. If they are,
>> can you provide some reproducible code and we can debug more fully.
>> The easiest way to send data is to use the dput() function to get a
>> copy-pasteable plain text representation. It would also be great if
>> you could restrict it to a subset of your data rather than the full 4M
>> data points, but if that's hard to do, don't worry.
>>
>> You should have expected behavior like
>>
>> X <- matrix(1:9,3)
>> colnames(X) <- c("A","B","C")
>> cor(X) # Prints with labels
>>
>> Michael
>>
>> >
>> > On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <
>> > [hidden email]> wrote:

>> >
>> >> > -----Original Message-----
>> >> > From: [hidden
>> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=0>[mailto:
>> >> r-help-bounces@r-
>> >> > project.org] On Behalf Of muzz56
>> >> > Sent: Wednesday, November 16, 2011 12:28 PM
>> >> > To: [hidden
>> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=1>
>> >> > Subject: Re: [R] Pairwise correlation
>> >> >
>> >> > Thanks Peter. I tried this after reading in the csv (read.csv) and
>> >> > converted the data to matrix (as.matrix). But when I tried the
>> >> > correlation,
>> >> > I keeping getting the error (x must be numeric) yet when I view the
>> >> > data,
>> >> > its numeric.
>> >> >
>> >>
>> >> What does R tell you if you execute the following?
>> >>
>> >> str(x)
>> >>
>> >> Just because the data looks like it is numeric when it prints doesn't
>> >> mean
>> >> it is.
>> >>
>> >>
>> >> Dan
>> >>
>> >> Daniel J. Nordlund
>> >> Washington State Department of Social and Health Services
>> >> Planning, Performance, and Accountability
>> >> Research and Data Analysis Division
>> >> Olympia, WA 98504-5204
>> >>
>> >>
>> >> ______________________________________________
>> >> [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=4078114&i=2>mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> >> discussion
>> >> below:
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
>> >>  To unsubscribe from Pairwise correlation, click
>> >> here<

>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080177.html
To unsubscribe from Pairwise correlation, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

Michael Weylandt
I can't see how it's stored like that and the email servers garble it
up. Use dput() to create a plain text representation and paste that
back in.

Thanks,
Michael

On Thu, Nov 17, 2011 at 9:37 AM, muzz56 <[hidden email]> wrote:

> Hi Michael,
> Here is a sample of the data.
>
>  Gene Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 Array9 Array10
> Array11  Fth1 26016.01 23134.66 17445.71 39856.04 27245.45 23622.98 37887.75
> 49857.46 25864.73 21852.51 29198.4  B2m 7573.64 7768.52 6608.24 8571.65
> 6380.78 6242.76 6903.92 7330.63 7256.18 5678.21 10937.05  Tmsb4x 6192.44
> 4277.22 5024.59 4851.51 3062.55 4562.43 7948.1 5018.58 3200.17 2855.77
> 6139.23  H2-D1 3141.41 3986.06 3328.62 4726.6 3589.89 2885.95 7509.88
> 5257.62 4742.26 3431.33 5300.72  Prdx5 3935.7 3938.9 3401.68 4193.14 4028.95
> 3438.19 6640.15 5486.61 4424.57 3368.83 5265.92
> I want to retain the gene names in the data. What you've proposed will take
> them out and I'll have to append them back to the results after the cor()
>
> On 17 November 2011 09:33, Michael Weylandt [via R] <
> [hidden email]> wrote:
>
>> I think something like this should do it, but I can't test without data:
>>
>> rownames(mydata) <- mydata[,1] # Put the elements in the first column
>> as rownames
>> mydata <- mydata[,-1] # drop the things that are now rownames
>>
>> Michael
>>
>> On Thu, Nov 17, 2011 at 9:23 AM, Musa Hassan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4080177&i=0>>
>> wrote:
>>
>> > Hi Michael,
>> > Thanks for the response. I have noticed that the error occurred during
>> my
>> > data read. It appears that the rownames (which when the data is
>> transposed
>> > become my colnames) were converted to numbers instead of strings as they
>> > should be. The original header names don't change, just the rownames. I
>> have
>> > to figure out how to import the data and have the strings not converted.
>> > Right now am using:
>> > mydata = read.csv(mydata.csv, headers=T,stringsAsFactors=F)
>> >
>> > then to convert the data frame to matrix
>> > mydata=data.matrix(mydata)
>> >
>> > Then I just do the correlation as Peter suggested.
>> >
>> > expression=cor(t(expression))
>> >
>> > Thanks.
>> >
>> > On 17 November 2011 08:51, R. Michael Weylandt <[hidden email]<http://user/SendEmail.jtp?type=node&node=4080177&i=1>>
>>
>> > wrote:
>> >>
>> >> On Wed, Nov 16, 2011 at 11:22 PM, muzz56 <[hidden email]<http://user/SendEmail.jtp?type=node&node=4080177&i=2>>
>> wrote:
>> >> > Thanks to everyone who replied to my post, I finally got it to work.
>> I
>> >> > am
>> >> > however not sure how well it worked since it run so quickly, but
>> seems
>> >> > like
>> >> > I have a 2000 x 2000 data set.
>> >>
>> >> Behold the great and mighty power that is R! Don't worry -- on a
>> >> decent machine the correlation of a 2k x 2k data set should be pretty
>> >> fast. (It's about 9 seconds on my old-ish laptop with a bunch of other
>> >> junk running)
>> >>
>> >> >  My followup questions would be, how do I get
>> >> > only pairs with say a certain pearson correlation value additionally
>> it
>> >> > seems like my output didn't retain the headers but instead replaced
>> them
>> >> > with numbers making it hard to know which gene pairs correlate.
>> >>
>> >> This is a little worrisome: R carries column names through cor() so
>> >> this would suggest you weren't using them. Were your headers listed as
>> >> part of your data (instead of being names)? If so, they would have
>> >> been taken as numbers.
>> >>
>> >> Take a look at dimnames(NAMEOFDATA) -- if your headers aren't there,
>> >> then they are being treated as data instead of numbers. If they are,
>> >> can you provide some reproducible code and we can debug more fully.
>> >> The easiest way to send data is to use the dput() function to get a
>> >> copy-pasteable plain text representation. It would also be great if
>> >> you could restrict it to a subset of your data rather than the full 4M
>> >> data points, but if that's hard to do, don't worry.
>> >>
>> >> You should have expected behavior like
>> >>
>> >> X <- matrix(1:9,3)
>> >> colnames(X) <- c("A","B","C")
>> >> cor(X) # Prints with labels
>> >>
>> >> Michael
>> >>
>> >> >
>> >> > On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <
>> >> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4080177&i=3>>
>> wrote:
>> >> >
>> >> >> > -----Original Message-----
>> >> >> > From: [hidden
>> >> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=0
>> >[mailto:
>> >> >> r-help-bounces@r-
>> >> >> > project.org] On Behalf Of muzz56
>> >> >> > Sent: Wednesday, November 16, 2011 12:28 PM
>> >> >> > To: [hidden
>> >> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=1>
>> >> >> > Subject: Re: [R] Pairwise correlation
>> >> >> >
>> >> >> > Thanks Peter. I tried this after reading in the csv (read.csv) and
>> >> >> > converted the data to matrix (as.matrix). But when I tried the
>> >> >> > correlation,
>> >> >> > I keeping getting the error (x must be numeric) yet when I view
>> the
>> >> >> > data,
>> >> >> > its numeric.
>> >> >> >
>> >> >>
>> >> >> What does R tell you if you execute the following?
>> >> >>
>> >> >> str(x)
>> >> >>
>> >> >> Just because the data looks like it is numeric when it prints
>> doesn't
>> >> >> mean
>> >> >> it is.
>> >> >>
>> >> >>
>> >> >> Dan
>> >> >>
>> >> >> Daniel J. Nordlund
>> >> >> Washington State Department of Social and Health Services
>> >> >> Planning, Performance, and Accountability
>> >> >> Research and Data Analysis Division
>> >> >> Olympia, WA 98504-5204
>> >> >>
>> >> >>
>> >> >> ______________________________________________
>> >> >> [hidden email]
>> >> >> <http://user/SendEmail.jtp?type=node&node=4078114&i=2>mailing list
>> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> PLEASE do read the posting guide
>> >> >> http://www.R-project.org/posting-guide.html
>> >> >> and provide commented, minimal, self-contained, reproducible code.
>> >> >>
>> >> >>
>> >> >> ------------------------------
>> >> >>  If you reply to this email, your message will be added to the
>> >> >> discussion
>> >> >> below:
>> >> >>
>> >> >>
>> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
>> >> >>  To unsubscribe from Pairwise correlation, click
>> >> >> here<
>>
>> >> >> .
>> >> >>
>> >> >> NAML<
>> http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >> >
>> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078915.html
>> >> > Sent from the R help mailing list archive at Nabble.com.
>> >> >        [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4080177&i=4>mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> > http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >
>> >
>>
>> ______________________________________________
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4080177&i=5>mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080177.html
>>  To unsubscribe from Pairwise correlation, click here<
>> .
>> NAML<
http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080194.html
> Sent from the R help mailing list archive at Nabble.com.
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pairwise correlation

Michael Weylandt
Here's a function Josh Wiley provided in another thread:

spec.cor <- function(dat, r, ...) {
    x <- cor(dat, ...)
    x[upper.tri(x, TRUE)] <- NA
    i <- which(abs(x) >= r, arr.ind = TRUE)
    data.frame(matrix(colnames(x)[as.vector(i)], ncol = 2), value = x[i])
}

Michael

On Thu, Nov 17, 2011 at 4:08 PM, Musa Hassan <[hidden email]> wrote:

> Hi Michael,
> I was able to solve this. I just used the WGCNA library which allows for
> stringsAsFactors to be defined in the work space making everything stored as
> strings remain strings. My problem now is parsing through the results to
> pull out only significant correlations defined by a certain Pearson
> correlation value say 0.8.
>
> On 17 November 2011 15:32, R. Michael Weylandt <[hidden email]>
> wrote:
>>
>> I can't see how it's stored like that and the email servers garble it
>> up. Use dput() to create a plain text representation and paste that
>> back in.
>>
>> Thanks,
>> Michael
>>
>> On Thu, Nov 17, 2011 at 9:37 AM, muzz56 <[hidden email]> wrote:
>> > Hi Michael,
>> > Here is a sample of the data.
>> >
>> >  Gene Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 Array9
>> > Array10
>> > Array11  Fth1 26016.01 23134.66 17445.71 39856.04 27245.45 23622.98
>> > 37887.75
>> > 49857.46 25864.73 21852.51 29198.4  B2m 7573.64 7768.52 6608.24 8571.65
>> > 6380.78 6242.76 6903.92 7330.63 7256.18 5678.21 10937.05  Tmsb4x 6192.44
>> > 4277.22 5024.59 4851.51 3062.55 4562.43 7948.1 5018.58 3200.17 2855.77
>> > 6139.23  H2-D1 3141.41 3986.06 3328.62 4726.6 3589.89 2885.95 7509.88
>> > 5257.62 4742.26 3431.33 5300.72  Prdx5 3935.7 3938.9 3401.68 4193.14
>> > 4028.95
>> > 3438.19 6640.15 5486.61 4424.57 3368.83 5265.92
>> > I want to retain the gene names in the data. What you've proposed will
>> > take
>> > them out and I'll have to append them back to the results after the
>> > cor()
>> >
>> > On 17 November 2011 09:33, Michael Weylandt [via R] <
>> > [hidden email]> wrote:
>> >
>> >> I think something like this should do it, but I can't test without
>> >> data:
>> >>
>> >> rownames(mydata) <- mydata[,1] # Put the elements in the first column
>> >> as rownames
>> >> mydata <- mydata[,-1] # drop the things that are now rownames
>> >>
>> >> Michael
>> >>
>> >> On Thu, Nov 17, 2011 at 9:23 AM, Musa Hassan <[hidden
>> >> email]<http://user/SendEmail.jtp?type=node&node=4080177&i=0>>
>> >> wrote:
>> >>
>> >> > Hi Michael,
>> >> > Thanks for the response. I have noticed that the error occurred
>> >> > during
>> >> my
>> >> > data read. It appears that the rownames (which when the data is
>> >> transposed
>> >> > become my colnames) were converted to numbers instead of strings as
>> >> > they
>> >> > should be. The original header names don't change, just the rownames.
>> >> > I
>> >> have
>> >> > to figure out how to import the data and have the strings not
>> >> > converted.
>> >> > Right now am using:
>> >> > mydata = read.csv(mydata.csv, headers=T,stringsAsFactors=F)
>> >> >
>> >> > then to convert the data frame to matrix
>> >> > mydata=data.matrix(mydata)
>> >> >
>> >> > Then I just do the correlation as Peter suggested.
>> >> >
>> >> > expression=cor(t(expression))
>> >> >
>> >> > Thanks.
>> >> >
>> >> > On 17 November 2011 08:51, R. Michael Weylandt <[hidden
>> >> > email]<http://user/SendEmail.jtp?type=node&node=4080177&i=1>>
>> >>
>> >> > wrote:
>> >> >>
>> >> >> On Wed, Nov 16, 2011 at 11:22 PM, muzz56 <[hidden
>> >> >> email]<http://user/SendEmail.jtp?type=node&node=4080177&i=2>>
>> >> wrote:
>> >> >> > Thanks to everyone who replied to my post, I finally got it to
>> >> >> > work.
>> >> I
>> >> >> > am
>> >> >> > however not sure how well it worked since it run so quickly, but
>> >> seems
>> >> >> > like
>> >> >> > I have a 2000 x 2000 data set.
>> >> >>
>> >> >> Behold the great and mighty power that is R! Don't worry -- on a
>> >> >> decent machine the correlation of a 2k x 2k data set should be
>> >> >> pretty
>> >> >> fast. (It's about 9 seconds on my old-ish laptop with a bunch of
>> >> >> other
>> >> >> junk running)
>> >> >>
>> >> >> >  My followup questions would be, how do I get
>> >> >> > only pairs with say a certain pearson correlation value
>> >> >> > additionally
>> >> it
>> >> >> > seems like my output didn't retain the headers but instead
>> >> >> > replaced
>> >> them
>> >> >> > with numbers making it hard to know which gene pairs correlate.
>> >> >>
>> >> >> This is a little worrisome: R carries column names through cor() so
>> >> >> this would suggest you weren't using them. Were your headers listed
>> >> >> as
>> >> >> part of your data (instead of being names)? If so, they would have
>> >> >> been taken as numbers.
>> >> >>
>> >> >> Take a look at dimnames(NAMEOFDATA) -- if your headers aren't there,
>> >> >> then they are being treated as data instead of numbers. If they are,
>> >> >> can you provide some reproducible code and we can debug more fully.
>> >> >> The easiest way to send data is to use the dput() function to get a
>> >> >> copy-pasteable plain text representation. It would also be great if
>> >> >> you could restrict it to a subset of your data rather than the full
>> >> >> 4M
>> >> >> data points, but if that's hard to do, don't worry.
>> >> >>
>> >> >> You should have expected behavior like
>> >> >>
>> >> >> X <- matrix(1:9,3)
>> >> >> colnames(X) <- c("A","B","C")
>> >> >> cor(X) # Prints with labels
>> >> >>
>> >> >> Michael
>> >> >>
>> >> >> >
>> >> >> > On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <
>> >> >> > [hidden email]
>> >> >> > <http://user/SendEmail.jtp?type=node&node=4080177&i=3>>
>> >> wrote:
>> >> >> >
>> >> >> >> > -----Original Message-----
>> >> >> >> > From: [hidden
>> >> >> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=0
>> >> >[mailto:
>> >> >> >> r-help-bounces@r-
>> >> >> >> > project.org] On Behalf Of muzz56
>> >> >> >> > Sent: Wednesday, November 16, 2011 12:28 PM
>> >> >> >> > To: [hidden
>> >> >> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=1>
>> >> >> >> > Subject: Re: [R] Pairwise correlation
>> >> >> >> >
>> >> >> >> > Thanks Peter. I tried this after reading in the csv (read.csv)
>> >> >> >> > and
>> >> >> >> > converted the data to matrix (as.matrix). But when I tried the
>> >> >> >> > correlation,
>> >> >> >> > I keeping getting the error (x must be numeric) yet when I view
>> >> the
>> >> >> >> > data,
>> >> >> >> > its numeric.
>> >> >> >> >
>> >> >> >>
>> >> >> >> What does R tell you if you execute the following?
>> >> >> >>
>> >> >> >> str(x)
>> >> >> >>
>> >> >> >> Just because the data looks like it is numeric when it prints
>> >> doesn't
>> >> >> >> mean
>> >> >> >> it is.
>> >> >> >>
>> >> >> >>
>> >> >> >> Dan
>> >> >> >>
>> >> >> >> Daniel J. Nordlund
>> >> >> >> Washington State Department of Social and Health Services
>> >> >> >> Planning, Performance, and Accountability
>> >> >> >> Research and Data Analysis Division
>> >> >> >> Olympia, WA 98504-5204
>> >> >> >>
>> >> >> >>
>> >> >> >> ______________________________________________
>> >> >> >> [hidden email]
>> >> >> >> <http://user/SendEmail.jtp?type=node&node=4078114&i=2>mailing
>> >> >> >> list
>> >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> >> PLEASE do read the posting guide
>> >> >> >> http://www.R-project.org/posting-guide.html
>> >> >> >> and provide commented, minimal, self-contained, reproducible
>> >> >> >> code.
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------
>> >> >> >>  If you reply to this email, your message will be added to the
>> >> >> >> discussion
>> >> >> >> below:
>> >> >> >>
>> >> >> >>
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
>> >> >> >>  To unsubscribe from Pairwise correlation, click
>> >> >> >> here<
>> >>
>> >> >> >> .
>> >> >> >>
>> >> >> >> NAML<
>> >>
>> >> http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > View this message in context:
>> >> >> >
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078915.html
>> >> >> > Sent from the R help mailing list archive at Nabble.com.
>> >> >> >        [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > [hidden email]
>> >> >> > <http://user/SendEmail.jtp?type=node&node=4080177&i=4>mailing list
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide
>> >> >> > http://www.R-project.org/posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >> >
>> >> >
>> >> >
>> >>
>> >> ______________________________________________
>> >> [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=4080177&i=5>mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> >> discussion
>> >> below:
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080177.html
>> >>  To unsubscribe from Pairwise correlation, click
>> >> here<
>> >> .
>> >>
>> >> NAML<
http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> >>
>> >
>> >
>> > --
>> > View this message in context:
>> > http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080194.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...