Quantcast

Principal component analysis PCA

classic Classic list List threaded Threaded
5 messages Options
SNN
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Principal component analysis PCA

SNN
Hi,

I am trying to run PCA on a set of data with dimension 115*300,000. The columns represnt the snps and the row represent the individuals. so this is what i did.

#load the data

 code<-read.table("code.txt", sep='\t', header=F, nrows=300000)

# do PCA #

pr<-prcomp(code, retx=T, center=T)

I am getting the following error message

"Error: cannot allocate vector of size 275.6 Mb"

I tried to increase the memory size :

"memory.size(4000)"

but it did not work, is there a solution for this ? or is there another software that can handle large data sets.

Thanks

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Principal component analysis PCA

Wang, Zhaoming (NIH/NCI) [C]
 
Try EIGENSTRAT http://www.nature.com/ng/journal/v38/n8/abs/ng1847.html 

or use a subset of SNPs.

Zhaoming
-----Original Message-----
From: SNN [mailto:[hidden email]]
Sent: Wednesday, February 13, 2008 9:14 PM
To: [hidden email]
Subject: [R] Principal component analysis PCA


Hi,

I am trying to run PCA on a set of data with dimension 115*300,000. The
columns represnt the snps and the row represent the individuals. so this
is what i did.

#load the data

 code<-read.table("code.txt", sep='\t', header=F, nrows=300000)

# do PCA #

pr<-prcomp(code, retx=T, center=T)

I am getting the following error message

"Error: cannot allocate vector of size 275.6 Mb"

I tried to increase the memory size :

"memory.size(4000)"

but it did not work, is there a solution for this ? or is there another
software that can handle large data sets.

Thanks


--
View this message in context:
http://www.nabble.com/Principal-component-analysis-PCA-tp15472509p154725
09.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Principal component analysis PCA

Thomas Lumley
On Wed, 13 Feb 2008, Wang, Zhaoming (NIH/NCI) [C] wrote:

>
> Try EIGENSTRAT http://www.nature.com/ng/journal/v38/n8/abs/ng1847.html

The same approach as EIGENSTRAT is pretty straightforward in R.

You need to create the covariance matrix of people (rather than of SNPs)
for the 0/1/2 genotype at each SNP and take the principal components of
that matrix.

In this case the number of individuals is small enough that you should be
able to create the covariance matrix directly by matrix operations.  In
larger data sets where the entire data matrix doesn't fit in memory, you
need some sort of double loop.

  -thomas


> Zhaoming
> -----Original Message-----
> From: SNN [mailto:[hidden email]]
> Sent: Wednesday, February 13, 2008 9:14 PM
> To: [hidden email]
> Subject: [R] Principal component analysis PCA
>
>
> Hi,
>
> I am trying to run PCA on a set of data with dimension 115*300,000. The
> columns represnt the snps and the row represent the individuals. so this
> is what i did.
>
> #load the data
>
> code<-read.table("code.txt", sep='\t', header=F, nrows=300000)
>
> # do PCA #
>
> pr<-prcomp(code, retx=T, center=T)
>
> I am getting the following error message
>
> "Error: cannot allocate vector of size 275.6 Mb"
>
> I tried to increase the memory size :
>
> "memory.size(4000)"
>
> but it did not work, is there a solution for this ? or is there another
> software that can handle large data sets.
>
> Thanks
>
>
> --
> View this message in context:
> http://www.nabble.com/Principal-component-analysis-PCA-tp15472509p154725
> 09.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
SNN
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Principal component analysis PCA

SNN
Thanks for the advice.

I tried to find the cov of my matrix using R and it ran out of memory. I am not sure how to do double loop to create the covariace matrix?  Also is doing prcomp( covariace matrix) the same as finding
prcomp( original data ,matrix of snps)?

Thanks for your help,



Thomas Lumley wrote
On Wed, 13 Feb 2008, Wang, Zhaoming (NIH/NCI) [C] wrote:

>
> Try EIGENSTRAT http://www.nature.com/ng/journal/v38/n8/abs/ng1847.html

The same approach as EIGENSTRAT is pretty straightforward in R.

You need to create the covariance matrix of people (rather than of SNPs)
for the 0/1/2 genotype at each SNP and take the principal components of
that matrix.

In this case the number of individuals is small enough that you should be
able to create the covariance matrix directly by matrix operations.  In
larger data sets where the entire data matrix doesn't fit in memory, you
need some sort of double loop.

  -thomas


> Zhaoming
> -----Original Message-----
> From: SNN [mailto:s.nancy1@yahoo.com]
> Sent: Wednesday, February 13, 2008 9:14 PM
> To: r-help@r-project.org
> Subject: [R] Principal component analysis PCA
>
>
> Hi,
>
> I am trying to run PCA on a set of data with dimension 115*300,000. The
> columns represnt the snps and the row represent the individuals. so this
> is what i did.
>
> #load the data
>
> code<-read.table("code.txt", sep='\t', header=F, nrows=300000)
>
> # do PCA #
>
> pr<-prcomp(code, retx=T, center=T)
>
> I am getting the following error message
>
> "Error: cannot allocate vector of size 275.6 Mb"
>
> I tried to increase the memory size :
>
> "memory.size(4000)"
>
> but it did not work, is there a solution for this ? or is there another
> software that can handle large data sets.
>
> Thanks
>
>
> --
> View this message in context:
> http://www.nabble.com/Principal-component-analysis-PCA-tp15472509p154725
> 09.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley Assoc. Professor, Biostatistics
tlumley@u.washington.edu University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Principal component analysis PCA

Thomas Lumley
On Thu, 14 Feb 2008, SNN wrote:

>
> Thanks for the advice.
>
> I tried to find the cov of my matrix using R and it ran out of memory.

How did you do this? The covariance matrix is only 115x115, so it
shouldn't run out of memory
   cov(t(code))
should work

If that doesn't work then
   tcrossprod(code)/300000 - tcrossprod(rowMeans(code))
might.

> I am
> not sure how to do double loop to create the covariace matrix?  Also is
> doing prcomp( covariace matrix) the same as finding
> prcomp( original data ,matrix of snps)?

That's the point of the paper behind the EIGENSTRAT software, which is
worth reading.  The eigenvalues are the same and the eigenvectors are
related.  One way around gives the left singular vectors of the data
matrix, the other gives the right singular vectors.


  -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...