# help, please! matrix operations inside 3 nested loops

9 messages
Open this post in threaded view
|
Report Content as Inappropriate

## help, please! matrix operations inside 3 nested loops

 hello, this is my script: #1) read in data: daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt', header=TRUE, sep="\t") daten<-as.matrix(daten) #2) create empty matrix: indxind<-matrix(nrow=617, ncol=617) indxind[1:20,1:19] #3) compare cells to each other, score: for (s in 3:34) {   #walks though the matrix colum by colum, starting at colum 3   for (z1 in 1:617) {  #for each current colum, take one row (z1)...     for (z2 in 1:617) {  #...and compare it to another row (z2) of the current colum       if (z1!=z2) {topf<-indxind[z1,z2]                    if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1   #actually, 2 rows make up 1 individual,                    if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1      #therefore i compare 2 rows                    if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1      #with another 2 rows                    if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1                    indxind[z1,z2]<-topf                    indxind[z2,z1]<-topf                   }       #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but gives NA for indxind[1,2]       }     #indxind[1:5,1:5] #empty matrix   }   #indxind[1:5,1:5] #empty matrix   } #4) check: indxind[1:5,1:5] this results no errors, but my matrix indxind remains empty (only NAs). though all columns and rows are counted properly. R needs quite a while to get through all this (there are probably smarter and faster ways to calculate this but i am not too deep into R and bioinformatics, and i need to calculate this only once). could the 3 for-loops already be too computationally intense for adding matrix operations? any help would be much appreciated! thx, frido
Open this post in threaded view
|
Report Content as Inappropriate

## Re: help, please! matrix operations inside 3 nested loops

 On Wed, Aug 8, 2012 at 9:06 AM, Fridolin <[hidden email]> wrote: > hello, this is my script: > > #1) read in data: > daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt', > header=TRUE, sep="\t") > daten<-as.matrix(daten) > > #2) create empty matrix: > indxind<-matrix(nrow=617, ncol=617) > indxind[1:20,1:19] > > #3) compare cells to each other, score: > for (s in 3:34) {   #walks though the matrix colum by colum, starting at > colum 3 >   for (z1 in 1:617) {  #for each current colum, take one row (z1)... >     for (z2 in 1:617) {  #...and compare it to another row (z2) of the > current colum >       if (z1!=z2) {topf<-indxind[z1,z2] >                    if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1 > #actually, 2 rows make up 1 individual, >                    if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1 > #therefore i compare 2 rows >                    if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1 > #with another 2 rows >                    if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1 >                    indxind[z1,z2]<-topf >                    indxind[z2,z1]<-topf >                   } >       #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but > gives NA for indxind[1,2] >       } >     #indxind[1:5,1:5] #empty matrix >   } >   #indxind[1:5,1:5] #empty matrix >   } > > #4) check: > indxind[1:5,1:5] > > this results no errors, but my matrix indxind remains empty (only NAs). > though all columns and rows are counted properly. R needs quite a while to > get through all this (there are probably smarter and faster ways to > calculate this but i am not too deep into R and bioinformatics, and i need > to calculate this only once). could the 3 for-loops already be too > computationally intense for adding matrix operations? > > any help would be much appreciated! > > thx, frido > > Hi Frido, I'm afraid I get a little lost in your code, but I'd be willing to bet we can cut the loops out entirely and speed things up. Can you give us a "big picture" description of the algorithm you're implementing as well as (if it's not too hard) a small reproducible example [1]? Note also that most of us don't use Nabble so you'll need to explicitly quote any relevant context. Thanks, Michael [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Odp: help, please! matrix operations inside 3 nested loops

Open this post in threaded view
|
Report Content as Inappropriate

## Re: help, please! matrix operations inside 3 nested loops

 In reply to this post by Fridolin Fridolin wrote hello, this is my script: #1) read in data: daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt', header=TRUE, sep="\t") daten<-as.matrix(daten) #2) create empty matrix: indxind<-matrix(nrow=617, ncol=617) indxind[1:20,1:19] You should at least initialize indxind to 0 with indxind<-matrix(0,nrow=617, ncol=617) because the default for matrix is to use NA for data. Berend
Open this post in threaded view
|
Report Content as Inappropriate

## Re: help, please! matrix operations inside 3 nested loops

 thank you for your help. my input data looks like this (tab separated): Ind.nr. Pop.nr. scm266 rms1280 scm247 rms1107 1 101 305 318 222 135 1 101 305 318 231 135 2 101 305 313 999 96 2 101 305 321 999 130 3 101 305 324 231 135 3 101 305 324 231 135 4 101 305 313 230 126 4 101 305 313 230 135 6 101 305 313 231 135 6 101 305 321 231 135 it is a dataset with genetic marker alleles for single individuals. the first row is the header, all following rows are individuals. 2 rows count for 1 individual. first colum is the individual's number, second colum is the number for the population the individual comes from, and all following colums are different genetic markers. what i want to do with this data in R, is to compare one individual with each of the other individuals, allele-wise. there are five possibilities: the two compared individuals share 4,3,2,1,0 alleles of the currently examined marker (=colum). for each shared allele this pair of individuals shall get 1 scoring point. for each pair of individuals, all scoring points shall be summarized over all markers. my code again, modified according to your suggestions: #1) read in data: daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t") daten<-as.data.frame(daten) #2) create empty matrix: indxind<-matrix(0,nrow=617, ncol=617) indxind[1:20,1:19] #3) compare cells to each other, score: #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617 for (s in 3:6) {   #walks though the matrix colum by colum, starting at colum 3   for (z1 in 1:6) {  #for each current colum, take one row (z1)...     for (z2 in 1:6) {  #...and compare it to another row (z2) of the current colum       if (z1!=z2) {topf<-indxind[z1,z2]                    if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1   #actually, 2 rows make up 1 individual,                    if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1      #therefore i compare 2 rows                    if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1      #with another 2 rows                    if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1                    indxind[z1,z2]<-topf                    indxind[z2,z1]<-topf       }       #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but gives always 8 for indxind[1,2]     }     #indxind[1:5,1:5] #empty matrix   }   #indxind[1:5,1:5] #empty matrix } #4) check: indxind[1:5,1:5] @ Michael Weylandt: i've done my best with regard to the "big picture" of my algorithm and the small reproducible example. i hope both is sufficient. @ Petr Pikal-3: in this case, there are only numerical values, but it's a useful hint for my other codes. @ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's instead of NAs helps, it fills something in indxind now. but it does the calculation only for the first marker (colum 3), afterwards i get an error: Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf +  :   Fehlender Wert, wo TRUE/FALSE nötig ist Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf +  :   Missing value, where TRUE/FAlse is required Has this something to do with the changing to daten<-as.data.frame(daten) in line 3 (instead of as.matrix before)?
Open this post in threaded view
|
Report Content as Inappropriate

## Re: help, please! matrix operations inside 3 nested loops

 SORRY!!!! it should be: Fridolin wrote for (s in 3:6) {   #walks though the matrix colum by colum, starting at colum 3   for (z1 in 1:5) {  #for each current colum, take one row (z1)...     for (z2 in 1:5) {  #...and compare it to another row (z2) of the current colum error is gone now.... SORRY!!!
Open this post in threaded view
|
Report Content as Inappropriate

## Re: help, please! matrix operations inside 3 nested loops

 all problems solved. thank you for your help! for the sake of completeness, here my solution: #1) read in data: daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t") daten<-as.data.frame(daten) #2) create empty matrix: indxind<-matrix(0,nrow=617, ncol=617) #indxind[1:20,1:19] #3) compare cells to each other, score: #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617 z1<-1 #running variable for rows in daten z2<-1 #running variable for rows in daten l1<-1 #running variable for rows in indxind l2<-1 #running variable for rows in indxind for (s in 3:6) {   #walks though the matrix colum by colum, starting at colum 3                 while (z1<11) {  #for each current colum, take one row (z1)...                                 while (z2<11) {  #...and compare it to another row (z2) of the current colum                                               if (z1!=z2) {                                                           l1                                                           topf<-indxind[l1,l2]                                                           if (daten[z1,s]==daten[z2,s]) topf<-topf+1   #actually, 2 rows make up 1 individual,                                                           if (daten[z1,s]==daten[z2+1,s]) topf<-topf+1      #therefore i compare 2 rows                                                           if (daten[z1+1,s]==daten[z2,s]) topf<-topf+1      #with another 2 rows                                                           if (daten[z1+1,s]==daten[z2+1,s]) topf<-topf+1                                                           indxind[l1,l2]<-topf                                                           }                                               z2<-z2+2                                               l2<-l2+1                                               }                                 z2<-1                                 l2<-1                                 z1<-z1+2                                 l1<-l1+1                               }                 z1<-1                 l1<-1                } #4) check: indxind[1:5,1:5]
Open this post in threaded view
|
Report Content as Inappropriate