|
hello, this is my script:
#1) read in data: daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt', header=TRUE, sep="\t") daten<-as.matrix(daten) #2) create empty matrix: indxind<-matrix(nrow=617, ncol=617) indxind[1:20,1:19] #3) compare cells to each other, score: for (s in 3:34) { #walks though the matrix colum by colum, starting at colum 3 for (z1 in 1:617) { #for each current colum, take one row (z1)... for (z2 in 1:617) { #...and compare it to another row (z2) of the current colum if (z1!=z2) {topf<-indxind[z1,z2] if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1 #actually, 2 rows make up 1 individual, if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1 #therefore i compare 2 rows if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1 #with another 2 rows if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1 indxind[z1,z2]<-topf indxind[z2,z1]<-topf } #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but gives NA for indxind[1,2] } #indxind[1:5,1:5] #empty matrix } #indxind[1:5,1:5] #empty matrix } #4) check: indxind[1:5,1:5] this results no errors, but my matrix indxind remains empty (only NAs). though all columns and rows are counted properly. R needs quite a while to get through all this (there are probably smarter and faster ways to calculate this but i am not too deep into R and bioinformatics, and i need to calculate this only once). could the 3 for-loops already be too computationally intense for adding matrix operations? any help would be much appreciated! thx, frido |
|
On Wed, Aug 8, 2012 at 9:06 AM, Fridolin <[hidden email]> wrote:
> hello, this is my script: > > #1) read in data: > daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt', > header=TRUE, sep="\t") > daten<-as.matrix(daten) > > #2) create empty matrix: > indxind<-matrix(nrow=617, ncol=617) > indxind[1:20,1:19] > > #3) compare cells to each other, score: > for (s in 3:34) { #walks though the matrix colum by colum, starting at > colum 3 > for (z1 in 1:617) { #for each current colum, take one row (z1)... > for (z2 in 1:617) { #...and compare it to another row (z2) of the > current colum > if (z1!=z2) {topf<-indxind[z1,z2] > if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1 > #actually, 2 rows make up 1 individual, > if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1 > #therefore i compare 2 rows > if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1 > #with another 2 rows > if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1 > indxind[z1,z2]<-topf > indxind[z2,z1]<-topf > } > #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but > gives NA for indxind[1,2] > } > #indxind[1:5,1:5] #empty matrix > } > #indxind[1:5,1:5] #empty matrix > } > > #4) check: > indxind[1:5,1:5] > > this results no errors, but my matrix indxind remains empty (only NAs). > though all columns and rows are counted properly. R needs quite a while to > get through all this (there are probably smarter and faster ways to > calculate this but i am not too deep into R and bioinformatics, and i need > to calculate this only once). could the 3 for-loops already be too > computationally intense for adding matrix operations? > > any help would be much appreciated! > > thx, frido > > Hi Frido, I'm afraid I get a little lost in your code, but I'd be willing to bet we can cut the loops out entirely and speed things up. Can you give us a "big picture" description of the algorithm you're implementing as well as (if it's not too hard) a small reproducible example [1]? Note also that most of us don't use Nabble so you'll need to explicitly quote any relevant context. Thanks, Michael [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Fridolin
Hi
> > hello, this is my script: > > #1) read in data: > daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt', > header=TRUE, sep="\t") > daten<-as.matrix(daten) If there is any column with nonnumeric values it will transfer all numeric values from daten data.frame to character values. > > #2) create empty matrix: > indxind<-matrix(nrow=617, ncol=617) > indxind[1:20,1:19] > > #3) compare cells to each other, score: > for (s in 3:34) { #walks though the matrix colum by colum, starting at > colum 3 > for (z1 in 1:617) { #for each current colum, take one row (z1)... > for (z2 in 1:617) { #...and compare it to another row (z2) of the > current colum > if (z1!=z2) {topf<-indxind[z1,z2] > if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1 > #actually, 2 rows make up 1 individual, > if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1 > #therefore i compare 2 rows > if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1 > #with another 2 rows > if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1 > indxind[z1,z2]<-topf > indxind[z2,z1]<-topf > } The above code is rather clumsy and it is difficult to understand what it shall do without extensive study. AFAIU you first set topf to NA and then try to add 1 to topf. The result is again NA regardless of your sophisticated z constuction. Therefore you are just computing NA in each cycle, so you can not expect other result them NA. > #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but > gives NA for indxind[1,2] > } > #indxind[1:5,1:5] #empty matrix > } > #indxind[1:5,1:5] #empty matrix > } > > #4) check: > indxind[1:5,1:5] > > this results no errors, but my matrix indxind remains empty (only NAs). > though all columns and rows are counted properly. R needs quite a while > get through all this (there are probably smarter and faster ways to > calculate this but i am not too deep into R and bioinformatics, and i need > to calculate this only once). could the 3 for-loops already be too What is this. Please try to set up small example with what do you have and what do you want to achive. Unless you can explain better what do you want, you probably will not get better answer. I, however, may be proven wrong as some clever people in this list are far better in mind reading then I am :-) Regards Petr > computationally intense for adding matrix operations? > > any help would be much appreciated! > > thx, frido > > > > -- > View this message in context: http://r.789695.n4.nabble.com/help-please- > matrix-operations-inside-3-nested-loops-tp4639592.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Fridolin
You should at least initialize indxind to 0 with indxind<-matrix(0,nrow=617, ncol=617) because the default for matrix is to use NA for data. Berend |
|
thank you for your help.
my input data looks like this (tab separated): Ind.nr. Pop.nr. scm266 rms1280 scm247 rms1107 1 101 305 318 222 135 1 101 305 318 231 135 2 101 305 313 999 96 2 101 305 321 999 130 3 101 305 324 231 135 3 101 305 324 231 135 4 101 305 313 230 126 4 101 305 313 230 135 6 101 305 313 231 135 6 101 305 321 231 135 it is a dataset with genetic marker alleles for single individuals. the first row is the header, all following rows are individuals. 2 rows count for 1 individual. first colum is the individual's number, second colum is the number for the population the individual comes from, and all following colums are different genetic markers. what i want to do with this data in R, is to compare one individual with each of the other individuals, allele-wise. there are five possibilities: the two compared individuals share 4,3,2,1,0 alleles of the currently examined marker (=colum). for each shared allele this pair of individuals shall get 1 scoring point. for each pair of individuals, all scoring points shall be summarized over all markers. my code again, modified according to your suggestions: #1) read in data: daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t") daten<-as.data.frame(daten) #2) create empty matrix: indxind<-matrix(0,nrow=617, ncol=617) indxind[1:20,1:19] #3) compare cells to each other, score: #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617 for (s in 3:6) { #walks though the matrix colum by colum, starting at colum 3 for (z1 in 1:6) { #for each current colum, take one row (z1)... for (z2 in 1:6) { #...and compare it to another row (z2) of the current colum if (z1!=z2) {topf<-indxind[z1,z2] if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1 #actually, 2 rows make up 1 individual, if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1 #therefore i compare 2 rows if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1 #with another 2 rows if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1 indxind[z1,z2]<-topf indxind[z2,z1]<-topf } #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but gives always 8 for indxind[1,2] } #indxind[1:5,1:5] #empty matrix } #indxind[1:5,1:5] #empty matrix } #4) check: indxind[1:5,1:5] @ Michael Weylandt: i've done my best with regard to the "big picture" of my algorithm and the small reproducible example. i hope both is sufficient. @ Petr Pikal-3: in this case, there are only numerical values, but it's a useful hint for my other codes. @ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's instead of NAs helps, it fills something in indxind now. but it does the calculation only for the first marker (colum 3), afterwards i get an error: Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf + : Fehlender Wert, wo TRUE/FALSE nötig ist Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf + : Missing value, where TRUE/FAlse is required Has this something to do with the changing to daten<-as.data.frame(daten) in line 3 (instead of as.matrix before)? |
|
SORRY!!!! it should be:
error is gone now.... SORRY!!! |
|
all problems solved. thank you for your help!
for the sake of completeness, here my solution: #1) read in data: daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t") daten<-as.data.frame(daten) #2) create empty matrix: indxind<-matrix(0,nrow=617, ncol=617) #indxind[1:20,1:19] #3) compare cells to each other, score: #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617 z1<-1 #running variable for rows in daten z2<-1 #running variable for rows in daten l1<-1 #running variable for rows in indxind l2<-1 #running variable for rows in indxind for (s in 3:6) { #walks though the matrix colum by colum, starting at colum 3 while (z1<11) { #for each current colum, take one row (z1)... while (z2<11) { #...and compare it to another row (z2) of the current colum if (z1!=z2) { l1 topf<-indxind[l1,l2] if (daten[z1,s]==daten[z2,s]) topf<-topf+1 #actually, 2 rows make up 1 individual, if (daten[z1,s]==daten[z2+1,s]) topf<-topf+1 #therefore i compare 2 rows if (daten[z1+1,s]==daten[z2,s]) topf<-topf+1 #with another 2 rows if (daten[z1+1,s]==daten[z2+1,s]) topf<-topf+1 indxind[l1,l2]<-topf } z2<-z2+2 l2<-l2+1 } z2<-1 l2<-1 z1<-z1+2 l1<-l1+1 } z1<-1 l1<-1 } #4) check: indxind[1:5,1:5] |
|
In reply to this post by Fridolin
Hi
> thank you for your help. > > my input data looks like this (tab separated): > > Ind.nr. Pop.nr. scm266 rms1280 scm247 rms1107 > 1 101 305 318 222 135 > 1 101 305 318 231 135 > 2 101 305 313 999 96 > 2 101 305 321 999 130 > 3 101 305 324 231 135 > 3 101 305 324 231 135 > 4 101 305 313 230 126 > 4 101 305 313 230 135 > 6 101 305 313 231 135 > 6 101 305 321 231 135 Better to use dput(your.data) for sharing data. Anyway I am still confused but you probably are able to clarify things further. > > it is a dataset with genetic marker alleles for single individuals. > the first row is the header, all following rows are individuals. 2 rows > count for 1 individual. > first colum is the individual's number, second colum is the number for the > population the individual comes from, and all following colums are different > genetic markers. > > what i want to do with this data in R, is to compare one individual with In those 2 rows for one individual sometimes the genetic marker differs > test[1:2, "scm247"] [1] 222 231 What do you want to do with them? > each of the other individuals, allele-wise. there are five possibilities: > the two compared individuals share 4,3,2,1,0 alleles of the currently > examined marker (=colum). for each shared allele this pair of individuals > shall get 1 scoring point. for each pair of individuals, all scoring points > shall be summarized over all markers. Based on your example, > dput(test) structure(list(Ind.nr. = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 6L, 6L), Pop.nr. = c(101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L), scm266 = c(305L, 305L, 305L, 305L, 305L, 305L, 305L, 305L, 305L, 305L), rms1280 = c(318L, 318L, 313L, 321L, 324L, 324L, 313L, 313L, 313L, 321L), scm247 = c(222L, 231L, 999L, 999L, 231L, 231L, 230L, 230L, 231L, 231L), rms1107 = c(135L, 135L, 96L, 130L, 135L, 135L, 126L, 135L, 135L, 135L)), .Names = c("Ind.nr.", "Pop.nr.", "scm266", "rms1280", "scm247", "rms1107"), class = "data.frame", row.names = c(NA, -10L)) what is your desired result? Regards Petr > > > my code again, modified according to your suggestions: > > #1) read in data: > daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t") > daten<-as.data.frame(daten) > > #2) create empty matrix: > indxind<-matrix(0,nrow=617, ncol=617) > indxind[1:20,1:19] > > #3) compare cells to each other, score: > #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617 > for (s in 3:6) { #walks though the matrix colum by colum, starting at > colum 3 > for (z1 in 1:6) { #for each current colum, take one row (z1)... > for (z2 in 1:6) { #...and compare it to another row (z2) of the > colum > if (z1!=z2) {topf<-indxind[z1,z2] > if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1 > #actually, 2 rows make up 1 individual, > if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1 > #therefore i compare 2 rows > if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1 > #with another 2 rows > if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1 > indxind[z1,z2]<-topf > indxind[z2,z1]<-topf > } > #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, > gives always 8 for indxind[1,2] > } > #indxind[1:5,1:5] #empty matrix > } > #indxind[1:5,1:5] #empty matrix > } > > #4) check: > indxind[1:5,1:5] > > > > @ Michael Weylandt: i've done my best with regard to the "big picture" > algorithm and the small reproducible example. i hope both is sufficient. > @ Petr Pikal-3: in this case, there are only numerical values, but it's a > useful hint for my other codes. > @ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's instead > of NAs helps, it fills something in indxind now. but it does the calculation > only for the first marker (colum 3), afterwards i get an error: > Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf + > : > Fehlender Wert, wo TRUE/FALSE nötig ist > Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf + : > Missing value, where TRUE/FAlse is required > Has this something to do with the changing to daten<-as.data.frame(daten) in > line 3 (instead of as.matrix before)? > > > > -- > View this message in context: http://r.789695.n4.nabble.com/help-please- > matrix-operations-inside-3-nested-loops-tp4639592p4639730.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Fridolin
Hi
> all problems solved. thank you for your help! > for the sake of completeness, here my solution: > #1) read in data: > daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t") > daten<-as.data.frame(daten) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not needed, daten is already data frame > > #2) create empty matrix: > indxind<-matrix(0,nrow=617, ncol=617) > #indxind[1:20,1:19] > > #3) compare cells to each other, score: > #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617 > z1<-1 #running variable for rows in daten > z2<-1 #running variable for rows in daten > l1<-1 #running variable for rows in indxind > l2<-1 #running variable for rows in indxind > for (s in 3:6) { #walks though the matrix colum by colum, starting at > colum 3 > while (z1<11) { #for each current colum, take one row > (z1)... > while (z2<11) { #...and compare it to > another row (z2) of the current colum > if (z1!=z2) { > l1 > > topf<-indxind[l1,l2] > if > (daten[z1,s]==daten[z2,s]) topf<-topf+1 #actually, 2 rows make up 1 > individual, > if > (daten[z1,s]==daten[z2+1,s]) topf<-topf+1 #therefore i compare 2 > if > (daten[z1+1,s]==daten[z2,s]) topf<-topf+1 #with another 2 rows > if > (daten[z1+1,s]==daten[z2+1,s]) topf<-topf+1 > > indxind[l1,l2]<-topf > } > z2<-z2+2 > l2<-l2+1 > } > z2<-1 > l2<-1 > z1<-z1+2 > l1<-l1+1 > } > z1<-1 > l1<-1 > } > > #4) check: > indxind[1:5,1:5] I believe that above cycles can be simplified, maybe by changing your daten to three dimensional array or some clever **ply construction but if your loops works it is not probably worth en effort. Regards Petr > > > > -- > View this message in context: http://r.789695.n4.nabble.com/help-please- > matrix-operations-inside-3-nested-loops-tp4639592p4639744.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
