Quantcast

help, please! matrix operations inside 3 nested loops

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

help, please! matrix operations inside 3 nested loops

Fridolin
hello, this is my script:

#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt', header=TRUE, sep="\t")
daten<-as.matrix(daten)

#2) create empty matrix:
indxind<-matrix(nrow=617, ncol=617)
indxind[1:20,1:19]

#3) compare cells to each other, score:
for (s in 3:34) {   #walks though the matrix colum by colum, starting at colum 3
  for (z1 in 1:617) {  #for each current colum, take one row (z1)...
    for (z2 in 1:617) {  #...and compare it to another row (z2) of the current colum
      if (z1!=z2) {topf<-indxind[z1,z2]
                   if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1   #actually, 2 rows make up 1 individual,
                   if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1      #therefore i compare 2 rows
                   if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1      #with another 2 rows
                   if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
                   indxind[z1,z2]<-topf
                   indxind[z2,z1]<-topf
                  }
      #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but gives NA for indxind[1,2]
      }
    #indxind[1:5,1:5] #empty matrix
  }
  #indxind[1:5,1:5] #empty matrix
  }

#4) check:
indxind[1:5,1:5]

this results no errors, but my matrix indxind remains empty (only NAs). though all columns and rows are counted properly. R needs quite a while to get through all this (there are probably smarter and faster ways to calculate this but i am not too deep into R and bioinformatics, and i need to calculate this only once). could the 3 for-loops already be too computationally intense for adding matrix operations?

any help would be much appreciated!

thx, frido
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help, please! matrix operations inside 3 nested loops

Michael Weylandt
On Wed, Aug 8, 2012 at 9:06 AM, Fridolin <[hidden email]> wrote:

> hello, this is my script:
>
> #1) read in data:
> daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt',
> header=TRUE, sep="\t")
> daten<-as.matrix(daten)
>
> #2) create empty matrix:
> indxind<-matrix(nrow=617, ncol=617)
> indxind[1:20,1:19]
>
> #3) compare cells to each other, score:
> for (s in 3:34) {   #walks though the matrix colum by colum, starting at
> colum 3
>   for (z1 in 1:617) {  #for each current colum, take one row (z1)...
>     for (z2 in 1:617) {  #...and compare it to another row (z2) of the
> current colum
>       if (z1!=z2) {topf<-indxind[z1,z2]
>                    if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1
> #actually, 2 rows make up 1 individual,
>                    if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1
> #therefore i compare 2 rows
>                    if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1
> #with another 2 rows
>                    if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
>                    indxind[z1,z2]<-topf
>                    indxind[z2,z1]<-topf
>                   }
>       #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but
> gives NA for indxind[1,2]
>       }
>     #indxind[1:5,1:5] #empty matrix
>   }
>   #indxind[1:5,1:5] #empty matrix
>   }
>
> #4) check:
> indxind[1:5,1:5]
>
> this results no errors, but my matrix indxind remains empty (only NAs).
> though all columns and rows are counted properly. R needs quite a while to
> get through all this (there are probably smarter and faster ways to
> calculate this but i am not too deep into R and bioinformatics, and i need
> to calculate this only once). could the 3 for-loops already be too
> computationally intense for adding matrix operations?
>
> any help would be much appreciated!
>
> thx, frido
>
>

Hi Frido,

I'm afraid I get a little lost in your code, but I'd be willing to bet
we can cut the loops out entirely and speed things up.

Can you give us a "big picture" description of the algorithm you're
implementing as well as (if it's not too hard) a small reproducible
example [1]?

Note also that most of us don't use Nabble so you'll need to
explicitly quote any relevant context.

Thanks,
Michael

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Odp: help, please! matrix operations inside 3 nested loops

PIKAL Petr
In reply to this post by Fridolin
Hi

>
> hello, this is my script:
>
> #1) read in data:
>
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt',
> header=TRUE, sep="\t")
> daten<-as.matrix(daten)

If there is any column with nonnumeric values it will transfer all numeric
values from daten data.frame to character values.

>
> #2) create empty matrix:
> indxind<-matrix(nrow=617, ncol=617)
> indxind[1:20,1:19]
>
> #3) compare cells to each other, score:
> for (s in 3:34) {   #walks though the matrix colum by colum, starting at
> colum 3
>   for (z1 in 1:617) {  #for each current colum, take one row (z1)...
>     for (z2 in 1:617) {  #...and compare it to another row (z2) of the
> current colum
>       if (z1!=z2) {topf<-indxind[z1,z2]
>                    if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1
> #actually, 2 rows make up 1 individual,
>                    if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1
> #therefore i compare 2 rows
>                    if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1
> #with another 2 rows
>                    if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
>                    indxind[z1,z2]<-topf
>                    indxind[z2,z1]<-topf
>                   }

The above code is rather clumsy and it is difficult to understand what it
shall do without extensive study. AFAIU you first set topf to NA and then
try to add 1 to topf. The result is again NA regardless of your
sophisticated z constuction. Therefore you are just computing NA in each
cycle, so you can not expect other result them NA.


>       #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly,
but

> gives NA for indxind[1,2]
>       }
>     #indxind[1:5,1:5] #empty matrix
>   }
>   #indxind[1:5,1:5] #empty matrix
>   }
>
> #4) check:
> indxind[1:5,1:5]
>
> this results no errors, but my matrix indxind remains empty (only NAs).
> though all columns and rows are counted properly. R needs quite a while
to
> get through all this (there are probably smarter and faster ways to
> calculate this but i am not too deep into R and bioinformatics, and i
need
> to calculate this only once). could the 3 for-loops already be too

What is this. Please try to set up small example with what do you have and
what do you want to achive. Unless you can explain better what do you
want, you probably will not get better answer.

I, however, may be proven wrong as some clever people in this list are far
better in mind reading then I am :-)

Regards
Petr


> computationally intense for adding matrix operations?
>
> any help would be much appreciated!
>
> thx, frido
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/help-please-
> matrix-operations-inside-3-nested-loops-tp4639592.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help, please! matrix operations inside 3 nested loops

Berend Hasselman
In reply to this post by Fridolin
Fridolin wrote
hello, this is my script:

#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt', header=TRUE, sep="\t")
daten<-as.matrix(daten)

#2) create empty matrix:
indxind<-matrix(nrow=617, ncol=617)
indxind[1:20,1:19]
You should at least initialize indxind to 0 with

indxind<-matrix(0,nrow=617, ncol=617)

because the default for matrix is to use NA for data.

Berend
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help, please! matrix operations inside 3 nested loops

Fridolin
thank you for your help.

my input data looks like this (tab separated):

Ind.nr. Pop.nr. scm266 rms1280 scm247 rms1107
1 101 305 318 222 135
1 101 305 318 231 135
2 101 305 313 999 96
2 101 305 321 999 130
3 101 305 324 231 135
3 101 305 324 231 135
4 101 305 313 230 126
4 101 305 313 230 135
6 101 305 313 231 135
6 101 305 321 231 135

it is a dataset with genetic marker alleles for single individuals.
the first row is the header, all following rows are individuals. 2 rows count for 1 individual.
first colum is the individual's number, second colum is the number for the population the individual comes from, and all following colums are different genetic markers.

what i want to do with this data in R, is to compare one individual with each of the other individuals, allele-wise. there are five possibilities: the two compared individuals share 4,3,2,1,0 alleles of the currently examined marker (=colum). for each shared allele this pair of individuals shall get 1 scoring point. for each pair of individuals, all scoring points shall be summarized over all markers.


my code again, modified according to your suggestions:

#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t")
daten<-as.data.frame(daten)

#2) create empty matrix:
indxind<-matrix(0,nrow=617, ncol=617)
indxind[1:20,1:19]

#3) compare cells to each other, score:
#for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
for (s in 3:6) {   #walks though the matrix colum by colum, starting at colum 3
  for (z1 in 1:6) {  #for each current colum, take one row (z1)...
    for (z2 in 1:6) {  #...and compare it to another row (z2) of the current colum
      if (z1!=z2) {topf<-indxind[z1,z2]
                   if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1   #actually, 2 rows make up 1 individual,
                   if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1      #therefore i compare 2 rows
                   if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1      #with another 2 rows
                   if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
                   indxind[z1,z2]<-topf
                   indxind[z2,z1]<-topf
      }
      #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but gives always 8 for indxind[1,2]
    }
    #indxind[1:5,1:5] #empty matrix
  }
  #indxind[1:5,1:5] #empty matrix
}

#4) check:
indxind[1:5,1:5]



@ Michael Weylandt: i've done my best with regard to the "big picture" of my algorithm and the small reproducible example. i hope both is sufficient.
@ Petr Pikal-3: in this case, there are only numerical values, but it's a useful hint for my other codes.
@ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's instead of NAs helps, it fills something in indxind now. but it does the calculation only for the first marker (colum 3), afterwards i get an error:
Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf +  :
  Fehlender Wert, wo TRUE/FALSE nötig ist
Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf +  :
  Missing value, where TRUE/FAlse is required
Has this something to do with the changing to daten<-as.data.frame(daten) in line 3 (instead of as.matrix before)?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help, please! matrix operations inside 3 nested loops

Fridolin
SORRY!!!! it should be:

Fridolin wrote
for (s in 3:6) {   #walks though the matrix colum by colum, starting at colum 3
  for (z1 in 1:5) {  #for each current colum, take one row (z1)...
    for (z2 in 1:5) {  #...and compare it to another row (z2) of the current colum
error is gone now.... SORRY!!!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help, please! matrix operations inside 3 nested loops

Fridolin
all problems solved. thank you for your help!
for the sake of completeness, here my solution:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t")
daten<-as.data.frame(daten)

#2) create empty matrix:
indxind<-matrix(0,nrow=617, ncol=617)
#indxind[1:20,1:19]

#3) compare cells to each other, score:
#for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
z1<-1 #running variable for rows in daten
z2<-1 #running variable for rows in daten
l1<-1 #running variable for rows in indxind
l2<-1 #running variable for rows in indxind
for (s in 3:6) {   #walks though the matrix colum by colum, starting at colum 3
                while (z1<11) {  #for each current colum, take one row (z1)...
                                while (z2<11) {  #...and compare it to another row (z2) of the current colum
                                              if (z1!=z2) {
                                                          l1
                                                          topf<-indxind[l1,l2]
                                                          if (daten[z1,s]==daten[z2,s]) topf<-topf+1   #actually, 2 rows make up 1 individual,
                                                          if (daten[z1,s]==daten[z2+1,s]) topf<-topf+1      #therefore i compare 2 rows
                                                          if (daten[z1+1,s]==daten[z2,s]) topf<-topf+1      #with another 2 rows
                                                          if (daten[z1+1,s]==daten[z2+1,s]) topf<-topf+1
                                                          indxind[l1,l2]<-topf
                                                          }
                                              z2<-z2+2
                                              l2<-l2+1
                                              }
                                z2<-1
                                l2<-1
                                z1<-z1+2
                                l1<-l1+1
                              }
                z1<-1
                l1<-1
               }

#4) check:
indxind[1:5,1:5]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help, please! matrix operations inside 3 nested loops

PIKAL Petr
In reply to this post by Fridolin
Hi

> thank you for your help.
>
> my input data looks like this (tab separated):
>
> Ind.nr.   Pop.nr.   scm266   rms1280   scm247   rms1107
> 1   101   305   318   222   135
> 1   101   305   318   231   135
> 2   101   305   313   999   96
> 2   101   305   321   999   130
> 3   101   305   324   231   135
> 3   101   305   324   231   135
> 4   101   305   313   230   126
> 4   101   305   313   230   135
> 6   101   305   313   231   135
> 6   101   305   321   231   135

Better to use dput(your.data) for sharing data. Anyway I am still confused
but you probably are able to clarify things further.

>
> it is a dataset with genetic marker alleles for single individuals.
> the first row is the header, all following rows are individuals. 2 rows
> count for 1 individual.
> first colum is the individual's number, second colum is the number for
the
> population the individual comes from, and all following colums are
different
> genetic markers.
>
> what i want to do with this data in R, is to compare one individual with

In those 2 rows for one individual sometimes the genetic marker differs

> test[1:2, "scm247"]
[1] 222 231

What do you want to do with them?

> each of the other individuals, allele-wise. there are five
possibilities:
> the two compared individuals share 4,3,2,1,0 alleles of the currently
> examined marker (=colum). for each shared allele this pair of
individuals
> shall get 1 scoring point. for each pair of individuals, all scoring
points
> shall be summarized over all markers.

Based on your example,

> dput(test)
structure(list(Ind.nr. = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 6L,
6L), Pop.nr. = c(101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L,
101L, 101L), scm266 = c(305L, 305L, 305L, 305L, 305L, 305L, 305L,
305L, 305L, 305L), rms1280 = c(318L, 318L, 313L, 321L, 324L,
324L, 313L, 313L, 313L, 321L), scm247 = c(222L, 231L, 999L, 999L,
231L, 231L, 230L, 230L, 231L, 231L), rms1107 = c(135L, 135L,
96L, 130L, 135L, 135L, 126L, 135L, 135L, 135L)), .Names = c("Ind.nr.",
"Pop.nr.", "scm266", "rms1280", "scm247", "rms1107"), class =
"data.frame", row.names = c(NA,
-10L))

what is your desired result?

Regards
Petr


>
>
> my code again, modified according to your suggestions:
>
> #1) read in data:
> daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE,
sep="\t")

> daten<-as.data.frame(daten)
>
> #2) create empty matrix:
> indxind<-matrix(0,nrow=617, ncol=617)
> indxind[1:20,1:19]
>
> #3) compare cells to each other, score:
> #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
> for (s in 3:6) {   #walks though the matrix colum by colum, starting at
> colum 3
>   for (z1 in 1:6) {  #for each current colum, take one row (z1)...
>     for (z2 in 1:6) {  #...and compare it to another row (z2) of the
current

> colum
>       if (z1!=z2) {topf<-indxind[z1,z2]
>                    if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1
> #actually, 2 rows make up 1 individual,
>                    if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1
> #therefore i compare 2 rows
>                    if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1
> #with another 2 rows
>                    if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
>                    indxind[z1,z2]<-topf
>                    indxind[z2,z1]<-topf
>       }
>       #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly,
but

> gives always 8 for indxind[1,2]
>     }
>     #indxind[1:5,1:5] #empty matrix
>   }
>   #indxind[1:5,1:5] #empty matrix
> }
>
> #4) check:
> indxind[1:5,1:5]
>
>
>
> @ Michael Weylandt: i've done my best with regard to the "big picture"
of my
> algorithm and the small reproducible example. i hope both is sufficient.
> @ Petr Pikal-3: in this case, there are only numerical values, but it's
a
> useful hint for my other codes.
> @ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's
instead
> of NAs helps, it fills something in indxind now. but it does the
calculation
> only for the first marker (colum 3), afterwards i get an error:
> Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf
+
> :
>   Fehlender Wert, wo TRUE/FALSE nötig ist
> Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf
+  :
>   Missing value, where TRUE/FAlse is required
> Has this something to do with the changing to
daten<-as.data.frame(daten) in

> line 3 (instead of as.matrix before)?
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/help-please-
> matrix-operations-inside-3-nested-loops-tp4639592p4639730.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help, please! matrix operations inside 3 nested loops

PIKAL Petr
In reply to this post by Fridolin
Hi

> all problems solved. thank you for your help!
> for the sake of completeness, here my solution:
> #1) read in data:
> daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE,
sep="\t")
> daten<-as.data.frame(daten)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
not needed, daten is already data frame

>
> #2) create empty matrix:
> indxind<-matrix(0,nrow=617, ncol=617)
> #indxind[1:20,1:19]
>
> #3) compare cells to each other, score:
> #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
> z1<-1 #running variable for rows in daten
> z2<-1 #running variable for rows in daten
> l1<-1 #running variable for rows in indxind
> l2<-1 #running variable for rows in indxind
> for (s in 3:6) {   #walks though the matrix colum by colum, starting at
> colum 3
>                 while (z1<11) {  #for each current colum, take one row
> (z1)...
>                                 while (z2<11) {  #...and compare it to
> another row (z2) of the current colum
>                                               if (z1!=z2) {
>                                                           l1
>
> topf<-indxind[l1,l2]
>                                                           if
> (daten[z1,s]==daten[z2,s]) topf<-topf+1   #actually, 2 rows make up 1
> individual,
>                                                           if
> (daten[z1,s]==daten[z2+1,s]) topf<-topf+1      #therefore i compare 2
rows

>                                                           if
> (daten[z1+1,s]==daten[z2,s]) topf<-topf+1      #with another 2 rows
>                                                           if
> (daten[z1+1,s]==daten[z2+1,s]) topf<-topf+1
>
> indxind[l1,l2]<-topf
>                                                           }
>                                               z2<-z2+2
>                                               l2<-l2+1
>                                               }
>                                 z2<-1
>                                 l2<-1
>                                 z1<-z1+2
>                                 l1<-l1+1
>                               }
>                 z1<-1
>                 l1<-1
>                }
>
> #4) check:
> indxind[1:5,1:5]

I believe that above cycles can be simplified, maybe by changing your
daten to three dimensional array or some clever **ply construction but if
your loops works it is not probably worth en effort.

Regards
Petr

>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/help-please-
> matrix-operations-inside-3-nested-loops-tp4639592p4639744.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...