Convert filogenetic tree to binary matrix

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Convert filogenetic tree to binary matrix

vanderlei52
I need to create a binary matrix with all node of a phylogenetic tree and the presence of each taxo in their respective node.

Example:

require(ape)
y<-read.tree(text="(E,((H,I)D,(F,G)C)B)A;")
y
plot(y, show.node=TRUE)

I need to create a binary matrix as follows:

        A B C D
G 1 1 1 0
F 1 1 1 0
I 1 1 0 1
H 1 1 0 1
E 1 0 0 0

Somebody could help me to solve this problem.

Thanks,


Vanderlei Debastiani
Reply | Threaded
Open this post in threaded view
|

Re: Convert filogenetic tree to binary matrix

bbolker
vanderlei52 <vanderleidebastianimach <at> yahoo.com.br> writes:

>
> I need to create a binary matrix with all node of a phylogenetic tree and the
> presence of each taxo in their respective node.
>

  I would suggest that you try this question on the r-sig-phylo mailing
list instead.  The phylobase package has an ancestors() functions that
could help you put together a solution, but there may well be a quicker,
easier way.

   Ben Bolker

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert filogenetic tree to binary matrix

vanderlei52
Hi Ben,

Thank you for your help.

I did the same question in the r-sig-phylo mailing list. Liam Revell gave the following solution:

temp<-prop.part(tree)
X<-matrix(0,nrow=length(tree$tip),ncol=length(temp),dimnames=list(tree$tip.label,tree$node.label))
for(i in 1:ncol(X)) X[temp[[i]],i]<-1

Vanderlei
Reply | Threaded
Open this post in threaded view
|

strange fluctuations in system.time with kernapply

Alexander Senger
Hello expeRts,


here is something which strikes me as kind of odd and I would like to ask for some enlightenment:

First let's do this:

tkern <- kernel("modified.daniell", c(5,5))
test <- rep(1,1000000)
system.time(kernapply(test,tkern))
        User      System verstrichen
       1.100       0.040       1.136

That was easy. Now this:

test <- rep(1,1100000)
system.time(kernapply(test,tkern))
        User      System verstrichen
        1.40        0.02        1.43

Still fine. Now this:

test <- rep(1,1110000)
system.time(kernapply(test,tkern))
        User      System verstrichen
       1.390       0.020       1.409

Ok, by now it seems boring. But wait:

test <- rep(1,1110300)
system.time(kernapply(test,tkern))
        User      System verstrichen
      12.270       0.030      12.319

There is a sudden - and repeatable! - jump in the time needed to execute kernapply. At least from a
naive point of view there should not be much difference between applying a kernel to a vector
1110000 or 1110300 entries long. But maybe there is some limit here?

So I tried this:

test <- rep(1,1110400)
system.time(kernapply(test,tkern))
        User      System verstrichen
        1.96        0.01        1.97

which doesn't fit into the pattern. But the best thing is still to come. When I try this

test <- rep(1,1110308)
system.time(kernapply(test,tkern))

then the computer starts to run and does so for longer than 15 minutes until when I normally kill
the process. As noted above this behaviour is repeatable and occurs every time I issue these commands.

I really would like to know if there is some magic to the number 1110308 I'm not aware of.


Last but not least, here is my

sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu

locale:
  [1] LC_CTYPE=de_DE.utf8       LC_NUMERIC=C
  [3] LC_TIME=de_DE.utf8        LC_COLLATE=de_DE.utf8
  [5] LC_MONETARY=C             LC_MESSAGES=de_DE.utf8
  [7] LC_PAPER=de_DE.utf8       LC_NAME=C
  [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.1


Thank you,

Alex

--
Dipl.-Phys. Alexander Senger        Tel   : +49 30 2093 4941
Humboldt-Universitaet zu Berlin     Fax   : +49 30 2093 4718
AG Quantenoptik und Metrologie
Hausvogteiplatz 5-7                 Email :
10117 Berlin, Germany               [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: strange fluctuations in system.time with kernapply

Uwe Ligges-3


On 29.04.2011 23:38, Alexander Senger wrote:

> Hello expeRts,
>
>
> here is something which strikes me as kind of odd and I would like to
> ask for some enlightenment:
>
> First let's do this:
>
> tkern <- kernel("modified.daniell", c(5,5))
> test <- rep(1,1000000)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 1.100 0.040 1.136
>
> That was easy. Now this:
>
> test <- rep(1,1100000)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 1.40 0.02 1.43
>
> Still fine. Now this:
>
> test <- rep(1,1110000)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 1.390 0.020 1.409
>
> Ok, by now it seems boring. But wait:
>
> test <- rep(1,1110300)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 12.270 0.030 12.319
>
> There is a sudden - and repeatable! - jump in the time needed to execute
> kernapply. At least from a naive point of view there should not be much
> difference between applying a kernel to a vector 1110000 or 1110300
> entries long. But maybe there is some limit here?
>
> So I tried this:
>
> test <- rep(1,1110400)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 1.96 0.01 1.97
>
> which doesn't fit into the pattern. But the best thing is still to come.
> When I try this
>
> test <- rep(1,1110308)
> system.time(kernapply(test,tkern))
>
> then the computer starts to run and does so for longer than 15 minutes
> until when I normally kill the process. As noted above this behaviour is
> repeatable and occurs every time I issue these commands.
>
> I really would like to know if there is some magic to the number 1110308
> I'm not aware of.

The magic is that the length of the vector, 1110308, is inefficient for
the fft() used within kernapply(). You need integer powers of 2 for a
really fast FFT.

You can also try smaller numbers  to get longer runtimes, e.g.: 100003

As an example, compare:

system.time(fft(rep(1, 32768))) # roughly 0 seconds
system.time(fft(rep(1, 32771))) # almost 10 seconds

Uwe Ligges



>
>
> Last but not least, here is my
>
> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C
> [3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8
> [5] LC_MONETARY=C LC_MESSAGES=de_DE.utf8
> [7] LC_PAPER=de_DE.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.1
>
>
> Thank you,
>
> Alex
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: strange fluctuations in system.time with kernapply

Ravi Varadhan
Why not do `zero padding' to improve the efficiency, i.e. add a bunch of zeros to the end of the data vector such that the resulting vector is a power of 2?  This is very common in signal processing, and is legitimate since zero padding does not add any new information.

Ravi.

-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University

Ph. (410) 502-2619
email: [hidden email]

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Uwe Ligges
Sent: Monday, May 02, 2011 5:31 AM
To: Alexander Senger
Cc: [hidden email]
Subject: Re: [R] strange fluctuations in system.time with kernapply



On 29.04.2011 23:38, Alexander Senger wrote:

> Hello expeRts,
>
>
> here is something which strikes me as kind of odd and I would like to
> ask for some enlightenment:
>
> First let's do this:
>
> tkern <- kernel("modified.daniell", c(5,5))
> test <- rep(1,1000000)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 1.100 0.040 1.136
>
> That was easy. Now this:
>
> test <- rep(1,1100000)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 1.40 0.02 1.43
>
> Still fine. Now this:
>
> test <- rep(1,1110000)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 1.390 0.020 1.409
>
> Ok, by now it seems boring. But wait:
>
> test <- rep(1,1110300)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 12.270 0.030 12.319
>
> There is a sudden - and repeatable! - jump in the time needed to execute
> kernapply. At least from a naive point of view there should not be much
> difference between applying a kernel to a vector 1110000 or 1110300
> entries long. But maybe there is some limit here?
>
> So I tried this:
>
> test <- rep(1,1110400)
> system.time(kernapply(test,tkern))
> User System verstrichen
> 1.96 0.01 1.97
>
> which doesn't fit into the pattern. But the best thing is still to come.
> When I try this
>
> test <- rep(1,1110308)
> system.time(kernapply(test,tkern))
>
> then the computer starts to run and does so for longer than 15 minutes
> until when I normally kill the process. As noted above this behaviour is
> repeatable and occurs every time I issue these commands.
>
> I really would like to know if there is some magic to the number 1110308
> I'm not aware of.

The magic is that the length of the vector, 1110308, is inefficient for
the fft() used within kernapply(). You need integer powers of 2 for a
really fast FFT.

You can also try smaller numbers  to get longer runtimes, e.g.: 100003

As an example, compare:

system.time(fft(rep(1, 32768))) # roughly 0 seconds
system.time(fft(rep(1, 32771))) # almost 10 seconds

Uwe Ligges



>
>
> Last but not least, here is my
>
> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C
> [3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8
> [5] LC_MONETARY=C LC_MESSAGES=de_DE.utf8
> [7] LC_PAPER=de_DE.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.1
>
>
> Thank you,
>
> Alex
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.