Quantcast

ape comparative analysis query

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

ape comparative analysis query

Chris Knight-4
I've been comparing variables among objects (taxa) related by known
trees, using phylogentically independent contrasts in the ape package,
and want to move on to more complex models e.g. by using gls with
appropriate correlation terms. My trees contain lots of (hard)
polytomies and information about ancestors, which I've been including-
creating fully dichotomous trees by using zero branch lengths. For
instance:

library(ape)
tree<-read.tree(text="((B1:3,(((D1:5,C1:0):5,B2:0):4,(B3:3,B4:4):0):0):0,A1:0):0;")
plot(tree)

Where all the B taxa arise in a true polytomy from A1 and B2 is the
ancestor of C1 which is the ancestor of D1

I have information on all the taxa and can create independent contrasts
for this:

x<-c(B1=47,D1=43,C1=45,B2=50,B3=47,B4=48,A1=48)
y<-c(B1=2.9,D1=5.4,C1=2.8,B2=3.5,B3=3.2,B4=3.5,A1=1.8)
picx<-pic(x,phy=tree)
picy<-pic(y,phy=tree)

Which seems to make sense. However, if I try to use anything more
sophisticated I run into problems:

>dat<-as.data.frame(cbind(x,y))
>model1<-gls(y~x,data=dat, correlation=corBrownian(phy=tree))
Error in corFactor.corStruct(object) : NA/NaN/Inf in foreign function
call (arg 1)

or equally:

> model2<-compar.gee(y~x,data=dat, phy=tree)
[1] "Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27"
[1] "running glm to get initial regression estimate"
[1] 15.6000 -0.2625
Error in gee(y ~ x, c(1, 1, 1, 1, 1, 1, 1), data = list(x = c(47, 43,  :
        NA/NaN/Inf in foreign function call (arg 14)

(NB using the 'data=' argument seems to be necessary in the gls case- if
I don't, there is a further problem: 'Row names in dataframe do not
match tree tip names. data taken to be in the same order as in tree. in:
Initialize.corPhyl(X[[1]], ...)')

This seems to go away if I remove the root taxon (A1):
tree2<-drop.tip(tree, "A1")
x2<-c(B1=47,D1=43,C1=45,B2=50,B3=47,B4=48)
y2<-c(B1=2.9,D1=5.4,C1=2.8,B2=3.5,B3=3.2,B4=3.5)
dat2<-as.data.frame(cbind(x2,y2))
model2<-gls(y2~x2,data=dat2, correlation=corBrownian(phy=tree2))

This raises various questions:

1) Was I misleading myself that my independent contrasts were valid in
the first place?
2) What is it, if anything, about the root taxon that causes this issue,
given that other taxa also have zero branch lengths?
3) Is there any way of getting around this and including data on the
root taxon, or am I better off just dropping it (ultimately I want to
work with much larger trees (up to tens of thousands of taxa) where that
one piece of information will become relatively less important)

Any help very much appreciated,

Chris

I'm working with ape 1.8-2 in R 2.1.1 under ubuntu 'Breezy' linux
(unfortunately 2.1.1 is the latest easily available in breezy)

--
--------------------------------------------------------------
Dr Christopher Knight                      School of Chemistry
[hidden email]     The University of Manchester
Tel: extn 64414                               Faraday Building
     +44 (0)161 3064414                              PO Box 88
Fax: +44 (0)161 3064556                       Sackville Street
                                            Manchester M60 1QD
 ` · . ,,><(((°>                                            UK

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ape comparative analysis query

Simon Blomberg
Chris Knight wrote:
>
> This raises various questions:
>
> 1) Was I misleading myself that my independent contrasts were valid in
> the first place?
>  
I think partly. Independent contrasts were designed for data for tip
taxa only, not ancestral state data. One of the steps in the algorithm
requires a correction to account for the fact that ordinarily the values
for traits at nodes are estimated and not known, basically lengthening
the branch length to that node by a certain amount (See Felsenstein's
original paper). If you have known ancestral states, then you should not
make this correction.
> 2) What is it, if anything, about the root taxon that causes this issue,
> given that other taxa also have zero branch lengths?
>  
When converting a tree to a correlation matrix for use in GLS or GEE,
the Brownian motion assumption implies that the covariances among taxa
are represented by their shared branch length from the root, and the
height of each terminal taxon is equal to the variance of the trait for
that taxon. If the tree is ultrametric, then the variances are equal. In
your case, the root has zero covariance with each taxon, and zero
variance. Now, the error in the gls call is caused by a division by zero
in the calculation of the correlation matrix, because the root has zero
variance. try:

vcv.phylo(tree)
vcv.phylo(tree, cor=TRUE)

> 3) Is there any way of getting around this and including data on the
> root taxon, or am I better off just dropping it (ultimately I want to
> work with much larger trees (up to tens of thousands of taxa) where that
> one piece of information will become relatively less important)
>  
I'm surprised that you have known values for the root data. Are you sure
that the root taxon is not actually an outgroup? If so, it will have
some branch length (variance), and the model should fit OK.

HTH,

Simon.

> Any help very much appreciated,
>
> Chris
>
> I'm working with ape 1.8-2 in R 2.1.1 under ubuntu 'Breezy' linux
> (unfortunately 2.1.1 is the latest easily available in breezy)
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ape comparative analysis query

Chris Knight-4
Thank you very much, that makes things a lot clearer.

The issue with known roots and ancestors in my case is that these taxa
are actually molecules evolved from one another in the lab (with control
over the evolutionary process, though not the traits that emerge), so
the data on the root and intermediate taxa are as good as any of the
tips.

To follow up your points:

On Thu, 2006-05-11 at 12:02 +1000, Simon Blomberg wrote:

> > 1) Was I misleading myself that my independent contrasts were valid in
> > the first place?
> >  
> I think partly. Independent contrasts were designed for data for tip
> taxa only, not ancestral state data. One of the steps in the algorithm
> requires a correction to account for the fact that ordinarily the values
> for traits at nodes are estimated and not known, basically lengthening
> the branch length to that node by a certain amount (See Felsenstein's
> original paper). If you have known ancestral states, then you should not
> make this correction.

I don't think the the branch lengthening for unknown ancestors is a
problem here- if, for instance one's looking at a contrast between node
B2 and C1 in the tree (in my original email, also below), the branch to
C1 would be extended because it's an internal node, but the amounts it
should extend by (looking at Felsenstein '85) is ij/(i+j) where i and j
are the branch lengths coming from the node, which will be zero when, as
in this case, one of the branches from the node has length zero.

That's not to say that there's not some other issue- I was under the
impression (immediately I can only find it as an un-referenced assertion
in Halsey et al '06 Am Nat167:276-287) that when run very simply with a
Brownian model (gamma parameter of 1), gls should give the same result
as independent contrasts, which it doesn't seem to here.

> > 2) What is it, if anything, about the root taxon that causes this issue,
> > given that other taxa also have zero branch lengths?
> >  
> When converting a tree to a correlation matrix for use in GLS or GEE,
> the Brownian motion assumption implies that the covariances among taxa
> are represented by their shared branch length from the root, and the
> height of each terminal taxon is equal to the variance of the trait for
> that taxon. If the tree is ultrametric, then the variances are equal. In
> your case, the root has zero covariance with each taxon, and zero
> variance. Now, the error in the gls call is caused by a division by zero
> in the calculation of the correlation matrix, because the root has zero
> variance. try:
>
> vcv.phylo(tree)
> vcv.phylo(tree, cor=TRUE)
>

So, if I read that correctly, the model assumes that the expected
variance of the traits, not just their covariance, is directly
proportional to their distance from the root, so at the root there is no
variance so an undefined correlation. This seems a little odd. I guess I
could stop the variance in the root being zero by somehow adding a small
variance due to measurement error (which I could probably come up with a
number for)?

It also seems odd that I can't get round this by putting in a root edge:
vcv.phylo(tree)
gives exactly the same as:
tree$root.edge<-10
vcv.phylo(tree)

However, perhaps it would be better just to use a different model with a
more appropriate expected variance. Looking at corMartins(), which
depends on distance between taxa, not shared common history, it seems
the expected variance (i.e. where the distance between the taxa, tij is
zero) should equal the gamma parameter rather than the distance from the
root. That seems more sensible here, but I'm not convinced I want to
assume the sort of stabilising selection that was designed for (?).

> > 3) Is there any way of getting around this and including data on the
> > root taxon, or am I better off just dropping it (ultimately I want to
> > work with much larger trees (up to tens of thousands of taxa) where that
> > one piece of information will become relatively less important)
> >  
> I'm surprised that you have known values for the root data. Are you sure
> that the root taxon is not actually an outgroup? If so, it will have
> some branch length (variance), and the model should fit OK.
>

In this particular case that does suggest a way around- I have several
lineages with different starting molecules which are essentially
un-related- I guess I can just make one big tree with an unknown root
and analyse that (the branch lengths from the root will necessarily be a
little arbitrary, but I can probably define something justifiable).
However, perhaps I'd still do better with corMartins()? (I guess I can
try both and look at AIC or similar?)

Many thanks for your help,

Chris


>library(ape)
> tree<-read.tree(text="((B1:3,(((D1:5,C1:0):5,B2:0):4,(B3:3,B4:4):0):0):0,A1:0):0;")
> plot(tree)

--
--------------------------------------------------------------
Dr Christopher Knight                      School of Chemistry
[hidden email]     The University of Manchester
Tel: extn 64414                               Faraday Building
     +44 (0)161 3064414                              PO Box 88
Fax: +44 (0)161 3064556                       Sackville Street
                                            Manchester M60 1QD
 ` · . ,,><(((°>                                            UK

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Loading...