|
I've been comparing variables among objects (taxa) related by known
trees, using phylogentically independent contrasts in the ape package, and want to move on to more complex models e.g. by using gls with appropriate correlation terms. My trees contain lots of (hard) polytomies and information about ancestors, which I've been including- creating fully dichotomous trees by using zero branch lengths. For instance: library(ape) tree<-read.tree(text="((B1:3,(((D1:5,C1:0):5,B2:0):4,(B3:3,B4:4):0):0):0,A1:0):0;") plot(tree) Where all the B taxa arise in a true polytomy from A1 and B2 is the ancestor of C1 which is the ancestor of D1 I have information on all the taxa and can create independent contrasts for this: x<-c(B1=47,D1=43,C1=45,B2=50,B3=47,B4=48,A1=48) y<-c(B1=2.9,D1=5.4,C1=2.8,B2=3.5,B3=3.2,B4=3.5,A1=1.8) picx<-pic(x,phy=tree) picy<-pic(y,phy=tree) Which seems to make sense. However, if I try to use anything more sophisticated I run into problems: >dat<-as.data.frame(cbind(x,y)) >model1<-gls(y~x,data=dat, correlation=corBrownian(phy=tree)) Error in corFactor.corStruct(object) : NA/NaN/Inf in foreign function call (arg 1) or equally: > model2<-compar.gee(y~x,data=dat, phy=tree) [1] "Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27" [1] "running glm to get initial regression estimate" [1] 15.6000 -0.2625 Error in gee(y ~ x, c(1, 1, 1, 1, 1, 1, 1), data = list(x = c(47, 43, : NA/NaN/Inf in foreign function call (arg 14) (NB using the 'data=' argument seems to be necessary in the gls case- if I don't, there is a further problem: 'Row names in dataframe do not match tree tip names. data taken to be in the same order as in tree. in: Initialize.corPhyl(X[[1]], ...)') This seems to go away if I remove the root taxon (A1): tree2<-drop.tip(tree, "A1") x2<-c(B1=47,D1=43,C1=45,B2=50,B3=47,B4=48) y2<-c(B1=2.9,D1=5.4,C1=2.8,B2=3.5,B3=3.2,B4=3.5) dat2<-as.data.frame(cbind(x2,y2)) model2<-gls(y2~x2,data=dat2, correlation=corBrownian(phy=tree2)) This raises various questions: 1) Was I misleading myself that my independent contrasts were valid in the first place? 2) What is it, if anything, about the root taxon that causes this issue, given that other taxa also have zero branch lengths? 3) Is there any way of getting around this and including data on the root taxon, or am I better off just dropping it (ultimately I want to work with much larger trees (up to tens of thousands of taxa) where that one piece of information will become relatively less important) Any help very much appreciated, Chris I'm working with ape 1.8-2 in R 2.1.1 under ubuntu 'Breezy' linux (unfortunately 2.1.1 is the latest easily available in breezy) -- -------------------------------------------------------------- Dr Christopher Knight School of Chemistry [hidden email] The University of Manchester Tel: extn 64414 Faraday Building +44 (0)161 3064414 PO Box 88 Fax: +44 (0)161 3064556 Sackville Street Manchester M60 1QD ` · . ,,><(((°> UK ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html |
|
Chris Knight wrote:
> > This raises various questions: > > 1) Was I misleading myself that my independent contrasts were valid in > the first place? > I think partly. Independent contrasts were designed for data for tip taxa only, not ancestral state data. One of the steps in the algorithm requires a correction to account for the fact that ordinarily the values for traits at nodes are estimated and not known, basically lengthening the branch length to that node by a certain amount (See Felsenstein's original paper). If you have known ancestral states, then you should not make this correction. > 2) What is it, if anything, about the root taxon that causes this issue, > given that other taxa also have zero branch lengths? > When converting a tree to a correlation matrix for use in GLS or GEE, the Brownian motion assumption implies that the covariances among taxa are represented by their shared branch length from the root, and the height of each terminal taxon is equal to the variance of the trait for that taxon. If the tree is ultrametric, then the variances are equal. In your case, the root has zero covariance with each taxon, and zero variance. Now, the error in the gls call is caused by a division by zero in the calculation of the correlation matrix, because the root has zero variance. try: vcv.phylo(tree) vcv.phylo(tree, cor=TRUE) > 3) Is there any way of getting around this and including data on the > root taxon, or am I better off just dropping it (ultimately I want to > work with much larger trees (up to tens of thousands of taxa) where that > one piece of information will become relatively less important) > I'm surprised that you have known values for the root data. Are you sure that the root taxon is not actually an outgroup? If so, it will have some branch length (variance), and the model should fit OK. HTH, Simon. > Any help very much appreciated, > > Chris > > I'm working with ape 1.8-2 in R 2.1.1 under ubuntu 'Breezy' linux > (unfortunately 2.1.1 is the latest easily available in breezy) > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html |
|
Thank you very much, that makes things a lot clearer.
The issue with known roots and ancestors in my case is that these taxa are actually molecules evolved from one another in the lab (with control over the evolutionary process, though not the traits that emerge), so the data on the root and intermediate taxa are as good as any of the tips. To follow up your points: On Thu, 2006-05-11 at 12:02 +1000, Simon Blomberg wrote: > > 1) Was I misleading myself that my independent contrasts were valid in > > the first place? > > > I think partly. Independent contrasts were designed for data for tip > taxa only, not ancestral state data. One of the steps in the algorithm > requires a correction to account for the fact that ordinarily the values > for traits at nodes are estimated and not known, basically lengthening > the branch length to that node by a certain amount (See Felsenstein's > original paper). If you have known ancestral states, then you should not > make this correction. I don't think the the branch lengthening for unknown ancestors is a problem here- if, for instance one's looking at a contrast between node B2 and C1 in the tree (in my original email, also below), the branch to C1 would be extended because it's an internal node, but the amounts it should extend by (looking at Felsenstein '85) is ij/(i+j) where i and j are the branch lengths coming from the node, which will be zero when, as in this case, one of the branches from the node has length zero. That's not to say that there's not some other issue- I was under the impression (immediately I can only find it as an un-referenced assertion in Halsey et al '06 Am Nat167:276-287) that when run very simply with a Brownian model (gamma parameter of 1), gls should give the same result as independent contrasts, which it doesn't seem to here. > > 2) What is it, if anything, about the root taxon that causes this issue, > > given that other taxa also have zero branch lengths? > > > When converting a tree to a correlation matrix for use in GLS or GEE, > the Brownian motion assumption implies that the covariances among taxa > are represented by their shared branch length from the root, and the > height of each terminal taxon is equal to the variance of the trait for > that taxon. If the tree is ultrametric, then the variances are equal. In > your case, the root has zero covariance with each taxon, and zero > variance. Now, the error in the gls call is caused by a division by zero > in the calculation of the correlation matrix, because the root has zero > variance. try: > > vcv.phylo(tree) > vcv.phylo(tree, cor=TRUE) > So, if I read that correctly, the model assumes that the expected variance of the traits, not just their covariance, is directly proportional to their distance from the root, so at the root there is no variance so an undefined correlation. This seems a little odd. I guess I could stop the variance in the root being zero by somehow adding a small variance due to measurement error (which I could probably come up with a number for)? It also seems odd that I can't get round this by putting in a root edge: vcv.phylo(tree) gives exactly the same as: tree$root.edge<-10 vcv.phylo(tree) However, perhaps it would be better just to use a different model with a more appropriate expected variance. Looking at corMartins(), which depends on distance between taxa, not shared common history, it seems the expected variance (i.e. where the distance between the taxa, tij is zero) should equal the gamma parameter rather than the distance from the root. That seems more sensible here, but I'm not convinced I want to assume the sort of stabilising selection that was designed for (?). > > 3) Is there any way of getting around this and including data on the > > root taxon, or am I better off just dropping it (ultimately I want to > > work with much larger trees (up to tens of thousands of taxa) where that > > one piece of information will become relatively less important) > > > I'm surprised that you have known values for the root data. Are you sure > that the root taxon is not actually an outgroup? If so, it will have > some branch length (variance), and the model should fit OK. > In this particular case that does suggest a way around- I have several lineages with different starting molecules which are essentially un-related- I guess I can just make one big tree with an unknown root and analyse that (the branch lengths from the root will necessarily be a little arbitrary, but I can probably define something justifiable). However, perhaps I'd still do better with corMartins()? (I guess I can try both and look at AIC or similar?) Many thanks for your help, Chris >library(ape) > tree<-read.tree(text="((B1:3,(((D1:5,C1:0):5,B2:0):4,(B3:3,B4:4):0):0):0,A1:0):0;") > plot(tree) -- -------------------------------------------------------------- Dr Christopher Knight School of Chemistry [hidden email] The University of Manchester Tel: extn 64414 Faraday Building +44 (0)161 3064414 PO Box 88 Fax: +44 (0)161 3064556 Sackville Street Manchester M60 1QD ` · . ,,><(((°> UK ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html |
| Powered by Nabble | Edit this page |
