How to run Hutcheson t-test on R?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

How to run Hutcheson t-test on R?

Luigi
Hello,
is it possible to run the Hutcheson t-test
(https://www.sciencedirect.com/science/article/abs/pii/0022519370901244)
on R? How?

--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to run Hutcheson t-test on R?

Rolf Turner

On Mon, 7 Sep 2020 11:17:36 +0200
Luigi Marongiu <[hidden email]> wrote:

> Hello,
> is it possible to run the Hutcheson t-test
> (https://www.sciencedirect.com/science/article/abs/pii/0022519370901244)
> on R?

Almost surely.  With R, all things are possible. :-)

> How?

Program it up?

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to run Hutcheson t-test on R?

Bernard McGarvey
This website has an example calculation shown in Excel Which might help in programming it in R.

https://www.dataanalytics.org.uk/comparing-diversity/


Bernard
Sent from my iPhone so please excuse the spelling!"

> On Sep 7, 2020, at 6:17 PM, Rolf Turner <[hidden email]> wrote:
>
> 
>> On Mon, 7 Sep 2020 11:17:36 +0200
>> Luigi Marongiu <[hidden email]> wrote:
>>
>> Hello,
>> is it possible to run the Hutcheson t-test
>> (https://www.sciencedirect.com/science/article/abs/pii/0022519370901244)
>> on R?
>
> Almost surely.  With R, all things are possible. :-)
>
>> How?
>
> Program it up?
>
> cheers,
>
> Rolf Turner
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to run Hutcheson t-test on R?

David Winsemius
In reply to this post by Rolf Turner

On 9/7/20 3:17 PM, Rolf Turner wrote:

> On Mon, 7 Sep 2020 11:17:36 +0200
> Luigi Marongiu <[hidden email]> wrote:
>
>> Hello,
>> is it possible to run the Hutcheson t-test
>> (https://www.sciencedirect.com/science/article/abs/pii/0022519370901244)
>> on R?
> Almost surely.  With R, all things are possible. :-)
>
>> How?
> Program it up?


To Luigi;


Citing a 50 year-old paper that sits behind a paywall seems a bit
ineffective in getting coding support.

Seems this might be a more appropriate question on the R-SIG-ecology or
R-SIG-phylo mailing lists. (It would also have been appropriate to
indicate what sort of searching has been done. My efforts at searching
led me to the vegan package and this tutorial:
https://cran.r-project.org/web/packages/vegan/vignettes/diversity-vegan.pdf 
. It doesn't appear to have a Hutcheson t-test, but I'm guessing that is
because there are more modern and more sophisticated tests currently in
use.)


See: https://www.r-project.org/mail.html

--

David

>
> cheers,
>
> Rolf Turner
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to run Hutcheson t-test on R?

Luigi
I cited that paper to show what test I was referring to, I was hoping it
was already coded...
Thanks, I will look into vegan
Best regards

On Tue, 8 Sep 2020, 01:23 David Winsemius, <[hidden email]> wrote:

>
> On 9/7/20 3:17 PM, Rolf Turner wrote:
> > On Mon, 7 Sep 2020 11:17:36 +0200
> > Luigi Marongiu <[hidden email]> wrote:
> >
> >> Hello,
> >> is it possible to run the Hutcheson t-test
> >> (https://www.sciencedirect.com/science/article/abs/pii/0022519370901244
> )
> >> on R?
> > Almost surely.  With R, all things are possible. :-)
> >
> >> How?
> > Program it up?
>
>
> To Luigi;
>
>
> Citing a 50 year-old paper that sits behind a paywall seems a bit
> ineffective in getting coding support.
>
> Seems this might be a more appropriate question on the R-SIG-ecology or
> R-SIG-phylo mailing lists. (It would also have been appropriate to
> indicate what sort of searching has been done. My efforts at searching
> led me to the vegan package and this tutorial:
> https://cran.r-project.org/web/packages/vegan/vignettes/diversity-vegan.pdf
> . It doesn't appear to have a Hutcheson t-test, but I'm guessing that is
> because there are more modern and more sophisticated tests currently in
> use.)
>
>
> See: https://www.r-project.org/mail.html
>
> --
>
> David
>
> >
> > cheers,
> >
> > Rolf Turner
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to run Hutcheson t-test on R?

karl.schilling
In reply to this post by Luigi
Could it be that the test you are looking for is implemented in the
vegan package (function diversity(... index = "shannon" ...), and/or the
BiodiversityR package, function "diversityresult (..., index =
"Shannon",...)

best,
Karl Schilling

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to run Hutcheson t-test on R?

Rui Barradas
Hello,

No, it's not. That's the Shannon diversity index, the test the OP is
looking for is a t-test for Shannon diversity index equality. The index
itself is easy to code. A very simple example, based on ?vegan::diversity:


library(vegan)

data(BCI)
H <- diversity(BCI[1,])    # just first row

divers <- function(n){
   p <- n/sum(n)
   log_p <- numeric(length(n))
   log_p[n != 0] <- log(p[n != 0])
   -sum(p * log_p)
}
HRui <- divers(BCI[1,])

identical(H, HRui)
#[1] TRUE


The vegan function is more general, it applies this and other indices
calculations to a matrix or array.

The t-test doesn't seem difficult to code.
The variance formula in the paper and in the OP's posted link [1] are
not the same, the original has one more term, but the degrees of freedom
formula are the same. It all seems straightforward coding.

Luigi: Maybe later today I will have time but I am not making promises.


[1] https://www.dataanalytics.org.uk/comparing-diversity/


Hope this helps,

Rui Barradas


Às 12:35 de 08/09/20, Karl Schilling escreveu:

> Could it be that the test you are looking for is implemented in the
> vegan package (function diversity(... index = "shannon" ...), and/or the
> BiodiversityR package, function "diversityresult (..., index =
> "Shannon",...)
>
> best,
> Karl Schilling
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to run Hutcheson t-test on R?

karl.schilling
Maybe the following is a solution:

# load package needed
# QSutils is on Bioconductor
library(QSutils)

# here some exemplary data - these are the data from Pilou 1966 that are
used
# in the second example of Hutcheson, J theor Biol 129:151-154 (1970)

earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)

# here starts the code ; you may replace the variables "earlier" and "later"
# by your own numbers.

# calculate h, var(h) etc
h1 <- Shannon(earlier)
varh1 <- ShannonVar(earlier)
n1 <- sum (earlier)
h2 <- Shannon(later)
varh2 <- ShannonVar(later)
n2 <- sum (later)
degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)

# compare numbers with those in the paper
h1
h2
varh1
varh2

Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
that explains the minor numerical differences obtained with the code
above and the published variances.

# this is the actual t-test
t <- (h1-h2) /sqrt(varh1 + varh2)
p <- 2*pt(-abs(t),df= degfree)
p

that's it
Best
Karl




On 08.09.2020 16:55, Rui Barradas wrote:

> Hello,
>
> No, it's not. That's the Shannon diversity index, the test the OP is
> looking for is a t-test for Shannon diversity index equality. The index
> itself is easy to code. A very simple example, based on ?vegan::diversity:
>
>
> library(vegan)
>
> data(BCI)
> H <- diversity(BCI[1,])    # just first row
>
> divers <- function(n){
>    p <- n/sum(n)
>    log_p <- numeric(length(n))
>    log_p[n != 0] <- log(p[n != 0])
>    -sum(p * log_p)
> }
> HRui <- divers(BCI[1,])
>
> identical(H, HRui)
> #[1] TRUE
>
>
> The vegan function is more general, it applies this and other indices
> calculations to a matrix or array.
>
> The t-test doesn't seem difficult to code.
> The variance formula in the paper and in the OP's posted link [1] are
> not the same, the original has one more term, but the degrees of freedom
> formula are the same. It all seems straightforward coding.
>
> Luigi: Maybe later today I will have time but I am not making promises.
>
>
> [1] https://www.dataanalytics.org.uk/comparing-diversity/
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 12:35 de 08/09/20, Karl Schilling escreveu:
>> Could it be that the test you are looking for is implemented in the
>> vegan package (function diversity(... index = "shannon" ...), and/or
>> the BiodiversityR package, function "diversityresult (..., index =
>> "Shannon",...)
>>
>> best,
>> Karl Schilling

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Luigi
In reply to this post by Luigi
Thank you very much for the code, that was very helpful.
I got the article by Hutcheson -- I don't know if I can distribute it
, given the possible copyrights, or if I can attach it here -- but it
does not report numbers directly: it refers to a previous article
counting bird death on a telegraph each year. The numbers
are:
bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
bird_1959 <- c(0,0,14,59,26,68,0)
bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)

This for sake of the argument.
As for my problem, I implemented the Shannon index with the package
iNext, which only gives me the index itself and the 95% CI. Even when
I implemented it with vegan, I only got the index. Essentially I don't
have a count of species I could feed into the Hutcheson's. Is there a
way to extract these data? Or to run a Hutcheson's on the final index?
Thank you

On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
<[hidden email]> wrote:

>
> Dear Luigi,
>
> below some code I cobbled together based on the Hutcheson paper you
> mentioned. I was lucky to find code to calculate h and, importantly, its
> variance in the R-package QSutils - you may find it on the Bioconductor
> website.
>
> here is the code, along with an example. I also attach the code as an
> R-file.
>
> Hope that helps.
> All my best
>
> Karl
> PS don't forget to adjust for multiple testing if you compare more than
> two groups.
> K
>
>
> # load package needed
> # QSutils is on Bioconductor
> library(QSutils)
>
> # here some exemplary data - these are the data from Pilou 1966 that are
> used
> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
>
> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> # numbers of the first example used by Hutcheson were unfortunately not
> # available to me
>
> # here starts the code ; you may replace the variables "earlier" and "later"
> # by your own numbers.
>
> # calculate h, var(h) etc
> h1 <- Shannon(earlier)
> varh1 <- ShannonVar(earlier)
> n1 <- sum (earlier)
> h2 <- Shannon(later)
> varh2 <- ShannonVar(later)
> n2 <- sum (later)
> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
>
> # compare numbers with those in the paper
> h1
> h2
> varh1
> varh2
> # I assume that minor numerical differences are due to differences in the
> # numerical precision of computers in the early seventies and today / KS
>
> # this is the actual t-test
> t <- (h1-h2) /sqrt(varh1 + varh2)
> p <- 2*pt(-abs(t),df= degfree)
> p
>
> # that's it
> # Best
> # Karl
> --
> Karl Schilling, MD
> Professor of Anatomy and Cell Biology
> Anatomisches Institut
> Rheinische Friedrich-Wilhelms-Universität
> Nussallee 10
>
> D-53115 Bonn
> Germany
>
> phone ++49-228-73-2602
>


--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Luigi
Update:
I can see that you used the function Shannon from the package QSutils.
This would supplement the iNext package I used and solve the problem.
Thank you.

On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
<[hidden email]> wrote:

>
> Thank you very much for the code, that was very helpful.
> I got the article by Hutcheson -- I don't know if I can distribute it
> , given the possible copyrights, or if I can attach it here -- but it
> does not report numbers directly: it refers to a previous article
> counting bird death on a telegraph each year. The numbers
> are:
> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> bird_1959 <- c(0,0,14,59,26,68,0)
> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
>
> This for sake of the argument.
> As for my problem, I implemented the Shannon index with the package
> iNext, which only gives me the index itself and the 95% CI. Even when
> I implemented it with vegan, I only got the index. Essentially I don't
> have a count of species I could feed into the Hutcheson's. Is there a
> way to extract these data? Or to run a Hutcheson's on the final index?
> Thank you
>
> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
> <[hidden email]> wrote:
> >
> > Dear Luigi,
> >
> > below some code I cobbled together based on the Hutcheson paper you
> > mentioned. I was lucky to find code to calculate h and, importantly, its
> > variance in the R-package QSutils - you may find it on the Bioconductor
> > website.
> >
> > here is the code, along with an example. I also attach the code as an
> > R-file.
> >
> > Hope that helps.
> > All my best
> >
> > Karl
> > PS don't forget to adjust for multiple testing if you compare more than
> > two groups.
> > K
> >
> >
> > # load package needed
> > # QSutils is on Bioconductor
> > library(QSutils)
> >
> > # here some exemplary data - these are the data from Pilou 1966 that are
> > used
> > # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
> >
> > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > # numbers of the first example used by Hutcheson were unfortunately not
> > # available to me
> >
> > # here starts the code ; you may replace the variables "earlier" and "later"
> > # by your own numbers.
> >
> > # calculate h, var(h) etc
> > h1 <- Shannon(earlier)
> > varh1 <- ShannonVar(earlier)
> > n1 <- sum (earlier)
> > h2 <- Shannon(later)
> > varh2 <- ShannonVar(later)
> > n2 <- sum (later)
> > degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
> >
> > # compare numbers with those in the paper
> > h1
> > h2
> > varh1
> > varh2
> > # I assume that minor numerical differences are due to differences in the
> > # numerical precision of computers in the early seventies and today / KS
> >
> > # this is the actual t-test
> > t <- (h1-h2) /sqrt(varh1 + varh2)
> > p <- 2*pt(-abs(t),df= degfree)
> > p
> >
> > # that's it
> > # Best
> > # Karl
> > --
> > Karl Schilling, MD
> > Professor of Anatomy and Cell Biology
> > Anatomisches Institut
> > Rheinische Friedrich-Wilhelms-Universität
> > Nussallee 10
> >
> > D-53115 Bonn
> > Germany
> >
> > phone ++49-228-73-2602
> >
>
>
> --
> Best regards,
> Luigi



--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Rui Barradas
If you want a function automating Karl's code, here it is. It returns an
object of S3 class "htest", R's standard for hypothesis tests functions.
The returned object can then be subset in the usual ways, ht$statistic,
ht$parameter, ht$p.value, etc.


library(QSutils)

hutcheson.test <- function(x1, x2){
   dataname1 <- deparse(substitute(x1))
   dataname2 <- deparse(substitute(x2))
   method <- "Hutcheson's t-test for Shannon diversity equality"
   alternative <- "the diversities of the two samples are not equal"
   h1 <- Shannon(x1)
   varh1 <- ShannonVar(x1)
   n1 <- sum(x1)
   h2 <- Shannon(x2)
   varh2 <- ShannonVar(x2)
   n2 <- sum(x2)
   degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
   tstat <- (h1 - h2)/sqrt(varh1 + varh2)
   p.value <- 2*pt(-abs(tstat), df = degfree)
   ht <- list(
     statistic = c(t = tstat),
     parameter = c(df = degfree),
     p.value = p.value,
     alternative = alternative,
     method = method,
     data.name = paste(dataname1, dataname2, sep = ", ")
   )
   class(ht) <- "htest"
   ht
}

earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)

hutcheson.test(earlier, later)



With the data you provided:


bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
bird_1959 <- c(0,0,14,59,26,68,0)
bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)

hutcheson.test(bird_1956, bird_1957)




Note that like David said earlier, there might be better ways to
interpret Shannon's diversity index. If h is the sample's diversity,
exp(h) gives the number of equally-common species with equivalent
diversity.


s1 <- Shannon(earlier)
s2 <- Shannon(later)
c(earlier = s1, later = s2)
exp(c(earlier = s1, later = s2))   # Both round to 3
eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
Shannon(eq_common)                 # Slightly greater than the samples'
diversity


round(exp(sapply(birds, Shannon))) # Your data


#-------------------------------------


Earlier Karl wrote [1] that


Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
that explains the minor numerical differences obtained with the code
above and the published variances.


I don't believe the published variances were computed with the published
variance estimator. The code below computes the variances like QSutils
and with formula (4) in Hutcheson's paper. The latter does not give the
same results.

var_est <- function(n){
   s <- length(n)
   N <- sum(n)
   p <- n/N
   i <- p != 0
   inv.p <- numeric(s)
   inv.p[i] <- 1/p[i]
   log.p <- numeric(s)
   log.p[i] <- log(p[i])
   #
   term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
   term2 <- (s - 1)/(2*N^2)
   #
   numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p *
log.p)
   denom3 <- 6*N^3
   term3 <- numer3/denom3
   list(
     Bioc = term1 + term2,
     Hutch = term1 + term2 + term3
   )
}

Vh1 <- var_est(earlier)
Vh1
all.equal(ShannonVar(earlier), Vh1$Bioc)
ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31

Vh2 <- var_est(later)
Vh2
identical(ShannonVar(later), Vh2$Bioc)    # TRUE



[1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html


Hope this helps,

Rui Barradas


Às 09:38 de 10/09/20, Luigi Marongiu escreveu:

> Update:
> I can see that you used the function Shannon from the package QSutils.
> This would supplement the iNext package I used and solve the problem.
> Thank you.
>
> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
> <[hidden email]> wrote:
>>
>> Thank you very much for the code, that was very helpful.
>> I got the article by Hutcheson -- I don't know if I can distribute it
>> , given the possible copyrights, or if I can attach it here -- but it
>> does not report numbers directly: it refers to a previous article
>> counting bird death on a telegraph each year. The numbers
>> are:
>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
>> bird_1959 <- c(0,0,14,59,26,68,0)
>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
>>
>> This for sake of the argument.
>> As for my problem, I implemented the Shannon index with the package
>> iNext, which only gives me the index itself and the 95% CI. Even when
>> I implemented it with vegan, I only got the index. Essentially I don't
>> have a count of species I could feed into the Hutcheson's. Is there a
>> way to extract these data? Or to run a Hutcheson's on the final index?
>> Thank you
>>
>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
>> <[hidden email]> wrote:
>>>
>>> Dear Luigi,
>>>
>>> below some code I cobbled together based on the Hutcheson paper you
>>> mentioned. I was lucky to find code to calculate h and, importantly, its
>>> variance in the R-package QSutils - you may find it on the Bioconductor
>>> website.
>>>
>>> here is the code, along with an example. I also attach the code as an
>>> R-file.
>>>
>>> Hope that helps.
>>> All my best
>>>
>>> Karl
>>> PS don't forget to adjust for multiple testing if you compare more than
>>> two groups.
>>> K
>>>
>>>
>>> # load package needed
>>> # QSutils is on Bioconductor
>>> library(QSutils)
>>>
>>> # here some exemplary data - these are the data from Pilou 1966 that are
>>> used
>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
>>>
>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
>>> # numbers of the first example used by Hutcheson were unfortunately not
>>> # available to me
>>>
>>> # here starts the code ; you may replace the variables "earlier" and "later"
>>> # by your own numbers.
>>>
>>> # calculate h, var(h) etc
>>> h1 <- Shannon(earlier)
>>> varh1 <- ShannonVar(earlier)
>>> n1 <- sum (earlier)
>>> h2 <- Shannon(later)
>>> varh2 <- ShannonVar(later)
>>> n2 <- sum (later)
>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
>>>
>>> # compare numbers with those in the paper
>>> h1
>>> h2
>>> varh1
>>> varh2
>>> # I assume that minor numerical differences are due to differences in the
>>> # numerical precision of computers in the early seventies and today / KS
>>>
>>> # this is the actual t-test
>>> t <- (h1-h2) /sqrt(varh1 + varh2)
>>> p <- 2*pt(-abs(t),df= degfree)
>>> p
>>>
>>> # that's it
>>> # Best
>>> # Karl
>>> --
>>> Karl Schilling, MD
>>> Professor of Anatomy and Cell Biology
>>> Anatomisches Institut
>>> Rheinische Friedrich-Wilhelms-Universität
>>> Nussallee 10
>>>
>>> D-53115 Bonn
>>> Germany
>>>
>>> phone ++49-228-73-2602
>>>
>>
>>
>> --
>> Best regards,
>> Luigi
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Rui Barradas
Hello,

Sorry, there's an instruction missing. See inline.

Às 11:44 de 10/09/20, Rui Barradas escreveu:

> If you want a function automating Karl's code, here it is. It returns an
> object of S3 class "htest", R's standard for hypothesis tests functions.
> The returned object can then be subset in the usual ways, ht$statistic,
> ht$parameter, ht$p.value, etc.
>
>
> library(QSutils)
>
> hutcheson.test <- function(x1, x2){
>    dataname1 <- deparse(substitute(x1))
>    dataname2 <- deparse(substitute(x2))
>    method <- "Hutcheson's t-test for Shannon diversity equality"
>    alternative <- "the diversities of the two samples are not equal"
>    h1 <- Shannon(x1)
>    varh1 <- ShannonVar(x1)
>    n1 <- sum(x1)
>    h2 <- Shannon(x2)
>    varh2 <- ShannonVar(x2)
>    n2 <- sum(x2)
>    degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
>    tstat <- (h1 - h2)/sqrt(varh1 + varh2)
>    p.value <- 2*pt(-abs(tstat), df = degfree)
>    ht <- list(
>      statistic = c(t = tstat),
>      parameter = c(df = degfree),
>      p.value = p.value,
>      alternative = alternative,
>      method = method,
>      data.name = paste(dataname1, dataname2, sep = ", ")
>    )
>    class(ht) <- "htest"
>    ht
> }
>
> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
>
> hutcheson.test(earlier, later)
>
>
>
> With the data you provided:
>
>
> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> bird_1959 <- c(0,0,14,59,26,68,0)
> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
>
> hutcheson.test(bird_1956, bird_1957)
>
>
>
>
> Note that like David said earlier, there might be better ways to
> interpret Shannon's diversity index. If h is the sample's diversity,
> exp(h) gives the number of equally-common species with equivalent
> diversity.
>
>
> s1 <- Shannon(earlier)
> s2 <- Shannon(later)
> c(earlier = s1, later = s2)
> exp(c(earlier = s1, later = s2))   # Both round to 3
> eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
> Shannon(eq_common)                 # Slightly greater than the samples'
> diversity
>
>

# Create a list with all the data
birds <- mget(ls(pattern = "^bird"))

> round(exp(sapply(birds, Shannon))) # Your data


Hope this helps,

Rui Barradas

>
>
> #-------------------------------------
>
>
> Earlier Karl wrote [1] that
>
>
> Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
> that explains the minor numerical differences obtained with the code
> above and the published variances.
>
>
> I don't believe the published variances were computed with the published
> variance estimator. The code below computes the variances like QSutils
> and with formula (4) in Hutcheson's paper. The latter does not give the
> same results.
>
> var_est <- function(n){
>    s <- length(n)
>    N <- sum(n)
>    p <- n/N
>    i <- p != 0
>    inv.p <- numeric(s)
>    inv.p[i] <- 1/p[i]
>    log.p <- numeric(s)
>    log.p[i] <- log(p[i])
>    #
>    term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
>    term2 <- (s - 1)/(2*N^2)
>    #
>    numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p *
> log.p)
>    denom3 <- 6*N^3
>    term3 <- numer3/denom3
>    list(
>      Bioc = term1 + term2,
>      Hutch = term1 + term2 + term3
>    )
> }
>
> Vh1 <- var_est(earlier)
> Vh1
> all.equal(ShannonVar(earlier), Vh1$Bioc)
> ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31
>
> Vh2 <- var_est(later)
> Vh2
> identical(ShannonVar(later), Vh2$Bioc)    # TRUE
>
>
>
> [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 09:38 de 10/09/20, Luigi Marongiu escreveu:
>> Update:
>> I can see that you used the function Shannon from the package QSutils.
>> This would supplement the iNext package I used and solve the problem.
>> Thank you.
>>
>> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
>> <[hidden email]> wrote:
>>>
>>> Thank you very much for the code, that was very helpful.
>>> I got the article by Hutcheson -- I don't know if I can distribute it
>>> , given the possible copyrights, or if I can attach it here -- but it
>>> does not report numbers directly: it refers to a previous article
>>> counting bird death on a telegraph each year. The numbers
>>> are:
>>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
>>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
>>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
>>> bird_1959 <- c(0,0,14,59,26,68,0)
>>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
>>>
>>> This for sake of the argument.
>>> As for my problem, I implemented the Shannon index with the package
>>> iNext, which only gives me the index itself and the 95% CI. Even when
>>> I implemented it with vegan, I only got the index. Essentially I don't
>>> have a count of species I could feed into the Hutcheson's. Is there a
>>> way to extract these data? Or to run a Hutcheson's on the final index?
>>> Thank you
>>>
>>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
>>> <[hidden email]> wrote:
>>>>
>>>> Dear Luigi,
>>>>
>>>> below some code I cobbled together based on the Hutcheson paper you
>>>> mentioned. I was lucky to find code to calculate h and, importantly,
>>>> its
>>>> variance in the R-package QSutils - you may find it on the Bioconductor
>>>> website.
>>>>
>>>> here is the code, along with an example. I also attach the code as an
>>>> R-file.
>>>>
>>>> Hope that helps.
>>>> All my best
>>>>
>>>> Karl
>>>> PS don't forget to adjust for multiple testing if you compare more than
>>>> two groups.
>>>> K
>>>>
>>>>
>>>> # load package needed
>>>> # QSutils is on Bioconductor
>>>> library(QSutils)
>>>>
>>>> # here some exemplary data - these are the data from Pilou 1966 that
>>>> are
>>>> used
>>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
>>>>
>>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
>>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
>>>> # numbers of the first example used by Hutcheson were unfortunately not
>>>> # available to me
>>>>
>>>> # here starts the code ; you may replace the variables "earlier" and
>>>> "later"
>>>> # by your own numbers.
>>>>
>>>> # calculate h, var(h) etc
>>>> h1 <- Shannon(earlier)
>>>> varh1 <- ShannonVar(earlier)
>>>> n1 <- sum (earlier)
>>>> h2 <- Shannon(later)
>>>> varh2 <- ShannonVar(later)
>>>> n2 <- sum (later)
>>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
>>>>
>>>> # compare numbers with those in the paper
>>>> h1
>>>> h2
>>>> varh1
>>>> varh2
>>>> # I assume that minor numerical differences are due to differences
>>>> in the
>>>> # numerical precision of computers in the early seventies and today
>>>> / KS
>>>>
>>>> # this is the actual t-test
>>>> t <- (h1-h2) /sqrt(varh1 + varh2)
>>>> p <- 2*pt(-abs(t),df= degfree)
>>>> p
>>>>
>>>> # that's it
>>>> # Best
>>>> # Karl
>>>> --
>>>> Karl Schilling, MD
>>>> Professor of Anatomy and Cell Biology
>>>> Anatomisches Institut
>>>> Rheinische Friedrich-Wilhelms-Universität
>>>> Nussallee 10
>>>>
>>>> D-53115 Bonn
>>>> Germany
>>>>
>>>> phone ++49-228-73-2602
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Luigi
>>
>>
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Luigi
Hello,
thank you for the code. To explain better, when I used vegan, I did
not count the species directly but simply prepared a dataframe where,
for each species, I counted the number of samples bearing such
species:
```

> str(new_df)
'data.frame': 3 obs. of  46 variables:
 $ NC_001416 Enterobacteria phage lambda   : int  5 4 5
 $ NC_001623 Autographa californica nucl...: int  7 7 7
 $ NC_001895 Enterobacteria phage P2       : int  1 0 0
 $ NC_004745 Yersinia phage L-413C         : int  1 0 0
```
here the triplettes refer to healthy, tumor and metastasis. The outcome is:
```
# Shannon index
diversity(new_df)
#> Normal     Tumour     Metastasis
#> 2.520139   3.109512   1.890404
```
Using iNext, I provided a list of all the species counted in a samples
```
> new_list
$Healthy
 [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0

$Tumour
 [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1
1 2 1 1 1 1 1 1 1 0 0 0 0

$Metastasis
 [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 1 1 1 1
```
From this I get:
```
mod = iNEXT(new_list, q=0, datatype="abundance")
mod$AsyEst
#Site         Diversity Observed Estimator   s.e.    LCL     UCL
#1     Normal  Species richness   18.000    41.368 19.683 23.563 116.155
#2     Normal Shannon diversity   12.430    21.343  5.183 12.430  31.501
#4     Tumour  Species richness   30.000    94.776 42.936 49.848 241.396
#5     Tumour Shannon diversity   22.410    53.135 14.486 24.743  81.526
#7 Metastasis  Species richness   10.000    27.379 22.821 12.443 133.640
#8 Metastasis Shannon diversity    6.622     9.980  3.102  6.622  16.059
```
So here the Shannon index is 12 instead of 2.5...
Using Karl's function, I get:
```
# compute Shannon
norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1]))))
canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2]))))
meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3]))))
norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1]))))
canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2]))))
meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3]))))
norm_sum <- sum(array(as.numeric(unlist(new_list[1]))))
canc_sum <- sum(array(as.numeric(unlist(new_list[2]))))
meta_sum <- sum(array(as.numeric(unlist(new_list[3]))))
# compute Hutcheson
degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum +
canc_var**2 /canc_sum)
test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var)
(p <- 2*pt(-abs(test),df= degree))
> [1] 0.01825784
```
remarkably, the indices are the same as obtained by vegan:
```
> norm_sIdx
[1] 2.520139
> canc_sIdx
[1] 3.109512
> meta_sIdx
[1] 1.890404
```

I tried Rui's function but I got an error, so I wrote it as
```
hutcheson = function(A, B){
  # compute Shannon index, variance and sum of elements
  A_index <- Shannon(A)
  B_index <- Shannon(B)
  A_var <- ShannonVar(A)
  B_var <- ShannonVar(B)
  A_sum <- sum(A)
  B_sum <- sum(B)
  # compute Hutcheson
  DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
  test <- (A_index-B_index) /sqrt(A_var + B_var)
  p <- 2*pt(-abs(test),df= DF)
  # closure
  cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
index first group: ",
      round(A_index, 4), "\n\tShannon index second group: ", round(B_index, 4),
      "\n\tp-value : ", round(p, 4), "\n", sep = "")
  return(p)
}
```
and I got:
```

> n_t = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[2]))))
Hutcheson's t-test for Shannon diversity equality
Shannon index first group: 2.5201
Shannon index second group: 3.1095
p-value : 0.0183
> n_m = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[3]))))
Hutcheson's t-test for Shannon diversity equality
Shannon index first group: 2.5201
Shannon index second group: 1.8904
p-value : 0.0371
> t_m = hutcheson(array(as.numeric(unlist(new_list[2]))), array(as.numeric(unlist(new_list[3]))))
Hutcheson's t-test for Shannon diversity equality
Shannon index first group: 3.1095
Shannon index second group: 1.8904
p-value : 0
```
new_list[1]|[2]|[3] refer to healthy, tumor and metastasis. applied to
the original Hutcheson data:
```
> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> hutcheson(bird_1956, bird_1957)
Hutcheson's t-test for Shannon diversity equality
Shannon index first group: 1.8429
Shannon index second group: 1.0689
p-value : 0

```
This is to compare two groups at the time. I'll probably have to
compensate for multiple testing...
But if this all OK, then the case is closed.
Thank you

On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <[hidden email]> wrote:

>
> Hello,
>
> Sorry, there's an instruction missing. See inline.
>
> Às 11:44 de 10/09/20, Rui Barradas escreveu:
> > If you want a function automating Karl's code, here it is. It returns an
> > object of S3 class "htest", R's standard for hypothesis tests functions.
> > The returned object can then be subset in the usual ways, ht$statistic,
> > ht$parameter, ht$p.value, etc.
> >
> >
> > library(QSutils)
> >
> > hutcheson.test <- function(x1, x2){
> >    dataname1 <- deparse(substitute(x1))
> >    dataname2 <- deparse(substitute(x2))
> >    method <- "Hutcheson's t-test for Shannon diversity equality"
> >    alternative <- "the diversities of the two samples are not equal"
> >    h1 <- Shannon(x1)
> >    varh1 <- ShannonVar(x1)
> >    n1 <- sum(x1)
> >    h2 <- Shannon(x2)
> >    varh2 <- ShannonVar(x2)
> >    n2 <- sum(x2)
> >    degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
> >    tstat <- (h1 - h2)/sqrt(varh1 + varh2)
> >    p.value <- 2*pt(-abs(tstat), df = degfree)
> >    ht <- list(
> >      statistic = c(t = tstat),
> >      parameter = c(df = degfree),
> >      p.value = p.value,
> >      alternative = alternative,
> >      method = method,
> >      data.name = paste(dataname1, dataname2, sep = ", ")
> >    )
> >    class(ht) <- "htest"
> >    ht
> > }
> >
> > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> >
> > hutcheson.test(earlier, later)
> >
> >
> >
> > With the data you provided:
> >
> >
> > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > bird_1959 <- c(0,0,14,59,26,68,0)
> > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> >
> > hutcheson.test(bird_1956, bird_1957)
> >
> >
> >
> >
> > Note that like David said earlier, there might be better ways to
> > interpret Shannon's diversity index. If h is the sample's diversity,
> > exp(h) gives the number of equally-common species with equivalent
> > diversity.
> >
> >
> > s1 <- Shannon(earlier)
> > s2 <- Shannon(later)
> > c(earlier = s1, later = s2)
> > exp(c(earlier = s1, later = s2))   # Both round to 3
> > eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
> > Shannon(eq_common)                 # Slightly greater than the samples'
> > diversity
> >
> >
>
> # Create a list with all the data
> birds <- mget(ls(pattern = "^bird"))
>
> > round(exp(sapply(birds, Shannon))) # Your data
>
>
> Hope this helps,
>
> Rui Barradas
>
> >
> >
> > #-------------------------------------
> >
> >
> > Earlier Karl wrote [1] that
> >
> >
> > Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
> > that explains the minor numerical differences obtained with the code
> > above and the published variances.
> >
> >
> > I don't believe the published variances were computed with the published
> > variance estimator. The code below computes the variances like QSutils
> > and with formula (4) in Hutcheson's paper. The latter does not give the
> > same results.
> >
> > var_est <- function(n){
> >    s <- length(n)
> >    N <- sum(n)
> >    p <- n/N
> >    i <- p != 0
> >    inv.p <- numeric(s)
> >    inv.p[i] <- 1/p[i]
> >    log.p <- numeric(s)
> >    log.p[i] <- log(p[i])
> >    #
> >    term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
> >    term2 <- (s - 1)/(2*N^2)
> >    #
> >    numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p *
> > log.p)
> >    denom3 <- 6*N^3
> >    term3 <- numer3/denom3
> >    list(
> >      Bioc = term1 + term2,
> >      Hutch = term1 + term2 + term3
> >    )
> > }
> >
> > Vh1 <- var_est(earlier)
> > Vh1
> > all.equal(ShannonVar(earlier), Vh1$Bioc)
> > ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31
> >
> > Vh2 <- var_est(later)
> > Vh2
> > identical(ShannonVar(later), Vh2$Bioc)    # TRUE
> >
> >
> >
> > [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > Às 09:38 de 10/09/20, Luigi Marongiu escreveu:
> >> Update:
> >> I can see that you used the function Shannon from the package QSutils.
> >> This would supplement the iNext package I used and solve the problem.
> >> Thank you.
> >>
> >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
> >> <[hidden email]> wrote:
> >>>
> >>> Thank you very much for the code, that was very helpful.
> >>> I got the article by Hutcheson -- I don't know if I can distribute it
> >>> , given the possible copyrights, or if I can attach it here -- but it
> >>> does not report numbers directly: it refers to a previous article
> >>> counting bird death on a telegraph each year. The numbers
> >>> are:
> >>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> >>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> >>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> >>> bird_1959 <- c(0,0,14,59,26,68,0)
> >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> >>>
> >>> This for sake of the argument.
> >>> As for my problem, I implemented the Shannon index with the package
> >>> iNext, which only gives me the index itself and the 95% CI. Even when
> >>> I implemented it with vegan, I only got the index. Essentially I don't
> >>> have a count of species I could feed into the Hutcheson's. Is there a
> >>> way to extract these data? Or to run a Hutcheson's on the final index?
> >>> Thank you
> >>>
> >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
> >>> <[hidden email]> wrote:
> >>>>
> >>>> Dear Luigi,
> >>>>
> >>>> below some code I cobbled together based on the Hutcheson paper you
> >>>> mentioned. I was lucky to find code to calculate h and, importantly,
> >>>> its
> >>>> variance in the R-package QSutils - you may find it on the Bioconductor
> >>>> website.
> >>>>
> >>>> here is the code, along with an example. I also attach the code as an
> >>>> R-file.
> >>>>
> >>>> Hope that helps.
> >>>> All my best
> >>>>
> >>>> Karl
> >>>> PS don't forget to adjust for multiple testing if you compare more than
> >>>> two groups.
> >>>> K
> >>>>
> >>>>
> >>>> # load package needed
> >>>> # QSutils is on Bioconductor
> >>>> library(QSutils)
> >>>>
> >>>> # here some exemplary data - these are the data from Pilou 1966 that
> >>>> are
> >>>> used
> >>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
> >>>>
> >>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> >>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> >>>> # numbers of the first example used by Hutcheson were unfortunately not
> >>>> # available to me
> >>>>
> >>>> # here starts the code ; you may replace the variables "earlier" and
> >>>> "later"
> >>>> # by your own numbers.
> >>>>
> >>>> # calculate h, var(h) etc
> >>>> h1 <- Shannon(earlier)
> >>>> varh1 <- ShannonVar(earlier)
> >>>> n1 <- sum (earlier)
> >>>> h2 <- Shannon(later)
> >>>> varh2 <- ShannonVar(later)
> >>>> n2 <- sum (later)
> >>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
> >>>>
> >>>> # compare numbers with those in the paper
> >>>> h1
> >>>> h2
> >>>> varh1
> >>>> varh2
> >>>> # I assume that minor numerical differences are due to differences
> >>>> in the
> >>>> # numerical precision of computers in the early seventies and today
> >>>> / KS
> >>>>
> >>>> # this is the actual t-test
> >>>> t <- (h1-h2) /sqrt(varh1 + varh2)
> >>>> p <- 2*pt(-abs(t),df= degfree)
> >>>> p
> >>>>
> >>>> # that's it
> >>>> # Best
> >>>> # Karl
> >>>> --
> >>>> Karl Schilling, MD
> >>>> Professor of Anatomy and Cell Biology
> >>>> Anatomisches Institut
> >>>> Rheinische Friedrich-Wilhelms-Universität
> >>>> Nussallee 10
> >>>>
> >>>> D-53115 Bonn
> >>>> Germany
> >>>>
> >>>> phone ++49-228-73-2602
> >>>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Luigi
> >>
> >>
> >>
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Luigi
Update:
I also added the confidence interval for the Shannon index:
```
#! Hutcheson's t-test for Shannon diversity equality
# thanks to Karl Schilling and Rui Barradas
hutcheson = function(A, B){
  # compute Shannon index, variance and sum of elements
  A_index <- Shannon(A)
  B_index <- Shannon(B)
  A_var <- ShannonVar(A)
  B_var <- ShannonVar(B)
  A_sum <- sum(A)
  B_sum <- sum(B)
  # compute Hutcheson
  DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
  test <- (A_index-B_index) /sqrt(A_var + B_var)
  p <- 2*pt(-abs(test),df= DF)
  if (p < 0.001) {
    P = "<0.001"
  } else {
    P = round(p, 3)
  }
  if (p < 0.001) {
    S = "***"
  } else if (p < 0.01) {
    S = "**"
  } else if (p < 0.05) {
    S = "*"
  } else {
    S = ""
  }
  # closure
  cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
index first group: \t",
      round(A_index, 3), " (", round((A_index-2*A_var),3), "-",
round((A_index+2*A_var),3),
      ")\n\tShannon index second group: \t",
      round(B_index, 3), " (", round((B_index-2*B_var),3), "-",
round((B_index+2*B_var),3),
      ")\n\tp-value: ", P, " ", S, "\n", sep = "")
  return(p)
}
```

On Thu, Sep 10, 2020 at 2:10 PM Luigi Marongiu <[hidden email]> wrote:

>
> Hello,
> thank you for the code. To explain better, when I used vegan, I did
> not count the species directly but simply prepared a dataframe where,
> for each species, I counted the number of samples bearing such
> species:
> ```
>
> > str(new_df)
> 'data.frame': 3 obs. of  46 variables:
>  $ NC_001416 Enterobacteria phage lambda   : int  5 4 5
>  $ NC_001623 Autographa californica nucl...: int  7 7 7
>  $ NC_001895 Enterobacteria phage P2       : int  1 0 0
>  $ NC_004745 Yersinia phage L-413C         : int  1 0 0
> ```
> here the triplettes refer to healthy, tumor and metastasis. The outcome is:
> ```
> # Shannon index
> diversity(new_df)
> #> Normal     Tumour     Metastasis
> #> 2.520139   3.109512   1.890404
> ```
> Using iNext, I provided a list of all the species counted in a samples
> ```
> > new_list
> $Healthy
>  [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> $Tumour
>  [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1
> 1 2 1 1 1 1 1 1 1 0 0 0 0
>
> $Metastasis
>  [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 1 0 0 1 1 1 1
> ```
> From this I get:
> ```
> mod = iNEXT(new_list, q=0, datatype="abundance")
> mod$AsyEst
> #Site         Diversity Observed Estimator   s.e.    LCL     UCL
> #1     Normal  Species richness   18.000    41.368 19.683 23.563 116.155
> #2     Normal Shannon diversity   12.430    21.343  5.183 12.430  31.501
> #4     Tumour  Species richness   30.000    94.776 42.936 49.848 241.396
> #5     Tumour Shannon diversity   22.410    53.135 14.486 24.743  81.526
> #7 Metastasis  Species richness   10.000    27.379 22.821 12.443 133.640
> #8 Metastasis Shannon diversity    6.622     9.980  3.102  6.622  16.059
> ```
> So here the Shannon index is 12 instead of 2.5...
> Using Karl's function, I get:
> ```
> # compute Shannon
> norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1]))))
> canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2]))))
> meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3]))))
> norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1]))))
> canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2]))))
> meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3]))))
> norm_sum <- sum(array(as.numeric(unlist(new_list[1]))))
> canc_sum <- sum(array(as.numeric(unlist(new_list[2]))))
> meta_sum <- sum(array(as.numeric(unlist(new_list[3]))))
> # compute Hutcheson
> degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum +
> canc_var**2 /canc_sum)
> test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var)
> (p <- 2*pt(-abs(test),df= degree))
> > [1] 0.01825784
> ```
> remarkably, the indices are the same as obtained by vegan:
> ```
> > norm_sIdx
> [1] 2.520139
> > canc_sIdx
> [1] 3.109512
> > meta_sIdx
> [1] 1.890404
> ```
>
> I tried Rui's function but I got an error, so I wrote it as
> ```
> hutcheson = function(A, B){
>   # compute Shannon index, variance and sum of elements
>   A_index <- Shannon(A)
>   B_index <- Shannon(B)
>   A_var <- ShannonVar(A)
>   B_var <- ShannonVar(B)
>   A_sum <- sum(A)
>   B_sum <- sum(B)
>   # compute Hutcheson
>   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
>   test <- (A_index-B_index) /sqrt(A_var + B_var)
>   p <- 2*pt(-abs(test),df= DF)
>   # closure
>   cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
> index first group: ",
>       round(A_index, 4), "\n\tShannon index second group: ", round(B_index, 4),
>       "\n\tp-value : ", round(p, 4), "\n", sep = "")
>   return(p)
> }
> ```
> and I got:
> ```
>
> > n_t = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[2]))))
> Hutcheson's t-test for Shannon diversity equality
> Shannon index first group: 2.5201
> Shannon index second group: 3.1095
> p-value : 0.0183
> > n_m = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[3]))))
> Hutcheson's t-test for Shannon diversity equality
> Shannon index first group: 2.5201
> Shannon index second group: 1.8904
> p-value : 0.0371
> > t_m = hutcheson(array(as.numeric(unlist(new_list[2]))), array(as.numeric(unlist(new_list[3]))))
> Hutcheson's t-test for Shannon diversity equality
> Shannon index first group: 3.1095
> Shannon index second group: 1.8904
> p-value : 0
> ```
> new_list[1]|[2]|[3] refer to healthy, tumor and metastasis. applied to
> the original Hutcheson data:
> ```
> > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > hutcheson(bird_1956, bird_1957)
> Hutcheson's t-test for Shannon diversity equality
> Shannon index first group: 1.8429
> Shannon index second group: 1.0689
> p-value : 0
>
> ```
> This is to compare two groups at the time. I'll probably have to
> compensate for multiple testing...
> But if this all OK, then the case is closed.
> Thank you
>
> On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <[hidden email]> wrote:
> >
> > Hello,
> >
> > Sorry, there's an instruction missing. See inline.
> >
> > Às 11:44 de 10/09/20, Rui Barradas escreveu:
> > > If you want a function automating Karl's code, here it is. It returns an
> > > object of S3 class "htest", R's standard for hypothesis tests functions.
> > > The returned object can then be subset in the usual ways, ht$statistic,
> > > ht$parameter, ht$p.value, etc.
> > >
> > >
> > > library(QSutils)
> > >
> > > hutcheson.test <- function(x1, x2){
> > >    dataname1 <- deparse(substitute(x1))
> > >    dataname2 <- deparse(substitute(x2))
> > >    method <- "Hutcheson's t-test for Shannon diversity equality"
> > >    alternative <- "the diversities of the two samples are not equal"
> > >    h1 <- Shannon(x1)
> > >    varh1 <- ShannonVar(x1)
> > >    n1 <- sum(x1)
> > >    h2 <- Shannon(x2)
> > >    varh2 <- ShannonVar(x2)
> > >    n2 <- sum(x2)
> > >    degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
> > >    tstat <- (h1 - h2)/sqrt(varh1 + varh2)
> > >    p.value <- 2*pt(-abs(tstat), df = degfree)
> > >    ht <- list(
> > >      statistic = c(t = tstat),
> > >      parameter = c(df = degfree),
> > >      p.value = p.value,
> > >      alternative = alternative,
> > >      method = method,
> > >      data.name = paste(dataname1, dataname2, sep = ", ")
> > >    )
> > >    class(ht) <- "htest"
> > >    ht
> > > }
> > >
> > > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > >
> > > hutcheson.test(earlier, later)
> > >
> > >
> > >
> > > With the data you provided:
> > >
> > >
> > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > > bird_1959 <- c(0,0,14,59,26,68,0)
> > > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > >
> > > hutcheson.test(bird_1956, bird_1957)
> > >
> > >
> > >
> > >
> > > Note that like David said earlier, there might be better ways to
> > > interpret Shannon's diversity index. If h is the sample's diversity,
> > > exp(h) gives the number of equally-common species with equivalent
> > > diversity.
> > >
> > >
> > > s1 <- Shannon(earlier)
> > > s2 <- Shannon(later)
> > > c(earlier = s1, later = s2)
> > > exp(c(earlier = s1, later = s2))   # Both round to 3
> > > eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
> > > Shannon(eq_common)                 # Slightly greater than the samples'
> > > diversity
> > >
> > >
> >
> > # Create a list with all the data
> > birds <- mget(ls(pattern = "^bird"))
> >
> > > round(exp(sapply(birds, Shannon))) # Your data
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > >
> > >
> > > #-------------------------------------
> > >
> > >
> > > Earlier Karl wrote [1] that
> > >
> > >
> > > Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
> > > that explains the minor numerical differences obtained with the code
> > > above and the published variances.
> > >
> > >
> > > I don't believe the published variances were computed with the published
> > > variance estimator. The code below computes the variances like QSutils
> > > and with formula (4) in Hutcheson's paper. The latter does not give the
> > > same results.
> > >
> > > var_est <- function(n){
> > >    s <- length(n)
> > >    N <- sum(n)
> > >    p <- n/N
> > >    i <- p != 0
> > >    inv.p <- numeric(s)
> > >    inv.p[i] <- 1/p[i]
> > >    log.p <- numeric(s)
> > >    log.p[i] <- log(p[i])
> > >    #
> > >    term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
> > >    term2 <- (s - 1)/(2*N^2)
> > >    #
> > >    numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p *
> > > log.p)
> > >    denom3 <- 6*N^3
> > >    term3 <- numer3/denom3
> > >    list(
> > >      Bioc = term1 + term2,
> > >      Hutch = term1 + term2 + term3
> > >    )
> > > }
> > >
> > > Vh1 <- var_est(earlier)
> > > Vh1
> > > all.equal(ShannonVar(earlier), Vh1$Bioc)
> > > ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31
> > >
> > > Vh2 <- var_est(later)
> > > Vh2
> > > identical(ShannonVar(later), Vh2$Bioc)    # TRUE
> > >
> > >
> > >
> > > [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > >
> > > Às 09:38 de 10/09/20, Luigi Marongiu escreveu:
> > >> Update:
> > >> I can see that you used the function Shannon from the package QSutils.
> > >> This would supplement the iNext package I used and solve the problem.
> > >> Thank you.
> > >>
> > >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
> > >> <[hidden email]> wrote:
> > >>>
> > >>> Thank you very much for the code, that was very helpful.
> > >>> I got the article by Hutcheson -- I don't know if I can distribute it
> > >>> , given the possible copyrights, or if I can attach it here -- but it
> > >>> does not report numbers directly: it refers to a previous article
> > >>> counting bird death on a telegraph each year. The numbers
> > >>> are:
> > >>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > >>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > >>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > >>> bird_1959 <- c(0,0,14,59,26,68,0)
> > >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > >>>
> > >>> This for sake of the argument.
> > >>> As for my problem, I implemented the Shannon index with the package
> > >>> iNext, which only gives me the index itself and the 95% CI. Even when
> > >>> I implemented it with vegan, I only got the index. Essentially I don't
> > >>> have a count of species I could feed into the Hutcheson's. Is there a
> > >>> way to extract these data? Or to run a Hutcheson's on the final index?
> > >>> Thank you
> > >>>
> > >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
> > >>> <[hidden email]> wrote:
> > >>>>
> > >>>> Dear Luigi,
> > >>>>
> > >>>> below some code I cobbled together based on the Hutcheson paper you
> > >>>> mentioned. I was lucky to find code to calculate h and, importantly,
> > >>>> its
> > >>>> variance in the R-package QSutils - you may find it on the Bioconductor
> > >>>> website.
> > >>>>
> > >>>> here is the code, along with an example. I also attach the code as an
> > >>>> R-file.
> > >>>>
> > >>>> Hope that helps.
> > >>>> All my best
> > >>>>
> > >>>> Karl
> > >>>> PS don't forget to adjust for multiple testing if you compare more than
> > >>>> two groups.
> > >>>> K
> > >>>>
> > >>>>
> > >>>> # load package needed
> > >>>> # QSutils is on Bioconductor
> > >>>> library(QSutils)
> > >>>>
> > >>>> # here some exemplary data - these are the data from Pilou 1966 that
> > >>>> are
> > >>>> used
> > >>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
> > >>>>
> > >>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > >>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > >>>> # numbers of the first example used by Hutcheson were unfortunately not
> > >>>> # available to me
> > >>>>
> > >>>> # here starts the code ; you may replace the variables "earlier" and
> > >>>> "later"
> > >>>> # by your own numbers.
> > >>>>
> > >>>> # calculate h, var(h) etc
> > >>>> h1 <- Shannon(earlier)
> > >>>> varh1 <- ShannonVar(earlier)
> > >>>> n1 <- sum (earlier)
> > >>>> h2 <- Shannon(later)
> > >>>> varh2 <- ShannonVar(later)
> > >>>> n2 <- sum (later)
> > >>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
> > >>>>
> > >>>> # compare numbers with those in the paper
> > >>>> h1
> > >>>> h2
> > >>>> varh1
> > >>>> varh2
> > >>>> # I assume that minor numerical differences are due to differences
> > >>>> in the
> > >>>> # numerical precision of computers in the early seventies and today
> > >>>> / KS
> > >>>>
> > >>>> # this is the actual t-test
> > >>>> t <- (h1-h2) /sqrt(varh1 + varh2)
> > >>>> p <- 2*pt(-abs(t),df= degfree)
> > >>>> p
> > >>>>
> > >>>> # that's it
> > >>>> # Best
> > >>>> # Karl
> > >>>> --
> > >>>> Karl Schilling, MD
> > >>>> Professor of Anatomy and Cell Biology
> > >>>> Anatomisches Institut
> > >>>> Rheinische Friedrich-Wilhelms-Universität
> > >>>> Nussallee 10
> > >>>>
> > >>>> D-53115 Bonn
> > >>>> Germany
> > >>>>
> > >>>> phone ++49-228-73-2602
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Luigi
> > >>
> > >>
> > >>
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Best regards,
> Luigi



--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Luigi
Hello,
I have just realized in the original paper, the t test is defined as:
`t = h1-h2 -(µ1µ2)/(var1-var2)^1/2`. is the term -(µ1µ2) missing in
your formula? How to calculate µ1µ2?
Thank you

On Thu, Sep 10, 2020 at 2:41 PM Luigi Marongiu <[hidden email]> wrote:

>
> Update:
> I also added the confidence interval for the Shannon index:
> ```
> #! Hutcheson's t-test for Shannon diversity equality
> # thanks to Karl Schilling and Rui Barradas
> hutcheson = function(A, B){
>   # compute Shannon index, variance and sum of elements
>   A_index <- Shannon(A)
>   B_index <- Shannon(B)
>   A_var <- ShannonVar(A)
>   B_var <- ShannonVar(B)
>   A_sum <- sum(A)
>   B_sum <- sum(B)
>   # compute Hutcheson
>   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
>   test <- (A_index-B_index) /sqrt(A_var + B_var)
>   p <- 2*pt(-abs(test),df= DF)
>   if (p < 0.001) {
>     P = "<0.001"
>   } else {
>     P = round(p, 3)
>   }
>   if (p < 0.001) {
>     S = "***"
>   } else if (p < 0.01) {
>     S = "**"
>   } else if (p < 0.05) {
>     S = "*"
>   } else {
>     S = ""
>   }
>   # closure
>   cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
> index first group: \t",
>       round(A_index, 3), " (", round((A_index-2*A_var),3), "-",
> round((A_index+2*A_var),3),
>       ")\n\tShannon index second group: \t",
>       round(B_index, 3), " (", round((B_index-2*B_var),3), "-",
> round((B_index+2*B_var),3),
>       ")\n\tp-value: ", P, " ", S, "\n", sep = "")
>   return(p)
> }
> ```
>
> On Thu, Sep 10, 2020 at 2:10 PM Luigi Marongiu <[hidden email]> wrote:
> >
> > Hello,
> > thank you for the code. To explain better, when I used vegan, I did
> > not count the species directly but simply prepared a dataframe where,
> > for each species, I counted the number of samples bearing such
> > species:
> > ```
> >
> > > str(new_df)
> > 'data.frame': 3 obs. of  46 variables:
> >  $ NC_001416 Enterobacteria phage lambda   : int  5 4 5
> >  $ NC_001623 Autographa californica nucl...: int  7 7 7
> >  $ NC_001895 Enterobacteria phage P2       : int  1 0 0
> >  $ NC_004745 Yersinia phage L-413C         : int  1 0 0
> > ```
> > here the triplettes refer to healthy, tumor and metastasis. The outcome is:
> > ```
> > # Shannon index
> > diversity(new_df)
> > #> Normal     Tumour     Metastasis
> > #> 2.520139   3.109512   1.890404
> > ```
> > Using iNext, I provided a list of all the species counted in a samples
> > ```
> > > new_list
> > $Healthy
> >  [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0
> >
> > $Tumour
> >  [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1
> > 1 2 1 1 1 1 1 1 1 0 0 0 0
> >
> > $Metastasis
> >  [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 1 0 0 1 1 1 1
> > ```
> > From this I get:
> > ```
> > mod = iNEXT(new_list, q=0, datatype="abundance")
> > mod$AsyEst
> > #Site         Diversity Observed Estimator   s.e.    LCL     UCL
> > #1     Normal  Species richness   18.000    41.368 19.683 23.563 116.155
> > #2     Normal Shannon diversity   12.430    21.343  5.183 12.430  31.501
> > #4     Tumour  Species richness   30.000    94.776 42.936 49.848 241.396
> > #5     Tumour Shannon diversity   22.410    53.135 14.486 24.743  81.526
> > #7 Metastasis  Species richness   10.000    27.379 22.821 12.443 133.640
> > #8 Metastasis Shannon diversity    6.622     9.980  3.102  6.622  16.059
> > ```
> > So here the Shannon index is 12 instead of 2.5...
> > Using Karl's function, I get:
> > ```
> > # compute Shannon
> > norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1]))))
> > canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2]))))
> > meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3]))))
> > norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1]))))
> > canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2]))))
> > meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3]))))
> > norm_sum <- sum(array(as.numeric(unlist(new_list[1]))))
> > canc_sum <- sum(array(as.numeric(unlist(new_list[2]))))
> > meta_sum <- sum(array(as.numeric(unlist(new_list[3]))))
> > # compute Hutcheson
> > degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum +
> > canc_var**2 /canc_sum)
> > test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var)
> > (p <- 2*pt(-abs(test),df= degree))
> > > [1] 0.01825784
> > ```
> > remarkably, the indices are the same as obtained by vegan:
> > ```
> > > norm_sIdx
> > [1] 2.520139
> > > canc_sIdx
> > [1] 3.109512
> > > meta_sIdx
> > [1] 1.890404
> > ```
> >
> > I tried Rui's function but I got an error, so I wrote it as
> > ```
> > hutcheson = function(A, B){
> >   # compute Shannon index, variance and sum of elements
> >   A_index <- Shannon(A)
> >   B_index <- Shannon(B)
> >   A_var <- ShannonVar(A)
> >   B_var <- ShannonVar(B)
> >   A_sum <- sum(A)
> >   B_sum <- sum(B)
> >   # compute Hutcheson
> >   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
> >   test <- (A_index-B_index) /sqrt(A_var + B_var)
> >   p <- 2*pt(-abs(test),df= DF)
> >   # closure
> >   cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
> > index first group: ",
> >       round(A_index, 4), "\n\tShannon index second group: ", round(B_index, 4),
> >       "\n\tp-value : ", round(p, 4), "\n", sep = "")
> >   return(p)
> > }
> > ```
> > and I got:
> > ```
> >
> > > n_t = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[2]))))
> > Hutcheson's t-test for Shannon diversity equality
> > Shannon index first group: 2.5201
> > Shannon index second group: 3.1095
> > p-value : 0.0183
> > > n_m = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[3]))))
> > Hutcheson's t-test for Shannon diversity equality
> > Shannon index first group: 2.5201
> > Shannon index second group: 1.8904
> > p-value : 0.0371
> > > t_m = hutcheson(array(as.numeric(unlist(new_list[2]))), array(as.numeric(unlist(new_list[3]))))
> > Hutcheson's t-test for Shannon diversity equality
> > Shannon index first group: 3.1095
> > Shannon index second group: 1.8904
> > p-value : 0
> > ```
> > new_list[1]|[2]|[3] refer to healthy, tumor and metastasis. applied to
> > the original Hutcheson data:
> > ```
> > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > hutcheson(bird_1956, bird_1957)
> > Hutcheson's t-test for Shannon diversity equality
> > Shannon index first group: 1.8429
> > Shannon index second group: 1.0689
> > p-value : 0
> >
> > ```
> > This is to compare two groups at the time. I'll probably have to
> > compensate for multiple testing...
> > But if this all OK, then the case is closed.
> > Thank you
> >
> > On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > Sorry, there's an instruction missing. See inline.
> > >
> > > Às 11:44 de 10/09/20, Rui Barradas escreveu:
> > > > If you want a function automating Karl's code, here it is. It returns an
> > > > object of S3 class "htest", R's standard for hypothesis tests functions.
> > > > The returned object can then be subset in the usual ways, ht$statistic,
> > > > ht$parameter, ht$p.value, etc.
> > > >
> > > >
> > > > library(QSutils)
> > > >
> > > > hutcheson.test <- function(x1, x2){
> > > >    dataname1 <- deparse(substitute(x1))
> > > >    dataname2 <- deparse(substitute(x2))
> > > >    method <- "Hutcheson's t-test for Shannon diversity equality"
> > > >    alternative <- "the diversities of the two samples are not equal"
> > > >    h1 <- Shannon(x1)
> > > >    varh1 <- ShannonVar(x1)
> > > >    n1 <- sum(x1)
> > > >    h2 <- Shannon(x2)
> > > >    varh2 <- ShannonVar(x2)
> > > >    n2 <- sum(x2)
> > > >    degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
> > > >    tstat <- (h1 - h2)/sqrt(varh1 + varh2)
> > > >    p.value <- 2*pt(-abs(tstat), df = degfree)
> > > >    ht <- list(
> > > >      statistic = c(t = tstat),
> > > >      parameter = c(df = degfree),
> > > >      p.value = p.value,
> > > >      alternative = alternative,
> > > >      method = method,
> > > >      data.name = paste(dataname1, dataname2, sep = ", ")
> > > >    )
> > > >    class(ht) <- "htest"
> > > >    ht
> > > > }
> > > >
> > > > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > > >
> > > > hutcheson.test(earlier, later)
> > > >
> > > >
> > > >
> > > > With the data you provided:
> > > >
> > > >
> > > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > > > bird_1959 <- c(0,0,14,59,26,68,0)
> > > > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > > >
> > > > hutcheson.test(bird_1956, bird_1957)
> > > >
> > > >
> > > >
> > > >
> > > > Note that like David said earlier, there might be better ways to
> > > > interpret Shannon's diversity index. If h is the sample's diversity,
> > > > exp(h) gives the number of equally-common species with equivalent
> > > > diversity.
> > > >
> > > >
> > > > s1 <- Shannon(earlier)
> > > > s2 <- Shannon(later)
> > > > c(earlier = s1, later = s2)
> > > > exp(c(earlier = s1, later = s2))   # Both round to 3
> > > > eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
> > > > Shannon(eq_common)                 # Slightly greater than the samples'
> > > > diversity
> > > >
> > > >
> > >
> > > # Create a list with all the data
> > > birds <- mget(ls(pattern = "^bird"))
> > >
> > > > round(exp(sapply(birds, Shannon))) # Your data
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > >
> > > >
> > > > #-------------------------------------
> > > >
> > > >
> > > > Earlier Karl wrote [1] that
> > > >
> > > >
> > > > Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
> > > > that explains the minor numerical differences obtained with the code
> > > > above and the published variances.
> > > >
> > > >
> > > > I don't believe the published variances were computed with the published
> > > > variance estimator. The code below computes the variances like QSutils
> > > > and with formula (4) in Hutcheson's paper. The latter does not give the
> > > > same results.
> > > >
> > > > var_est <- function(n){
> > > >    s <- length(n)
> > > >    N <- sum(n)
> > > >    p <- n/N
> > > >    i <- p != 0
> > > >    inv.p <- numeric(s)
> > > >    inv.p[i] <- 1/p[i]
> > > >    log.p <- numeric(s)
> > > >    log.p[i] <- log(p[i])
> > > >    #
> > > >    term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
> > > >    term2 <- (s - 1)/(2*N^2)
> > > >    #
> > > >    numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p *
> > > > log.p)
> > > >    denom3 <- 6*N^3
> > > >    term3 <- numer3/denom3
> > > >    list(
> > > >      Bioc = term1 + term2,
> > > >      Hutch = term1 + term2 + term3
> > > >    )
> > > > }
> > > >
> > > > Vh1 <- var_est(earlier)
> > > > Vh1
> > > > all.equal(ShannonVar(earlier), Vh1$Bioc)
> > > > ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31
> > > >
> > > > Vh2 <- var_est(later)
> > > > Vh2
> > > > identical(ShannonVar(later), Vh2$Bioc)    # TRUE
> > > >
> > > >
> > > >
> > > > [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
> > > >
> > > >
> > > > Hope this helps,
> > > >
> > > > Rui Barradas
> > > >
> > > >
> > > > Às 09:38 de 10/09/20, Luigi Marongiu escreveu:
> > > >> Update:
> > > >> I can see that you used the function Shannon from the package QSutils.
> > > >> This would supplement the iNext package I used and solve the problem.
> > > >> Thank you.
> > > >>
> > > >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
> > > >> <[hidden email]> wrote:
> > > >>>
> > > >>> Thank you very much for the code, that was very helpful.
> > > >>> I got the article by Hutcheson -- I don't know if I can distribute it
> > > >>> , given the possible copyrights, or if I can attach it here -- but it
> > > >>> does not report numbers directly: it refers to a previous article
> > > >>> counting bird death on a telegraph each year. The numbers
> > > >>> are:
> > > >>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > >>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > >>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > > >>> bird_1959 <- c(0,0,14,59,26,68,0)
> > > >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > > >>>
> > > >>> This for sake of the argument.
> > > >>> As for my problem, I implemented the Shannon index with the package
> > > >>> iNext, which only gives me the index itself and the 95% CI. Even when
> > > >>> I implemented it with vegan, I only got the index. Essentially I don't
> > > >>> have a count of species I could feed into the Hutcheson's. Is there a
> > > >>> way to extract these data? Or to run a Hutcheson's on the final index?
> > > >>> Thank you
> > > >>>
> > > >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
> > > >>> <[hidden email]> wrote:
> > > >>>>
> > > >>>> Dear Luigi,
> > > >>>>
> > > >>>> below some code I cobbled together based on the Hutcheson paper you
> > > >>>> mentioned. I was lucky to find code to calculate h and, importantly,
> > > >>>> its
> > > >>>> variance in the R-package QSutils - you may find it on the Bioconductor
> > > >>>> website.
> > > >>>>
> > > >>>> here is the code, along with an example. I also attach the code as an
> > > >>>> R-file.
> > > >>>>
> > > >>>> Hope that helps.
> > > >>>> All my best
> > > >>>>
> > > >>>> Karl
> > > >>>> PS don't forget to adjust for multiple testing if you compare more than
> > > >>>> two groups.
> > > >>>> K
> > > >>>>
> > > >>>>
> > > >>>> # load package needed
> > > >>>> # QSutils is on Bioconductor
> > > >>>> library(QSutils)
> > > >>>>
> > > >>>> # here some exemplary data - these are the data from Pilou 1966 that
> > > >>>> are
> > > >>>> used
> > > >>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
> > > >>>>
> > > >>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > >>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > > >>>> # numbers of the first example used by Hutcheson were unfortunately not
> > > >>>> # available to me
> > > >>>>
> > > >>>> # here starts the code ; you may replace the variables "earlier" and
> > > >>>> "later"
> > > >>>> # by your own numbers.
> > > >>>>
> > > >>>> # calculate h, var(h) etc
> > > >>>> h1 <- Shannon(earlier)
> > > >>>> varh1 <- ShannonVar(earlier)
> > > >>>> n1 <- sum (earlier)
> > > >>>> h2 <- Shannon(later)
> > > >>>> varh2 <- ShannonVar(later)
> > > >>>> n2 <- sum (later)
> > > >>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
> > > >>>>
> > > >>>> # compare numbers with those in the paper
> > > >>>> h1
> > > >>>> h2
> > > >>>> varh1
> > > >>>> varh2
> > > >>>> # I assume that minor numerical differences are due to differences
> > > >>>> in the
> > > >>>> # numerical precision of computers in the early seventies and today
> > > >>>> / KS
> > > >>>>
> > > >>>> # this is the actual t-test
> > > >>>> t <- (h1-h2) /sqrt(varh1 + varh2)
> > > >>>> p <- 2*pt(-abs(t),df= degfree)
> > > >>>> p
> > > >>>>
> > > >>>> # that's it
> > > >>>> # Best
> > > >>>> # Karl
> > > >>>> --
> > > >>>> Karl Schilling, MD
> > > >>>> Professor of Anatomy and Cell Biology
> > > >>>> Anatomisches Institut
> > > >>>> Rheinische Friedrich-Wilhelms-Universität
> > > >>>> Nussallee 10
> > > >>>>
> > > >>>> D-53115 Bonn
> > > >>>> Germany
> > > >>>>
> > > >>>> phone ++49-228-73-2602
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Best regards,
> > > >>> Luigi
> > > >>
> > > >>
> > > >>
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Best regards,
> > Luigi
>
>
>
> --
> Best regards,
> Luigi



--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Luigi
Actually,
in the working example, Hutcheson himself did not report the term
µ1µ2: `t0 = h1-h2/(var1-var2)^1/2`. so I think we can live without it.
Case closed.
Thank you

On Fri, Sep 11, 2020 at 11:11 AM Luigi Marongiu
<[hidden email]> wrote:

>
> Hello,
> I have just realized in the original paper, the t test is defined as:
> `t = h1-h2 -(µ1µ2)/(var1-var2)^1/2`. is the term -(µ1µ2) missing in
> your formula? How to calculate µ1µ2?
> Thank you
>
> On Thu, Sep 10, 2020 at 2:41 PM Luigi Marongiu <[hidden email]> wrote:
> >
> > Update:
> > I also added the confidence interval for the Shannon index:
> > ```
> > #! Hutcheson's t-test for Shannon diversity equality
> > # thanks to Karl Schilling and Rui Barradas
> > hutcheson = function(A, B){
> >   # compute Shannon index, variance and sum of elements
> >   A_index <- Shannon(A)
> >   B_index <- Shannon(B)
> >   A_var <- ShannonVar(A)
> >   B_var <- ShannonVar(B)
> >   A_sum <- sum(A)
> >   B_sum <- sum(B)
> >   # compute Hutcheson
> >   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
> >   test <- (A_index-B_index) /sqrt(A_var + B_var)
> >   p <- 2*pt(-abs(test),df= DF)
> >   if (p < 0.001) {
> >     P = "<0.001"
> >   } else {
> >     P = round(p, 3)
> >   }
> >   if (p < 0.001) {
> >     S = "***"
> >   } else if (p < 0.01) {
> >     S = "**"
> >   } else if (p < 0.05) {
> >     S = "*"
> >   } else {
> >     S = ""
> >   }
> >   # closure
> >   cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
> > index first group: \t",
> >       round(A_index, 3), " (", round((A_index-2*A_var),3), "-",
> > round((A_index+2*A_var),3),
> >       ")\n\tShannon index second group: \t",
> >       round(B_index, 3), " (", round((B_index-2*B_var),3), "-",
> > round((B_index+2*B_var),3),
> >       ")\n\tp-value: ", P, " ", S, "\n", sep = "")
> >   return(p)
> > }
> > ```
> >
> > On Thu, Sep 10, 2020 at 2:10 PM Luigi Marongiu <[hidden email]> wrote:
> > >
> > > Hello,
> > > thank you for the code. To explain better, when I used vegan, I did
> > > not count the species directly but simply prepared a dataframe where,
> > > for each species, I counted the number of samples bearing such
> > > species:
> > > ```
> > >
> > > > str(new_df)
> > > 'data.frame': 3 obs. of  46 variables:
> > >  $ NC_001416 Enterobacteria phage lambda   : int  5 4 5
> > >  $ NC_001623 Autographa californica nucl...: int  7 7 7
> > >  $ NC_001895 Enterobacteria phage P2       : int  1 0 0
> > >  $ NC_004745 Yersinia phage L-413C         : int  1 0 0
> > > ```
> > > here the triplettes refer to healthy, tumor and metastasis. The outcome is:
> > > ```
> > > # Shannon index
> > > diversity(new_df)
> > > #> Normal     Tumour     Metastasis
> > > #> 2.520139   3.109512   1.890404
> > > ```
> > > Using iNext, I provided a list of all the species counted in a samples
> > > ```
> > > > new_list
> > > $Healthy
> > >  [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > > 0 0 0 0 0 0 0 0 0 0 0 0 0
> > >
> > > $Tumour
> > >  [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1
> > > 1 2 1 1 1 1 1 1 1 0 0 0 0
> > >
> > > $Metastasis
> > >  [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > > 0 0 0 0 0 0 1 0 0 1 1 1 1
> > > ```
> > > From this I get:
> > > ```
> > > mod = iNEXT(new_list, q=0, datatype="abundance")
> > > mod$AsyEst
> > > #Site         Diversity Observed Estimator   s.e.    LCL     UCL
> > > #1     Normal  Species richness   18.000    41.368 19.683 23.563 116.155
> > > #2     Normal Shannon diversity   12.430    21.343  5.183 12.430  31.501
> > > #4     Tumour  Species richness   30.000    94.776 42.936 49.848 241.396
> > > #5     Tumour Shannon diversity   22.410    53.135 14.486 24.743  81.526
> > > #7 Metastasis  Species richness   10.000    27.379 22.821 12.443 133.640
> > > #8 Metastasis Shannon diversity    6.622     9.980  3.102  6.622  16.059
> > > ```
> > > So here the Shannon index is 12 instead of 2.5...
> > > Using Karl's function, I get:
> > > ```
> > > # compute Shannon
> > > norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1]))))
> > > canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2]))))
> > > meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3]))))
> > > norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1]))))
> > > canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2]))))
> > > meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3]))))
> > > norm_sum <- sum(array(as.numeric(unlist(new_list[1]))))
> > > canc_sum <- sum(array(as.numeric(unlist(new_list[2]))))
> > > meta_sum <- sum(array(as.numeric(unlist(new_list[3]))))
> > > # compute Hutcheson
> > > degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum +
> > > canc_var**2 /canc_sum)
> > > test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var)
> > > (p <- 2*pt(-abs(test),df= degree))
> > > > [1] 0.01825784
> > > ```
> > > remarkably, the indices are the same as obtained by vegan:
> > > ```
> > > > norm_sIdx
> > > [1] 2.520139
> > > > canc_sIdx
> > > [1] 3.109512
> > > > meta_sIdx
> > > [1] 1.890404
> > > ```
> > >
> > > I tried Rui's function but I got an error, so I wrote it as
> > > ```
> > > hutcheson = function(A, B){
> > >   # compute Shannon index, variance and sum of elements
> > >   A_index <- Shannon(A)
> > >   B_index <- Shannon(B)
> > >   A_var <- ShannonVar(A)
> > >   B_var <- ShannonVar(B)
> > >   A_sum <- sum(A)
> > >   B_sum <- sum(B)
> > >   # compute Hutcheson
> > >   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
> > >   test <- (A_index-B_index) /sqrt(A_var + B_var)
> > >   p <- 2*pt(-abs(test),df= DF)
> > >   # closure
> > >   cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
> > > index first group: ",
> > >       round(A_index, 4), "\n\tShannon index second group: ", round(B_index, 4),
> > >       "\n\tp-value : ", round(p, 4), "\n", sep = "")
> > >   return(p)
> > > }
> > > ```
> > > and I got:
> > > ```
> > >
> > > > n_t = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[2]))))
> > > Hutcheson's t-test for Shannon diversity equality
> > > Shannon index first group: 2.5201
> > > Shannon index second group: 3.1095
> > > p-value : 0.0183
> > > > n_m = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[3]))))
> > > Hutcheson's t-test for Shannon diversity equality
> > > Shannon index first group: 2.5201
> > > Shannon index second group: 1.8904
> > > p-value : 0.0371
> > > > t_m = hutcheson(array(as.numeric(unlist(new_list[2]))), array(as.numeric(unlist(new_list[3]))))
> > > Hutcheson's t-test for Shannon diversity equality
> > > Shannon index first group: 3.1095
> > > Shannon index second group: 1.8904
> > > p-value : 0
> > > ```
> > > new_list[1]|[2]|[3] refer to healthy, tumor and metastasis. applied to
> > > the original Hutcheson data:
> > > ```
> > > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > > hutcheson(bird_1956, bird_1957)
> > > Hutcheson's t-test for Shannon diversity equality
> > > Shannon index first group: 1.8429
> > > Shannon index second group: 1.0689
> > > p-value : 0
> > >
> > > ```
> > > This is to compare two groups at the time. I'll probably have to
> > > compensate for multiple testing...
> > > But if this all OK, then the case is closed.
> > > Thank you
> > >
> > > On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <[hidden email]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > Sorry, there's an instruction missing. See inline.
> > > >
> > > > Às 11:44 de 10/09/20, Rui Barradas escreveu:
> > > > > If you want a function automating Karl's code, here it is. It returns an
> > > > > object of S3 class "htest", R's standard for hypothesis tests functions.
> > > > > The returned object can then be subset in the usual ways, ht$statistic,
> > > > > ht$parameter, ht$p.value, etc.
> > > > >
> > > > >
> > > > > library(QSutils)
> > > > >
> > > > > hutcheson.test <- function(x1, x2){
> > > > >    dataname1 <- deparse(substitute(x1))
> > > > >    dataname2 <- deparse(substitute(x2))
> > > > >    method <- "Hutcheson's t-test for Shannon diversity equality"
> > > > >    alternative <- "the diversities of the two samples are not equal"
> > > > >    h1 <- Shannon(x1)
> > > > >    varh1 <- ShannonVar(x1)
> > > > >    n1 <- sum(x1)
> > > > >    h2 <- Shannon(x2)
> > > > >    varh2 <- ShannonVar(x2)
> > > > >    n2 <- sum(x2)
> > > > >    degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
> > > > >    tstat <- (h1 - h2)/sqrt(varh1 + varh2)
> > > > >    p.value <- 2*pt(-abs(tstat), df = degfree)
> > > > >    ht <- list(
> > > > >      statistic = c(t = tstat),
> > > > >      parameter = c(df = degfree),
> > > > >      p.value = p.value,
> > > > >      alternative = alternative,
> > > > >      method = method,
> > > > >      data.name = paste(dataname1, dataname2, sep = ", ")
> > > > >    )
> > > > >    class(ht) <- "htest"
> > > > >    ht
> > > > > }
> > > > >
> > > > > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > > > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > > > >
> > > > > hutcheson.test(earlier, later)
> > > > >
> > > > >
> > > > >
> > > > > With the data you provided:
> > > > >
> > > > >
> > > > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > > > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > > > > bird_1959 <- c(0,0,14,59,26,68,0)
> > > > > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > > > >
> > > > > hutcheson.test(bird_1956, bird_1957)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Note that like David said earlier, there might be better ways to
> > > > > interpret Shannon's diversity index. If h is the sample's diversity,
> > > > > exp(h) gives the number of equally-common species with equivalent
> > > > > diversity.
> > > > >
> > > > >
> > > > > s1 <- Shannon(earlier)
> > > > > s2 <- Shannon(later)
> > > > > c(earlier = s1, later = s2)
> > > > > exp(c(earlier = s1, later = s2))   # Both round to 3
> > > > > eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
> > > > > Shannon(eq_common)                 # Slightly greater than the samples'
> > > > > diversity
> > > > >
> > > > >
> > > >
> > > > # Create a list with all the data
> > > > birds <- mget(ls(pattern = "^bird"))
> > > >
> > > > > round(exp(sapply(birds, Shannon))) # Your data
> > > >
> > > >
> > > > Hope this helps,
> > > >
> > > > Rui Barradas
> > > >
> > > > >
> > > > >
> > > > > #-------------------------------------
> > > > >
> > > > >
> > > > > Earlier Karl wrote [1] that
> > > > >
> > > > >
> > > > > Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
> > > > > that explains the minor numerical differences obtained with the code
> > > > > above and the published variances.
> > > > >
> > > > >
> > > > > I don't believe the published variances were computed with the published
> > > > > variance estimator. The code below computes the variances like QSutils
> > > > > and with formula (4) in Hutcheson's paper. The latter does not give the
> > > > > same results.
> > > > >
> > > > > var_est <- function(n){
> > > > >    s <- length(n)
> > > > >    N <- sum(n)
> > > > >    p <- n/N
> > > > >    i <- p != 0
> > > > >    inv.p <- numeric(s)
> > > > >    inv.p[i] <- 1/p[i]
> > > > >    log.p <- numeric(s)
> > > > >    log.p[i] <- log(p[i])
> > > > >    #
> > > > >    term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
> > > > >    term2 <- (s - 1)/(2*N^2)
> > > > >    #
> > > > >    numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p *
> > > > > log.p)
> > > > >    denom3 <- 6*N^3
> > > > >    term3 <- numer3/denom3
> > > > >    list(
> > > > >      Bioc = term1 + term2,
> > > > >      Hutch = term1 + term2 + term3
> > > > >    )
> > > > > }
> > > > >
> > > > > Vh1 <- var_est(earlier)
> > > > > Vh1
> > > > > all.equal(ShannonVar(earlier), Vh1$Bioc)
> > > > > ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31
> > > > >
> > > > > Vh2 <- var_est(later)
> > > > > Vh2
> > > > > identical(ShannonVar(later), Vh2$Bioc)    # TRUE
> > > > >
> > > > >
> > > > >
> > > > > [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
> > > > >
> > > > >
> > > > > Hope this helps,
> > > > >
> > > > > Rui Barradas
> > > > >
> > > > >
> > > > > Às 09:38 de 10/09/20, Luigi Marongiu escreveu:
> > > > >> Update:
> > > > >> I can see that you used the function Shannon from the package QSutils.
> > > > >> This would supplement the iNext package I used and solve the problem.
> > > > >> Thank you.
> > > > >>
> > > > >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
> > > > >> <[hidden email]> wrote:
> > > > >>>
> > > > >>> Thank you very much for the code, that was very helpful.
> > > > >>> I got the article by Hutcheson -- I don't know if I can distribute it
> > > > >>> , given the possible copyrights, or if I can attach it here -- but it
> > > > >>> does not report numbers directly: it refers to a previous article
> > > > >>> counting bird death on a telegraph each year. The numbers
> > > > >>> are:
> > > > >>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > > >>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > > >>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > > > >>> bird_1959 <- c(0,0,14,59,26,68,0)
> > > > >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > > > >>>
> > > > >>> This for sake of the argument.
> > > > >>> As for my problem, I implemented the Shannon index with the package
> > > > >>> iNext, which only gives me the index itself and the 95% CI. Even when
> > > > >>> I implemented it with vegan, I only got the index. Essentially I don't
> > > > >>> have a count of species I could feed into the Hutcheson's. Is there a
> > > > >>> way to extract these data? Or to run a Hutcheson's on the final index?
> > > > >>> Thank you
> > > > >>>
> > > > >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
> > > > >>> <[hidden email]> wrote:
> > > > >>>>
> > > > >>>> Dear Luigi,
> > > > >>>>
> > > > >>>> below some code I cobbled together based on the Hutcheson paper you
> > > > >>>> mentioned. I was lucky to find code to calculate h and, importantly,
> > > > >>>> its
> > > > >>>> variance in the R-package QSutils - you may find it on the Bioconductor
> > > > >>>> website.
> > > > >>>>
> > > > >>>> here is the code, along with an example. I also attach the code as an
> > > > >>>> R-file.
> > > > >>>>
> > > > >>>> Hope that helps.
> > > > >>>> All my best
> > > > >>>>
> > > > >>>> Karl
> > > > >>>> PS don't forget to adjust for multiple testing if you compare more than
> > > > >>>> two groups.
> > > > >>>> K
> > > > >>>>
> > > > >>>>
> > > > >>>> # load package needed
> > > > >>>> # QSutils is on Bioconductor
> > > > >>>> library(QSutils)
> > > > >>>>
> > > > >>>> # here some exemplary data - these are the data from Pilou 1966 that
> > > > >>>> are
> > > > >>>> used
> > > > >>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
> > > > >>>>
> > > > >>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > > >>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > > > >>>> # numbers of the first example used by Hutcheson were unfortunately not
> > > > >>>> # available to me
> > > > >>>>
> > > > >>>> # here starts the code ; you may replace the variables "earlier" and
> > > > >>>> "later"
> > > > >>>> # by your own numbers.
> > > > >>>>
> > > > >>>> # calculate h, var(h) etc
> > > > >>>> h1 <- Shannon(earlier)
> > > > >>>> varh1 <- ShannonVar(earlier)
> > > > >>>> n1 <- sum (earlier)
> > > > >>>> h2 <- Shannon(later)
> > > > >>>> varh2 <- ShannonVar(later)
> > > > >>>> n2 <- sum (later)
> > > > >>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
> > > > >>>>
> > > > >>>> # compare numbers with those in the paper
> > > > >>>> h1
> > > > >>>> h2
> > > > >>>> varh1
> > > > >>>> varh2
> > > > >>>> # I assume that minor numerical differences are due to differences
> > > > >>>> in the
> > > > >>>> # numerical precision of computers in the early seventies and today
> > > > >>>> / KS
> > > > >>>>
> > > > >>>> # this is the actual t-test
> > > > >>>> t <- (h1-h2) /sqrt(varh1 + varh2)
> > > > >>>> p <- 2*pt(-abs(t),df= degfree)
> > > > >>>> p
> > > > >>>>
> > > > >>>> # that's it
> > > > >>>> # Best
> > > > >>>> # Karl
> > > > >>>> --
> > > > >>>> Karl Schilling, MD
> > > > >>>> Professor of Anatomy and Cell Biology
> > > > >>>> Anatomisches Institut
> > > > >>>> Rheinische Friedrich-Wilhelms-Universität
> > > > >>>> Nussallee 10
> > > > >>>>
> > > > >>>> D-53115 Bonn
> > > > >>>> Germany
> > > > >>>>
> > > > >>>> phone ++49-228-73-2602
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Best regards,
> > > > >>> Luigi
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Luigi
> >
> >
> >
> > --
> > Best regards,
> > Luigi
>
>
>
> --
> Best regards,
> Luigi



--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

karl.schilling
Dear Luigi:

no, µ1µ2 is not "missing"

First, it should actually be "µ1 - µ2".

And as your usual null-hypothesis when comparing h1 and h2 is that they
are not different (i.e. µ1 = µ2), the latter term adds up to 0 and may
be omitted.

Karl Schilling

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Some code to run Hutcheson t-test using R

Luigi
Thank you for the clarification.

On Fri, Sep 11, 2020 at 2:36 PM Karl Schilling
<[hidden email]> wrote:

>
> Dear Luigi:
>
> no, µ1µ2 is not "missing"
>
> First, it should actually be "µ1 - µ2".
>
> And as your usual null-hypothesis when comparing h1 and h2 is that they
> are not different (i.e. µ1 = µ2), the latter term adds up to 0 and may
> be omitted.
>
> Karl Schilling



--
Best regards,
Luigi

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.