glmer with non integer weights

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

glmer with non integer weights

Kay Cichini
hello,

i'd appreciate help with my glmer.
i have a dependent which is an index (MH.index) ranging from 0-1. this index can also be considered as a propability. as i have a fixed factor (stage) and a nested random factor (site) i tried to model with glmer. i read that it's possible to use a quasibinomial distribution, for this kind of data, which i than actually did - but firstly

(1) i'm not quite sure if that's appropiate for my data, secondly
(2) i wondered if the model can be correct when variance of then main and nested factor are zero.
(3) also i could not yield p-values for that model.

here's data, call and output:

##########################################################
#call:
##########################################################

glmer(MH~stage+(1|stage/site),family=quasibinomial)

##########################################################
#output:
##########################################################
#Generalized linear mixed model fit by the Laplace approximation
#Formula: MH ~ stage + (1 | stage/site)
#  AIC   BIC logLik deviance
# 66.03 86.47 -26.01    52.03
#Random effects:
# Groups     Name        Variance Std.Dev.
# site:stage (Intercept) 0.000000 0.000  
# stage      (Intercept) 0.000000 0.000  
# Residual               0.076175 0.276  
# Number of obs: 137, groups: site:stage, 39; stage, 4

#Fixed effects:
#            Estimate Std. Error t value
#(Intercept)  0.39205    0.09009   4.352
#stageB      -0.87214    0.12498  -6.978
#stageC      -0.36153    0.12202  -2.963
#stageD      -0.09884    0.19811  -0.499

#Correlation of Fixed Effects:
#       (Intr) stageB stageC
#stageB -0.721              
#stageC -0.738  0.532      
#stageD -0.455  0.328  0.336
##########################################################
#my data:
##########################################################
similarity<-data.frame(list(structure(list(stage = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), site = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 6L, 6L,
6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L,
10L, 10L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L,
14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 18L,
18L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L,
22L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 25L, 25L,
25L, 25L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L, 28L, 28L, 28L,
28L, 29L, 29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 32L, 32L, 32L,
32L, 33L, 33L, 33L, 33L, 34L, 34L, 34L, 34L, 35L, 35L, 35L, 35L,
36L, 36L, 36L, 36L, 37L, 37L, 38L, 38L, 38L, 38L, 39L, 39L, 39L
), .Label = c("A11", "A12", "A14", "A15", "A16", "A17", "A18",
"A19", "A20", "A5", "A7", "A8", "B1", "B12", "B13", "B14", "B15",
"B17", "B18", "B2", "B4", "B7", "B8", "B9", "C1", "C10", "C11",
"C15", "C17", "C18", "C19", "C2", "C20", "C3", "C4", "C6", "D1",
"D4", "D7"), class = "factor"), MH.Index = c(0.392156863, 0.602434077,
0.576923077, 0.647482014, 0.989010989, 0.857142857, 1, 1, 1,
0, 1, 0.378378378, 0.839087948, 0.252915554, 1, 0.22556391, 0.510366826,
0.476190476, 0.555819477, 0.961538462, 0.666666667, 0.089285714,
0.923076923, 0.571428571, 0, 0.923076923, 0.617647059, 0.599423631,
0, 0.727272727, 0.998112812, 0, 0, 0, 1, 0.565656566, 0.75, 0.923076923,
0.654545455, 0.14084507, 0.617647059, 0.315789474, 0.179347826,
0.583468021, 0.165525114, 0.817438692, 0.455551457, 0.49548886,
0.556127703, 0.707431246, 0.506757551, 0.689655172, 0.241433511,
0.379232506, 0.241935484, 0, 0.30848329, 0.530973451, 0.148148148,
0, 0.976744186, 0.550218341, 0.542168675, 0.769230769, 0.153310105,
0, 0, 0.380569406, 0.742174733, 0.222222222, 0.046925432, 0,
0.068076328, 0.772727273, 0.830039526, 0.503458415, 0.863910822,
0.39401263, 0.081818182, 0.368421053, 0.088607595, 0, 0.575499851,
0.605657238, 0.714854232, 0.855881172, 0.815689401, 0.552207228,
0.81708081, 0.583228133, 0.334466349, 0.259477365, 0.194711538,
0.278916707, 0.636304805, 0.593715432, 0.661016949, 0.626865672,
0.420219245, 0.453535143, 0.471243706, 0.462427746, 0.56980057,
0.453821155, 0.052828527, 0.926829268, 0.51988266, 0.472200264,
0.351219512, 0.290030211, 0.765258974, 0.564894108, 0.789699571,
0.863378215, 0.525181559, 0.803061458, 0.260164645, 0.477265792,
0.265889379, 0.317791411, 0.107623318, 0.279181709, 0.471953363,
0.463724265, 0.241966696, 0.403647213, 0.693087992, 0.494259925,
0.68904453, 0.39329147, 0.498161213, 0.376225983, 0.407001046,
0.825016633, 0.718991658, 0.662995912)), .Names = c("stage",
"site", "MH.Index"), class = "data.frame", row.names = c(NA,
-136L))))
##########################################################  
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Thierry Onkelinx
Dear Kay,

There is a R list about mixed models. Which is a better place for your
questions.

The (quasi)binomial family is used with binary data or a ratio that
originates from binary data. In case of a ratio you need to provide the
number of trials through the weights argument.

Further I would suggest to drop stage from either the random effects or
the fixed effects. You are trying to estimate the same effect twice in a
model.

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
[hidden email]
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
 

> -----Oorspronkelijk bericht-----
> Van: [hidden email]
> [mailto:[hidden email]] Namens Kay Cichini
> Verzonden: maandag 12 april 2010 16:12
> Aan: [hidden email]
> Onderwerp: [R] glmer with non integer weights
>
>
> hello,
>
> i'd appreciate help with my glmer.
> i have a dependent which is an index (MH.index) ranging from
> 0-1. this index can also be considered as a propability. as i
> have a fixed factor (stage) and a nested random factor (site)
> i tried to model with glmer. i read that it's possible to use
> a quasibinomial distribution, for this kind of data, which i
> than actually did - but firstly
>
> (1) i'm not quite sure if that's appropiate for my data, secondly
> (2) i wondered if the model can be correct when variance of
> then main and nested factor are zero.
> (3) also i could not yield p-values for that model.
>
> here's data, call and output:
>
> ##########################################################
> #call:
> ##########################################################
>
> glmer(MH~stage+(1|stage/site),family=quasibinomial)
>
> ##########################################################
> #output:
> ##########################################################
> #Generalized linear mixed model fit by the Laplace approximation
> #Formula: MH ~ stage + (1 | stage/site)
> #  AIC   BIC logLik deviance
> # 66.03 86.47 -26.01    52.03
> #Random effects:
> # Groups     Name        Variance Std.Dev.
> # site:stage (Intercept) 0.000000 0.000  
> # stage      (Intercept) 0.000000 0.000  
> # Residual               0.076175 0.276  
> # Number of obs: 137, groups: site:stage, 39; stage, 4
>
> #Fixed effects:
> #            Estimate Std. Error t value
> #(Intercept)  0.39205    0.09009   4.352
> #stageB      -0.87214    0.12498  -6.978
> #stageC      -0.36153    0.12202  -2.963
> #stageD      -0.09884    0.19811  -0.499
>
> #Correlation of Fixed Effects:
> #       (Intr) stageB stageC
> #stageB -0.721              
> #stageC -0.738  0.532      
> #stageD -0.455  0.328  0.336
> ##########################################################
> #my data:
> ##########################################################
> similarity<-data.frame(list(structure(list(stage =
> structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class =
> "factor"), site = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L,
> 3L, 3L, 4L, 4L, 4L, 4L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 8L,
> 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L,
> 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 15L, 15L,
> 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 18L, 18L, 19L,
> 19L, 19L, 19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 22L,
> 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 25L, 25L,
> 25L, 25L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L, 28L, 28L,
> 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 32L,
> 32L, 32L, 32L, 33L, 33L, 33L, 33L, 34L, 34L, 34L, 34L, 35L,
> 35L, 35L, 35L, 36L, 36L, 36L, 36L, 37L, 37L, 38L, 38L, 38L,
> 38L, 39L, 39L, 39L ), .Label = c("A11", "A12", "A14", "A15",
> "A16", "A17", "A18", "A19", "A20", "A5", "A7", "A8", "B1",
> "B12", "B13", "B14", "B15", "B17", "B18", "B2", "B4", "B7",
> "B8", "B9", "C1", "C10", "C11", "C15", "C17", "C18", "C19",
> "C2", "C20", "C3", "C4", "C6", "D1", "D4", "D7"), class =
> "factor"), MH.Index = c(0.392156863, 0.602434077,
> 0.576923077, 0.647482014, 0.989010989, 0.857142857, 1, 1, 1,
> 0, 1, 0.378378378, 0.839087948, 0.252915554, 1, 0.22556391,
> 0.510366826, 0.476190476, 0.555819477, 0.961538462,
> 0.666666667, 0.089285714, 0.923076923, 0.571428571, 0,
> 0.923076923, 0.617647059, 0.599423631, 0, 0.727272727,
> 0.998112812, 0, 0, 0, 1, 0.565656566, 0.75, 0.923076923,
> 0.654545455, 0.14084507, 0.617647059, 0.315789474,
> 0.179347826, 0.583468021, 0.165525114, 0.817438692,
> 0.455551457, 0.49548886, 0.556127703, 0.707431246,
> 0.506757551, 0.689655172, 0.241433511, 0.379232506,
> 0.241935484, 0, 0.30848329, 0.530973451, 0.148148148, 0,
> 0.976744186, 0.550218341, 0.542168675, 0.769230769,
> 0.153310105, 0, 0, 0.380569406, 0.742174733, 0.222222222,
> 0.046925432, 0, 0.068076328, 0.772727273, 0.830039526,
> 0.503458415, 0.863910822, 0.39401263, 0.081818182,
> 0.368421053, 0.088607595, 0, 0.575499851, 0.605657238,
> 0.714854232, 0.855881172, 0.815689401, 0.552207228,
> 0.81708081, 0.583228133, 0.334466349, 0.259477365,
> 0.194711538, 0.278916707, 0.636304805, 0.593715432,
> 0.661016949, 0.626865672, 0.420219245, 0.453535143,
> 0.471243706, 0.462427746, 0.56980057, 0.453821155,
> 0.052828527, 0.926829268, 0.51988266, 0.472200264,
> 0.351219512, 0.290030211, 0.765258974, 0.564894108,
> 0.789699571, 0.863378215, 0.525181559, 0.803061458,
> 0.260164645, 0.477265792, 0.265889379, 0.317791411,
> 0.107623318, 0.279181709, 0.471953363, 0.463724265,
> 0.241966696, 0.403647213, 0.693087992, 0.494259925,
> 0.68904453, 0.39329147, 0.498161213, 0.376225983,
> 0.407001046, 0.825016633, 0.718991658, 0.662995912)), .Names
> = c("stage", "site", "MH.Index"), class = "data.frame",
> row.names = c(NA,
> -136L))))
> ##########################################################
> --
> View this message in context:
> http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p
> 1837179.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Kay Cichini
In reply to this post by Kay Cichini
thanks thierry,

my problem is that the index is a propability which is not derived from incidents per nr. of observations, thus i don't have those numbers but only the plain index, which i want to test.

greatings,
kay
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Thierry Onkelinx
So your respons variable behaves like a continuous variable except that
is range is limited to the 0-1 interval. In such a case I would
transform the respons variable (e.g. logit, sqrt(arcsin())) and use a
gaussian model.

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
[hidden email]
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
 

> -----Oorspronkelijk bericht-----
> Van: [hidden email]
> [mailto:[hidden email]] Namens Kay Cichini
> Verzonden: dinsdag 13 april 2010 13:37
> Aan: [hidden email]
> Onderwerp: Re: [R] glmer with non integer weights
>
>
> thanks thierry,
>
> my problem is that the index is a propability which is not
> derived from incidents per nr. of observations, thus i don't
> have those numbers but only the plain index, which i want to test.
>
> greatings,
> kay
> --
> View this message in context:
> http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p
> 1838268.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Thomas Lumley
On Tue, 13 Apr 2010, ONKELINX, Thierry wrote:

> So your respons variable behaves like a continuous variable except that
> is range is limited to the 0-1 interval. In such a case I would
> transform the respons variable (e.g. logit, sqrt(arcsin())) and use a
> gaussian model.

A logit-Normal has variance roughly mu^2(1-mu)^2 and a quasibinomial logistic uses mu(1-mu),  with the parameters having the same interpretations. The question of which variance function best approximates the data should really be an empirical one, not an a prioiri one.  There is an example of exactly this in the quasilikelihood chapter in McCullagh and Nelder, where the observations are the proportion of damage on a set of leaves.

   -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Kay Cichini
In reply to this post by Thierry Onkelinx
thanks thierry,

i considered this transformations already, but variance is not stabilized and/or normality is neither achieved.
i guess i'll have to look out for non-parametrics?

best regards,
kay
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Kay Cichini
In reply to this post by Thomas Lumley
thank you thomas for the helpful hint!

yours,
kay
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Emmanuel Charpentier
In reply to this post by Kay Cichini
Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a écrit :
> thanks thierry,
>
> i considered this transformations already, but variance is not stabilized
> and/or normality is neither achieved.
> i guess i'll have to look out for non-parametrics?

Or (maybe) a model based on a non-Gaussian likelihood ? A beta
distribution comes to mind, either fitted by maximum likelihood or (if
relevant prior information is available) in a Bayesian framework ?

But beware : you have a not-so-small problem ...

Your data have zeroes and ones, which, if you have no information on a
"sample size", are "sharp" zeroes and ones, and there therefore
theoretically bound to infinite linear predictors (in plain English :
bloody unlikely). These values make a "fixed effect" analysis
impossible : these points "at infinite" will make regression essentially
impossible. Consider :

> logit<-function(x)log(x/(1-x))
> ilogit<-function(x)1/(1+exp(-x))
> ilogit(coef(lm(logit(MH.Index)~0+stage,data=similarity)))
Erreur dans lm.fit(x, y, offset = offset, singular.ok =
singular.ok, ...) :
  NA/NaN/Inf dans un appel à une fonction externe (argument 4)

You need to have some information on the precision of your zeroes and
ones nd use it to get some "possible" values of MH.Index

One might be tempted to "unround" them by a small amount (representing a
"reasonable guess" on your precision :
> epsilon=0.01
> ilogit(coef(lm(logit(MH.Index)~0
+stage,data=within(similarity,{MH.Index<-pmax(epsilon,pmin(1-epsilon,MH.Index))}))))
   stageA    stageB    stageC    stageD
0.6490997 0.2914323 0.5087639 0.5721789

BUT :

> epsilon=0.000001
> ilogit(coef(lm(logit(MH.Index)~0
+stage,data=within(similarity,{MH.Index<-pmax(epsilon,pmin(1-epsilon,MH.Index))}))))
   stageA    stageB    stageC    stageD
0.6588222 0.1020177 0.5087639 0.5721789

The estimation for stageB depends critically of the "unrounding" amount
chosen.

I tried a small fixed effect logit model in JAGS : it won't initialize
with the original data (O and 1 are effectively impossible with possible
values of the beta coefficients), and seems to exhibit, the same kind of
sensitivity to "unrounding" amount that the linear model :

LogitModFix<-local({
  Modele<-function(){
    for(k in 1:nobs) {
      ## logit(MH.Index[k])<-lpi.i[k]
      lpi.i[k]~dnorm(lpi[stage[k]], tau.lpi[stage[k]])
    }
    for(i in 1:nstage) {
      lpi[i]~dnorm(0,1.0E-6)
      tau.lpi[i]<-pow(sigma.lpi[i],-2)
      sigma.lpi[i]~dunif(0,100)
      pi[i]<-ilogit(lpi[i])
    }
  }
  Data<-function() {
    for (i in 1:nobs) {
      lpi.i[i]<-logit(MH.Index[i])
    }
  }
  tmpf<-tempfile()
  ## write.model has been shoplifted from R2WinBUGS and adapted to JAGS
  ## by allowing a "data" argument for transformations
  write.model(Modele,tmpf, data=Data)
  epsilon<-0.00001
  Modele.jags<-jags.model(tmpf,
                          data=with(similarity,

list(MH.Index=pmax(epsilon,pmin(1-epsilon,MH.Index)),
                                 ## MH.Index=MH.Index,
                                 stage=stage,
                                 nobs=nrow(similarity),
                                 nstage=nlevels(stage))),
                          n.chains=3)
  unlink(tmpf)
  Modele.coda<-coda.samples(Modele.jags,
                            variable.names=c("deviance", "pi",
"sigma.lpi"),
                            n.iter=1000)
  list(Modele.jags, Modele.coda)
})

## Convergence (not shown) is quite acceptable

> summary(LogitModFix[[2]])

Iterations = 1001:2000
Thinning interval = 1
Number of chains = 3
Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                 Mean      SD  Naive SE Time-series SE
deviance     660.5656 4.71697 0.0861196      0.1672079
pi[1]          0.6304 0.21666 0.0039557      0.0040347
pi[2]          0.1506 0.08036 0.0014671      0.0014460
pi[3]          0.5082 0.03956 0.0007222      0.0006777
pi[4]          0.5696 0.07473 0.0013644      0.0012952
sigma.lpi[1]   6.9409 0.86442 0.0157821      0.0287626
sigma.lpi[2]   4.2775 0.49281 0.0089974      0.0132247
sigma.lpi[3]   1.0875 0.11506 0.0021006      0.0026048
sigma.lpi[4]   0.8801 0.29371 0.0053623      0.0118630

2. Quantiles for each variable:

                 2.5%       25%      50%      75%    97.5%
deviance     654.0216 657.13069 659.6788 662.9535 672.2101
pi[1]          0.1695   0.47600   0.6681   0.8107   0.9485
pi[2]          0.0413   0.09206   0.1345   0.1914   0.3564
pi[3]          0.4320   0.48079   0.5078   0.5366   0.5851
pi[4]          0.4154   0.52518   0.5722   0.6168   0.7135
sigma.lpi[1]   5.5553   6.33366   6.8287   7.4412   8.8207
sigma.lpi[2]   3.5082   3.93440   4.2274   4.5598   5.3497
sigma.lpi[3]   0.8833   1.00397   1.0805   1.1565   1.3412
sigma.lpi[4]   0.5304   0.67740   0.8189   1.0067   1.5987

The same model re-fitted with epsilon=0.01 gives :

> summary(LogitModFix[[2]])

Iterations = 1001:2000
Thinning interval = 1
Number of chains = 3
Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                 Mean      SD  Naive SE Time-series SE
deviance     534.2639 4.42164 0.0807278       0.122396
pi[1]          0.6396 0.10877 0.0019859       0.001892
pi[2]          0.2964 0.06592 0.0012036       0.001163
pi[3]          0.5083 0.04120 0.0007522       0.000805
pi[4]          0.5702 0.07170 0.0013091       0.001221
sigma.lpi[1]   3.0387 0.36356 0.0066376       0.008803
sigma.lpi[2]   2.0519 0.22665 0.0041381       0.005358
sigma.lpi[3]   1.0928 0.12531 0.0022879       0.003305
sigma.lpi[4]   0.8699 0.26650 0.0048656       0.010146

2. Quantiles for each variable:

                 2.5%      25%      50%      75%    97.5%
deviance     527.9142 530.9711 533.4347 536.6295 544.7738
pi[1]          0.4130   0.5683   0.6429   0.7195   0.8251
pi[2]          0.1826   0.2511   0.2909   0.3373   0.4390
pi[3]          0.4266   0.4809   0.5093   0.5359   0.5892
pi[4]          0.4183   0.5254   0.5701   0.6150   0.7106
sigma.lpi[1]   2.4362   2.7772   2.9965   3.2600   3.8395
sigma.lpi[2]   1.6606   1.8935   2.0320   2.1811   2.5598
sigma.lpi[3]   0.8803   1.0035   1.0754   1.1711   1.3670
sigma.lpi[4]   0.5022   0.6901   0.8198   0.9906   1.5235

One should also note that the deviance is *extremely* sensitive to the
choice of epsilon, whic tends to indicate that, at small epsilon values,
the "sharp" "impossible" values dominate the evaluation of the
oefficients they are involved in.

Beta models (not shown) give similar results, to a lesser extend.

In short, your "sharp" data are essentially incompatible with your
model, which can't be fitted unless you get at least a rough idea of
their precision.

HTH,

                                Emmanuel Charpentier, DDS, MSc

> best regards,
> kay

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Emmanuel Charpentier
In reply to this post by Kay Cichini
Addendum to my previous answer :

In that special case, the limited range of the asin(sqrt())
transformation, which is a shortcoming, turns out to be useful. The
fixed-effect doefficients seem semi-reasonable (except for stageB) :

> (sin(coef(lm(asin(sqrt(MH.Index))~0+stage, data=similarity))))^2
   stageA    stageB    stageC    stageD
0.6164870 0.3389430 0.5083574 0.5672021

The random effects being nested in the fixed efect, one can't afford to
be lazy in the parametrization of the corresponding random effect :

> summary(lmer(asin(sqrt(MH.Index))~stage+(stage|site),
data=similarity))
Linear mixed model fit by REML
Formula: asin(sqrt(MH.Index)) ~ stage + (stage | site)
   Data: similarity
   AIC BIC logLik deviance REMLdev
 155.3 199 -62.65    111.8   125.3
Random effects:
 Groups   Name        Variance Std.Dev. Corr                
 site     (Intercept) 0.043579 0.20876                      
          stageB      0.033423 0.18282  -0.999              
          stageC      0.043580 0.20876  -1.000  0.999        
          stageD      0.043575 0.20875  -1.000  0.999  1.000
 Residual             0.128403 0.35833                      
Number of obs: 136, groups: site, 39

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.93036    0.08431  11.035
stageB      -0.30879    0.10079  -3.064
stageC      -0.13660    0.09981  -1.369
stageD      -0.07755    0.14620  -0.530

Correlation of Fixed Effects:
       (Intr) stageB stageC
stageB -0.836              
stageC -0.845  0.707      
stageD -0.577  0.482  0.487
> v<-fixef(lmer(asin(sqrt(MH.Index))~stage+(stage|site),
data=similarity))
> v[2:4]<-v[1]+v[2:4]
> names(v)[1]<-"stageA"
> (sin(v))^2
   stageA    stageB    stageC    stageD
0.6429384 0.3390903 0.5083574 0.5672021

But again, we're exploiting a shortcoming of the asin(sqrt())
transformation.

HTH,

                                        Emmanuel Charpentier

Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a écrit :
> thanks thierry,
>
> i considered this transformations already, but variance is not stabilized
> and/or normality is neither achieved.
> i guess i'll have to look out for non-parametrics?
>
> best regards,
> kay

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Kay Cichini
hi emmanuel,

thanks a lot for your extensive answer.
do you think using the asin(sqrt()) transf. can be justified for publishing prurpose or do i have to expect criticism.

naivly i excluded that possibility, because of violated anova-assumptions, but if i did get you right the finite range rather posses a problem here.

why is it in this special case an advantage?

greetings,
kay
Reply | Threaded
Open this post in threaded view
|

logit() etc {was "Re: glmer with non integer weights"}

Martin Maechler
In reply to this post by Emmanuel Charpentier
>>>>> "EC" == Emmanuel Charpentier <[hidden email]>
>>>>>     on Sun, 18 Apr 2010 11:29:29 +0200 writes:

    EC> Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a
    EC> écrit :
    >> thanks thierry,
    >>
    >> i considered this transformations already, but variance
    >> is not stabilized and/or normality is neither achieved.
    >> i guess i'll have to look out for non-parametrics?

    EC> Or (maybe) a model based on a non-Gaussian likelihood ?
    EC> A beta distribution comes to mind, either fitted by
    EC> maximum likelihood or (if relevant prior information is
    EC> available) in a Bayesian framework ?

    EC> But beware : you have a not-so-small problem ...

    EC> Your data have zeroes and ones, which, if you have no
    EC> information on a "sample size", are "sharp" zeroes and
    EC> ones, and there therefore theoretically bound to
    EC> infinite linear predictors (in plain English : bloody
    EC> unlikely). These values make a "fixed effect" analysis
    EC> impossible : these points "at infinite" will make
    EC> regression essentially impossible. Consider :

    >> logit<-function(x)log(x/(1-x))
    >> ilogit<-function(x)1/(1+exp(-x))

Hmmm,  and some CRAN packages even define these ..

Now, please,  the help page   ?Logistic
has contained for a long time now

 >> Note:
 >>
 >>      ‘qlogis(p)’ is the same as the well known ‘_logit_’ function,
 >>      logit(p) = log(p/(1-p), and ‘plogis(x)’ has consequently been
 >>      called the ‘inverse logit’.

So please "note", and do use qlogis() and plogis() instead of
logit() and ilogit() ...
or if you really really must (e.g. for didactical reasons), use

logit <- qlogis

Using the logistic functions directly may also remind you or
your user that sometimes it will be advantageous to use
'log.p=TRUE' or 'lower.tail=FALSE'  ``coordinate systems"

Martin


  [...................................]
  [...................................]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Emmanuel Charpentier
In reply to this post by Kay Cichini
Le lundi 19 avril 2010 à 03:00 -0800, Kay Cichini a écrit :
> hi emmanuel,
>
> thanks a lot for your extensive answer.
> do you think using the asin(sqrt()) transf. can be justified for publishing
> prurpose or do i have to expect criticism.

Hmmm ... depends of your reviewers. But if an half-asleep dental surgeon
caught that after an insomnia, you might expect that a fully caffeinated
reviewer will. Add Murphy's law to the mix and ... boom !

> naivly i excluded that possibility, because of violated anova-assumptions,
> but if i did get you right the finite range rather posses a problem here.

No. your problem is that you model a probability as a smooth (linear)
finite function of finite variables. Under those assumptions, you can't
get a *certitude* (probability 0 or 1). Your model is *intrinsically*
inconsistent with your data.

In other word, I'm unable to believe both your model (linear
whathyoumaycallit regression) and your data (wich include certainties)
*simultaneously*.

I'd reconsider your 0 or 1, as meaning *censored* quantities (i. e. no
farther than some epsilon from 0 or 1), with *hard* data (i. e. not a
cooked-up estimate such as the ones i used) to estimate epsilon. There
are *lots* of ways to fit models with censored dependent variables.

> why is it in this special case an advantage?

It's bloody hell *not* a specific advantage : if you want to fit a
linear model to a a probability, you *need* some function mapping R to
the open ]0 1[ (i. e. all reals strictly superior to 0 and strictly
inferior to 1 ; I thing that's denoted (0 1) in English/American usage).
Asin(sqrt()) does that.

However, (asin(sqrt()))^-1 has a big problem (mapping back [0 1] i. e.
*including* 0 and 1, *not* (0 1), to R) which *hides* the (IMHO bigger)
problem of the inadequacy of your model to your data ! In other words,
it lets you shoot yourself in the foot after a nice sciatic nerve
articaïne block making the operation painless (but still harmful). On
the other hand, logit (or, as pointed by Martin Maechler, qlogis), is
kind enough to choke on this (i. e. returning back Inf values, which
will make the regression program choke).

So please quench my thirst : what exactly is MH.Index supposed to be ?
How is it measured, estimated, guessed or divined ?

HTH,

Emmanuel Charpentier

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Kay Cichini
hello,

it's the Morisita Horn Index, which is an ecological index for similarity between two multivariate objects (vegetation samples with species and its abundance) where a value of one indicates completely same relative importance of species in both samples and 0 denotes total absence of any same species.

it can be expressed as a probability:

(prob. that an individual drawn from sample j
and one from sample k belong to the same species)
-------------------------------------------------   = MH-INdex
(prob. that two ind. drawn from either sample will
belong to same species)

it is also covered in
library(vegan);?vegdist
here it is its complement: 1-MH, which then is a dissimilarity measure

best regards,
kay
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Emmanuel Charpentier
Sorry for this late answer (I've had a seriously nonmaskable interrupt).

Since I have technical questions not related to R, I take the liberty to
follow this up by e-mail.

I might post a followup summary if  another R problem arises...

                                        Emmanuel Charpentier

Le lundi 19 avril 2010 à 23:40 -0800, Kay Cichini a écrit :

> hello,
>
> it's the Morisita Horn Index, which is an ecological index for similarity
> between two multivariate objects (vegetation samples with species and its
> abundance) where a value of one indicates completely same relative
> importance of species in both samples and 0 denotes total absence of any
> same species.
>
> it can be expressed as a probability:
>
> (prob. that an individual drawn from sample j
> and one from sample k belong to the same species)
> -------------------------------------------------   = MH-INdex
> (prob. that two ind. drawn from either sample will
> belong to same species)

[ Technical rambling ]

Hmmm ... that's the *ratio* of two probabilities, not a probability.
According to <a href="http://www.tnstate.edu/ganter/B412%20Ch%2015&16%">http://www.tnstate.edu/ganter/B412%20Ch%2015&16%
20CommMetric.html, that I found in the first page of answers to a simple
google query, in can also be thought of the ratio of two "distances"
between sites : (maximal distnce - actual distance) / maximal distance
(with a massive (over-?) simplification). There is no reason to think a
priori that the logit transformation (or the asin(sqrt()) transformation
has better properties for this index than any other mapping from [0 1]
to R.

(BTW, a better mapping might be from [0 1] to [0 Inf] or conversely, but
"negative" distances have no (obvious) meaning. Here asin(sqrt()) might
make more sense that qlogis().)

[ End of technical rambling ]

But I have trouble understanding how a "similarity" or "distance" index
can characterize *one* site... Your data clearly associate a MH.Index to
each site : what distance or similarity do you measure at this site ?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glmer with non integer weights

Kay Cichini
hello,

krebs (1995) states MH as prob., but yes it's rather a ratio of probs.
at each site i had 4 blocks with 2 treatments (treat vs. control) - after treating i looked for similarity between each of those pairs.

it is of interest if changes in similarity due to treatment differ between stages.

hope this clarifies the thing a bit.

greetings,
kay