Weights in binomial glm

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Weights in binomial glm

Jan van der Laan
I have some questions about the use of weights in binomial glm as I am
not getting the results I would expect. In my case the weights I have
can be seen as 'replicate weights'; one respondent i in my dataset
corresponds to w[i] persons in the population. From the documentation
of the glm method, I understand that the weights can indeed be used
for this: "For a binomial GLM prior weights are used to give the
number of trials when the response is the proportion of successes."
>From "Modern applied statistics with S-Plus 3rd ed." I understand the
same.

However, I am getting some strange results. I generated an example:

Generate some data which is simular to my dataset
> Z <- rbinom(1000, 1, 0.1)
> W <- round(rnorm(1000, 100, 40))
> W[W < 1] <- 1

Probability of success can either be estimated using:
> sum(Z*W)/sum(W)
[1] 0.09642109

Or using glm:
> model <- glm(Z ~ 1, weights=W, family=binomial())
Warning message:
In glm.fit(x = X, y = Y, weights = weights, start = start, etastart =
etastart,  :
  fitted probabilities numerically 0 or 1 occurred
> predict(model, type="response")[1]
           1
2.220446e-16

These two results are obviously not the same. The strange thing is
that when I scale the weights, such that the total equals one, the
probability is correctly estimated:

> model <- glm(Z ~ 1, weights=W/sum(W), family=binomial())
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
> predict(model, type="response")[1]
         1
0.09642109


However scaling of the weights should, as far as I am aware, not have
an effect on the estimated parameters. I also tried some other
scalings. And, for example scaling the weights by 20 also gives me the
correct result.

> model <- glm(Z ~ 1, weights=W/20, family=binomial())
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
> predict(model, type="response")[1]
         1
0.09642109


Am I misinterpreting the weights? Could this be a numerical problem?

Regards,

Jan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weights in binomial glm

Thierry Onkelinx
Jan,

It looks like you did not understand the line "For a binomial GLM prior
weights are used to give the number of trials when the response is the
proportion of successes."

Weights must be a number of trials (hence integer). Not a proportion of
a population. Here is an example that clarifies the use of weights.

library(boot)
library(reshape)
dataset <- data.frame(Person = c(rep("A", 20), rep("B", 10)), Success =
c(rbinom(20, 1, 0.25), rbinom(10, 1, 0.75)))
Aggregated <- cast(Person ~ ., data = dataset, value = "Success", fun =
list(mean, length))

m0 <- glm(Success ~ 1, data = dataset, family = binomial)
m1 <- glm(mean ~ 1, data = Aggregated, family = binomial, weights =
length)

inv.logit(coef(m0))
inv.logit(coef(m1))

Have a look at the survey package is you want to analyse stratified
data.

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
[hidden email]
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
 

> -----Oorspronkelijk bericht-----
> Van: [hidden email]
> [mailto:[hidden email]] Namens Jan van der Laan
> Verzonden: vrijdag 16 april 2010 14:11
> Aan: [hidden email]
> Onderwerp: [R] Weights in binomial glm
>
> I have some questions about the use of weights in binomial
> glm as I am not getting the results I would expect. In my
> case the weights I have can be seen as 'replicate weights';
> one respondent i in my dataset corresponds to w[i] persons in
> the population. From the documentation of the glm method, I
> understand that the weights can indeed be used for this: "For
> a binomial GLM prior weights are used to give the number of
> trials when the response is the proportion of successes."
> >From "Modern applied statistics with S-Plus 3rd ed." I understand the
> same.
>

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weights in binomial glm

Jan van der Laan
Thierry,

Thank you for your answer.

>From the documentation it looks like it is valid to assume that the
weights can be used for replicate weights. Continuing your example:

dataset$Success2 <- dataset$Success
Aggregated2 <- cast(Person+Success ~ ., data = dataset, value =
"Success2", fun =list(mean, length))
m2 <- glm(mean ~ 1, data = Aggregated2, family = binomial, weights =length)

In this case the weights can be seen as replicate weights. In my case
the proportion of successes for each group is either 0 or 1.

I am familiar with the survey package. However, in this case there
should not be difference between the two as far as the parameter
estimates are concerned (the standard errors are incorrect for glm).

The strange thing in this case is that the estimates seem to depend on
the scaling of the weights, which should not be the case. Also in your
example scaling the weights gives the same estimate:

m1 <- glm(mean ~ 1, data = Aggregated, family = binomial, weights = length/10)

Regards,
Jan



On Fri, Apr 16, 2010 at 3:19 PM, ONKELINX, Thierry
<[hidden email]> wrote:

> Jan,
>
> It looks like you did not understand the line "For a binomial GLM prior
> weights are used to give the number of trials when the response is the
> proportion of successes."
>
> Weights must be a number of trials (hence integer). Not a proportion of
> a population. Here is an example that clarifies the use of weights.
>
> library(boot)
> library(reshape)
> dataset <- data.frame(Person = c(rep("A", 20), rep("B", 10)), Success =
> c(rbinom(20, 1, 0.25), rbinom(10, 1, 0.75)))
> Aggregated <- cast(Person ~ ., data = dataset, value = "Success", fun =
> list(mean, length))
>
> m0 <- glm(Success ~ 1, data = dataset, family = binomial)
> m1 <- glm(mean ~ 1, data = Aggregated, family = binomial, weights =
> length)
>
> inv.logit(coef(m0))
> inv.logit(coef(m1))
>
> Have a look at the survey package is you want to analyse stratified
> data.
>
> Thierry
>
> ------------------------------------------------------------------------
> ----
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek
> team Biometrie & Kwaliteitszorg
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
>
> Research Institute for Nature and Forest
> team Biometrics & Quality Assurance
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
>
> tel. + 32 54/436 185
> [hidden email]
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to
> say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of
> data.
> ~ John Tukey
>
>
>> -----Oorspronkelijk bericht-----
>> Van: [hidden email]
>> [mailto:[hidden email]] Namens Jan van der Laan
>> Verzonden: vrijdag 16 april 2010 14:11
>> Aan: [hidden email]
>> Onderwerp: [R] Weights in binomial glm
>>
>> I have some questions about the use of weights in binomial
>> glm as I am not getting the results I would expect. In my
>> case the weights I have can be seen as 'replicate weights';
>> one respondent i in my dataset corresponds to w[i] persons in
>> the population. From the documentation of the glm method, I
>> understand that the weights can indeed be used for this: "For
>> a binomial GLM prior weights are used to give the number of
>> trials when the response is the proportion of successes."
>> >From "Modern applied statistics with S-Plus 3rd ed." I understand the
>> same.
>>
>
> Druk dit bericht a.u.b. niet onnodig af.
> Please do not print this message unnecessarily.
>
> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
> en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
> door een geldig ondertekend document. The views expressed in  this message
> and any annex are purely those of the writer and may not be regarded as stating
> an official position of INBO, as long as the message is not confirmed by a duly
> signed document.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weights in binomial glm

Thierry Onkelinx
Jan,

You misread the documentation of ?glm. Note that glm works with different kinds of families. So the first statement about weights is rather general: it holds for most of the families. It explicitly tells you that is not the case with the binomial family. From the documentation: "For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes". Nothing more, nothing less.

Scaling the weights will change the results because you change the NUMBER OF TRIALS. More trials = more information = lower variances. So you only need to give the weights when the response is expressed as a ratio. If you have it as a binary variable or as cbind(NummerOfSuccesses,NumberOfFailures) then you don't need weights.

Thierry

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
[hidden email]
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
 

> -----Oorspronkelijk bericht-----
> Van: Jan van der Laan [mailto:[hidden email]]
> Verzonden: vrijdag 16 april 2010 16:09
> Aan: ONKELINX, Thierry
> CC: [hidden email]
> Onderwerp: Re: [R] Weights in binomial glm
>
> Thierry,
>
> Thank you for your answer.
>
> From the documentation it looks like it is valid to assume
> that the weights can be used for replicate weights.
> Continuing your example:
>
> dataset$Success2 <- dataset$Success
> Aggregated2 <- cast(Person+Success ~ ., data = dataset, value
> = "Success2", fun =list(mean, length))
> m2 <- glm(mean ~ 1, data = Aggregated2, family = binomial,
> weights =length)
>
> In this case the weights can be seen as replicate weights. In
> my case the proportion of successes for each group is either 0 or 1.
>
> I am familiar with the survey package. However, in this case
> there should not be difference between the two as far as the
> parameter estimates are concerned (the standard errors are
> incorrect for glm).
>
> The strange thing in this case is that the estimates seem to
> depend on the scaling of the weights, which should not be the
> case. Also in your example scaling the weights gives the same
> estimate:
>
> m1 <- glm(mean ~ 1, data = Aggregated, family = binomial,
> weights = length/10)
>
> Regards,
> Jan
>
>
>
> On Fri, Apr 16, 2010 at 3:19 PM, ONKELINX, Thierry
> <[hidden email]> wrote:
> > Jan,
> >
> > It looks like you did not understand the line "For a binomial GLM
> > prior weights are used to give the number of trials when
> the response
> > is the proportion of successes."
> >
> > Weights must be a number of trials (hence integer). Not a
> proportion
> > of a population. Here is an example that clarifies the use
> of weights.
> >
> > library(boot)
> > library(reshape)
> > dataset <- data.frame(Person = c(rep("A", 20), rep("B",
> 10)), Success
> > = c(rbinom(20, 1, 0.25), rbinom(10, 1, 0.75))) Aggregated <-
> > cast(Person ~ ., data = dataset, value = "Success", fun =
> list(mean,
> > length))
> >
> > m0 <- glm(Success ~ 1, data = dataset, family = binomial)
> > m1 <- glm(mean ~ 1, data = Aggregated, family = binomial, weights =
> > length)
> >
> > inv.logit(coef(m0))
> > inv.logit(coef(m1))
> >
> > Have a look at the survey package is you want to analyse stratified
> > data.
> >
> > Thierry
> >
> >
> ----------------------------------------------------------------------
> > --
> > ----
> > ir. Thierry Onkelinx
> > Instituut voor natuur- en bosonderzoek team Biometrie &
> Kwaliteitszorg
> > Gaverstraat 4 9500 Geraardsbergen Belgium
> >
> > Research Institute for Nature and Forest team Biometrics & Quality
> > Assurance Gaverstraat 4 9500 Geraardsbergen Belgium
> >
> > tel. + 32 54/436 185
> > [hidden email]
> > www.inbo.be
> >
> > To call in the statistician after the experiment is done may be no
> > more than asking him to perform a post-mortem examination:
> he may be
> > able to say what the experiment died of.
> > ~ Sir Ronald Aylmer Fisher
> >
> > The plural of anecdote is not data.
> > ~ Roger Brinner
> >
> > The combination of some data and an aching desire for an
> answer does
> > not ensure that a reasonable answer can be extracted from a
> given body
> > of data.
> > ~ John Tukey
> >
> >
> >> -----Oorspronkelijk bericht-----
> >> Van: [hidden email]
> >> [mailto:[hidden email]] Namens Jan van der Laan
> >> Verzonden: vrijdag 16 april 2010 14:11
> >> Aan: [hidden email]
> >> Onderwerp: [R] Weights in binomial glm
> >>
> >> I have some questions about the use of weights in binomial
> glm as I
> >> am not getting the results I would expect. In my case the
> weights I
> >> have can be seen as 'replicate weights'; one respondent i in my
> >> dataset corresponds to w[i] persons in the population. From the
> >> documentation of the glm method, I understand that the weights can
> >> indeed be used for this: "For a binomial GLM prior weights
> are used
> >> to give the number of trials when the response is the
> proportion of
> >> successes."
> >> >From "Modern applied statistics with S-Plus 3rd ed." I understand
> >> >the
> >> same.
> >>
> >
> > Druk dit bericht a.u.b. niet onnodig af.
> > Please do not print this message unnecessarily.
> >
> > Dit bericht en eventuele bijlagen geven enkel de visie van de
> > schrijver weer en binden het INBO onder geen enkel beding,
> zolang dit
> > bericht niet bevestigd is door een geldig ondertekend document. The
> > views expressed in  this message and any annex are purely
> those of the
> > writer and may not be regarded as stating an official position of
> > INBO, as long as the message is not confirmed by a duly
> signed document.
> >
>

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weights in binomial glm

Thomas Lumley
In reply to this post by Jan van der Laan


Jan,

Thierry is correct in saying that you are misusing glm(), but there is also a numerical problem.

You are misusing glm() because your model specification claims to have Binomial(n,p) observations with w in the vicinity of 100, where there is a single common p but the observed binomial proportion is either 1 or 0, never anything in between.  These data are a very poor fit to a binomial model.

The correct specification if you have what you call replicate weights and I call frequency weights is to produce a single data record for each covariate pattern that has both the 1 and 0 observations. This can either be two columns for successes and failures, or one column of proportions and one column of weights.  As your quote from MASS says "weights are used to give the number of trials when the response is the proportion of successes." In your data the response is *not* the proportion of successes.


However, the MLE should still be equal to the weighted mean even with this misuse.  The reason it is not is because of the starting values.  R has to find some starting values for the iterative maximization of the likelihood, and for binomial data with y successes out of n it uses  starting values for the fitted means of  (y+0.5)/(n+1).  Starting the iteration at the data in this way usually makes the Fisher scoring algorithm very reliable -- it is correctly scaled to the data, in some sense.   Unfortunately, if you separate out the successes and failures, you have some points starting with values very close to 0.  When I used your code the starting value for the point with the largest weight was 0.5/199.   At iteration 2, the estimated mean ends up very small for all observations, and then the iteration diverges.  However, if you provide a starting value then the fitting works, even if you start the iteration at, say beta=1, corresponding to a fitted mean of over 70%.

So, the result is wrong in the sense that it is not the mle, because of a failure of convergence, which happens because specifying the weights the way you did rather than the documented way leads to bad default starting values for the iteration.  You need either to specify the data as recommended or supply starting values.

     =thomas


On Fri, 16 Apr 2010, Jan van der Laan wrote:

> I have some questions about the use of weights in binomial glm as I am
> not getting the results I would expect. In my case the weights I have
> can be seen as 'replicate weights'; one respondent i in my dataset
> corresponds to w[i] persons in the population. From the documentation
> of the glm method, I understand that the weights can indeed be used
> for this: "For a binomial GLM prior weights are used to give the
> number of trials when the response is the proportion of successes."
>> From "Modern applied statistics with S-Plus 3rd ed." I understand the
> same.
>
> However, I am getting some strange results. I generated an example:
>
> Generate some data which is simular to my dataset
>> Z <- rbinom(1000, 1, 0.1)
>> W <- round(rnorm(1000, 100, 40))
>> W[W < 1] <- 1
>
> Probability of success can either be estimated using:
>> sum(Z*W)/sum(W)
> [1] 0.09642109
>
> Or using glm:
>> model <- glm(Z ~ 1, weights=W, family=binomial())
> Warning message:
> In glm.fit(x = X, y = Y, weights = weights, start = start, etastart =
> etastart,  :
>  fitted probabilities numerically 0 or 1 occurred
>> predict(model, type="response")[1]
>           1
> 2.220446e-16
>
> These two results are obviously not the same. The strange thing is
> that when I scale the weights, such that the total equals one, the
> probability is correctly estimated:
>
>> model <- glm(Z ~ 1, weights=W/sum(W), family=binomial())
> Warning message:
> In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
>> predict(model, type="response")[1]
>         1
> 0.09642109
>
>
> However scaling of the weights should, as far as I am aware, not have
> an effect on the estimated parameters. I also tried some other
> scalings. And, for example scaling the weights by 20 also gives me the
> correct result.
>
>> model <- glm(Z ~ 1, weights=W/20, family=binomial())
> Warning message:
> In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
>> predict(model, type="response")[1]
>         1
> 0.09642109
>
>
> Am I misinterpreting the weights? Could this be a numerical problem?
>
> Regards,
>
> Jan
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weights in binomial glm

Jan van der Laan
Thomas, Thierry,

Thank your for your answers and my appogies for my late reply. Thomas,
from your reply it seems that dividing the weights by their average
would also make finding a suitable starting value more robust. This
indeed seems to be case from test I've ran.

However, your comments about me misusing glm() make me wonder at how
the weights are used in glm(). I was under the impression that weights
in glm are usually replicae/frequency weights. When I look at how
weights are usually taken into account in the likelihood, namely

L = Product_i L_i^w_i

each record i contributes w_i times a factor L_i to the total
likelihood, which  is the same as saying that record i represents w_i
observations. This is exactly what I want to express with my weights.
Writing out the likelihood for binomial regression also shows that in
that case it is possible to have Y's that are not only 0 or 1, but can
also have values between 0 and 1 as in R. However, looking at this I
do not see any problems with large weights and Y's equal to 0 or one.

Is it just the fact that R has problems generating the starting values
when the Y's are all between 0 and 1, or are there also some other
reasons for this?

Regards,
Jan





On Fri, Apr 16, 2010 at 6:28 PM, Thomas Lumley <[hidden email]> wrote:

>
>
> Jan,
>
> Thierry is correct in saying that you are misusing glm(), but there is also
> a numerical problem.
>
> You are misusing glm() because your model specification claims to have
> Binomial(n,p) observations with w in the vicinity of 100, where there is a
> single common p but the observed binomial proportion is either 1 or 0, never
> anything in between.  These data are a very poor fit to a binomial model.
>
> The correct specification if you have what you call replicate weights and I
> call frequency weights is to produce a single data record for each covariate
> pattern that has both the 1 and 0 observations. This can either be two
> columns for successes and failures, or one column of proportions and one
> column of weights.  As your quote from MASS says "weights are used to give
> the number of trials when the response is the proportion of successes." In
> your data the response is *not* the proportion of successes.
>
>
> However, the MLE should still be equal to the weighted mean even with this
> misuse.  The reason it is not is because of the starting values.  R has to
> find some starting values for the iterative maximization of the likelihood,
> and for binomial data with y successes out of n it uses  starting values for
> the fitted means of  (y+0.5)/(n+1).  Starting the iteration at the data in
> this way usually makes the Fisher scoring algorithm very reliable -- it is
> correctly scaled to the data, in some sense.   Unfortunately, if you
> separate out the successes and failures, you have some points starting with
> values very close to 0.  When I used your code the starting value for the
> point with the largest weight was 0.5/199.   At iteration 2, the estimated
> mean ends up very small for all observations, and then the iteration
> diverges.  However, if you provide a starting value then the fitting works,
> even if you start the iteration at, say beta=1, corresponding to a fitted
> mean of over 70%.
>
> So, the result is wrong in the sense that it is not the mle, because of a
> failure of convergence, which happens because specifying the weights the way
> you did rather than the documented way leads to bad default starting values
> for the iteration.  You need either to specify the data as recommended or
> supply starting values.
>
>    =thomas
>
>
> On Fri, 16 Apr 2010, Jan van der Laan wrote:
>
>> I have some questions about the use of weights in binomial glm as I am
>> not getting the results I would expect. In my case the weights I have
>> can be seen as 'replicate weights'; one respondent i in my dataset
>> corresponds to w[i] persons in the population. From the documentation
>> of the glm method, I understand that the weights can indeed be used
>> for this: "For a binomial GLM prior weights are used to give the
>> number of trials when the response is the proportion of successes."
>>>
>>> From "Modern applied statistics with S-Plus 3rd ed." I understand the
>>
>> same.
>>
>> However, I am getting some strange results. I generated an example:
>>
>> Generate some data which is simular to my dataset
>>>
>>> Z <- rbinom(1000, 1, 0.1)
>>> W <- round(rnorm(1000, 100, 40))
>>> W[W < 1] <- 1
>>
>> Probability of success can either be estimated using:
>>>
>>> sum(Z*W)/sum(W)
>>
>> [1] 0.09642109
>>
>> Or using glm:
>>>
>>> model <- glm(Z ~ 1, weights=W, family=binomial())
>>
>> Warning message:
>> In glm.fit(x = X, y = Y, weights = weights, start = start, etastart =
>> etastart,  :
>>  fitted probabilities numerically 0 or 1 occurred
>>>
>>> predict(model, type="response")[1]
>>
>>          1
>> 2.220446e-16
>>
>> These two results are obviously not the same. The strange thing is
>> that when I scale the weights, such that the total equals one, the
>> probability is correctly estimated:
>>
>>> model <- glm(Z ~ 1, weights=W/sum(W), family=binomial())
>>
>> Warning message:
>> In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
>>>
>>> predict(model, type="response")[1]
>>
>>        1
>> 0.09642109
>>
>>
>> However scaling of the weights should, as far as I am aware, not have
>> an effect on the estimated parameters. I also tried some other
>> scalings. And, for example scaling the weights by 20 also gives me the
>> correct result.
>>
>>> model <- glm(Z ~ 1, weights=W/20, family=binomial())
>>
>> Warning message:
>> In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
>>>
>>> predict(model, type="response")[1]
>>
>>        1
>> 0.09642109
>>
>>
>> Am I misinterpreting the weights? Could this be a numerical problem?
>>
>> Regards,
>>
>> Jan
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Thomas Lumley                   Assoc. Professor, Biostatistics
> [hidden email]        University of Washington, Seattle
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.