(svy)glm and weights question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

(svy)glm and weights question

Jos Elkink-2
Hi all,

I am running a set of logistic regressions, where we want to use some
weights, and I am not sure whether what I am doing is reasonable or
not.

The dependent variable is turnout in an election - i.e. survey
respondents were asked whether or not they voted. The percentage of
those who say they voted is much higher than the actual turnout,
probably due both to non-response bias and social desirability issues.
So now the suggestion is to weigh the cases, to weigh down the
respondents who say they voted and weigh more heavily those who did
say they did not vote. So the questions that arise from this are:

1) Is it reasonable to use the distribution of the dependent variable
to calculate the weights used in a logistic regression? It feels
wrong, but I cannot find, so far, any sources on this.

2) How to implement this in R? I tried the weights option in glm(),
but I think that is meant for when you have one row in your data for
multiple observations, not for this kind of weight. Although I have
the McCullagh and Nelder book explaining in detail how glm() operates,
I cannot find a similar book for svyglm(). Is svyglm() better for this
type of weighting?

3) Where would I find a good source describing the estimation
procedure, including weighting, applied in svyglm()?

Thanks in advance for any help!

Jos

--
Johan A. Elkink
Lecturer in Social Science Research Methods
School of Politics and International Relations & CHS Graduate School
University College Dublin
Ph. +353 1 716 8150  |  Newman Building, Rm F304
http://jaeweb.cantr.net

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: (svy)glm and weights question

Thomas Lumley
On Tue, 11 May 2010, Jos Elkink wrote:

> Hi all,
>
> I am running a set of logistic regressions, where we want to use some
> weights, and I am not sure whether what I am doing is reasonable or
> not.
>
> The dependent variable is turnout in an election - i.e. survey
> respondents were asked whether or not they voted. The percentage of
> those who say they voted is much higher than the actual turnout,
> probably due both to non-response bias and social desirability issues.
> So now the suggestion is to weigh the cases, to weigh down the
> respondents who say they voted and weigh more heavily those who did
> say they did not vote. So the questions that arise from this are:
>
> 1) Is it reasonable to use the distribution of the dependent variable
> to calculate the weights used in a logistic regression? It feels
> wrong, but I cannot find, so far, any sources on this.

Yes and no.  There's nothing special about it being the dependent variable.  As with any other methods for handling missing data and measurement error, it won't actually work, but it might reduce the bias.

However, there is something special about it being logistic regression model with biased sampling only on the dependent variable. This is better known as case-control sampling, and there isn't any bias for the coefficients of the predictors, so reweighting won't help.


> 2) How to implement this in R? I tried the weights option in glm(),
> but I think that is meant for when you have one row in your data for
> multiple observations, not for this kind of weight. Although I have
> the McCullagh and Nelder book explaining in detail how glm() operates,
> I cannot find a similar book for svyglm(). Is svyglm() better for this
> type of weighting?

In general svyglm() is better for this type of weighting.  The point estimates are the same (and in fact are obtained from glm()), but the standard errors are more appropriate. Under the unreasonable assumption that the weighting does correct the bias, the standard errors will also be correct.

> 3) Where would I find a good source describing the estimation
> procedure, including weighting, applied in svyglm()?

Well, one source is the book of the package (see http://faculty.washington.edu/tlumley/svybook/ for its web page).  I'm perhaps not the best person to say whether it's a good source.  Chapters 5 and 6 on regression and 7 on post-stratification, raking and calibration would be relevant.

There is much more detail about the general weighting approach in Sarndal, Swensson, Wretman "Model Assisted Survey Sampling".  Or you can search for papers on "calibration" and "non-response".   The survey literature generally will not say that much about applying these methods to regression modelling, but the principles are the same.

     -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.