relative risk regression with survey data

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

relative risk regression with survey data

Daniel Nordlund-3
I have been asked to look at options for doing relative risk regression
on some survey data.  I have a binary DV and several predictor /
adjustment variables.  In R, would this be as "simple" as using the
survey package to set up an appropriate design object and then running
svyglm with family=binomial(log) ?  Any other suggestions for covariate
adjustment of relative risk estimates?  Any and all suggestions welcomed.

Dan

--
Daniel Nordlund
Bothell, WA USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: relative risk regression with survey data

Thomas Lumley
On Mon, 13 Sep 2010, Daniel Nordlund wrote:

> I have been asked to look at options for doing relative risk regression on
> some survey data.  I have a binary DV and several predictor / adjustment
> variables.  In R, would this be as "simple" as using the survey package to
> set up an appropriate design object and then running svyglm with
> family=binomial(log) ?  Any other suggestions for covariate adjustment of
> relative risk estimates?  Any and all suggestions welcomed.

If the fitted values don't get too close to 1 then svyglm(  ,family=quasibinomial(log)) will do it.

The log-binomial model is very non-robust when the fitted values get close to 1, and there is some controversy over the best approach.  You can still use svyglm(  ,family=quasibinomial(log)) but you will probably need to set the number of iterations much higher (perhaps 200).

Alternatively, you can use nonlinear least squares  [svyglm(, family=gaussian(log))] or other quasilikelihood approaches, such as family=quasipoisson(log).  These are all consistent for the same parameter if the model is correctly specified and are much more robust to x-outliers.  I rather like nonlinear least squares, because it's easy to explain.

      -thomas


Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: relative risk regression with survey data

Daniel Nordlund-3
Thanks to Thomas Lumley and David Winsemius for their responses.  I
had read a number of papers by Thomas and have ordered his book on
survey analysis, but I wanted to get some confirmation because I
wanted to get started before the book arrived.  Thanks, again.

Dan

Daniel Nordlund
Bothell, WA USA

On Mon, Sep 13, 2010 at 7:40 PM, Thomas Lumley <[hidden email]> wrote:

> On Mon, 13 Sep 2010, Daniel Nordlund wrote:
>
>> I have been asked to look at options for doing relative risk regression on
>> some survey data.  I have a binary DV and several predictor / adjustment
>> variables.  In R, would this be as "simple" as using the survey package to
>> set up an appropriate design object and then running svyglm with
>> family=binomial(log) ?  Any other suggestions for covariate adjustment of
>> relative risk estimates?  Any and all suggestions welcomed.
>
> If the fitted values don't get too close to 1 then svyglm(
>  ,family=quasibinomial(log)) will do it.
>
> The log-binomial model is very non-robust when the fitted values get close
> to 1, and there is some controversy over the best approach.  You can still
> use svyglm(  ,family=quasibinomial(log)) but you will probably need to set
> the number of iterations much higher (perhaps 200).
>
> Alternatively, you can use nonlinear least squares  [svyglm(,
> family=gaussian(log))] or other quasilikelihood approaches, such as
> family=quasipoisson(log).  These are all consistent for the same parameter
> if the model is correctly specified and are much more robust to x-outliers.
>  I rather like nonlinear least squares, because it's easy to explain.
>
>     -thomas
>
>
> Thomas Lumley
> Professor of Biostatistics
> University of Washington, Seattle
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: relative risk regression with survey data

Ravi Varadhan
In reply to this post by Thomas Lumley
Dear Thomas,

You said, "the log-binomial model is very non-robust when the fitted values
get close to 1, and there is some controversy over the best approach."
Could you please point me to a paper that discusses the issues?

I have written some code to do maximum likelihood estimation for relative,
additive, and mixed risk regression models with binomial model.  I have been
able to obtain good convergence.  I have used bootstrap to get standard
errors.  However, I am not sure if these standard errors are valid when
fitted values were close to 0 or 1. It seems to me that when the fitted
probabilities are close to 0 or 1, there is not a good way to estimate
standard errors.
   

Thanks,
Ravi.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Thomas Lumley
Sent: Monday, September 13, 2010 10:41 PM
To: Daniel Nordlund
Cc: [hidden email]
Subject: Re: [R] relative risk regression with survey data

On Mon, 13 Sep 2010, Daniel Nordlund wrote:

> I have been asked to look at options for doing relative risk regression on

> some survey data.  I have a binary DV and several predictor / adjustment
> variables.  In R, would this be as "simple" as using the survey package to

> set up an appropriate design object and then running svyglm with
> family=binomial(log) ?  Any other suggestions for covariate adjustment of
> relative risk estimates?  Any and all suggestions welcomed.

If the fitted values don't get too close to 1 then svyglm(
,family=quasibinomial(log)) will do it.

The log-binomial model is very non-robust when the fitted values get close
to 1, and there is some controversy over the best approach.  You can still
use svyglm(  ,family=quasibinomial(log)) but you will probably need to set
the number of iterations much higher (perhaps 200).

Alternatively, you can use nonlinear least squares  [svyglm(,
family=gaussian(log))] or other quasilikelihood approaches, such as
family=quasipoisson(log).  These are all consistent for the same parameter
if the model is correctly specified and are much more robust to x-outliers.
I rather like nonlinear least squares, because it's easy to explain.

      -thomas


Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: relative risk regression with survey data

Thomas Lumley
On Wed, 15 Sep 2010, Ravi Varadhan wrote:

> Dear Thomas,
>
> You said, "the log-binomial model is very non-robust when the fitted values
> get close to 1, and there is some controversy over the best approach."
> Could you please point me to a paper that discusses the issues?
>
> I have written some code to do maximum likelihood estimation for relative,
> additive, and mixed risk regression models with binomial model.  I have been
> able to obtain good convergence.  I have used bootstrap to get standard
> errors.  However, I am not sure if these standard errors are valid when
> fitted values were close to 0 or 1. It seems to me that when the fitted
> probabilities are close to 0 or 1, there is not a good way to estimate
> standard errors.

There's a technical report at
http://www.bepress.com/uwbiostat/paper293/
with simulations, some theory, and references.  It's under review at the moment, after being forgotten for a few years.

The distribution of the parameter estimates when the true parameter is on the boundary of the parameter space is a separate mess.
  Theoretically it is the intersection of the the multivariate Normal with the parameter space, and if the parameter space has a piecewise linear boundary the log likelihood ratio has a chi-squared mixture distribution.  In practice, if there isn't a hard edge to the covariate distribution it's not going to be easy to get a good approximation to the distribution of parameter estimates. As an example of the complications, the sampling distributions for fixed and random design matrices can be very different, because a random design matrix means that the estimated edge of the parameter space moves from one realization to another.

     -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.