svydesign syntax

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

svydesign syntax

R user
This message is for those familiar with the survey package. I need to fit a weighted Cox model to accommodate the sampling weights as I have a case-control study with controls sampled at random from a database in a ratio 2:1 to cases (whom were all sampled). I want to make sure I am using the right svydesign syntax to specify this sampling design. Can anyone please check if the statement below is appropriate for my design?

#group represents the case (total of 132) vs control (253 out of the total of 853 controls) groups; prob is 1 for cases and 253/853 for controls and ssize=132 for cases and 853 otherwise;

dstr=svydesign(id=~1, strata=~group, prob=~prob, fpc=~ssize, data=noNA)
Reply | Threaded
Open this post in threaded view
|

Re: svydesign syntax

Thomas Lumley
On Thu, 22 Jul 2010, R user wrote:

> This message is for those familiar with the survey package. I need to fit a
> weighted Cox model to accommodate the sampling weights as I have a
> case-control study with controls sampled at random from a database in a
> ratio 2:1 to cases (whom were all sampled). I want to make sure I am using
> the right svydesign syntax to specify this sampling design. Can anyone
> please check if the statement below is appropriate for my design?
>
> #group represents the case (total of 132) vs control (253 out of the total
> of 853 controls) groups; prob is 1 for cases and 253/853 for controls and
> ssize=132 for cases and 853 otherwise;
>
> dstr=svydesign(id=~1, strata=~group, prob=~prob, fpc=~ssize, data=noNA)
>

This is technically correct but probably not for what you want.  You probably want

dstr=svydesign(id=~1, strata=~group, prob=~prob,  data=noNA)
or
dstr = twophase(id=list(~1,~1), strata=list(NULL, ~group), data=noNA)

Your svydesign() call treats the database as the full population.  This could be correct, but usually people want estimates for the 'superpopulation' from which the population was sampled.  The first option above is very slightly conservative, the second describes the two phases of sampling that give first the whole database and then your subsample.

    -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.