Package survey: singularities in linear regression models

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Package survey: singularities in linear regression models

Sebastian Weirich-2
Hello,

I want to specify a linear regression model in which the metric outcome
is predicted by two factors and their interaction. glm() computes
effects for each factor level and the levels of the interaction. In the
case of singularities glm() displays "NA" for the corresponding
coefficients. However, svyglm() aborts with an error message. Is there a
possibility that svyglm() provides output for coefficients without
singularities like glm()?

Thank you very much,
Sebastian Weirich

--
Sebastian Weirich, Dipl.-Psych.

Institut zur Qualitätsentwicklung im Bildungswesen
Humboldt-Universität zu Berlin
Sitz: Hannoversche Straße 19, 10115 Berlin
Postadresse: Unter den Linden 6, 10099 Berlin

Tel: +49-(0)30-2093-46512

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Package survey: singularities in linear regression models

Thomas Lumley-2
On Fri, May 3, 2013 at 2:27 AM, Sebastian Weirich <
[hidden email]> wrote:

> Hello,
>
> I want to specify a linear regression model in which the metric outcome is
> predicted by two factors and their interaction. glm() computes effects for
> each factor level and the levels of the interaction. In the case of
> singularities glm() displays "NA" for the corresponding coefficients.
> However, svyglm() aborts with an error message. Is there a possibility that
> svyglm() provides output for coefficients without singularities like glm()?
>
>
It's not true that svyglm() aborts with an error message whenever there are
singularities, eg

> svyglm(enroll~stype+I(stype),design=dclus1)
1 - level Cluster Sampling design
With (15) clusters.
svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)

Call:  svyglm(formula = enroll ~ stype + I(stype), design = dclus1)

Coefficients:
(Intercept)       stypeH       stypeM    I(stype)H    I(stype)M
      432.9        697.4        464.9           NA           NA

Degrees of Freedom: 182 Total (i.e. Null);  12 Residual
Null Deviance:    24830000
Residual Deviance: 15120000 AIC: 2599


So, perhaps you could show us what you actually did, and what actually
happened, as the posting guidelines request.

    -thomas

--
Thomas Lumley
Professor of Biostatistics
University of Auckland

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Package survey: singularities in linear regression models

Sebastian Weirich-2
Well, I have uploaded the data in the public folder of my dropbox. Due
to data confidentiality, I haved to change the labels. To load the data:

con <- url( "http://dl.dropboxusercontent.com/u/101865137/datEx.rda" )
print(load(con))

# The replicate weights were created according to the jackknife (JK2)
procedure in the same way as implemented in WesVar.
# According to 100 JK zones, 100 replicate weights result. The replicate
weights are labelled "totwgtM_1" to "totwgtM_100"
# The regression I want to specify is achievement on group and origin.
Both predictors are factors.

library(survey)
design   <- svrepdesign(data = datEx[, c("origin", "group",
"achievement")], weights = datEx[ ,"pweight"],
         type="JKn", scale = 1, rscales = 1, repweights =
datEx[,grep("^totwgtM_", colnames(datEx))], combined.weights = TRUE, mse
= TRUE)

# This works
mod1     <- svyglm(formula = achievement ~ origin + group, design =
design, return.replicates = FALSE, family = gaussian(link="identity"))

# I get the error message when specifying the interaction
mod2     <- svyglm(formula = achievement ~ origin * group, design =
design, return.replicates = FALSE, family = gaussian(link="identity"))

# The output of the conventional glm() function reports singularities
for one coefficient of the interaction
mod3     <- glm(formula = achievement ~ origin * group, data = datEx,
family = gaussian(link = "identity"))

Thanks again,
Sebastian

--
Sebastian Weirich, Dipl.-Psych.

Institut zur Qualitätsentwicklung im Bildungswesen
Humboldt-Universität zu Berlin
Sitz: Hannoversche Straße 19, 10115 Berlin
Postadresse: Unter den Linden 6, 10099 Berlin

Tel: +49-(0)30-2093-46512

Am 02.05.2013 22:02, schrieb Thomas Lumley:

> On Fri, May 3, 2013 at 2:27 AM, Sebastian Weirich
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hello,
>
>     I want to specify a linear regression model in which the metric
>     outcome is predicted by two factors and their interaction. glm()
>     computes effects for each factor level and the levels of the
>     interaction. In the case of singularities glm() displays "NA" for
>     the corresponding coefficients. However, svyglm() aborts with an
>     error message. Is there a possibility that svyglm() provides
>     output for coefficients without singularities like glm()?
>
>
> It's not true that svyglm() aborts with an error message whenever
> there are singularities, eg
>
> > svyglm(enroll~stype+I(stype),design=dclus1)
> 1 - level Cluster Sampling design
> With (15) clusters.
> svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)
>
> Call:  svyglm(formula = enroll ~ stype + I(stype), design = dclus1)
>
> Coefficients:
> (Intercept)       stypeH       stypeM    I(stype)H  I(stype)M
>       432.9        697.4        464.9           NA         NA
>
> Degrees of Freedom: 182 Total (i.e. Null);  12 Residual
> Null Deviance:   24830000
> Residual Deviance: 15120000 AIC: 2599
>
>
> So, perhaps you could show us what you actually did, and what actually
> happened, as the posting guidelines request.
>
>     -thomas
>
> --
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Package survey: singularities in linear regression models

Thomas Lumley-2
Ok, that's more helpful.

The problem is with replicate-weight designs, and it's because svyglm()
uses the fitted coefficients from the point estimate as starting values for
fitting the replicates.  And even if that is changed, the computation of
the replicate variance doesn't like all the replicates having NAs.

A workaround is to use withReplicates() and to extract only the non-NA
coefficients

> withReplicates(design, quote(coef(glm(achievement ~ origin * group,
weights=.weights))[-12]))
                                       theta      SE
(Intercept)                        446.55058  8.3059
originFrance                         4.44506 15.1254
originGB                            82.74938 15.1038
originNetherlands                   40.31827 14.9648
originother                         67.83235  9.1836
originSweden                        30.45314 16.3332
groupschooltype2                     0.89859 10.2257
originFrance:groupschooltype2        6.73524 23.2032
originGB:groupschooltype2          -40.73301 18.1516
originNetherlands:groupschooltype2  15.62253 19.8704
originother:groupschooltype2       -34.59610 11.9646

   -thomas





On Sat, May 4, 2013 at 2:59 AM, Sebastian Weirich <
[hidden email]> wrote:

>  Well, I have uploaded the data in the public folder of my dropbox. Due
> to data confidentiality, I haved to change the labels. To load the data:
>
> con <- url( "http://dl.dropboxusercontent.com/u/101865137/datEx.rda"<http://dl.dropboxusercontent.com/u/101865137/datEx.rda>)
> print(load(con))
>
> # The replicate weights were created according to the jackknife (JK2)
> procedure in the same way as implemented in WesVar.
> # According to 100 JK zones, 100 replicate weights result. The replicate
> weights are labelled "totwgtM_1" to "totwgtM_100"
> # The regression I want to specify is achievement on group and origin.
> Both predictors are factors.
>
> library(survey)
> design   <- svrepdesign(data = datEx[, c("origin", "group",
> "achievement")], weights = datEx[ ,"pweight"],
>         type="JKn", scale = 1, rscales = 1, repweights =
> datEx[,grep("^totwgtM_", colnames(datEx))], combined.weights = TRUE, mse =
> TRUE)
>
> # This works
> mod1     <- svyglm(formula = achievement ~ origin + group, design =
> design, return.replicates = FALSE, family = gaussian(link="identity"))
>
> # I get the error message when specifying the interaction
> mod2     <- svyglm(formula = achievement ~ origin * group, design =
> design, return.replicates = FALSE, family = gaussian(link="identity"))
>
> # The output of the conventional glm() function reports singularities for
> one coefficient of the interaction
> mod3     <- glm(formula = achievement ~ origin * group, data = datEx,
> family = gaussian(link = "identity"))
>
> Thanks again,
> Sebastian
>
> --
> Sebastian Weirich, Dipl.-Psych.
>
> Institut zur Qualitätsentwicklung im Bildungswesen
> Humboldt-Universität zu Berlin
> Sitz: Hannoversche Straße 19, 10115 Berlin
> Postadresse: Unter den Linden 6, 10099 Berlin
>
> Tel: +49-(0)30-2093-46512
>
> Am 02.05.2013 22:02, schrieb Thomas Lumley:
>
> On Fri, May 3, 2013 at 2:27 AM, Sebastian Weirich <
> [hidden email]> wrote:
>
>> Hello,
>>
>> I want to specify a linear regression model in which the metric outcome
>> is predicted by two factors and their interaction. glm() computes effects
>> for each factor level and the levels of the interaction. In the case of
>> singularities glm() displays "NA" for the corresponding coefficients.
>> However, svyglm() aborts with an error message. Is there a possibility that
>> svyglm() provides output for coefficients without singularities like glm()?
>>
>>
>  It's not true that svyglm() aborts with an error message whenever there
> are singularities, eg
>
>  > svyglm(enroll~stype+I(stype),design=dclus1)
> 1 - level Cluster Sampling design
> With (15) clusters.
> svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)
>
>  Call:  svyglm(formula = enroll ~ stype + I(stype), design = dclus1)
>
>  Coefficients:
> (Intercept)       stypeH       stypeM    I(stype)H    I(stype)M
>       432.9        697.4        464.9           NA           NA
>
>  Degrees of Freedom: 182 Total (i.e. Null);  12 Residual
> Null Deviance:    24830000
> Residual Deviance: 15120000 AIC: 2599
>
>
>  So, perhaps you could show us what you actually did, and what actually
> happened, as the posting guidelines request.
>
>      -thomas
>
>  --
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland
>
>
>

--
Thomas Lumley
Professor of Biostatistics
University of Auckland

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.