bigglm() results different from glm()

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

bigglm() results different from glm()

Francisco J. Zagmutt
Dear all,

I am using the bigglm package to fit a few GLM's to a large dataset (3
million rows, 6 columns).  While trying to fit a Poisson GLM I noticed
that the coefficient estimates were very different from what I obtained
when estimating the model on a smaller dataset using glm(), I wrote a
very basic toy example to compare the results of bigglm() against a
glm() call.  Consider the following code:


 > require(biglm)
 > options(digits=6, scipen=3, contrasts = c("contr.treatment",
"contr.poly"))
 > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
ttment=gl(2,50000))
 > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
 > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
 > summary(m1)

<snipped output for this email>
Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  2.30305    0.00141    1629   <2e-16 ***
ttment2      0.40429    0.00183     221   <2e-16 ***

     Null deviance: 151889  on 99999  degrees of freedom
Residual deviance: 101848  on 99998  degrees of freedom
AIC: 533152

 > summary(m1big)
Large data regression model: bigglm(y ~ ttment, data = dat, family =
poisson(link = "log"))
Sample size =  100000
              Coef  (95%   CI)    SE p
(Intercept) 2.651 2.650 2.653 0.001 0
ttment2     4.346 4.344 4.348 0.001 0

 > m1big$deviance
[1] 287158986


Notice that the coefficients and deviance are quite different in the
model estimated using bigglm(). If I change the chunk to
seq(1000,10000,1000) the estimates remain the same.

Can someone help me understand what is causing these differences?

Here is my version info:

 > version
                _
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          2
minor          8.1
year           2008
month          12
day            22
svn rev        47281
language       R
version.string R version 2.8.1 (2008-12-22)


Many thanks in advance for your help,

Francisco

--
Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
www.voseconsulting.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bigglm() results different from glm()

Thomas Lumley

This is a surprisingly interesting problem that took a while to debug, because the computations all seemed correct.

Your model hasn't converged yet.  You can get the right answer either by running longer:

> summary(m1big_longer)
Large data regression model: bigglm(y ~ ttment, data = dat, family = poisson(link = "log"),
     chunksize = 100000, maxit = 20)
Sample size =  100000
              Coef  (95%   CI)    SE p
(Intercept) 2.304 2.301 2.307 0.001 0
ttment2     0.405 0.401 0.408 0.002 0

or supplying starting values:

> summary(m1big_started)
Large data regression model: bigglm(y ~ ttment, data = dat, family = poisson(link = "log"),
     chunksize = 100000, start = c(2, 0))
Sample size =  100000
              Coef  (95%   CI)    SE p
(Intercept) 2.304 2.301 2.307 0.001 0
ttment2     0.405 0.401 0.408 0.002 0


The bug is that you weren't told about the lack of convergence.  There is a flag in the object, but it is only set after the model is converged, so it is not there when convergence fails.

> m1big$converged
NULL
> m1big_longer$converged
[1] TRUE
> m1big_started$converged
[1] TRUE

For the next version I will make sure there is a clear warning when the model hasn't converged.  The default maximum number of iterations is fairly small, by design --- if it isn't working, you want to find out and specify starting values rather than wait for dozens of potentially slow iterations.  This strategy obviously breaks down when you don't notice that failure. :(

      -thomas


On Mon, 16 Mar 2009, Francisco J. Zagmutt wrote:

> Dear all,
>
> I am using the bigglm package to fit a few GLM's to a large dataset (3 million
> rows, 6 columns).  While trying to fit a Poisson GLM I noticed that the
> coefficient estimates were very different from what I obtained when estimating
> the model on a smaller dataset using glm(), I wrote a very basic toy example to
> compare the results of bigglm() against a glm() call.  Consider the following
> code:
>
>
>> require(biglm)
>> options(digits=6, scipen=3, contrasts = c("contr.treatment", "contr.poly"))
>> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)), ttment=gl(2,50000))
>> m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>> m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
>> summary(m1)
>
> <snipped output for this email>
> Coefficients:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept)  2.30305    0.00141    1629   <2e-16 ***
> ttment2      0.40429    0.00183     221   <2e-16 ***
>
>    Null deviance: 151889  on 99999  degrees of freedom
> Residual deviance: 101848  on 99998  degrees of freedom
> AIC: 533152
>
>> summary(m1big)
> Large data regression model: bigglm(y ~ ttment, data = dat, family =
> poisson(link = "log"))
> Sample size =  100000
>             Coef  (95%   CI)    SE p
> (Intercept) 2.651 2.650 2.653 0.001 0
> ttment2     4.346 4.344 4.348 0.001 0
>
>> m1big$deviance
> [1] 287158986
>
>
> Notice that the coefficients and deviance are quite different in the model
> estimated using bigglm(). If I change the chunk to seq(1000,10000,1000) the
> estimates remain the same.
>
> Can someone help me understand what is causing these differences?
>
> Here is my version info:
>
>> version
>               _
> platform       i386-pc-mingw32
> arch           i386
> os             mingw32
> system         i386, mingw32
> status
> major          2
> minor          8.1
> year           2008
> month          12
> day            22
> svn rev        47281
> language       R
> version.string R version 2.8.1 (2008-12-22)
>
>
> Many thanks in advance for your help,
>
> Francisco
>
> --
> Francisco J. Zagmutt
> Vose Consulting
> 2891 20th Street
> Boulder, CO, 80304
> USA
> www.voseconsulting.com
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bigglm() results different from glm()

Fox, John
In reply to this post by Francisco J. Zagmutt
Dear Francisco,

I was able to duplicate the problem that you reported, and in addition
discovered that the problem seems to be peculiar to the poisson family.
lm(y~ttment, data=dat) and biglm(y~ttment, data=dat) produce identical
results, as do glm(y~ttment, data=dat) and bigglm(y~ttment, data=dat).
Another example, with the binomial family:

> set.seed(12345)
> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
ttment=gl(2,50000))
> dat$y0 <- ifelse(dat$y > 12, 1, 0)
> m1b <- glm(y0~ttment, data=dat, family=binomial)  
> m1bigb <- bigglm(y0~ttment , data=dat, family=binomial())
> coef(m1b)
(Intercept)     ttment2
   -1.33508     2.34263
> coef(m1bigb)
(Intercept)     ttment2
   -1.33508     2.34263
> deviance(m1b)
[1] 109244
> deviance(m1bigb)  
[1] 109244

I'm copying this message to Thomas Lumley, who's the author and maintainer
of the biglm package.

Regards,
 John


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
On

> Behalf Of Francisco J. Zagmutt
> Sent: March-16-09 10:26 PM
> To: [hidden email]
> Subject: [R] bigglm() results different from glm()
>
> Dear all,
>
> I am using the bigglm package to fit a few GLM's to a large dataset (3
> million rows, 6 columns).  While trying to fit a Poisson GLM I noticed
> that the coefficient estimates were very different from what I obtained
> when estimating the model on a smaller dataset using glm(), I wrote a
> very basic toy example to compare the results of bigglm() against a
> glm() call.  Consider the following code:
>
>
>  > require(biglm)
>  > options(digits=6, scipen=3, contrasts = c("contr.treatment",
> "contr.poly"))
>  > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
> ttment=gl(2,50000))
>  > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>  > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
>  > summary(m1)
>
> <snipped output for this email>
> Coefficients:
>              Estimate Std. Error z value Pr(>|z|)
> (Intercept)  2.30305    0.00141    1629   <2e-16 ***
> ttment2      0.40429    0.00183     221   <2e-16 ***
>
>      Null deviance: 151889  on 99999  degrees of freedom
> Residual deviance: 101848  on 99998  degrees of freedom
> AIC: 533152
>
>  > summary(m1big)
> Large data regression model: bigglm(y ~ ttment, data = dat, family =
> poisson(link = "log"))
> Sample size =  100000
>               Coef  (95%   CI)    SE p
> (Intercept) 2.651 2.650 2.653 0.001 0
> ttment2     4.346 4.344 4.348 0.001 0
>
>  > m1big$deviance
> [1] 287158986
>
>
> Notice that the coefficients and deviance are quite different in the
> model estimated using bigglm(). If I change the chunk to
> seq(1000,10000,1000) the estimates remain the same.
>
> Can someone help me understand what is causing these differences?
>
> Here is my version info:
>
>  > version
>                 _
> platform       i386-pc-mingw32
> arch           i386
> os             mingw32
> system         i386, mingw32
> status
> major          2
> minor          8.1
> year           2008
> month          12
> day            22
> svn rev        47281
> language       R
> version.string R version 2.8.1 (2008-12-22)
>
>
> Many thanks in advance for your help,
>
> Francisco
>
> --
> Francisco J. Zagmutt
> Vose Consulting
> 2891 20th Street
> Boulder, CO, 80304
> USA
> www.voseconsulting.com
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bigglm() results different from glm()

Thomas Lumley

Yes, the slow convergence is easier to get with the log link.  Overshooting the correct coefficient vector has more dramatic effects on the fitted values and weights, and in this example the starting value of (0,0) is a long way from the truth.   The same sort of thing happens in the Cox model, where there are real data sets that will cause numeric overflow in a careless implementation.

It might be worth trying to guess better starting values: saving an iteration or two would be useful with large data sets.

      -thomas


On Tue, 17 Mar 2009, John Fox wrote:

> Dear Francisco,
>
> I was able to duplicate the problem that you reported, and in addition
> discovered that the problem seems to be peculiar to the poisson family.
> lm(y~ttment, data=dat) and biglm(y~ttment, data=dat) produce identical
> results, as do glm(y~ttment, data=dat) and bigglm(y~ttment, data=dat).
> Another example, with the binomial family:
>
>> set.seed(12345)
>> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
> ttment=gl(2,50000))
>> dat$y0 <- ifelse(dat$y > 12, 1, 0)
>> m1b <- glm(y0~ttment, data=dat, family=binomial)
>> m1bigb <- bigglm(y0~ttment , data=dat, family=binomial())
>> coef(m1b)
> (Intercept)     ttment2
>   -1.33508     2.34263
>> coef(m1bigb)
> (Intercept)     ttment2
>   -1.33508     2.34263
>> deviance(m1b)
> [1] 109244
>> deviance(m1bigb)
> [1] 109244
>
> I'm copying this message to Thomas Lumley, who's the author and maintainer
> of the biglm package.
>
> Regards,
> John
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]]
> On
>> Behalf Of Francisco J. Zagmutt
>> Sent: March-16-09 10:26 PM
>> To: [hidden email]
>> Subject: [R] bigglm() results different from glm()
>>
>> Dear all,
>>
>> I am using the bigglm package to fit a few GLM's to a large dataset (3
>> million rows, 6 columns).  While trying to fit a Poisson GLM I noticed
>> that the coefficient estimates were very different from what I obtained
>> when estimating the model on a smaller dataset using glm(), I wrote a
>> very basic toy example to compare the results of bigglm() against a
>> glm() call.  Consider the following code:
>>
>>
>> > require(biglm)
>> > options(digits=6, scipen=3, contrasts = c("contr.treatment",
>> "contr.poly"))
>> > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>> ttment=gl(2,50000))
>> > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>> > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
>> > summary(m1)
>>
>> <snipped output for this email>
>> Coefficients:
>>              Estimate Std. Error z value Pr(>|z|)
>> (Intercept)  2.30305    0.00141    1629   <2e-16 ***
>> ttment2      0.40429    0.00183     221   <2e-16 ***
>>
>>      Null deviance: 151889  on 99999  degrees of freedom
>> Residual deviance: 101848  on 99998  degrees of freedom
>> AIC: 533152
>>
>> > summary(m1big)
>> Large data regression model: bigglm(y ~ ttment, data = dat, family =
>> poisson(link = "log"))
>> Sample size =  100000
>>               Coef  (95%   CI)    SE p
>> (Intercept) 2.651 2.650 2.653 0.001 0
>> ttment2     4.346 4.344 4.348 0.001 0
>>
>> > m1big$deviance
>> [1] 287158986
>>
>>
>> Notice that the coefficients and deviance are quite different in the
>> model estimated using bigglm(). If I change the chunk to
>> seq(1000,10000,1000) the estimates remain the same.
>>
>> Can someone help me understand what is causing these differences?
>>
>> Here is my version info:
>>
>> > version
>>                 _
>> platform       i386-pc-mingw32
>> arch           i386
>> os             mingw32
>> system         i386, mingw32
>> status
>> major          2
>> minor          8.1
>> year           2008
>> month          12
>> day            22
>> svn rev        47281
>> language       R
>> version.string R version 2.8.1 (2008-12-22)
>>
>>
>> Many thanks in advance for your help,
>>
>> Francisco
>>
>> --
>> Francisco J. Zagmutt
>> Vose Consulting
>> 2891 20th Street
>> Boulder, CO, 80304
>> USA
>> www.voseconsulting.com
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bigglm() results different from glm()

Francisco J. Zagmutt
Dear Thomas and John,

Thanks for your prompt reply and for looking at your code to explain
these differences. I see you replied very late at night, so I am sorry
if my little problem kept you awake!

As you pointed out, the model indeed converges (as reported in
model$converged) once I specify a larger number of iterations.

A very minor comment: it seems that the reporting of the estimates in
summary.biglm() is not taking the parameters from options(digits).  For
example, using the same data and models as before:

 > require(biglm)
 > options(digits=8)
 > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
ttment=gl(2,50000))
 > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
 > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"),
maxit=20)

 > summary(m1)
<Snipped>
Coefficients:
              Estimate Std. Error z value  Pr(>|z|)
(Intercept) 2.3019509  0.0014147 1627.21 < 2.2e-16 ***
ttment2     0.4052002  0.0018264  221.86 < 2.2e-16 ***

 > summary(m1big)
              Coef  (95%   CI)    SE p
(Intercept) 2.302 2.299 2.305 0.001 0
ttment2     0.405 0.402 0.409 0.002 0


To get more digits I can extract the point estimates using coef(m1big),
but after looking at str(m1big) the only way I could figure to extract
the p-values was:

 > summary(m1big)$mat[,"p"]
(Intercept)     ttment2
           0           0

Is there a way I can get summary.biglm() to report more digits directly?

Thanks again,

Francisco



--
Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
www.voseconsulting.com


Thomas Lumley wrote:

>
> Yes, the slow convergence is easier to get with the log link.  
> Overshooting the correct coefficient vector has more dramatic effects on
> the fitted values and weights, and in this example the starting value of
> (0,0) is a long way from the truth.   The same sort of thing happens in
> the Cox model, where there are real data sets that will cause numeric
> overflow in a careless implementation.
>
> It might be worth trying to guess better starting values: saving an
> iteration or two would be useful with large data sets.
>
>      -thomas
>
>
> On Tue, 17 Mar 2009, John Fox wrote:
>
>> Dear Francisco,
>>
>> I was able to duplicate the problem that you reported, and in addition
>> discovered that the problem seems to be peculiar to the poisson family.
>> lm(y~ttment, data=dat) and biglm(y~ttment, data=dat) produce identical
>> results, as do glm(y~ttment, data=dat) and bigglm(y~ttment, data=dat).
>> Another example, with the binomial family:
>>
>>> set.seed(12345)
>>> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>> ttment=gl(2,50000))
>>> dat$y0 <- ifelse(dat$y > 12, 1, 0)
>>> m1b <- glm(y0~ttment, data=dat, family=binomial)
>>> m1bigb <- bigglm(y0~ttment , data=dat, family=binomial())
>>> coef(m1b)
>> (Intercept)     ttment2
>>   -1.33508     2.34263
>>> coef(m1bigb)
>> (Intercept)     ttment2
>>   -1.33508     2.34263
>>> deviance(m1b)
>> [1] 109244
>>> deviance(m1bigb)
>> [1] 109244
>>
>> I'm copying this message to Thomas Lumley, who's the author and
>> maintainer
>> of the biglm package.
>>
>> Regards,
>> John
>>
>>
>>> -----Original Message-----
>>> From: [hidden email] [mailto:[hidden email]]
>> On
>>> Behalf Of Francisco J. Zagmutt
>>> Sent: March-16-09 10:26 PM
>>> To: [hidden email]
>>> Subject: [R] bigglm() results different from glm()
>>>
>>> Dear all,
>>>
>>> I am using the bigglm package to fit a few GLM's to a large dataset (3
>>> million rows, 6 columns).  While trying to fit a Poisson GLM I noticed
>>> that the coefficient estimates were very different from what I obtained
>>> when estimating the model on a smaller dataset using glm(), I wrote a
>>> very basic toy example to compare the results of bigglm() against a
>>> glm() call.  Consider the following code:
>>>
>>>
>>> > require(biglm)
>>> > options(digits=6, scipen=3, contrasts = c("contr.treatment",
>>> "contr.poly"))
>>> > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>>> ttment=gl(2,50000))
>>> > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>>> > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
>>> > summary(m1)
>>>
>>> <snipped output for this email>
>>> Coefficients:
>>>              Estimate Std. Error z value Pr(>|z|)
>>> (Intercept)  2.30305    0.00141    1629   <2e-16 ***
>>> ttment2      0.40429    0.00183     221   <2e-16 ***
>>>
>>>      Null deviance: 151889  on 99999  degrees of freedom
>>> Residual deviance: 101848  on 99998  degrees of freedom
>>> AIC: 533152
>>>
>>> > summary(m1big)
>>> Large data regression model: bigglm(y ~ ttment, data = dat, family =
>>> poisson(link = "log"))
>>> Sample size =  100000
>>>               Coef  (95%   CI)    SE p
>>> (Intercept) 2.651 2.650 2.653 0.001 0
>>> ttment2     4.346 4.344 4.348 0.001 0
>>>
>>> > m1big$deviance
>>> [1] 287158986
>>>
>>>
>>> Notice that the coefficients and deviance are quite different in the
>>> model estimated using bigglm(). If I change the chunk to
>>> seq(1000,10000,1000) the estimates remain the same.
>>>
>>> Can someone help me understand what is causing these differences?
>>>
>>> Here is my version info:
>>>
>>> > version
>>>                 _
>>> platform       i386-pc-mingw32
>>> arch           i386
>>> os             mingw32
>>> system         i386, mingw32
>>> status
>>> major          2
>>> minor          8.1
>>> year           2008
>>> month          12
>>> day            22
>>> svn rev        47281
>>> language       R
>>> version.string R version 2.8.1 (2008-12-22)
>>>
>>>
>>> Many thanks in advance for your help,
>>>
>>> Francisco
>>>
>>> --
>>> Francisco J. Zagmutt
>>> Vose Consulting
>>> 2891 20th Street
>>> Boulder, CO, 80304
>>> USA
>>> www.voseconsulting.com
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Thomas Lumley            Assoc. Professor, Biostatistics
> [hidden email]    University of Washington, Seattle
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bigglm() results different from glm()

Francisco J. Zagmutt
In reply to this post by Thomas Lumley
Dear Thomas and John,

Thanks for your prompt reply and for looking at your code to explain
these differences. I see you replied very late at night, so I am sorry
if my little problem kept you awake!

As you pointed out, the model indeed converges (as reported in
model$converged) once I specify a larger number of iterations.

A very minor comment: it seems that the reporting of the estimates in
summary.biglm() is not taking the parameters from options(digits).  For
example, using the same data and models as before:

 > require(biglm)
 > options(digits=8)
 > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
ttment=gl(2,50000))
 > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
 > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"),
maxit=20)

 > summary(m1)
<Snipped>
Coefficients:
              Estimate Std. Error z value  Pr(>|z|)
(Intercept) 2.3019509  0.0014147 1627.21 < 2.2e-16 ***
ttment2     0.4052002  0.0018264  221.86 < 2.2e-16 ***

 > summary(m1big)
              Coef  (95%   CI)    SE p
(Intercept) 2.302 2.299 2.305 0.001 0
ttment2     0.405 0.402 0.409 0.002 0


To get more digits I can extract the point estimates using coef(m1big),
but after looking at str(m1big) the only way I could figure to extract
the p-values was:

 > summary(m1big)$mat[,"p"]
(Intercept)     ttment2
           0           0

Is there a way I can get summary.biglm() to report more digits directly?

Thanks again,

Francisco



--
Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
www.voseconsulting.com


Thomas Lumley wrote:

>
> Yes, the slow convergence is easier to get with the log link.  
> Overshooting the correct coefficient vector has more dramatic effects on
> the fitted values and weights, and in this example the starting value of
> (0,0) is a long way from the truth.   The same sort of thing happens in
> the Cox model, where there are real data sets that will cause numeric
> overflow in a careless implementation.
>
> It might be worth trying to guess better starting values: saving an
> iteration or two would be useful with large data sets.
>
>      -thomas
>
>
> On Tue, 17 Mar 2009, John Fox wrote:
>
>> Dear Francisco,
>>
>> I was able to duplicate the problem that you reported, and in addition
>> discovered that the problem seems to be peculiar to the poisson family.
>> lm(y~ttment, data=dat) and biglm(y~ttment, data=dat) produce identical
>> results, as do glm(y~ttment, data=dat) and bigglm(y~ttment, data=dat).
>> Another example, with the binomial family:
>>
>>> set.seed(12345)
>>> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>> ttment=gl(2,50000))
>>> dat$y0 <- ifelse(dat$y > 12, 1, 0)
>>> m1b <- glm(y0~ttment, data=dat, family=binomial)
>>> m1bigb <- bigglm(y0~ttment , data=dat, family=binomial())
>>> coef(m1b)
>> (Intercept)     ttment2
>>   -1.33508     2.34263
>>> coef(m1bigb)
>> (Intercept)     ttment2
>>   -1.33508     2.34263
>>> deviance(m1b)
>> [1] 109244
>>> deviance(m1bigb)
>> [1] 109244
>>
>> I'm copying this message to Thomas Lumley, who's the author and
>> maintainer
>> of the biglm package.
>>
>> Regards,
>> John
>>
>>
>>> -----Original Message-----
>>> From: [hidden email] [mailto:[hidden email]]
>> On
>>> Behalf Of Francisco J. Zagmutt
>>> Sent: March-16-09 10:26 PM
>>> To: [hidden email]
>>> Subject: [R] bigglm() results different from glm()
>>>
>>> Dear all,
>>>
>>> I am using the bigglm package to fit a few GLM's to a large dataset (3
>>> million rows, 6 columns).  While trying to fit a Poisson GLM I noticed
>>> that the coefficient estimates were very different from what I obtained
>>> when estimating the model on a smaller dataset using glm(), I wrote a
>>> very basic toy example to compare the results of bigglm() against a
>>> glm() call.  Consider the following code:
>>>
>>>
>>> > require(biglm)
>>> > options(digits=6, scipen=3, contrasts = c("contr.treatment",
>>> "contr.poly"))
>>> > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>>> ttment=gl(2,50000))
>>> > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>>> > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
>>> > summary(m1)
>>>
>>> <snipped output for this email>
>>> Coefficients:
>>>              Estimate Std. Error z value Pr(>|z|)
>>> (Intercept)  2.30305    0.00141    1629   <2e-16 ***
>>> ttment2      0.40429    0.00183     221   <2e-16 ***
>>>
>>>      Null deviance: 151889  on 99999  degrees of freedom
>>> Residual deviance: 101848  on 99998  degrees of freedom
>>> AIC: 533152
>>>
>>> > summary(m1big)
>>> Large data regression model: bigglm(y ~ ttment, data = dat, family =
>>> poisson(link = "log"))
>>> Sample size =  100000
>>>               Coef  (95%   CI)    SE p
>>> (Intercept) 2.651 2.650 2.653 0.001 0
>>> ttment2     4.346 4.344 4.348 0.001 0
>>>
>>> > m1big$deviance
>>> [1] 287158986
>>>
>>>
>>> Notice that the coefficients and deviance are quite different in the
>>> model estimated using bigglm(). If I change the chunk to
>>> seq(1000,10000,1000) the estimates remain the same.
>>>
>>> Can someone help me understand what is causing these differences?
>>>
>>> Here is my version info:
>>>
>>> > version
>>>                 _
>>> platform       i386-pc-mingw32
>>> arch           i386
>>> os             mingw32
>>> system         i386, mingw32
>>> status
>>> major          2
>>> minor          8.1
>>> year           2008
>>> month          12
>>> day            22
>>> svn rev        47281
>>> language       R
>>> version.string R version 2.8.1 (2008-12-22)
>>>
>>>
>>> Many thanks in advance for your help,
>>>
>>> Francisco
>>>
>>> --
>>> Francisco J. Zagmutt
>>> Vose Consulting
>>> 2891 20th Street
>>> Boulder, CO, 80304
>>> USA
>>> www.voseconsulting.com
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Thomas Lumley            Assoc. Professor, Biostatistics
> [hidden email]    University of Washington, Seattle
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bigglm() results different from glm()

Thomas Lumley

There's a digits= argument to the print method.

> a <- bigglm(ff,data=trees, chunksize=10, sandwich=TRUE)
> print(summary(a),digits=5)
Large data regression model: bigglm(ff, data = trees, chunksize = 10, sandwich = TRUE)
Sample size =  31
                 Coef     (95%      CI)      SE p
(Intercept) -6.63162 -8.08729 -5.17595 0.72783 0
log(Girth)   1.98265  1.87132  2.09398 0.05567 0
log(Height)  1.11712  0.73339  1.50085 0.19186 0
Sandwich (model-robust) standard errors


I suppose I should make it take its default from options(digits)-3 or something.

      -thomas


On Tue, 17 Mar 2009, Francisco J. Zagmutt wrote:

> Dear Thomas and John,
>
> Thanks for your prompt reply and for looking at your code to explain these
> differences. I see you replied very late at night, so I am sorry if my little
> problem kept you awake!
>
> As you pointed out, the model indeed converges (as reported in model$converged)
> once I specify a larger number of iterations.
>
> A very minor comment: it seems that the reporting of the estimates in
> summary.biglm() is not taking the parameters from options(digits).  For
> example, using the same data and models as before:
>
>> require(biglm)
>> options(digits=8)
>> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)), ttment=gl(2,50000))
>> m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>> m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"), maxit=20)
>
>> summary(m1)
> <Snipped>
> Coefficients:
>             Estimate Std. Error z value  Pr(>|z|)
> (Intercept) 2.3019509  0.0014147 1627.21 < 2.2e-16 ***
> ttment2     0.4052002  0.0018264  221.86 < 2.2e-16 ***
>
>> summary(m1big)
>             Coef  (95%   CI)    SE p
> (Intercept) 2.302 2.299 2.305 0.001 0
> ttment2     0.405 0.402 0.409 0.002 0
>
>
> To get more digits I can extract the point estimates using coef(m1big), but
> after looking at str(m1big) the only way I could figure to extract the p-values
> was:
>
>> summary(m1big)$mat[,"p"]
> (Intercept)     ttment2
>          0           0
>
> Is there a way I can get summary.biglm() to report more digits directly?
>
> Thanks again,
>
> Francisco
>
>
>
> --
> Francisco J. Zagmutt
> Vose Consulting
> 2891 20th Street
> Boulder, CO, 80304
> USA
> www.voseconsulting.com
>
>
> Thomas Lumley wrote:
>>
>> Yes, the slow convergence is easier to get with the log link.  Overshooting
>> the correct coefficient vector has more dramatic effects on the fitted
>> values and weights, and in this example the starting value of (0,0) is a
>> long way from the truth.   The same sort of thing happens in the Cox model,
>> where there are real data sets that will cause numeric overflow in a
>> careless implementation.
>>
>> It might be worth trying to guess better starting values: saving an
>> iteration or two would be useful with large data sets.
>>
>>      -thomas
>>
>>
>> On Tue, 17 Mar 2009, John Fox wrote:
>>
>>> Dear Francisco,
>>>
>>> I was able to duplicate the problem that you reported, and in addition
>>> discovered that the problem seems to be peculiar to the poisson family.
>>> lm(y~ttment, data=dat) and biglm(y~ttment, data=dat) produce identical
>>> results, as do glm(y~ttment, data=dat) and bigglm(y~ttment, data=dat).
>>> Another example, with the binomial family:
>>>
>>>> set.seed(12345)
>>>> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>>> ttment=gl(2,50000))
>>>> dat$y0 <- ifelse(dat$y > 12, 1, 0)
>>>> m1b <- glm(y0~ttment, data=dat, family=binomial)
>>>> m1bigb <- bigglm(y0~ttment , data=dat, family=binomial())
>>>> coef(m1b)
>>> (Intercept)     ttment2
>>>   -1.33508     2.34263
>>>> coef(m1bigb)
>>> (Intercept)     ttment2
>>>   -1.33508     2.34263
>>>> deviance(m1b)
>>> [1] 109244
>>>> deviance(m1bigb)
>>> [1] 109244
>>>
>>> I'm copying this message to Thomas Lumley, who's the author and maintainer
>>> of the biglm package.
>>>
>>> Regards,
>>> John
>>>
>>>
>>>> -----Original Message-----
>>>> From: [hidden email] [mailto:[hidden email]]
>>> On
>>>> Behalf Of Francisco J. Zagmutt
>>>> Sent: March-16-09 10:26 PM
>>>> To: [hidden email]
>>>> Subject: [R] bigglm() results different from glm()
>>>>
>>>> Dear all,
>>>>
>>>> I am using the bigglm package to fit a few GLM's to a large dataset (3
>>>> million rows, 6 columns).  While trying to fit a Poisson GLM I noticed
>>>> that the coefficient estimates were very different from what I obtained
>>>> when estimating the model on a smaller dataset using glm(), I wrote a
>>>> very basic toy example to compare the results of bigglm() against a
>>>> glm() call.  Consider the following code:
>>>>
>>>>
>>>> > require(biglm)
>>>> > options(digits=6, scipen=3, contrasts = c("contr.treatment",
>>>> "contr.poly"))
>>>> > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>>>> ttment=gl(2,50000))
>>>> > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>>>> > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
>>>> > summary(m1)
>>>>
>>>> <snipped output for this email>
>>>> Coefficients:
>>>>              Estimate Std. Error z value Pr(>|z|)
>>>> (Intercept)  2.30305    0.00141    1629   <2e-16 ***
>>>> ttment2      0.40429    0.00183     221   <2e-16 ***
>>>>
>>>>      Null deviance: 151889  on 99999  degrees of freedom
>>>> Residual deviance: 101848  on 99998  degrees of freedom
>>>> AIC: 533152
>>>>
>>>> > summary(m1big)
>>>> Large data regression model: bigglm(y ~ ttment, data = dat, family =
>>>> poisson(link = "log"))
>>>> Sample size =  100000
>>>>               Coef  (95%   CI)    SE p
>>>> (Intercept) 2.651 2.650 2.653 0.001 0
>>>> ttment2     4.346 4.344 4.348 0.001 0
>>>>
>>>> > m1big$deviance
>>>> [1] 287158986
>>>>
>>>>
>>>> Notice that the coefficients and deviance are quite different in the
>>>> model estimated using bigglm(). If I change the chunk to
>>>> seq(1000,10000,1000) the estimates remain the same.
>>>>
>>>> Can someone help me understand what is causing these differences?
>>>>
>>>> Here is my version info:
>>>>
>>>> > version
>>>>                 _
>>>> platform       i386-pc-mingw32
>>>> arch           i386
>>>> os             mingw32
>>>> system         i386, mingw32
>>>> status
>>>> major          2
>>>> minor          8.1
>>>> year           2008
>>>> month          12
>>>> day            22
>>>> svn rev        47281
>>>> language       R
>>>> version.string R version 2.8.1 (2008-12-22)
>>>>
>>>>
>>>> Many thanks in advance for your help,
>>>>
>>>> Francisco
>>>>
>>>> -- Francisco J. Zagmutt
>>>> Vose Consulting
>>>> 2891 20th Street
>>>> Boulder, CO, 80304
>>>> USA
>>>> www.voseconsulting.com
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Thomas Lumley            Assoc. Professor, Biostatistics
>> [hidden email]    University of Washington, Seattle
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bigglm() results different from glm()

Francisco J. Zagmutt
That's a much cleaner solution!  It would be nice if biglm takes the
defaults from options(digits), but off course we can also just use
print() as you pointed out.

Thanks again for your replies and for making this library available to
the community.

Francisco

--
Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
www.voseconsulting.com



Thomas Lumley wrote:

>
> There's a digits= argument to the print method.
>
>> a <- bigglm(ff,data=trees, chunksize=10, sandwich=TRUE)
>> print(summary(a),digits=5)
> Large data regression model: bigglm(ff, data = trees, chunksize = 10,
> sandwich = TRUE)
> Sample size =  31
>                 Coef     (95%      CI)      SE p
> (Intercept) -6.63162 -8.08729 -5.17595 0.72783 0
> log(Girth)   1.98265  1.87132  2.09398 0.05567 0
> log(Height)  1.11712  0.73339  1.50085 0.19186 0
> Sandwich (model-robust) standard errors
>
>
> I suppose I should make it take its default from options(digits)-3 or
> something.
>
>      -thomas
>
>
> On Tue, 17 Mar 2009, Francisco J. Zagmutt wrote:
>
>> Dear Thomas and John,
>>
>> Thanks for your prompt reply and for looking at your code to explain
>> these differences. I see you replied very late at night, so I am sorry
>> if my little problem kept you awake!
>>
>> As you pointed out, the model indeed converges (as reported in
>> model$converged) once I specify a larger number of iterations.
>>
>> A very minor comment: it seems that the reporting of the estimates in
>> summary.biglm() is not taking the parameters from options(digits).  
>> For example, using the same data and models as before:
>>
>>> require(biglm)
>>> options(digits=8)
>>> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>>> ttment=gl(2,50000))
>>> m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>>> m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"),
>>> maxit=20)
>>
>>> summary(m1)
>> <Snipped>
>> Coefficients:
>>             Estimate Std. Error z value  Pr(>|z|)
>> (Intercept) 2.3019509  0.0014147 1627.21 < 2.2e-16 ***
>> ttment2     0.4052002  0.0018264  221.86 < 2.2e-16 ***
>>
>>> summary(m1big)
>>             Coef  (95%   CI)    SE p
>> (Intercept) 2.302 2.299 2.305 0.001 0
>> ttment2     0.405 0.402 0.409 0.002 0
>>
>>
>> To get more digits I can extract the point estimates using
>> coef(m1big), but after looking at str(m1big) the only way I could
>> figure to extract the p-values was:
>>
>>> summary(m1big)$mat[,"p"]
>> (Intercept)     ttment2
>>          0           0
>>
>> Is there a way I can get summary.biglm() to report more digits directly?
>>
>> Thanks again,
>>
>> Francisco
>>
>>
>>
>> --
>> Francisco J. Zagmutt
>> Vose Consulting
>> 2891 20th Street
>> Boulder, CO, 80304
>> USA
>> www.voseconsulting.com
>>
>>
>> Thomas Lumley wrote:
>>>
>>> Yes, the slow convergence is easier to get with the log link.  
>>> Overshooting the correct coefficient vector has more dramatic effects
>>> on the fitted values and weights, and in this example the starting
>>> value of (0,0) is a long way from the truth.   The same sort of thing
>>> happens in the Cox model, where there are real data sets that will
>>> cause numeric overflow in a careless implementation.
>>>
>>> It might be worth trying to guess better starting values: saving an
>>> iteration or two would be useful with large data sets.
>>>
>>>      -thomas
>>>
>>>
>>> On Tue, 17 Mar 2009, John Fox wrote:
>>>
>>>> Dear Francisco,
>>>>
>>>> I was able to duplicate the problem that you reported, and in addition
>>>> discovered that the problem seems to be peculiar to the poisson family.
>>>> lm(y~ttment, data=dat) and biglm(y~ttment, data=dat) produce identical
>>>> results, as do glm(y~ttment, data=dat) and bigglm(y~ttment, data=dat).
>>>> Another example, with the binomial family:
>>>>
>>>>> set.seed(12345)
>>>>> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>>>> ttment=gl(2,50000))
>>>>> dat$y0 <- ifelse(dat$y > 12, 1, 0)
>>>>> m1b <- glm(y0~ttment, data=dat, family=binomial)
>>>>> m1bigb <- bigglm(y0~ttment , data=dat, family=binomial())
>>>>> coef(m1b)
>>>> (Intercept)     ttment2
>>>>   -1.33508     2.34263
>>>>> coef(m1bigb)
>>>> (Intercept)     ttment2
>>>>   -1.33508     2.34263
>>>>> deviance(m1b)
>>>> [1] 109244
>>>>> deviance(m1bigb)
>>>> [1] 109244
>>>>
>>>> I'm copying this message to Thomas Lumley, who's the author and
>>>> maintainer
>>>> of the biglm package.
>>>>
>>>> Regards,
>>>> John
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: [hidden email]
>>>>> [mailto:[hidden email]]
>>>> On
>>>>> Behalf Of Francisco J. Zagmutt
>>>>> Sent: March-16-09 10:26 PM
>>>>> To: [hidden email]
>>>>> Subject: [R] bigglm() results different from glm()
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am using the bigglm package to fit a few GLM's to a large dataset (3
>>>>> million rows, 6 columns).  While trying to fit a Poisson GLM I noticed
>>>>> that the coefficient estimates were very different from what I
>>>>> obtained
>>>>> when estimating the model on a smaller dataset using glm(), I wrote a
>>>>> very basic toy example to compare the results of bigglm() against a
>>>>> glm() call.  Consider the following code:
>>>>>
>>>>>
>>>>> > require(biglm)
>>>>> > options(digits=6, scipen=3, contrasts = c("contr.treatment",
>>>>> "contr.poly"))
>>>>> > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
>>>>> ttment=gl(2,50000))
>>>>> > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
>>>>> > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
>>>>> > summary(m1)
>>>>>
>>>>> <snipped output for this email>
>>>>> Coefficients:
>>>>>              Estimate Std. Error z value Pr(>|z|)
>>>>> (Intercept)  2.30305    0.00141    1629   <2e-16 ***
>>>>> ttment2      0.40429    0.00183     221   <2e-16 ***
>>>>>
>>>>>      Null deviance: 151889  on 99999  degrees of freedom
>>>>> Residual deviance: 101848  on 99998  degrees of freedom
>>>>> AIC: 533152
>>>>>
>>>>> > summary(m1big)
>>>>> Large data regression model: bigglm(y ~ ttment, data = dat, family =
>>>>> poisson(link = "log"))
>>>>> Sample size =  100000
>>>>>               Coef  (95%   CI)    SE p
>>>>> (Intercept) 2.651 2.650 2.653 0.001 0
>>>>> ttment2     4.346 4.344 4.348 0.001 0
>>>>>
>>>>> > m1big$deviance
>>>>> [1] 287158986
>>>>>
>>>>>
>>>>> Notice that the coefficients and deviance are quite different in the
>>>>> model estimated using bigglm(). If I change the chunk to
>>>>> seq(1000,10000,1000) the estimates remain the same.
>>>>>
>>>>> Can someone help me understand what is causing these differences?
>>>>>
>>>>> Here is my version info:
>>>>>
>>>>> > version
>>>>>                 _
>>>>> platform       i386-pc-mingw32
>>>>> arch           i386
>>>>> os             mingw32
>>>>> system         i386, mingw32
>>>>> status
>>>>> major          2
>>>>> minor          8.1
>>>>> year           2008
>>>>> month          12
>>>>> day            22
>>>>> svn rev        47281
>>>>> language       R
>>>>> version.string R version 2.8.1 (2008-12-22)
>>>>>
>>>>>
>>>>> Many thanks in advance for your help,
>>>>>
>>>>> Francisco
>>>>>
>>>>> -- Francisco J. Zagmutt
>>>>> Vose Consulting
>>>>> 2891 20th Street
>>>>> Boulder, CO, 80304
>>>>> USA
>>>>> www.voseconsulting.com
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> Thomas Lumley            Assoc. Professor, Biostatistics
>>> [hidden email]    University of Washington, Seattle
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> Thomas Lumley            Assoc. Professor, Biostatistics
> [hidden email]    University of Washington, Seattle
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.