Using response variable in interaction as explanatory variable in glm crashes R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Using response variable in interaction as explanatory variable in glm crashes R

Jan van der LAan-2

The following code crashes R (I know I shouldn't try to estimate such a
model; this was a bug in some code of mine). I also tried with R-devel;
same result.


tab <- structure(list(dob_day = c(FALSE, FALSE, FALSE, FALSE, TRUE,
TRUE, TRUE, TRUE), dob_mon = c(FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, TRUE, TRUE), dob_year = c(FALSE, TRUE, FALSE, TRUE, FALSE,
TRUE, FALSE, TRUE), n = c(1489634L, 17491L, 134985L, 1639L, 47892L,
611L, 4365L, 750L), pred1 = c(1488301, 18187, 135605, 1657, 48547,
593, 4423, 54)), .Names = c("dob_day", "dob_mon", "dob_year",
"n", "pred1"), row.names = c(NA, -8L), class = "data.frame")

m <- glm(dob_mon ~ dob_day*dob_mon, data = tab, family = binomial())


The crash doesn't when the variables are added just as main effects
(dob_day+dob_mon): this results in a warning and the removal of dob_mon
from the formula.

--

Jan



 > R.version
                _
platform       x86_64-pc-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status
major          3
minor          4.1
year           2017
month          06
day            30
svn rev        72865
language       R
version.string R version 3.4.1 (2017-06-30)
nickname       Single Candle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Using response variable in interaction as explanatory variable in glm crashes R

Jan van der LAan-2
It is actually model.matrix that crashes, not glm. Same crash occurs
with e.g. lm.

model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)

also crashes R.

Jan



On 06-10-17 12:08, Jan van der Laan wrote:

>
> The following code crashes R (I know I shouldn't try to estimate such
> a model; this was a bug in some code of mine). I also tried with
> R-devel; same result.
>
>
> tab <- structure(list(dob_day = c(FALSE, FALSE, FALSE, FALSE, TRUE,
> TRUE, TRUE, TRUE), dob_mon = c(FALSE, FALSE, TRUE, TRUE, FALSE,
> FALSE, TRUE, TRUE), dob_year = c(FALSE, TRUE, FALSE, TRUE, FALSE,
> TRUE, FALSE, TRUE), n = c(1489634L, 17491L, 134985L, 1639L, 47892L,
> 611L, 4365L, 750L), pred1 = c(1488301, 18187, 135605, 1657, 48547,
> 593, 4423, 54)), .Names = c("dob_day", "dob_mon", "dob_year",
> "n", "pred1"), row.names = c(NA, -8L), class = "data.frame")
>
> m <- glm(dob_mon ~ dob_day*dob_mon, data = tab, family = binomial())
>
>
> The crash doesn't when the variables are added just as main effects
> (dob_day+dob_mon): this results in a warning and the removal of
> dob_mon from the formula.
>
> --
>
> Jan
>
>
>
> > R.version
>                _
> platform       x86_64-pc-linux-gnu
> arch           x86_64
> os             linux-gnu
> system         x86_64, linux-gnu
> status
> major          3
> minor          4.1
> year           2017
> month          06
> day            30
> svn rev        72865
> language       R
> version.string R version 3.4.1 (2017-06-30)
> nickname       Single Candle
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Using response variable in interaction as explanatory variable in glm crashes R

Martin Maechler
>>>>> Jan van der Laan <[hidden email]>
>>>>>     on Fri, 6 Oct 2017 12:13:39 +0200 writes:

    > It is actually model.matrix that crashes, not glm. Same
    > crash occurs with e.g. lm.

    > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)

    > also crashes R.

Yes, segmentation fault.

It only happens when these are *logical* variables, not, e.g., when
transformed to integer.

The C code in src/library/stats/src/model.c  tries to eliminate
occurances of the LHS of the formula from the RHS when building
the model matrix and it does work fine in the integer case.

Part of the culprit code may be this (from line 717),
with the  isLogical(.) which in our case, shifts the pointer by
1  in the call to firstfactor() :

                        int adj = isLogical(var_i)?1:0;
                        // avoid overflow of jstart * nn PR#15578
                        firstfactor(&rx[jstart * nn], n, jnext - jstart,
                                    REAL(contrast), nrows(contrast),
                                    ncols(contrast), INTEGER(var_i)+adj);

then in firstfactor(), we see the segfault (when running R with
'-d gdb') :

    > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)

  Program received signal SIGSEGV, Segmentation fault.
  0x00007fffeafa76b5 in firstfactor (ncx=0, v=0x5c3b37c, ncc=1, nrc=2, c=0x5c90008,
   nrx=8, x=0x5cbf150) at ../../../../../R/src/library/stats/src/model.c:252
    252    else xj[i] = cj[v[i]-1];
    Missing separate debuginfos, .................
    (gdb) list
    247    for (int j = 0; j < ncc; j++) {
    248 xj = &x[j * (R_xlen_t)nrx];
    249 cj = &c[j * (R_xlen_t)nrc];
    250 for (int i = 0; i < nrx; i++)
    251    if(v[i] == NA_INTEGER) xj[i] = NA_REAL;
    252    else xj[i] = cj[v[i]-1];
    253    }
    254 }
    255

and indeed in the debugger,  i=7  and  v[i] is "outside", v[]
being of length 7, hence indexed 0:6.


    > Jan



    > On 06-10-17 12:08, Jan van der Laan wrote:
    >>
    >> The following code crashes R (I know I shouldn't try to
    >> estimate such a model; this was a bug in some code of
    >> mine). I also tried with R-devel; same result.
    >>
    >>
    >> tab <- structure(list(dob_day = c(FALSE, FALSE, FALSE,
    >> FALSE, TRUE, TRUE, TRUE, TRUE), dob_mon = c(FALSE, FALSE,
    >> TRUE, TRUE, FALSE, FALSE, TRUE, TRUE), dob_year =
    >> c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), n
    >> = c(1489634L, 17491L, 134985L, 1639L, 47892L, 611L,
    >> 4365L, 750L), pred1 = c(1488301, 18187, 135605, 1657,
    >> 48547, 593, 4423, 54)), .Names = c("dob_day", "dob_mon",
    >> "dob_year", "n", "pred1"), row.names = c(NA, -8L), class
    >> = "data.frame")
    >>
    >> m <- glm(dob_mon ~ dob_day*dob_mon, data = tab, family =
    >> binomial())
    >>
    >>
    >> The crash doesn't when the variables are added just as
    >> main effects (dob_day+dob_mon): this results in a warning
    >> and the removal of dob_mon from the formula.
    >>
    >> --
    >>
    >> Jan
    >>
    >>
    >>
    >> > R.version                _ platform      
    >> x86_64-pc-linux-gnu arch           x86_64 os            
    >> linux-gnu system         x86_64, linux-gnu status
    >> major          3 minor          4.1 year           2017
    >> month          06 day            30 svn rev        72865
    >> language       R version.string R version 3.4.1
    >> (2017-06-30) nickname       Single Candle
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Using response variable in interaction as explanatory variable in glm crashes R

Scott Kostyshak-2
On Mon, Oct 09, 2017 at 03:52:43PM +0000, Martin Maechler wrote:

> >>>>> Jan van der Laan <[hidden email]>
> >>>>>     on Fri, 6 Oct 2017 12:13:39 +0200 writes:
>
>     > It is actually model.matrix that crashes, not glm. Same
>     > crash occurs with e.g. lm.
>
>     > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)
>
>     > also crashes R.
>
> Yes, segmentation fault.
>
> It only happens when these are *logical* variables, not, e.g., when
> transformed to integer.
>
> The C code in src/library/stats/src/model.c  tries to eliminate
> occurances of the LHS of the formula from the RHS when building
> the model matrix and it does work fine in the integer case.
>
> Part of the culprit code may be this (from line 717),
> with the  isLogical(.) which in our case, shifts the pointer by
> 1  in the call to firstfactor() :
>
> int adj = isLogical(var_i)?1:0;
> // avoid overflow of jstart * nn PR#15578
> firstfactor(&rx[jstart * nn], n, jnext - jstart,
>    REAL(contrast), nrows(contrast),
>    ncols(contrast), INTEGER(var_i)+adj);
>
> then in firstfactor(), we see the segfault (when running R with
> '-d gdb') :
>
>     > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)
>
>   Program received signal SIGSEGV, Segmentation fault.
>   0x00007fffeafa76b5 in firstfactor (ncx=0, v=0x5c3b37c, ncc=1, nrc=2, c=0x5c90008,
>    nrx=8, x=0x5cbf150) at ../../../../../R/src/library/stats/src/model.c:252
>     252    else xj[i] = cj[v[i]-1];
>     Missing separate debuginfos, .................
>     (gdb) list
>     247    for (int j = 0; j < ncc; j++) {
>     248 xj = &x[j * (R_xlen_t)nrx];
>     249 cj = &c[j * (R_xlen_t)nrc];
>     250 for (int i = 0; i < nrx; i++)
>     251    if(v[i] == NA_INTEGER) xj[i] = NA_REAL;
>     252    else xj[i] = cj[v[i]-1];
>     253    }
>     254 }
>     255
>
> and indeed in the debugger,  i=7  and  v[i] is "outside", v[]
> being of length 7, hence indexed 0:6.

Dear Martin,

I just wanted to thank you for providing details on your approach to
debugging. Often I see bug fixes and I wonder "how the heck did they
figure that out?" so I am very excited when I see details like these on
the process (and not just the end result), so that I can learn.

Best,

Scott


--
Scott Kostyshak
Assistant Professor of Economics
University of Florida
https://people.clas.ufl.edu/skostyshak/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel