How to plot marginal effects (MEM) in R?

classic Classic list List threaded Threaded
4 messages Options
F86
Reply | Threaded
Open this post in threaded view
|

How to plot marginal effects (MEM) in R?

F86
Dear all,

I have two logistic regression models:


   • model <- glm(Y ~ X1+X2+X3+X4, data = data, family = "binomial")



   • modelInteraction <- glm(Y ~ X1+X2+X3+X4+X1*X4, data = data, family = "binomial")

To calculate the marginal effects (MEM approach) for these models, I used the `mfx` package:


   • a<- logitmfx(model, data=data, atmean=TRUE)



    •b<- logitmfx(modelInteraction, data=data, atmean=TRUE)


What I want to do now is 1) plot all the results for "model" and 2) show the result just for two variables: X1 and X2.
3) I also want to plot the interaction term in ”modelInteraction”.


I have been looking around for the solutions but haven't been able to find any. I would appreciate any suggestions.

A reproducible sample:

> dput(data)
structure(list(Y = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1 = c(1L, 0L, 1L,
0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 0L), X2 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), X4 = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L)), .Names = c("Y", "X1", "X2",
"X3", "X4"), row.names = c(NA, -69L), class = "data.frame")




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to plot marginal effects (MEM) in R?

David Winsemius

> On Jul 21, 2016, at 2:22 PM, Faradj Koliev <[hidden email]> wrote:
>
> Dear all,
>
> I have two logistic regression models:
>
>
>   • model <- glm(Y ~ X1+X2+X3+X4, data = data, family = "binomial")
>
>
>
>   • modelInteraction <- glm(Y ~ X1+X2+X3+X4+X1*X4, data = data, family = "binomial")
>
> To calculate the marginal effects (MEM approach) for these models, I used the `mfx` package:
>
>
>   • a<- logitmfx(model, data=data, atmean=TRUE)
>
>
>
>    •b<- logitmfx(modelInteraction, data=data, atmean=TRUE)
>
>
> What I want to do now is 1) plot all the results for "model" and 2) show the result just for two variables: X1 and X2.
> 3) I also want to plot the interaction term in ”modelInteraction”.

There is no longer a single "effect" for X1 in modelInteraction in contrast to the manner as there might be an "effect" for X2. There can only be predictions for combined situations with particular combinations of values for X1 and X4.

> model

Call:  glm(formula = Y ~ X1 + X2 + X3 + X4, family = "binomial", data = data)

Coefficients:
(Intercept)           X1           X2           X3           X4  
    -0.3601       1.3353       0.1056       0.2898      -0.3705  

Degrees of Freedom: 68 Total (i.e. Null);  64 Residual
Null Deviance:    66.78
Residual Deviance: 62.27 AIC: 72.27


> modelInteraction

Call:  glm(formula = Y ~ X1 + X2 + X3 + X4 + X1 * X4, family = "binomial",
    data = data)

Coefficients:
(Intercept)           X1           X2           X3           X4        X1:X4  
    90.0158     -90.0747       0.1183       0.3064     -15.3688      15.1593  

Degrees of Freedom: 68 Total (i.e. Null);  63 Residual
Null Deviance:    66.78
Residual Deviance: 61.49 AIC: 73.49

Notice that a naive attempt to plot an X1  "effect" in modelInteraction might pick the -90.07 value which would then ignore both the much larger Intercept value and also ignore the fact that the interaction term has now split the X4 (and X1) "effects" into multiple pieces.

You need to interpret the effects of X1 in the context of a specification of a particular X4 value and not forget that the Intercept should not be ignored. It appears to me that the estimates of the mfx package are essentially meaningless with the problem you have thrown at it.

> a
Call:
logitmfx(formula = model, data = data, atmean = TRUE)

Marginal Effects:
       dF/dx Std. Err.       z   P>|z|  
X1  0.147532  0.087865  1.6791 0.09314 .
X2  0.015085  0.193888  0.0778 0.93798  
X3  0.040309  0.063324  0.6366 0.52441  
X4 -0.050393  0.092947 -0.5422 0.58770  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

dF/dx is for discrete change for the following variables:

[1] "X1" "X2" "X4"
> b
Call:
logitmfx(formula = modelInteraction, data = data, atmean = TRUE)

Marginal Effects:
            dF/dx   Std. Err.         z  P>|z|    
X1    -1.0000e+00  1.2121e-07 -8.25e+06 <2e-16 ***
X2     6.5595e-03  8.1616e-01  8.00e-03 0.9936    
X3     1.6312e-02  2.0326e+00  8.00e-03 0.9936    
X4    -9.6831e-01  1.5806e+01 -6.13e-02 0.9511    
X1:X4  8.0703e-01  1.4572e+01  5.54e-02 0.9558    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

dF/dx is for discrete change for the following variables:

[1] "X1" "X2" "X4"

I see no sensible interpretation of the phrase "X1 effect" in the comparison tables above. The "p-value" in the second table appears to be nonsense induced by throwing a model formulation that was not anticipated. There is a negligible improvement in the glm fits:

> anova(model,modelInteraction)
Analysis of Deviance Table

Model 1: Y ~ X1 + X2 + X3 + X4
Model 2: Y ~ X1 + X2 + X3 + X4 + X1 * X4
  Resid. Df Resid. Dev Df Deviance
1        64     62.274            
2        63     61.495  1  0.77908


So the notion that the "X1 effect" is now "highly significant" where it was before not even suggestive of significance seem to point to either an error in the underlying theory or a failure to anticipate and trap (and warn the user) that an erroneous model (or at least an unanticipated model) is being passed to a procedure.

At least the 'effects- package gives you a tiny warning about this issue, although I think it really should throw an informative error when a user attempts to estimate only a "main effect" in a model that has an interaction involving such a covariate:

> library(effects)

> effect('X1', model)

 X1 effect
X1
         0        0.2        0.4        0.6        0.8          1
0.06706123 0.08582757 0.10923061 0.13805139 0.17299973 0.21459275
> effect('X1', modelInteraction)
NOTE: X1 is not a high-order term in the model

 X1 effect
X1
           0          0.2          0.4          0.6          0.8            1
0.0002418661 0.0009864740 0.0040142251 0.0161843996 0.0629206979 0.2151098752
> effect('X1:X4', modelInteraction)

 X1*X4 effect
     X4
X1            6         6.2          6.4          6.6          6.8            7
  0   0.1100479 0.005686142 0.0002643982 1.223058e-05 5.656287e-07 2.615838e-08
  0.2 0.1285241 0.012352473 0.0010595321 8.994106e-05 7.628099e-06 6.469071e-07
  0.4 0.1495811 0.026625017 0.0042357682 6.610806e-04 1.028639e-04 1.599803e-05
  0.6 0.1734015 0.056446132 0.0167737545 4.841483e-03 1.385458e-03 3.954877e-04
  0.8 0.2001225 0.115698003 0.0640377838 3.454327e-02 1.836677e-02 9.689636e-03
  1   0.2298165 0.222481442 0.2153150766 2.083177e-01 2.014893e-01 1.948297e-01

--
David.



>
>
> I have been looking around for the solutions but haven't been able to find any. I would appreciate any suggestions.
>
> A reproducible sample:
>
>> dput(data)
> structure(list(Y = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
> 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1 = c(1L, 0L, 1L,
> 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
> 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
> 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 0L), X2 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 0L, 0L, 0L, 0L,
> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
> ), X4 = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
> 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L,
> 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
> 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L,
> 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L)), .Names = c("Y", "X1", "X2",
> "X3", "X4"), row.names = c(NA, -69L), class = "data.frame")
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
F86
Reply | Threaded
Open this post in threaded view
|

Re: How to plot marginal effects (MEM) in R?

F86
Dear David Winsemius,

Thank you!  

The sample make no sense, I know. The real data is too big. So, I only want to understand how to plot marginal effects, to visualize them in a proper way.

Best,


> 22 juli 2016 kl. 08:35 skrev David Winsemius <[hidden email]>:
>
>>
>> On Jul 21, 2016, at 2:22 PM, Faradj Koliev <[hidden email]> wrote:
>>
>> Dear all,
>>
>> I have two logistic regression models:
>>
>>
>>  • model <- glm(Y ~ X1+X2+X3+X4, data = data, family = "binomial")
>>
>>
>>
>>  • modelInteraction <- glm(Y ~ X1+X2+X3+X4+X1*X4, data = data, family = "binomial")
>>
>> To calculate the marginal effects (MEM approach) for these models, I used the `mfx` package:
>>
>>
>>  • a<- logitmfx(model, data=data, atmean=TRUE)
>>
>>
>>
>>   •b<- logitmfx(modelInteraction, data=data, atmean=TRUE)
>>
>>
>> What I want to do now is 1) plot all the results for "model" and 2) show the result just for two variables: X1 and X2.
>> 3) I also want to plot the interaction term in ”modelInteraction”.
>
> There is no longer a single "effect" for X1 in modelInteraction in contrast to the manner as there might be an "effect" for X2. There can only be predictions for combined situations with particular combinations of values for X1 and X4.
>
>> model
>
> Call:  glm(formula = Y ~ X1 + X2 + X3 + X4, family = "binomial", data = data)
>
> Coefficients:
> (Intercept)           X1           X2           X3           X4  
>    -0.3601       1.3353       0.1056       0.2898      -0.3705  
>
> Degrees of Freedom: 68 Total (i.e. Null);  64 Residual
> Null Deviance:    66.78
> Residual Deviance: 62.27 AIC: 72.27
>
>
>> modelInteraction
>
> Call:  glm(formula = Y ~ X1 + X2 + X3 + X4 + X1 * X4, family = "binomial",
>    data = data)
>
> Coefficients:
> (Intercept)           X1           X2           X3           X4        X1:X4  
>    90.0158     -90.0747       0.1183       0.3064     -15.3688      15.1593  
>
> Degrees of Freedom: 68 Total (i.e. Null);  63 Residual
> Null Deviance:    66.78
> Residual Deviance: 61.49 AIC: 73.49
>
> Notice that a naive attempt to plot an X1  "effect" in modelInteraction might pick the -90.07 value which would then ignore both the much larger Intercept value and also ignore the fact that the interaction term has now split the X4 (and X1) "effects" into multiple pieces.
>
> You need to interpret the effects of X1 in the context of a specification of a particular X4 value and not forget that the Intercept should not be ignored. It appears to me that the estimates of the mfx package are essentially meaningless with the problem you have thrown at it.
>
>> a
> Call:
> logitmfx(formula = model, data = data, atmean = TRUE)
>
> Marginal Effects:
>       dF/dx Std. Err.       z   P>|z|  
> X1  0.147532  0.087865  1.6791 0.09314 .
> X2  0.015085  0.193888  0.0778 0.93798  
> X3  0.040309  0.063324  0.6366 0.52441  
> X4 -0.050393  0.092947 -0.5422 0.58770  
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> dF/dx is for discrete change for the following variables:
>
> [1] "X1" "X2" "X4"
>> b
> Call:
> logitmfx(formula = modelInteraction, data = data, atmean = TRUE)
>
> Marginal Effects:
>            dF/dx   Std. Err.         z  P>|z|    
> X1    -1.0000e+00  1.2121e-07 -8.25e+06 <2e-16 ***
> X2     6.5595e-03  8.1616e-01  8.00e-03 0.9936    
> X3     1.6312e-02  2.0326e+00  8.00e-03 0.9936    
> X4    -9.6831e-01  1.5806e+01 -6.13e-02 0.9511    
> X1:X4  8.0703e-01  1.4572e+01  5.54e-02 0.9558    
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> dF/dx is for discrete change for the following variables:
>
> [1] "X1" "X2" "X4"
>
> I see no sensible interpretation of the phrase "X1 effect" in the comparison tables above. The "p-value" in the second table appears to be nonsense induced by throwing a model formulation that was not anticipated. There is a negligible improvement in the glm fits:
>
>> anova(model,modelInteraction)
> Analysis of Deviance Table
>
> Model 1: Y ~ X1 + X2 + X3 + X4
> Model 2: Y ~ X1 + X2 + X3 + X4 + X1 * X4
>  Resid. Df Resid. Dev Df Deviance
> 1        64     62.274            
> 2        63     61.495  1  0.77908
>
>
> So the notion that the "X1 effect" is now "highly significant" where it was before not even suggestive of significance seem to point to either an error in the underlying theory or a failure to anticipate and trap (and warn the user) that an erroneous model (or at least an unanticipated model) is being passed to a procedure.
>
> At least the 'effects- package gives you a tiny warning about this issue, although I think it really should throw an informative error when a user attempts to estimate only a "main effect" in a model that has an interaction involving such a covariate:
>
>> library(effects)
>
>> effect('X1', model)
>
> X1 effect
> X1
>         0        0.2        0.4        0.6        0.8          1
> 0.06706123 0.08582757 0.10923061 0.13805139 0.17299973 0.21459275
>> effect('X1', modelInteraction)
> NOTE: X1 is not a high-order term in the model
>
> X1 effect
> X1
>           0          0.2          0.4          0.6          0.8            1
> 0.0002418661 0.0009864740 0.0040142251 0.0161843996 0.0629206979 0.2151098752
>> effect('X1:X4', modelInteraction)
>
> X1*X4 effect
>     X4
> X1            6         6.2          6.4          6.6          6.8            7
>  0   0.1100479 0.005686142 0.0002643982 1.223058e-05 5.656287e-07 2.615838e-08
>  0.2 0.1285241 0.012352473 0.0010595321 8.994106e-05 7.628099e-06 6.469071e-07
>  0.4 0.1495811 0.026625017 0.0042357682 6.610806e-04 1.028639e-04 1.599803e-05
>  0.6 0.1734015 0.056446132 0.0167737545 4.841483e-03 1.385458e-03 3.954877e-04
>  0.8 0.2001225 0.115698003 0.0640377838 3.454327e-02 1.836677e-02 9.689636e-03
>  1   0.2298165 0.222481442 0.2153150766 2.083177e-01 2.014893e-01 1.948297e-01
>
> --
> David.
>
>
>
>>
>>
>> I have been looking around for the solutions but haven't been able to find any. I would appreciate any suggestions.
>>
>> A reproducible sample:
>>
>>> dput(data)
>> structure(list(Y = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
>> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>> 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
>> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1 = c(1L, 0L, 1L,
>> 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
>> 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
>> 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 0L), X2 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 0L, 0L, 0L, 0L,
>> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 0L, 0L,
>> 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
>> ), X4 = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
>> 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L,
>> 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
>> 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L,
>> 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L)), .Names = c("Y", "X1", "X2",
>> "X3", "X4"), row.names = c(NA, -69L), class = "data.frame")
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to plot marginal effects (MEM) in R?

David Winsemius

> On Jul 21, 2016, at 11:44 PM, Faradj Koliev <[hidden email]> wrote:
>
> Dear David Winsemius,
>
> Thank you!  
>
> The sample make no sense, I know. The real data is too big. So, I only want to understand how to plot marginal effects, to visualize them in a proper way.
>

Before you start plotting, you first need to first understand that with interactions in the model there are N X M effects where N and M are the number of levels in the two covariates. The whole point of models is to boil down large amounts of "real data", but not so much boiling that you get a congealed, burned molasses.

The package you chose to examine the interactions appears ill-equipted to provide you with the necessary analysis.

--
David.

> Best,
>
>
>> 22 juli 2016 kl. 08:35 skrev David Winsemius <[hidden email]>:
>>
>>>
>>> On Jul 21, 2016, at 2:22 PM, Faradj Koliev <[hidden email]> wrote:
>>>
>>> Dear all,
>>>
>>> I have two logistic regression models:
>>>
>>>
>>>  • model <- glm(Y ~ X1+X2+X3+X4, data = data, family = "binomial")
>>>
>>>
>>>
>>>  • modelInteraction <- glm(Y ~ X1+X2+X3+X4+X1*X4, data = data, family = "binomial")
>>>
>>> To calculate the marginal effects (MEM approach) for these models, I used the `mfx` package:
>>>
>>>
>>>  • a<- logitmfx(model, data=data, atmean=TRUE)
>>>
>>>
>>>
>>>   •b<- logitmfx(modelInteraction, data=data, atmean=TRUE)
>>>
>>>
>>> What I want to do now is 1) plot all the results for "model" and 2) show the result just for two variables: X1 and X2.
>>> 3) I also want to plot the interaction term in ”modelInteraction”.
>>
>> There is no longer a single "effect" for X1 in modelInteraction in contrast to the manner as there might be an "effect" for X2. There can only be predictions for combined situations with particular combinations of values for X1 and X4.
>>
>>> model
>>
>> Call:  glm(formula = Y ~ X1 + X2 + X3 + X4, family = "binomial", data = data)
>>
>> Coefficients:
>> (Intercept)           X1           X2           X3           X4  
>>    -0.3601       1.3353       0.1056       0.2898      -0.3705  
>>
>> Degrees of Freedom: 68 Total (i.e. Null);  64 Residual
>> Null Deviance:    66.78
>> Residual Deviance: 62.27 AIC: 72.27
>>
>>
>>> modelInteraction
>>
>> Call:  glm(formula = Y ~ X1 + X2 + X3 + X4 + X1 * X4, family = "binomial",
>>    data = data)
>>
>> Coefficients:
>> (Intercept)           X1           X2           X3           X4        X1:X4  
>>    90.0158     -90.0747       0.1183       0.3064     -15.3688      15.1593  
>>
>> Degrees of Freedom: 68 Total (i.e. Null);  63 Residual
>> Null Deviance:    66.78
>> Residual Deviance: 61.49 AIC: 73.49
>>
>> Notice that a naive attempt to plot an X1  "effect" in modelInteraction might pick the -90.07 value which would then ignore both the much larger Intercept value and also ignore the fact that the interaction term has now split the X4 (and X1) "effects" into multiple pieces.
>>
>> You need to interpret the effects of X1 in the context of a specification of a particular X4 value and not forget that the Intercept should not be ignored. It appears to me that the estimates of the mfx package are essentially meaningless with the problem you have thrown at it.
>>
>>> a
>> Call:
>> logitmfx(formula = model, data = data, atmean = TRUE)
>>
>> Marginal Effects:
>>       dF/dx Std. Err.       z   P>|z|  
>> X1  0.147532  0.087865  1.6791 0.09314 .
>> X2  0.015085  0.193888  0.0778 0.93798  
>> X3  0.040309  0.063324  0.6366 0.52441  
>> X4 -0.050393  0.092947 -0.5422 0.58770  
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> dF/dx is for discrete change for the following variables:
>>
>> [1] "X1" "X2" "X4"
>>> b
>> Call:
>> logitmfx(formula = modelInteraction, data = data, atmean = TRUE)
>>
>> Marginal Effects:
>>            dF/dx   Std. Err.         z  P>|z|    
>> X1    -1.0000e+00  1.2121e-07 -8.25e+06 <2e-16 ***
>> X2     6.5595e-03  8.1616e-01  8.00e-03 0.9936    
>> X3     1.6312e-02  2.0326e+00  8.00e-03 0.9936    
>> X4    -9.6831e-01  1.5806e+01 -6.13e-02 0.9511    
>> X1:X4  8.0703e-01  1.4572e+01  5.54e-02 0.9558    
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> dF/dx is for discrete change for the following variables:
>>
>> [1] "X1" "X2" "X4"
>>
>> I see no sensible interpretation of the phrase "X1 effect" in the comparison tables above. The "p-value" in the second table appears to be nonsense induced by throwing a model formulation that was not anticipated. There is a negligible improvement in the glm fits:
>>
>>> anova(model,modelInteraction)
>> Analysis of Deviance Table
>>
>> Model 1: Y ~ X1 + X2 + X3 + X4
>> Model 2: Y ~ X1 + X2 + X3 + X4 + X1 * X4
>>  Resid. Df Resid. Dev Df Deviance
>> 1        64     62.274            
>> 2        63     61.495  1  0.77908
>>
>>
>> So the notion that the "X1 effect" is now "highly significant" where it was before not even suggestive of significance seem to point to either an error in the underlying theory or a failure to anticipate and trap (and warn the user) that an erroneous model (or at least an unanticipated model) is being passed to a procedure.
>>
>> At least the 'effects- package gives you a tiny warning about this issue, although I think it really should throw an informative error when a user attempts to estimate only a "main effect" in a model that has an interaction involving such a covariate:
>>
>>> library(effects)
>>
>>> effect('X1', model)
>>
>> X1 effect
>> X1
>>         0        0.2        0.4        0.6        0.8          1
>> 0.06706123 0.08582757 0.10923061 0.13805139 0.17299973 0.21459275
>>> effect('X1', modelInteraction)
>> NOTE: X1 is not a high-order term in the model
>>
>> X1 effect
>> X1
>>           0          0.2          0.4          0.6          0.8            1
>> 0.0002418661 0.0009864740 0.0040142251 0.0161843996 0.0629206979 0.2151098752
>>> effect('X1:X4', modelInteraction)
>>
>> X1*X4 effect
>>     X4
>> X1            6         6.2          6.4          6.6          6.8            7
>>  0   0.1100479 0.005686142 0.0002643982 1.223058e-05 5.656287e-07 2.615838e-08
>>  0.2 0.1285241 0.012352473 0.0010595321 8.994106e-05 7.628099e-06 6.469071e-07
>>  0.4 0.1495811 0.026625017 0.0042357682 6.610806e-04 1.028639e-04 1.599803e-05
>>  0.6 0.1734015 0.056446132 0.0167737545 4.841483e-03 1.385458e-03 3.954877e-04
>>  0.8 0.2001225 0.115698003 0.0640377838 3.454327e-02 1.836677e-02 9.689636e-03
>>  1   0.2298165 0.222481442 0.2153150766 2.083177e-01 2.014893e-01 1.948297e-01
>>
>> --
>> David.
>>
>>
>>
>>>
>>>
>>> I have been looking around for the solutions but haven't been able to find any. I would appreciate any suggestions.
>>>
>>> A reproducible sample:
>>>
>>>> dput(data)
>>> structure(list(Y = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
>>> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>> 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
>>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
>>> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1 = c(1L, 0L, 1L,
>>> 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L,
>>> 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
>>> 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 0L), X2 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
>>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>> 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
>>> 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 0L, 0L, 0L, 0L,
>>> 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 0L, 0L,
>>> 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
>>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
>>> ), X4 = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
>>> 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L,
>>> 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
>>> 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L,
>>> 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L)), .Names = c("Y", "X1", "X2",
>>> "X3", "X4"), row.names = c(NA, -69L), class = "data.frame")
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.