Dependent Variable in Logistic Regression

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Dependent Variable in Logistic Regression

PaulJr
Dear friends,

Hope you are doing great. I want to fit a logistic regression in R, where
the dependent variable is the covid status (I used 1 for covid positives,
and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.

What would be more advisable, to keep the dependent variable with 1s and
0s, or code it as yes/no and then make it a factor?

Any guidance will be greatly appreciated,

Best regards,

Paul

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Bert Gunter-2
x <- factor(0:1)
x <- factor("yes","no")

will produce identical results up to labeling.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]> wrote:

> Dear friends,
>
> Hope you are doing great. I want to fit a logistic regression in R, where
> the dependent variable is the covid status (I used 1 for covid positives,
> and 0 for covid negatives), but when I ran the glm, R complains that I
> should make the dependent variable a factor.
>
> What would be more advisable, to keep the dependent variable with 1s and
> 0s, or code it as yes/no and then make it a factor?
>
> Any guidance will be greatly appreciated,
>
> Best regards,
>
> Paul
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Rich Shepard
In reply to this post by PaulJr
On Sat, 1 Aug 2020, Paul Bernal wrote:

> Hope you are doing great. I want to fit a logistic regression in R, where
> the dependent variable is the covid status (I used 1 for covid positives,
> and 0 for covid negatives), but when I ran the glm, R complains that I
> should make the dependent variable a factor.
>
> What would be more advisable, to keep the dependent variable with 1s and
> 0s, or code it as yes/no and then make it a factor?

Paul,

1 or 0 are equivalent to yes or no, success or failure. All are nomminal
variables so all should be factors, regardless of the coding.

Rich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

PaulJr
In reply to this post by Bert Gunter-2
Hi Bert,

Thank you for the kind reply.

But what if I don't turn the variable into a factor. Let's say that in
excel I just coded the variable as 1s and 0s and just imported the dataset
into R and fitted the logistic regression without turning any categorical
variable or dummy variable into a factor?

Does R requires every dummy variable to be treated as a factor?

Best regards,

Paul

El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
[hidden email]> escribió:

> x <- factor(0:1)
> x <- factor("yes","no")
>
> will produce identical results up to labeling.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
> wrote:
>
>> Dear friends,
>>
>> Hope you are doing great. I want to fit a logistic regression in R, where
>> the dependent variable is the covid status (I used 1 for covid positives,
>> and 0 for covid negatives), but when I ran the glm, R complains that I
>> should make the dependent variable a factor.
>>
>> What would be more advisable, to keep the dependent variable with 1s and
>> 0s, or code it as yes/no and then make it a factor?
>>
>> Any guidance will be greatly appreciated,
>>
>> Best regards,
>>
>> Paul
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Bert Gunter-2
You appear to be confusing a binomial **response** with categorical
"dependent variables." glm() of course fits continuous or categorical
dependent variables. If a continuous dependent variable has only 2 values,
the results for glm() will be the same whether or not it is considered to
be continuous or categorical, though you may not recognize it as such.

This discussion has already wandered off topic to statistical issues. I
will not comment further on or off list. I suggest you consult a good
reference on linear/generalized linear models or talk with a local
statistician.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Aug 1, 2020 at 11:04 AM Paul Bernal <[hidden email]> wrote:

> Hi Bert,
>
> Thank you for the kind reply.
>
> But what if I don't turn the variable into a factor. Let's say that in
> excel I just coded the variable as 1s and 0s and just imported the dataset
> into R and fitted the logistic regression without turning any categorical
> variable or dummy variable into a factor?
>
> Does R requires every dummy variable to be treated as a factor?
>
> Best regards,
>
> Paul
>
> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
> [hidden email]> escribió:
>
>> x <- factor(0:1)
>> x <- factor("yes","no")
>>
>> will produce identical results up to labeling.
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
>> wrote:
>>
>>> Dear friends,
>>>
>>> Hope you are doing great. I want to fit a logistic regression in R, where
>>> the dependent variable is the covid status (I used 1 for covid positives,
>>> and 0 for covid negatives), but when I ran the glm, R complains that I
>>> should make the dependent variable a factor.
>>>
>>> What would be more advisable, to keep the dependent variable with 1s and
>>> 0s, or code it as yes/no and then make it a factor?
>>>
>>> Any guidance will be greatly appreciated,
>>>
>>> Best regards,
>>>
>>> Paul
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Bert Gunter-2
Sorry, typo.My first sentences should read:

"You appear to be confusing a binomial **response** with categorical
"independent variables." glm() of course fits continuous or categorical
independent variables."

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Aug 1, 2020 at 11:11 AM Bert Gunter <[hidden email]> wrote:

> You appear to be confusing a binomial **response** with categorical
> "dependent variables." glm() of course fits continuous or categorical
> dependent variables. If a continuous dependent variable has only 2 values,
> the results for glm() will be the same whether or not it is considered to
> be continuous or categorical, though you may not recognize it as such.
>
> This discussion has already wandered off topic to statistical issues. I
> will not comment further on or off list. I suggest you consult a good
> reference on linear/generalized linear models or talk with a local
> statistician.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, Aug 1, 2020 at 11:04 AM Paul Bernal <[hidden email]>
> wrote:
>
>> Hi Bert,
>>
>> Thank you for the kind reply.
>>
>> But what if I don't turn the variable into a factor. Let's say that in
>> excel I just coded the variable as 1s and 0s and just imported the dataset
>> into R and fitted the logistic regression without turning any categorical
>> variable or dummy variable into a factor?
>>
>> Does R requires every dummy variable to be treated as a factor?
>>
>> Best regards,
>>
>> Paul
>>
>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>> [hidden email]> escribió:
>>
>>> x <- factor(0:1)
>>> x <- factor("yes","no")
>>>
>>> will produce identical results up to labeling.
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
>>> wrote:
>>>
>>>> Dear friends,
>>>>
>>>> Hope you are doing great. I want to fit a logistic regression in R,
>>>> where
>>>> the dependent variable is the covid status (I used 1 for covid
>>>> positives,
>>>> and 0 for covid negatives), but when I ran the glm, R complains that I
>>>> should make the dependent variable a factor.
>>>>
>>>> What would be more advisable, to keep the dependent variable with 1s and
>>>> 0s, or code it as yes/no and then make it a factor?
>>>>
>>>> Any guidance will be greatly appreciated,
>>>>
>>>> Best regards,
>>>>
>>>> Paul
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Bert Gunter-2
In reply to this post by Bert Gunter-2
... and further:
" If a continuous independent variable has only 2 values,..."

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Aug 1, 2020 at 11:11 AM Bert Gunter <[hidden email]> wrote:

> You appear to be confusing a binomial **response** with categorical
> "dependent variables." glm() of course fits continuous or categorical
> dependent variables. If a continuous dependent variable has only 2 values,
> the results for glm() will be the same whether or not it is considered to
> be continuous or categorical, though you may not recognize it as such.
>
> This discussion has already wandered off topic to statistical issues. I
> will not comment further on or off list. I suggest you consult a good
> reference on linear/generalized linear models or talk with a local
> statistician.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, Aug 1, 2020 at 11:04 AM Paul Bernal <[hidden email]>
> wrote:
>
>> Hi Bert,
>>
>> Thank you for the kind reply.
>>
>> But what if I don't turn the variable into a factor. Let's say that in
>> excel I just coded the variable as 1s and 0s and just imported the dataset
>> into R and fitted the logistic regression without turning any categorical
>> variable or dummy variable into a factor?
>>
>> Does R requires every dummy variable to be treated as a factor?
>>
>> Best regards,
>>
>> Paul
>>
>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>> [hidden email]> escribió:
>>
>>> x <- factor(0:1)
>>> x <- factor("yes","no")
>>>
>>> will produce identical results up to labeling.
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
>>> wrote:
>>>
>>>> Dear friends,
>>>>
>>>> Hope you are doing great. I want to fit a logistic regression in R,
>>>> where
>>>> the dependent variable is the covid status (I used 1 for covid
>>>> positives,
>>>> and 0 for covid negatives), but when I ran the glm, R complains that I
>>>> should make the dependent variable a factor.
>>>>
>>>> What would be more advisable, to keep the dependent variable with 1s and
>>>> 0s, or code it as yes/no and then make it a factor?
>>>>
>>>> Any guidance will be greatly appreciated,
>>>>
>>>> Best regards,
>>>>
>>>> Paul
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Patrick (Malone Quantitative)
In reply to this post by PaulJr
No, R does not. glm() does in order to do logistic regression.

On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <[hidden email]> wrote:

> Hi Bert,
>
> Thank you for the kind reply.
>
> But what if I don't turn the variable into a factor. Let's say that in
> excel I just coded the variable as 1s and 0s and just imported the dataset
> into R and fitted the logistic regression without turning any categorical
> variable or dummy variable into a factor?
>
> Does R requires every dummy variable to be treated as a factor?
>
> Best regards,
>
> Paul
>
> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
> [hidden email]> escribió:
>
> > x <- factor(0:1)
> > x <- factor("yes","no")
> >
> > will produce identical results up to labeling.
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
> > wrote:
> >
> >> Dear friends,
> >>
> >> Hope you are doing great. I want to fit a logistic regression in R,
> where
> >> the dependent variable is the covid status (I used 1 for covid
> positives,
> >> and 0 for covid negatives), but when I ran the glm, R complains that I
> >> should make the dependent variable a factor.
> >>
> >> What would be more advisable, to keep the dependent variable with 1s and
> >> 0s, or code it as yes/no and then make it a factor?
> >>
> >> Any guidance will be greatly appreciated,
> >>
> >> Best regards,
> >>
> >> Paul
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Patrick S. Malone, Ph.D., Malone Quantitative
NEW Service Models: http://malonequantitative.com

He/Him/His

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Bert Gunter-2
... yes, but so does lm() for a categorical **INdependent** variable with
more than 2 numerically labeled levels. n levels  = (n-1) df for a
categorical covariate, but 1 for a continuous one (unless more complex
models are explicitly specified of course). As I said, the OP seems
confused about whether he is referring to the response or covariates. Or
maybe he just made the same typo I did.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
[hidden email]> wrote:

> No, R does not. glm() does in order to do logistic regression.
>
> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <[hidden email]> wrote:
>
>> Hi Bert,
>>
>> Thank you for the kind reply.
>>
>> But what if I don't turn the variable into a factor. Let's say that in
>> excel I just coded the variable as 1s and 0s and just imported the dataset
>> into R and fitted the logistic regression without turning any categorical
>> variable or dummy variable into a factor?
>>
>> Does R requires every dummy variable to be treated as a factor?
>>
>> Best regards,
>>
>> Paul
>>
>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>> [hidden email]> escribió:
>>
>> > x <- factor(0:1)
>> > x <- factor("yes","no")
>> >
>> > will produce identical results up to labeling.
>> >
>> >
>> > Bert Gunter
>> >
>> > "The trouble with having an open mind is that people keep coming along
>> and
>> > sticking things into it."
>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >
>> >
>> > On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
>> > wrote:
>> >
>> >> Dear friends,
>> >>
>> >> Hope you are doing great. I want to fit a logistic regression in R,
>> where
>> >> the dependent variable is the covid status (I used 1 for covid
>> positives,
>> >> and 0 for covid negatives), but when I ran the glm, R complains that I
>> >> should make the dependent variable a factor.
>> >>
>> >> What would be more advisable, to keep the dependent variable with 1s
>> and
>> >> 0s, or code it as yes/no and then make it a factor?
>> >>
>> >> Any guidance will be greatly appreciated,
>> >>
>> >> Best regards,
>> >>
>> >> Paul
>> >>
>> >>         [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Patrick S. Malone, Ph.D., Malone Quantitative
> NEW Service Models: http://malonequantitative.com
>
> He/Him/His
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

PaulJr
Dear friend,

I am aware that I have a binomial dependent variable, which is covid status
(1 if covid positive, and 0 otherwise).

My question was if R requires to turn a binomial response variable into a
factor or not, that's all.

Cheers,

Paul

El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter <[hidden email]>
escribió:

> ... yes, but so does lm() for a categorical **INdependent** variable with
> more than 2 numerically labeled levels. n levels  = (n-1) df for a
> categorical covariate, but 1 for a continuous one (unless more complex
> models are explicitly specified of course). As I said, the OP seems
> confused about whether he is referring to the response or covariates. Or
> maybe he just made the same typo I did.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
> [hidden email]> wrote:
>
>> No, R does not. glm() does in order to do logistic regression.
>>
>> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <[hidden email]>
>> wrote:
>>
>>> Hi Bert,
>>>
>>> Thank you for the kind reply.
>>>
>>> But what if I don't turn the variable into a factor. Let's say that in
>>> excel I just coded the variable as 1s and 0s and just imported the
>>> dataset
>>> into R and fitted the logistic regression without turning any categorical
>>> variable or dummy variable into a factor?
>>>
>>> Does R requires every dummy variable to be treated as a factor?
>>>
>>> Best regards,
>>>
>>> Paul
>>>
>>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>>> [hidden email]> escribió:
>>>
>>> > x <- factor(0:1)
>>> > x <- factor("yes","no")
>>> >
>>> > will produce identical results up to labeling.
>>> >
>>> >
>>> > Bert Gunter
>>> >
>>> > "The trouble with having an open mind is that people keep coming along
>>> and
>>> > sticking things into it."
>>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >
>>> >
>>> > On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
>>> > wrote:
>>> >
>>> >> Dear friends,
>>> >>
>>> >> Hope you are doing great. I want to fit a logistic regression in R,
>>> where
>>> >> the dependent variable is the covid status (I used 1 for covid
>>> positives,
>>> >> and 0 for covid negatives), but when I ran the glm, R complains that I
>>> >> should make the dependent variable a factor.
>>> >>
>>> >> What would be more advisable, to keep the dependent variable with 1s
>>> and
>>> >> 0s, or code it as yes/no and then make it a factor?
>>> >>
>>> >> Any guidance will be greatly appreciated,
>>> >>
>>> >> Best regards,
>>> >>
>>> >> Paul
>>> >>
>>> >>         [[alternative HTML version deleted]]
>>> >>
>>> >> ______________________________________________
>>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >>
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> Patrick S. Malone, Ph.D., Malone Quantitative
>> NEW Service Models: http://malonequantitative.com
>>
>> He/Him/His
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Patrick (Malone Quantitative)
In reply to this post by Bert Gunter-2
I didn't mean to imply that was the only time that it was required, only
that it's not universal in R.

On Sat, Aug 1, 2020 at 2:22 PM Bert Gunter <[hidden email]> wrote:

> ... yes, but so does lm() for a categorical **INdependent** variable with
> more than 2 numerically labeled levels. n levels  = (n-1) df for a
> categorical covariate, but 1 for a continuous one (unless more complex
> models are explicitly specified of course). As I said, the OP seems
> confused about whether he is referring to the response or covariates. Or
> maybe he just made the same typo I did.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
> [hidden email]> wrote:
>
>> No, R does not. glm() does in order to do logistic regression.
>>
>> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <[hidden email]>
>> wrote:
>>
>>> Hi Bert,
>>>
>>> Thank you for the kind reply.
>>>
>>> But what if I don't turn the variable into a factor. Let's say that in
>>> excel I just coded the variable as 1s and 0s and just imported the
>>> dataset
>>> into R and fitted the logistic regression without turning any categorical
>>> variable or dummy variable into a factor?
>>>
>>> Does R requires every dummy variable to be treated as a factor?
>>>
>>> Best regards,
>>>
>>> Paul
>>>
>>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>>> [hidden email]> escribió:
>>>
>>> > x <- factor(0:1)
>>> > x <- factor("yes","no")
>>> >
>>> > will produce identical results up to labeling.
>>> >
>>> >
>>> > Bert Gunter
>>> >
>>> > "The trouble with having an open mind is that people keep coming along
>>> and
>>> > sticking things into it."
>>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >
>>> >
>>> > On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
>>> > wrote:
>>> >
>>> >> Dear friends,
>>> >>
>>> >> Hope you are doing great. I want to fit a logistic regression in R,
>>> where
>>> >> the dependent variable is the covid status (I used 1 for covid
>>> positives,
>>> >> and 0 for covid negatives), but when I ran the glm, R complains that I
>>> >> should make the dependent variable a factor.
>>> >>
>>> >> What would be more advisable, to keep the dependent variable with 1s
>>> and
>>> >> 0s, or code it as yes/no and then make it a factor?
>>> >>
>>> >> Any guidance will be greatly appreciated,
>>> >>
>>> >> Best regards,
>>> >>
>>> >> Paul
>>> >>
>>> >>         [[alternative HTML version deleted]]
>>> >>
>>> >> ______________________________________________
>>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >>
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> Patrick S. Malone, Ph.D., Malone Quantitative
>> NEW Service Models: http://malonequantitative.com
>>
>> He/Him/His
>>
>

--
Patrick S. Malone, Ph.D., Malone Quantitative
NEW Service Models: http://malonequantitative.com

He/Him/His

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Fox, John
In reply to this post by Bert Gunter-2
Dear Paul,

I think that this thread has gotten unnecessarily complicated. The
answer, as is easily demonstrated, is that a binary response for a
binomial GLM in glm() may be a factor, a numeric variable, or a logical
variable, with identical results; for example:

--------------- snip -------------

 > set.seed(123)

 > head(x <- rnorm(100))
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499

 > head(y <- rbinom(100, 1, 1/(1 + exp(-x))))
[1] 0 1 1 1 1 0

 > head(yf <- as.factor(y))
[1] 0 1 1 1 1 0
Levels: 0 1

 > head(yl <- y == 1)
[1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE

 > glm(y ~ x, family=binomial)

Call:  glm(formula = y ~ x, family = binomial)

Coefficients:
(Intercept)            x
      0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:    134.6
Residual Deviance: 114.9 AIC: 118.9

 > glm(yf ~ x, family=binomial)

Call:  glm(formula = yf ~ x, family = binomial)

Coefficients:
(Intercept)            x
      0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:    134.6
Residual Deviance: 114.9 AIC: 118.9

 > glm(yl ~ x, family=binomial)

Call:  glm(formula = yl ~ x, family = binomial)

Coefficients:
(Intercept)            x
      0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:    134.6
Residual Deviance: 114.9 AIC: 118.9

--------------- snip -------------

The original poster claimed to have encountered an error with a 0/1
numeric response, but didn't show any data or even a command. I suspect
that the response was a character variable, but of course can't really
know that.

Best,
  John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-08-01 2:25 p.m., Paul Bernal wrote:

> Dear friend,
>
> I am aware that I have a binomial dependent variable, which is covid status
> (1 if covid positive, and 0 otherwise).
>
> My question was if R requires to turn a binomial response variable into a
> factor or not, that's all.
>
> Cheers,
>
> Paul
>
> El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter <[hidden email]>
> escribió:
>
>> ... yes, but so does lm() for a categorical **INdependent** variable with
>> more than 2 numerically labeled levels. n levels  = (n-1) df for a
>> categorical covariate, but 1 for a continuous one (unless more complex
>> models are explicitly specified of course). As I said, the OP seems
>> confused about whether he is referring to the response or covariates. Or
>> maybe he just made the same typo I did.
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
>> [hidden email]> wrote:
>>
>>> No, R does not. glm() does in order to do logistic regression.
>>>
>>> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <[hidden email]>
>>> wrote:
>>>
>>>> Hi Bert,
>>>>
>>>> Thank you for the kind reply.
>>>>
>>>> But what if I don't turn the variable into a factor. Let's say that in
>>>> excel I just coded the variable as 1s and 0s and just imported the
>>>> dataset
>>>> into R and fitted the logistic regression without turning any categorical
>>>> variable or dummy variable into a factor?
>>>>
>>>> Does R requires every dummy variable to be treated as a factor?
>>>>
>>>> Best regards,
>>>>
>>>> Paul
>>>>
>>>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>>>> [hidden email]> escribió:
>>>>
>>>>> x <- factor(0:1)
>>>>> x <- factor("yes","no")
>>>>>
>>>>> will produce identical results up to labeling.
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming along
>>>> and
>>>>> sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>>
>>>>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Dear friends,
>>>>>>
>>>>>> Hope you are doing great. I want to fit a logistic regression in R,
>>>> where
>>>>>> the dependent variable is the covid status (I used 1 for covid
>>>> positives,
>>>>>> and 0 for covid negatives), but when I ran the glm, R complains that I
>>>>>> should make the dependent variable a factor.
>>>>>>
>>>>>> What would be more advisable, to keep the dependent variable with 1s
>>>> and
>>>>>> 0s, or code it as yes/no and then make it a factor?
>>>>>>
>>>>>> Any guidance will be greatly appreciated,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Paul
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> --
>>> Patrick S. Malone, Ph.D., Malone Quantitative
>>> NEW Service Models: http://malonequantitative.com
>>>
>>> He/Him/His
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Rui Barradas
In reply to this post by PaulJr
Hello,

 From the documentation, help('glm'):


      Details

A typical predictor has the form|response ~ terms|where|response|is the
(numeric) response vector and|terms|is a series of terms which specifies
a linear predictor for|response|.
For|binomial|and|quasibinomial|families the response can also be
specified as a|factor
<http://127.0.0.1:11611/library/stats/help/factor>|(when the first level
denotes failure and all others success) or as a two-column matrix with
the columns giving the numbers of successes and failures. A terms
specification of the form|first + second|indicates all the terms
in|first|together with all the terms in|second|with any duplicates removed.


There is no need for the response to be a factor, it is optional, the
wording is very clear,

"For|binomial|and|quasibinomial|families the response *can* also be
specified as a|factor <http://127.0.0.1:11611/library/stats/help/factor>"|

And with binary, numeric responses I cannot reproduce the warning
message, the models fit silently.


Hope this helps,

Rui Barradas




Às 18:39 de 01/08/2020, Paul Bernal escreveu:

> Dear friends,
>
> Hope you are doing great. I want to fit a logistic regression in R, where
> the dependent variable is the covid status (I used 1 for covid positives,
> and 0 for covid negatives), but when I ran the glm, R complains that I
> should make the dependent variable a factor.
>
> What would be more advisable, to keep the dependent variable with 1s and
> 0s, or code it as yes/no and then make it a factor?
>
> Any guidance will be greatly appreciated,
>
> Best regards,
>
> Paul
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

Rui Barradas
In reply to this post by Fox, John
Hello,

Inline.

Às 20:01 de 01/08/2020, John Fox escreveu:

> Dear Paul,
>
> I think that this thread has gotten unnecessarily complicated. The
> answer, as is easily demonstrated, is that a binary response for a
> binomial GLM in glm() may be a factor, a numeric variable, or a
> logical variable, with identical results; for example:
>
> --------------- snip -------------
>
> > set.seed(123)
>
> > head(x <- rnorm(100))
> [1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774
> 1.71506499
>
> > head(y <- rbinom(100, 1, 1/(1 + exp(-x))))
> [1] 0 1 1 1 1 0
>
> > head(yf <- as.factor(y))
> [1] 0 1 1 1 1 0
> Levels: 0 1
>
> > head(yl <- y == 1)
> [1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE
>
> > glm(y ~ x, family=binomial)
>
> Call:  glm(formula = y ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>      0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:        134.6
> Residual Deviance: 114.9     AIC: 118.9
>
> > glm(yf ~ x, family=binomial)
>
> Call:  glm(formula = yf ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>      0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:        134.6
> Residual Deviance: 114.9     AIC: 118.9
>
> > glm(yl ~ x, family=binomial)
>
> Call:  glm(formula = yl ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>      0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:        134.6
> Residual Deviance: 114.9     AIC: 118.9
>
> --------------- snip -------------
>
> The original poster claimed to have encountered an error with a 0/1
> numeric response, but didn't show any data or even a command. I
> suspect that the response was a character variable, but of course
> can't really know that.

So continuing with your example:

 > head(yc <- as.character(y))
[1] "0" "1" "1" "1" "1" "0"
 > glm(yc ~ x, family=binomial)
Error in weights * y : non-numeric argument to binary operator


But the OP says that

[...] R complains that I should make the dependent variable a factor.

That is not what the error message says, it "asks" for a numeric
argument to the '*' operator.
We haven't seen the exact R message yet, so, like others have said, the
OP should post it along with code.

Hope this helps,

Rui Barradas

>
> Best,
>  John
>
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
>
> On 2020-08-01 2:25 p.m., Paul Bernal wrote:
>> Dear friend,
>>
>> I am aware that I have a binomial dependent variable, which is covid
>> status
>> (1 if covid positive, and 0 otherwise).
>>
>> My question was if R requires to turn a binomial response variable
>> into a
>> factor or not, that's all.
>>
>> Cheers,
>>
>> Paul
>>
>> El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter
>> <[hidden email]>
>> escribió:
>>
>>> ... yes, but so does lm() for a categorical **INdependent** variable
>>> with
>>> more than 2 numerically labeled levels. n levels  = (n-1) df for a
>>> categorical covariate, but 1 for a continuous one (unless more complex
>>> models are explicitly specified of course). As I said, the OP seems
>>> confused about whether he is referring to the response or
>>> covariates. Or
>>> maybe he just made the same typo I did.
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming
>>> along and
>>> sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
>>> [hidden email]> wrote:
>>>
>>>> No, R does not. glm() does in order to do logistic regression.
>>>>
>>>> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi Bert,
>>>>>
>>>>> Thank you for the kind reply.
>>>>>
>>>>> But what if I don't turn the variable into a factor. Let's say
>>>>> that in
>>>>> excel I just coded the variable as 1s and 0s and just imported the
>>>>> dataset
>>>>> into R and fitted the logistic regression without turning any
>>>>> categorical
>>>>> variable or dummy variable into a factor?
>>>>>
>>>>> Does R requires every dummy variable to be treated as a factor?
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Paul
>>>>>
>>>>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>>>>> [hidden email]> escribió:
>>>>>
>>>>>> x <- factor(0:1)
>>>>>> x <- factor("yes","no")
>>>>>>
>>>>>> will produce identical results up to labeling.
>>>>>>
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that people keep coming
>>>>>> along
>>>>> and
>>>>>> sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear friends,
>>>>>>>
>>>>>>> Hope you are doing great. I want to fit a logistic regression in R,
>>>>> where
>>>>>>> the dependent variable is the covid status (I used 1 for covid
>>>>> positives,
>>>>>>> and 0 for covid negatives), but when I ran the glm, R complains
>>>>>>> that I
>>>>>>> should make the dependent variable a factor.
>>>>>>>
>>>>>>> What would be more advisable, to keep the dependent variable
>>>>>>> with 1s
>>>>> and
>>>>>>> 0s, or code it as yes/no and then make it a factor?
>>>>>>>
>>>>>>> Any guidance will be greatly appreciated,
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Paul
>>>>>>>
>>>>>>>          [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>> --
>>>> Patrick S. Malone, Ph.D., Malone Quantitative
>>>> NEW Service Models: http://malonequantitative.com
>>>>
>>>> He/Him/His
>>>>
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dependent Variable in Logistic Regression

R help mailing list-2
In reply to this post by Fox, John
I like using a logical response in cases like this, but put its
construction in the formula so it is unambiguous when I look at the
results later.
> d <- data.frame(Covid=c("Pos","Pos","Neg","Pos","Neg","Neg"), Age=41:46)
> glm(family=binomial, data=d, Covid=="Pos"~Age)

Call:  glm(formula = Covid == "Pos" ~ Age, family = binomial, data = d)

Coefficients:
(Intercept)          Age
     52.810       -1.214

Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
Null Deviance:      8.318
Residual Deviance: 4.956        AIC: 8.956


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Aug 1, 2020 at 12:21 PM John Fox <[hidden email]> wrote:

>
> Dear Paul,
>
> I think that this thread has gotten unnecessarily complicated. The
> answer, as is easily demonstrated, is that a binary response for a
> binomial GLM in glm() may be a factor, a numeric variable, or a logical
> variable, with identical results; for example:
>
> --------------- snip -------------
>
>  > set.seed(123)
>
>  > head(x <- rnorm(100))
> [1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499
>
>  > head(y <- rbinom(100, 1, 1/(1 + exp(-x))))
> [1] 0 1 1 1 1 0
>
>  > head(yf <- as.factor(y))
> [1] 0 1 1 1 1 0
> Levels: 0 1
>
>  > head(yl <- y == 1)
> [1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE
>
>  > glm(y ~ x, family=binomial)
>
> Call:  glm(formula = y ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>       0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:      134.6
> Residual Deviance: 114.9        AIC: 118.9
>
>  > glm(yf ~ x, family=binomial)
>
> Call:  glm(formula = yf ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>       0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:      134.6
> Residual Deviance: 114.9        AIC: 118.9
>
>  > glm(yl ~ x, family=binomial)
>
> Call:  glm(formula = yl ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>       0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:      134.6
> Residual Deviance: 114.9        AIC: 118.9
>
> --------------- snip -------------
>
> The original poster claimed to have encountered an error with a 0/1
> numeric response, but didn't show any data or even a command. I suspect
> that the response was a character variable, but of course can't really
> know that.
>
> Best,
>   John
>
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
>
> On 2020-08-01 2:25 p.m., Paul Bernal wrote:
> > Dear friend,
> >
> > I am aware that I have a binomial dependent variable, which is covid status
> > (1 if covid positive, and 0 otherwise).
> >
> > My question was if R requires to turn a binomial response variable into a
> > factor or not, that's all.
> >
> > Cheers,
> >
> > Paul
> >
> > El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter <[hidden email]>
> > escribió:
> >
> >> ... yes, but so does lm() for a categorical **INdependent** variable with
> >> more than 2 numerically labeled levels. n levels  = (n-1) df for a
> >> categorical covariate, but 1 for a continuous one (unless more complex
> >> models are explicitly specified of course). As I said, the OP seems
> >> confused about whether he is referring to the response or covariates. Or
> >> maybe he just made the same typo I did.
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along and
> >> sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
> >> [hidden email]> wrote:
> >>
> >>> No, R does not. glm() does in order to do logistic regression.
> >>>
> >>> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <[hidden email]>
> >>> wrote:
> >>>
> >>>> Hi Bert,
> >>>>
> >>>> Thank you for the kind reply.
> >>>>
> >>>> But what if I don't turn the variable into a factor. Let's say that in
> >>>> excel I just coded the variable as 1s and 0s and just imported the
> >>>> dataset
> >>>> into R and fitted the logistic regression without turning any categorical
> >>>> variable or dummy variable into a factor?
> >>>>
> >>>> Does R requires every dummy variable to be treated as a factor?
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Paul
> >>>>
> >>>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
> >>>> [hidden email]> escribió:
> >>>>
> >>>>> x <- factor(0:1)
> >>>>> x <- factor("yes","no")
> >>>>>
> >>>>> will produce identical results up to labeling.
> >>>>>
> >>>>>
> >>>>> Bert Gunter
> >>>>>
> >>>>> "The trouble with having an open mind is that people keep coming along
> >>>> and
> >>>>> sticking things into it."
> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>>>
> >>>>>
> >>>>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <[hidden email]>
> >>>>> wrote:
> >>>>>
> >>>>>> Dear friends,
> >>>>>>
> >>>>>> Hope you are doing great. I want to fit a logistic regression in R,
> >>>> where
> >>>>>> the dependent variable is the covid status (I used 1 for covid
> >>>> positives,
> >>>>>> and 0 for covid negatives), but when I ran the glm, R complains that I
> >>>>>> should make the dependent variable a factor.
> >>>>>>
> >>>>>> What would be more advisable, to keep the dependent variable with 1s
> >>>> and
> >>>>>> 0s, or code it as yes/no and then make it a factor?
> >>>>>>
> >>>>>> Any guidance will be greatly appreciated,
> >>>>>>
> >>>>>> Best regards,
> >>>>>>
> >>>>>> Paul
> >>>>>>
> >>>>>>          [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide
> >>>>>> http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>
> >>>>>
> >>>>
> >>>>          [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>
> >>> --
> >>> Patrick S. Malone, Ph.D., Malone Quantitative
> >>> NEW Service Models: http://malonequantitative.com
> >>>
> >>> He/Him/His
> >>>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Dependent Variable in Logistic Regression

Rolf Turner
In reply to this post by PaulJr

On 2/08/20 5:39 am, Paul Bernal wrote:

> Dear friends,
>
> Hope you are doing great. I want to fit a logistic regression in R, where
> the dependent variable is the covid status (I used 1 for covid positives,
> and 0 for covid negatives), but when I ran the glm, R complains that I
> should make the dependent variable a factor.
>
> What would be more advisable, to keep the dependent variable with 1s and
> 0s, or code it as yes/no and then make it a factor?
>
> Any guidance will be greatly appreciated,


There have been many responses to this post, the majority of them being
confusing and off the point.

BOTTOM LINE:  R/glm() does *NOT* complain that one "should make the
dependent variable a factor".   This is bovine faecal output.

As Rui Barradas has pointed out (alternatively: RTFM!) when you fit a
Bernoulli model using glm(), your response/dependent variable is allowed
to be

     * a numeric variable with values 0 or 1
     * a logical variable
     * a factor with two levels

The OP presumably fed glm() a *character* vector with values "0" and
"1".  Doing *this* will cause glm() to whinge.

I reiterate:  RTFM!!!  (And perhaps learn to distinguish between
character vectors and factors.)

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Dependent Variable in Logistic Regression

Abby Spurdle
That's a bit harsh.
Isn't the best advice here, to post a reproducible example...
Which I believe has been mentioned.

Also, I'd strongly encourage people to use package+function name, for
this sort of thing.

    stats::glm

As there are many R functions for GLMs...


On Sun, Aug 2, 2020 at 12:47 PM Rolf Turner <[hidden email]> wrote:

>
>
> On 2/08/20 5:39 am, Paul Bernal wrote:
>
> > Dear friends,
> >
> > Hope you are doing great. I want to fit a logistic regression in R, where
> > the dependent variable is the covid status (I used 1 for covid positives,
> > and 0 for covid negatives), but when I ran the glm, R complains that I
> > should make the dependent variable a factor.
> >
> > What would be more advisable, to keep the dependent variable with 1s and
> > 0s, or code it as yes/no and then make it a factor?
> >
> > Any guidance will be greatly appreciated,
>
>
> There have been many responses to this post, the majority of them being
> confusing and off the point.
>
> BOTTOM LINE:  R/glm() does *NOT* complain that one "should make the
> dependent variable a factor".   This is bovine faecal output.
>
> As Rui Barradas has pointed out (alternatively: RTFM!) when you fit a
> Bernoulli model using glm(), your response/dependent variable is allowed
> to be
>
>      * a numeric variable with values 0 or 1
>      * a logical variable
>      * a factor with two levels
>
> The OP presumably fed glm() a *character* vector with values "0" and
> "1".  Doing *this* will cause glm() to whinge.
>
> I reiterate:  RTFM!!!  (And perhaps learn to distinguish between
> character vectors and factors.)
>
> cheers,
>
> Rolf Turner
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Dependent Variable in Logistic Regression

Martin Maechler
>>>>> Abby Spurdle
>>>>>     on Sun, 2 Aug 2020 15:13:51 +1200 writes:

    > That's a bit harsh.  Isn't the best advice here, to post a
    > reproducible example...  Which I believe has been
    > mentioned.

    > Also, I'd strongly encourage people to use
    > package+function name, for this sort of thing.

    >     stats::glm

    > As there are many R functions for GLMs...

Sorry, Abby, I do disagree here ((strongly enough as to warrant
this reply) :

We're talking about doing "basic" statistics with R,  and these
function in the stats package have been part of R even before
got a version number.

So, no,  glm()  {and the stats package} are the default and I still
think everybody should know and assume that.

Martin

    > On Sun, Aug 2, 2020 at 12:47 PM Rolf Turner
    > <[hidden email]> wrote:
    >>
    >>
    >> On 2/08/20 5:39 am, Paul Bernal wrote:
    >>
    >> > Dear friends,
    >> >
    >> > Hope you are doing great. I want to fit a logistic
    >> regression in R, where > the dependent variable is the
    >> covid status (I used 1 for covid positives, > and 0 for
    >> covid negatives), but when I ran the glm, R complains
    >> that I > should make the dependent variable a factor.
    >> >
    >> > What would be more advisable, to keep the dependent
    >> variable with 1s and > 0s, or code it as yes/no and then
    >> make it a factor?
    >> >
    >> > Any guidance will be greatly appreciated,
    >>
    >>
    >> There have been many responses to this post, the majority
    >> of them being confusing and off the point.
    >>
    >> BOTTOM LINE: R/glm() does *NOT* complain that one "should
    >> make the dependent variable a factor".  This is bovine
    >> faecal output.
    >>
    >> As Rui Barradas has pointed out (alternatively: RTFM!)
    >> when you fit a Bernoulli model using glm(), your
    >> response/dependent variable is allowed to be
    >>
    >> * a numeric variable with values 0 or 1 * a logical
    >> variable * a factor with two levels
    >>
    >> The OP presumably fed glm() a *character* vector with
    >> values "0" and "1".  Doing *this* will cause glm() to
    >> whinge.
    >>
    >> I reiterate: RTFM!!!  (And perhaps learn to distinguish
    >> between character vectors and factors.)
    >>
    >> cheers,
    >>
    >> Rolf Turner
    >>
    >> --
    >> Honorary Research Fellow Department of Statistics
    >> University of Auckland Phone: +64-9-373-7599 ext. 88276
    >>
    >> ______________________________________________
    >> [hidden email] mailing list -- To UNSUBSCRIBE and
    >> more, see https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide
    >> http://www.R-project.org/posting-guide.html and provide
    >> commented, minimal, self-contained, reproducible code.

    > ______________________________________________
    > [hidden email] mailing list -- To UNSUBSCRIBE and
    > more, see https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
    > http://www.R-project.org/posting-guide.html and provide
    > commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Dependent Variable in Logistic Regression

Abby Spurdle
> Sorry, Abby, I do disagree here ((strongly enough as to warrant
> this reply) :

Which part are you disagreeing with?
That unambiquous names/references should be used, or that there are
many R functions for GLMs.
The wording of your post, suggests (kind of), that there is only one R
function for GLMs.

> We're talking about doing "basic" statistics with R,  and these
> function in the stats package have been part of R even before
> got a version number.

Remember, not everyone is using the same R packages, as you.
And some people have done university courses, or done online courses,
etc, in R, without ever using one function from the stats package.

I'm reluctant to assume that all R users will have a common understanding.
And what may seem obvious to you or me, may seem quite foreign to some
users, or vice versa.

> So, no,  glm()  {and the stats package} are the default and I still
> think everybody should know and assume that.

But perhaps most importantly, the OP said "the glm".
He never said "glm()", but rather the subsequent posters did.

Rolf suggested his post was bullshit, after removing the lexical peroxide.
How does anyone know that it wasn't a genuine post, but in reference
to something other than stats::glm?

Shouldn't people be innocent until proven guilty.
Otherwise (something I have been guilty of in the past), the mailing
list turns into statistical propaganda...

Even if the OP was referring to stats::glm, I'm still inclined to feel
the post was legitimate, just a bit short on technical details...

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Dependent Variable in Logistic Regression

Bert Gunter-2
All: Kindly take this offline please.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 3, 2020 at 12:39 PM Abby Spurdle <[hidden email]> wrote:

> > Sorry, Abby, I do disagree here ((strongly enough as to warrant
> > this reply) :
>
> Which part are you disagreeing with?
> That unambiquous names/references should be used, or that there are
> many R functions for GLMs.
> The wording of your post, suggests (kind of), that there is only one R
> function for GLMs.
>
> > We're talking about doing "basic" statistics with R,  and these
> > function in the stats package have been part of R even before
> > got a version number.
>
> Remember, not everyone is using the same R packages, as you.
> And some people have done university courses, or done online courses,
> etc, in R, without ever using one function from the stats package.
>
> I'm reluctant to assume that all R users will have a common understanding.
> And what may seem obvious to you or me, may seem quite foreign to some
> users, or vice versa.
>
> > So, no,  glm()  {and the stats package} are the default and I still
> > think everybody should know and assume that.
>
> But perhaps most importantly, the OP said "the glm".
> He never said "glm()", but rather the subsequent posters did.
>
> Rolf suggested his post was bullshit, after removing the lexical peroxide.
> How does anyone know that it wasn't a genuine post, but in reference
> to something other than stats::glm?
>
> Shouldn't people be innocent until proven guilty.
> Otherwise (something I have been guilty of in the past), the mailing
> list turns into statistical propaganda...
>
> Even if the OP was referring to stats::glm, I'm still inclined to feel
> the post was legitimate, just a bit short on technical details...
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.