Quantcast

Factors in an regression using lm()

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Factors in an regression using lm()

Gabriel Bergin
Hi,

I am trying to do a multiple regression on the dataset "Hdma", available in
the Ecdat package.

The data looks like this:
> str(Hdma)
'data.frame': 2381 obs. of  13 variables:
 $ dir        : num  0.221 0.265 0.372 0.32 0.36 ...
 $ hir        : num  0.221 0.265 0.248 0.25 0.35 ...
 $ lvr        : num  0.8 0.922 0.92 0.86 0.6 ...
 $ ccs        : num  5 2 1 1 1 1 1 2 2 2 ...
 $ mcs        : num  2 2 2 2 1 1 2 2 2 1 ...
 $ pbcr       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ dmi        : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
 $ self       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ single     : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 2 1 1 2 ...
 $ uria       : num  3.9 3.2 3.2 4.3 3.2 ...
 $ comdominiom: num  0 0 0 0 0 0 1 0 0 0 ...
 $ black      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ deny       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...

I would like to try a more complex regression, but even this relatively
uncomplicated one returns an error:

summary(lm(deny ~ hir + dir + ccs + mcs + black))

The error I get is:
Error in storage.mode(y) <- "double" :
  invalid to change the storage mode of a factor
In addition: Warning message:
In model.response(mf, "numeric") :
  using type="numeric" with a factor response will be ignored

I understand that there is something wrong due to the fact that some of the
variables are factors. But as far as I've grasped, it should be possible to
include factor variables when using lm(). Am I in error in thinking this?

Sincerely,
Gabriel Bergin
Undergraduate economics student

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Factors in an regression using lm()

ONKELINX, Thierry
The problem is not in the covariates but in the respons variable. lm()
can only handle numerical variables. Deny is a factor, hence you get an
error.

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
[hidden email]
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
 

> -----Oorspronkelijk bericht-----
> Van: [hidden email]
> [mailto:[hidden email]] Namens Gabriel Bergin
> Verzonden: dinsdag 12 oktober 2010 11:39
> Aan: [hidden email]
> Onderwerp: [R] Factors in an regression using lm()
>
> Hi,
>
> I am trying to do a multiple regression on the dataset
> "Hdma", available in the Ecdat package.
>
> The data looks like this:
> > str(Hdma)
> 'data.frame': 2381 obs. of  13 variables:
>  $ dir        : num  0.221 0.265 0.372 0.32 0.36 ...
>  $ hir        : num  0.221 0.265 0.248 0.25 0.35 ...
>  $ lvr        : num  0.8 0.922 0.92 0.86 0.6 ...
>  $ ccs        : num  5 2 1 1 1 1 1 2 2 2 ...
>  $ mcs        : num  2 2 2 2 1 1 2 2 2 1 ...
>  $ pbcr       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>  $ dmi        : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
>  $ self       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>  $ single     : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 2 1 1 2 ...
>  $ uria       : num  3.9 3.2 3.2 4.3 3.2 ...
>  $ comdominiom: num  0 0 0 0 0 0 1 0 0 0 ...
>  $ black      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>  $ deny       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
>
> I would like to try a more complex regression, but even this
> relatively uncomplicated one returns an error:
>
> summary(lm(deny ~ hir + dir + ccs + mcs + black))
>
> The error I get is:
> Error in storage.mode(y) <- "double" :
>   invalid to change the storage mode of a factor In addition:
> Warning message:
> In model.response(mf, "numeric") :
>   using type="numeric" with a factor response will be ignored
>
> I understand that there is something wrong due to the fact
> that some of the variables are factors. But as far as I've
> grasped, it should be possible to include factor variables
> when using lm(). Am I in error in thinking this?
>
> Sincerely,
> Gabriel Bergin
> Undergraduate economics student
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Factors in an regression using lm()

Ivan Calandra
In reply to this post by Gabriel Bergin
  Hi,

Your response (dependent) variable, which has to be on the left side of
the '~' in the formula, should be numeric. In your example deny is a
factor; first problem
The explaining variables, on the right side of the '~', should be
factors. Here, hir, dir, css and mcs are numeric; second problem. Only
black is a factor.

There are two possibilities (not mutually exclusive):
- you should transform your factors into numeric and vice-versa as
needed, see ?factor and ?as.numeric, as well as StringAsFactor argument
from ?read.table (I guess you imported your data.frame that way)
- you should adjust your model formula. It might be that you mixed up
the variables in the formula. See ?formula

HTH,
Ivan

Le 10/12/2010 11:39, Gabriel Bergin a écrit :

> Hi,
>
> I am trying to do a multiple regression on the dataset "Hdma", available in
> the Ecdat package.
>
> The data looks like this:
>> str(Hdma)
> 'data.frame': 2381 obs. of  13 variables:
>   $ dir        : num  0.221 0.265 0.372 0.32 0.36 ...
>   $ hir        : num  0.221 0.265 0.248 0.25 0.35 ...
>   $ lvr        : num  0.8 0.922 0.92 0.86 0.6 ...
>   $ ccs        : num  5 2 1 1 1 1 1 2 2 2 ...
>   $ mcs        : num  2 2 2 2 1 1 2 2 2 1 ...
>   $ pbcr       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>   $ dmi        : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
>   $ self       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>   $ single     : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 2 1 1 2 ...
>   $ uria       : num  3.9 3.2 3.2 4.3 3.2 ...
>   $ comdominiom: num  0 0 0 0 0 0 1 0 0 0 ...
>   $ black      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>   $ deny       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
>
> I would like to try a more complex regression, but even this relatively
> uncomplicated one returns an error:
>
> summary(lm(deny ~ hir + dir + ccs + mcs + black))
>
> The error I get is:
> Error in storage.mode(y)<- "double" :
>    invalid to change the storage mode of a factor
> In addition: Warning message:
> In model.response(mf, "numeric") :
>    using type="numeric" with a factor response will be ignored
>
> I understand that there is something wrong due to the fact that some of the
> variables are factors. But as far as I've grasped, it should be possible to
> include factor variables when using lm(). Am I in error in thinking this?
>
> Sincerely,
> Gabriel Bergin
> Undergraduate economics student
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
[hidden email]

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Factors in an regression using lm()

Ivan Calandra
  Oops, my bad.
I rarely do regression, so I forgot that in your case the explaining
variables do not have to be factors.
The rest stands.
Ivan

Le 10/12/2010 11:56, Ivan Calandra a écrit :

>  Hi,
>
> Your response (dependent) variable, which has to be on the left side
> of the '~' in the formula, should be numeric. In your example deny is
> a factor; first problem
> The explaining variables, on the right side of the '~', should be
> factors. Here, hir, dir, css and mcs are numeric; second problem. Only
> black is a factor.
>
> There are two possibilities (not mutually exclusive):
> - you should transform your factors into numeric and vice-versa as
> needed, see ?factor and ?as.numeric, as well as StringAsFactor
> argument from ?read.table (I guess you imported your data.frame that way)
> - you should adjust your model formula. It might be that you mixed up
> the variables in the formula. See ?formula
>
> HTH,
> Ivan
>
> Le 10/12/2010 11:39, Gabriel Bergin a écrit :
>> Hi,
>>
>> I am trying to do a multiple regression on the dataset "Hdma",
>> available in
>> the Ecdat package.
>>
>> The data looks like this:
>>> str(Hdma)
>> 'data.frame': 2381 obs. of  13 variables:
>>   $ dir        : num  0.221 0.265 0.372 0.32 0.36 ...
>>   $ hir        : num  0.221 0.265 0.248 0.25 0.35 ...
>>   $ lvr        : num  0.8 0.922 0.92 0.86 0.6 ...
>>   $ ccs        : num  5 2 1 1 1 1 1 2 2 2 ...
>>   $ mcs        : num  2 2 2 2 1 1 2 2 2 1 ...
>>   $ pbcr       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>>   $ dmi        : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
>>   $ self       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>>   $ single     : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 2 1 1 2 ...
>>   $ uria       : num  3.9 3.2 3.2 4.3 3.2 ...
>>   $ comdominiom: num  0 0 0 0 0 0 1 0 0 0 ...
>>   $ black      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>>   $ deny       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
>>
>> I would like to try a more complex regression, but even this relatively
>> uncomplicated one returns an error:
>>
>> summary(lm(deny ~ hir + dir + ccs + mcs + black))
>>
>> The error I get is:
>> Error in storage.mode(y)<- "double" :
>>    invalid to change the storage mode of a factor
>> In addition: Warning message:
>> In model.response(mf, "numeric") :
>>    using type="numeric" with a factor response will be ignored
>>
>> I understand that there is something wrong due to the fact that some
>> of the
>> variables are factors. But as far as I've grasped, it should be
>> possible to
>> include factor variables when using lm(). Am I in error in thinking
>> this?
>>
>> Sincerely,
>> Gabriel Bergin
>> Undergraduate economics student
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
[hidden email]

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...