(no subject)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

(no subject)

Arantzazu Blanco Bernardeau

Hello
I have a data array with soil variables (caperf), in which the variable "clay" is factor (as I see entering str(caperf)) . I need to do a regression model, so I need to have arcilla (=clay) as a numeric variable.  For that I have entered

as.numeric(as.character(arcilla))

and even entering
 'as.numeric(levels(arcilla))[arcilla]'the variable is resting as factor, and the linear model is not valid (for my purposes).
The decimal commas have been converted to decimal points, so I have no idea of what to do.
Thanks a lot


Arantzazu Blanco Bernardeau
Dpto de Química Agrícola, Geología y Edafología 
Universidad de Murcia-Campus de Espinardo





     
_________________________________________________________________
Diseñar aplicaciones tiene premio. ¡Si eres desarrollador no esperes más!

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

Steve Lianoglou-6
Hi,

Sorry, I'm not really getting what going on here ... perhaps having
more domain knowledge would help me make better sense of our question.

In particular:

On Tue, May 18, 2010 at 11:35 AM, Arantzazu Blanco Bernardeau
<[hidden email]> wrote:
>
> Hello
> I have a data array with soil variables (caperf), in which the variable "clay" is factor (as I see entering str(caperf)) . I need to do a regression model, so I need to have arcilla (=clay) as a numeric variable.  For that I have entered
>
> as.numeric(as.character(arcilla))
>
> and even entering
>  'as.numeric(levels(arcilla))[arcilla]'

The above code doesn't make sense to me ...

Perhaps cleaning up your question and providing some reproducible
example we can use to help show you the light (just describing what a
variable has isn't enough -- give us minimal code we can paste into R
that reproduces your problem).

Alternatively, depending no what your "levels" mean, you might want to
recode your data using "dummy variables" (I'm not sure if that's the
official term) .. this is what I mean:

http://dss.princeton.edu/online_help/analysis/dummy_variables.htm

In your example, let's say you have four levels for "clay" ... maybe
"soft", "hard", "smooth", "red"

Instead of only using 1 variable with values 1-4, you would recode
this into 4 variables with values 0,1

So, if one example has a value of "smooth" for clay. Instead of coding it like:
clay: 3

You would do:
soft: 0
hard: 0
smooth: 1
red : 0

-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

Arantzazu Blanco Bernardeau

Hello
Well, the problem is, that arcilla is the percentage of clay in the soil sample. So, for linear model, I need to work with that number or value. Now, R thinks that arcilla (arcilla means clay in spanish), is a factor, and gives me the value as a factor, so the output of the linear model is
Call:
lm(formula = formula, data = caperf)

Residuals:
       Min         1Q     Median         3Q        Max
-1.466e+01 -1.376e-15  1.780e-16  2.038e-15  1.279e+01

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)    1.68964    6.33889   0.267 0.790221   
arcilla0.9     1.90228    8.90888   0.214 0.831239   
arcilla10      1.26371    7.96734   0.159 0.874212   
arcilla10.3   15.70081    9.05141   1.735 0.085090 . 
arcilla10.4    7.27517    7.72806   0.941 0.348183   
arcilla10.45   7.03879    9.02600   0.780 0.436853   
arcilla10.5    2.41241    8.90827   0.271 0.786954   
arcilla10.65  15.44298    9.03879   1.709 0.089838 . 
arcilla10.7   19.35651    9.04675   2.140 0.034185 * 
arcilla10.9    3.55947    9.18501   0.388 0.698974   

[...]

arcilla9.9     6.31949    7.35724   0.859 0.391892   
arcilla#N/A   24.17959    8.87201   2.725 0.007274 **
limo           0.24920    0.04605   5.412 2.76e-07 ***
CO_gkg1        0.21015    0.03931   5.346 3.73e-07 ***
C03Ca          0.01711    0.02727   0.628 0.531337   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.249 on 135 degrees of freedom
  (50 observations deleted due to missingness)
Multiple R-squared: 0.9736,    Adjusted R-squared: 0.9014
F-statistic: 13.47 on 370 and 135 DF,  p-value: < 2.2e-16

So, in the desired linear model, arcilla should be just a line, with the valors of the linear model.
I hope you understand better more. If not, I could make an english version of the file to send, so you can try the commands.
Thanks a lot for your help!



Arantzazu Blanco Bernardeau
Dpto de Química Agrícola, Geología y Edafología
Universidad de Murcia-Campus de Espinardo









----------------------------------------

> Date: Tue, 18 May 2010 11:54:20 -0400
> Subject: Re: [R] (no subject)
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]
>
> Hi,
>
> Sorry, I'm not really getting what going on here ... perhaps having
> more domain knowledge would help me make better sense of our question.
>
> In particular:
>
> On Tue, May 18, 2010 at 11:35 AM, Arantzazu Blanco Bernardeau
>  wrote:
>>
>> Hello
>> I have a data array with soil variables (caperf), in which the variable "clay" is factor (as I see entering str(caperf)) . I need to do a regression model, so I need to have arcilla (=clay) as a numeric variable.  For that I have entered
>>
>> as.numeric(as.character(arcilla))
>>
>> and even entering
>>  'as.numeric(levels(arcilla))[arcilla]'
>
> The above code doesn't make sense to me ...
>
> Perhaps cleaning up your question and providing some reproducible
> example we can use to help show you the light (just describing what a
> variable has isn't enough -- give us minimal code we can paste into R
> that reproduces your problem).
>
> Alternatively, depending no what your "levels" mean, you might want to
> recode your data using "dummy variables" (I'm not sure if that's the
> official term) .. this is what I mean:
>
> http://dss.princeton.edu/online_help/analysis/dummy_variables.htm
>
> In your example, let's say you have four levels for "clay" ... maybe
> "soft", "hard", "smooth", "red"
>
> Instead of only using 1 variable with values 1-4, you would recode
> this into 4 variables with values 0,1
>
> So, if one example has a value of "smooth" for clay. Instead of coding it like:
> clay: 3
>
> You would do:
> soft: 0
> hard: 0
> smooth: 1
> red : 0
>
> -steve
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
     
_________________________________________________________________
Consejos para seducir ¿Puedes conocer gente nueva a través de Internet? ¡Regístrate ya!

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

Steve Lianoglou-6
In reply to this post by Steve Lianoglou-6
One last thing:

before you take my advice on how to recode your nominal/categorical
"clay" variable for your regression model, take some time to see how
other people talk about this and do some searching on phrases like
"regression model with nominal variables" (that's just the one I
used).

You'll find more (and better) advice on how to do it correctly.

-steve

On Tue, May 18, 2010 at 11:54 AM, Steve Lianoglou
<[hidden email]> wrote:

> Hi,
>
> Sorry, I'm not really getting what going on here ... perhaps having
> more domain knowledge would help me make better sense of our question.
>
> In particular:
>
> On Tue, May 18, 2010 at 11:35 AM, Arantzazu Blanco Bernardeau
> <[hidden email]> wrote:
>>
>> Hello
>> I have a data array with soil variables (caperf), in which the variable "clay" is factor (as I see entering str(caperf)) . I need to do a regression model, so I need to have arcilla (=clay) as a numeric variable.  For that I have entered
>>
>> as.numeric(as.character(arcilla))
>>
>> and even entering
>>  'as.numeric(levels(arcilla))[arcilla]'
>
> The above code doesn't make sense to me ...
>
> Perhaps cleaning up your question and providing some reproducible
> example we can use to help show you the light (just describing what a
> variable has isn't enough -- give us minimal code we can paste into R
> that reproduces your problem).
>
> Alternatively, depending no what your "levels" mean, you might want to
> recode your data using "dummy variables" (I'm not sure if that's the
> official term) .. this is what I mean:
>
> http://dss.princeton.edu/online_help/analysis/dummy_variables.htm
>
> In your example, let's say you have four levels for "clay" ... maybe
> "soft", "hard", "smooth", "red"
>
> Instead of only using 1 variable with values 1-4, you would recode
> this into 4 variables with values 0,1
>
> So, if one example has a value of "smooth" for clay. Instead of coding it like:
> clay: 3
>
> You would do:
> soft: 0
> hard: 0
> smooth: 1
> red : 0
>
> -steve
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

Peter Ehlers
In reply to this post by Arantzazu Blanco Bernardeau
Arantzazu,

Your problem is that the data were probably imported
from Excel where you had at least one cell containing "#N/A".
You need to replace those cases in your dataframe with NA.
Then you should be able to do as.numeric(as.character(arcilla)).

  -Peter Ehlers

On 2010-05-18 10:07, Arantzazu Blanco Bernardeau wrote:

>
> Hello
> Well, the problem is, that arcilla is the percentage of clay in the soil sample. So, for linear model, I need to work with that number or value. Now, R thinks that arcilla (arcilla means clay in spanish), is a factor, and gives me the value as a factor, so the output of the linear model is
> Call:
> lm(formula = formula, data = caperf)
>
> Residuals:
>         Min         1Q     Median         3Q        Max
> -1.466e+01 -1.376e-15  1.780e-16  2.038e-15  1.279e+01
>
> Coefficients:
>                Estimate Std. Error t value Pr(>|t|)
> (Intercept)    1.68964    6.33889   0.267 0.790221
> arcilla0.9     1.90228    8.90888   0.214 0.831239
> arcilla10      1.26371    7.96734   0.159 0.874212
> arcilla10.3   15.70081    9.05141   1.735 0.085090 .
> arcilla10.4    7.27517    7.72806   0.941 0.348183
> arcilla10.45   7.03879    9.02600   0.780 0.436853
> arcilla10.5    2.41241    8.90827   0.271 0.786954
> arcilla10.65  15.44298    9.03879   1.709 0.089838 .
> arcilla10.7   19.35651    9.04675   2.140 0.034185 *
> arcilla10.9    3.55947    9.18501   0.388 0.698974
>
> [...]
>
> arcilla9.9     6.31949    7.35724   0.859 0.391892

===> arcilla#N/A   24.17959    8.87201   2.725 0.007274 **

> limo           0.24920    0.04605   5.412 2.76e-07 ***
> CO_gkg1        0.21015    0.03931   5.346 3.73e-07 ***
> C03Ca          0.01711    0.02727   0.628 0.531337
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 6.249 on 135 degrees of freedom
>    (50 observations deleted due to missingness)
> Multiple R-squared: 0.9736,    Adjusted R-squared: 0.9014
> F-statistic: 13.47 on 370 and 135 DF,  p-value:<  2.2e-16
>
> So, in the desired linear model, arcilla should be just a line, with the valors of the linear model.
> I hope you understand better more. If not, I could make an english version of the file to send, so you can try the commands.
> Thanks a lot for your help!
>
>
>
> Arantzazu Blanco Bernardeau
> Dpto de Química Agrícola, Geología y Edafología
> Universidad de Murcia-Campus de Espinardo
>
>
[snip]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

David Winsemius
In reply to this post by Arantzazu Blanco Bernardeau

On May 18, 2010, at 12:07 PM, Arantzazu Blanco Bernardeau wrote:

>
> Hello
> Well, the problem is, that arcilla is the percentage of clay in the  
> soil sample. So, for linear model, I need to work with that number  
> or value. Now, R thinks that arcilla (arcilla means clay in  
> spanish), is a factor, and gives me the value as a factor, so the  
> output of the linear model is
> Call:
> lm(formula = formula, data = caperf)

Would help if you also displayed the value for "formula", so we might  
have an idea what you are calling your "y"-variable  .... and it would  
be wise not to continue to name your formulas "formula."

require(fortunes)
fortune("dog")

What happens when you create a new variable in caperf with the numeric  
equivalant of the arcilla levels?

caperf$claynum <- as.numeric(as.character(arcilla))

lm(y ~ claynum + limo + CO_gkg1 + C03Ca  , data=caperf)

--
David.


>
> Residuals:
>        Min         1Q     Median         3Q        Max
> -1.466e+01 -1.376e-15  1.780e-16  2.038e-15  1.279e+01
>
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)
> (Intercept)    1.68964    6.33889   0.267 0.790221
> arcilla0.9     1.90228    8.90888   0.214 0.831239
> arcilla10      1.26371    7.96734   0.159 0.874212
> arcilla10.3   15.70081    9.05141   1.735 0.085090 .
> arcilla10.4    7.27517    7.72806   0.941 0.348183
> arcilla10.45   7.03879    9.02600   0.780 0.436853
> arcilla10.5    2.41241    8.90827   0.271 0.786954
> arcilla10.65  15.44298    9.03879   1.709 0.089838 .
> arcilla10.7   19.35651    9.04675   2.140 0.034185 *
> arcilla10.9    3.55947    9.18501   0.388 0.698974
>
> [...]
>
> arcilla9.9     6.31949    7.35724   0.859 0.391892
> arcilla#N/A   24.17959    8.87201   2.725 0.007274 **
> limo           0.24920    0.04605   5.412 2.76e-07 ***
> CO_gkg1        0.21015    0.03931   5.346 3.73e-07 ***
> C03Ca          0.01711    0.02727   0.628 0.531337
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 6.249 on 135 degrees of freedom
>   (50 observations deleted due to missingness)
> Multiple R-squared: 0.9736,    Adjusted R-squared: 0.9014
> F-statistic: 13.47 on 370 and 135 DF,  p-value: < 2.2e-16
>
> So, in the desired linear model, arcilla should be just a line, with  
> the valors of the linear model.
> I hope you understand better more. If not, I could make an english  
> version of the file to send, so you can try the commands.
> Thanks a lot for your help!
>
>
>
> Arantzazu Blanco Bernardeau
> Dpto de Química Agrícola, Geología y Edafología
> Universidad de Murcia-Campus de Espinardo
>
>
>
>
>
>
>
>
>
> ----------------------------------------
>> Date: Tue, 18 May 2010 11:54:20 -0400
>> Subject: Re: [R] (no subject)
>> From: [hidden email]
>> To: [hidden email]
>> CC: [hidden email]
>>
>> Hi,
>>
>> Sorry, I'm not really getting what going on here ... perhaps having
>> more domain knowledge would help me make better sense of our  
>> question.
>>
>> In particular:
>>
>> On Tue, May 18, 2010 at 11:35 AM, Arantzazu Blanco Bernardeau
>> wrote:
>>>
>>> Hello
>>> I have a data array with soil variables (caperf), in which the  
>>> variable "clay" is factor (as I see entering str(caperf)) . I need  
>>> to do a regression model, so I need to have arcilla (=clay) as a  
>>> numeric variable.  For that I have entered
>>>
>>> as.numeric(as.character(arcilla))
>>>
>>> and even entering
>>> 'as.numeric(levels(arcilla))[arcilla]'
>>
>> The above code doesn't make sense to me ...
>>
>> Perhaps cleaning up your question and providing some reproducible
>> example we can use to help show you the light (just describing what a
>> variable has isn't enough -- give us minimal code we can paste into R
>> that reproduces your problem).
>>
>> Alternatively, depending no what your "levels" mean, you might want  
>> to
>> recode your data using "dummy variables" (I'm not sure if that's the
>> official term) .. this is what I mean:
>>
>> http://dss.princeton.edu/online_help/analysis/dummy_variables.htm
>>
>> In your example, let's say you have four levels for "clay" ... maybe
>> "soft", "hard", "smooth", "red"
>>
>> Instead of only using 1 variable with values 1-4, you would recode
>> this into 4 variables with values 0,1
>>
>> So, if one example has a value of "smooth" for clay. Instead of  
>> coding it like:
>> clay: 3
>>
>> You would do:
>> soft: 0
>> hard: 0
>> smooth: 1
>> red : 0
>>
>> -steve
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>  
> _________________________________________________________________
> Consejos para seducir ¿Puedes conocer gente nueva a través de  
> Internet? ¡Regístrate ya!
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.