Hi,
I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1 VVS2 500 8 1 0 0 1000 5.2 0 0 1 864 3 0 1 0 340 2.6 0 0 1 90 0.5 1 0 0 450 2.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? Any helps is greatly appreciated. Matthew ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi
[hidden email] napsal dne 16.12.2009 15:58:56: > Hi, > I am trying to create a set of dummy variables to use within a multiple linear > regression and am unable to find the codes within the manuals. > > For example i have: > Price Weight Clarity > IF VVS1 VVS2 > 500 8 1 0 0 > 1000 5.2 0 0 1 > 864 3 0 1 0 > 340 2.6 0 0 1 > 90 0.5 1 0 0 > 450 2.3 0 1 0 > > Where price is dependent upon weight (single value in each observation) > clarity (split into three levels, IF, VVS1, VVS2). > I am having trouble telling the program that clarity is a set of 3 dummy > variables and keep getting error messages, what is the correct way? Well, try to bribe it. Or ask what please it to break its resistance. Seriously. What is a structure of your data in R. ?str what commands did you use for regression I suppose lm(Price~Weight+IF+VVS1+VVS2, data=your.data) shall not complain if your.data is a data frame. Regards Petr > > Any helps is greatly appreciated. > Matthew > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by whitaker m. (mw1006)
On 12/16/2009 03:58 PM, whitaker m. (mw1006) wrote:
> Hi, > I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. > > For example i have: > Price Weight Clarity > IF VVS1 VVS2 > 500 8 1 0 0 > 1000 5.2 0 0 1 > 864 3 0 1 0 > 340 2.6 0 0 1 > 90 0.5 1 0 0 > 450 2.3 0 1 0 > > Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). > I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? > Without an example of your code, it's a bit difficult. But it might be easier to use one variable "clarity" with three possible values (IF, VVS1, VVS2), defined as a factor. lm(Price ~ Weight + Clarity) should then do the trick (unless you explicitly want to use a different dummy coding than the default) Stephan ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by whitaker m. (mw1006)
On Wed, 16 Dec 2009, whitaker m. (mw1006) wrote:
> Hi, > I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. > > For example i have: > Price Weight Clarity > IF VVS1 VVS2 > 500 8 1 0 0 > 1000 5.2 0 0 1 > 864 3 0 1 0 > 340 2.6 0 0 1 > 90 0.5 1 0 0 > 450 2.3 0 1 0 > > Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). > I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? You should code the categorical variable "Clarity" as a "factor" so that R knows that this is a categorical variable and can deal with it appropriately in subsequent computations such as summary() or lm(). Thus, I would recommend to store your data as dat <- data.frame( Price = c(500, 1000, 864, 340, 90, 450), Weight = c(8, 5.2, 3, 2.6, 0.5, 2.3), Clarity = c("IF", "VVS1", "VVS2")[c(1, 3, 2, 3, 1, 2)]) which yields, e.g., R> summary(dat) Price Weight Clarity Min. : 90.0 Min. :0.500 IF :2 1st Qu.: 367.5 1st Qu.:2.375 VVS1:2 Median : 475.0 Median :2.800 VVS2:2 Mean : 540.7 Mean :3.600 3rd Qu.: 773.0 3rd Qu.:4.650 Max. :1000.0 Max. :8.000 and then you can also do R> lm(Price ~ Weight + Clarity, data = dat) Call: lm(formula = Price ~ Weight + Clarity, data = dat) Coefficients: (Intercept) Weight ClarityVVS1 ClarityVVS2 -45.05 80.01 490.02 403.00 or if you wish to choose a different coding R> lm(Price ~ 0 + Weight + Clarity, data = dat) Call: lm(formula = Price ~ 0 + Weight + Clarity, data = dat) Coefficients: Weight ClarityIF ClarityVVS1 ClarityVVS2 80.01 -45.05 444.97 357.95 Some further reading of introductory material on linear regression in R would be useful. Also look at ?lm, ?factor, ?model.matrix, ?contrasts etc. hth, Z > Any helps is greatly appreciated. > Matthew > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by PIKAL Petr
I don't think R will complain, if you use the approach below. However,
IF, VVS1 and VVS2 are linearly dependent. Better use the factor approach and define which factor should be the contrast Nikhil On 16 Dec 2009, at 10:12AM, Petr PIKAL wrote: > what commands did you use for regression > > I suppose > > lm(Price~Weight+IF+VVS1+VVS2, data=your.data) > > shall not complain if your.data is a data frame. > > Regards > Petr ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by whitaker m. (mw1006)
Is your variable Clarity a categorical with 4 levels? Thus, the need for
k-1 (3) dummies? Your error may be the result of creating k instead of k-1 dummies, but can't be sure from the example. In R, you don't have to (unless you really want to) explicitly create separate variables. You can use the internal contrast functions. See ?contr.treatment Which is dummy coding by default. You can specify which group is the reference group. Alternatively, if you prefer effects coding, you can see ?contr.sum There are others as well. Tom Fletcher -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of whitaker m. (mw1006) Sent: Wednesday, December 16, 2009 8:59 AM To: [hidden email] Subject: [R] Creating Dummy Variables in R Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1 VVS2 500 8 1 0 0 1000 5.2 0 0 1 864 3 0 1 0 340 2.6 0 0 1 90 0.5 1 0 0 450 2.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? Any helps is greatly appreciated. Matthew ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Nikhil Kaza-2
I have a much larger dataset than in my original email (attached - price dependent upon weight, Clarity (different levels IF-SI2), colour(levels D-L) and Cut (ideal-fair), and tried the regression command:
>diamond.lm<-lm(price~weight+IF+VVS1+VVS2+VS1+VS2+SI1+SI2+I1+I2+D+E+F+G+H+I+J+K+L+ideal+excellent+very.good+good+fair, data="Diamonds2.txt") Error in eval(predvars, data, env) : invalid 'envir' argument Which lead to the error message below the command. I have tried searching for this, and assumed this was down to having categrocial variables within the data, is this a correct assumption or am i doing something else wrong? Apologies if this is a bit of a basic question! Thanks again, Matthew ________________________________________ From: Nikhil Kaza [[hidden email]] Sent: Wednesday, December 16, 2009 4:14 PM To: Petr PIKAL Cc: whitaker m. (mw1006); [hidden email] Subject: Re: [R] Odp: Creating Dummy Variables in R I don't think R will complain, if you use the approach below. However, IF, VVS1 and VVS2 are linearly dependent. Better use the factor approach and define which factor should be the contrast Nikhil On 16 Dec 2009, at 10:12AM, Petr PIKAL wrote: > what commands did you use for regression > > I suppose > > lm(Price~Weight+IF+VVS1+VVS2, data=your.data) > > shall not complain if your.data is a data frame. > > Regards > Petr ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Diamonds2.txt (25K) Download Attachment |
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On > Behalf Of whitaker m. (mw1006) > Sent: Wednesday, December 16, 2009 2:14 PM > To: Nikhil Kaza; Petr PIKAL > Cc: [hidden email] > Subject: Re: [R] Odp: Creating Dummy Variables in R > > I have a much larger dataset than in my original email (attached - price dependent > upon weight, Clarity (different levels IF-SI2), colour(levels D-L) and Cut (ideal-fair), > and tried the regression command: > > >diamond.lm<- > lm(price~weight+IF+VVS1+VVS2+VS1+VS2+SI1+SI2+I1+I2+D+E+F+G+H+I+J+K > +L+ideal+excellent+very.good+good+fair, data="Diamonds2.txt") > > Error in eval(predvars, data, env) : invalid 'envir' argument > > Which lead to the error message below the command. I have tried searching for > this, and assumed this was down to having categrocial variables within the data, is > this a correct assumption or am i doing something else wrong? Apologies if this is a > bit of a basic question! > > Thanks again, > Matthew You need to read your data from Diamonds2.txt into a dataframe first before running the lm() function. What does your file Diamonds2.txt look like? Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Nordlund, Dan (DSHS/RDA) wrote:
>> -----Original Message----- >> From: [hidden email] [mailto:[hidden email]] On >> Behalf Of whitaker m. (mw1006) >> Sent: Wednesday, December 16, 2009 2:14 PM >> To: Nikhil Kaza; Petr PIKAL >> Cc: [hidden email] >> Subject: Re: [R] Odp: Creating Dummy Variables in R >> >> I have a much larger dataset than in my original email (attached - price dependent >> upon weight, Clarity (different levels IF-SI2), colour(levels D-L) and Cut (ideal-fair), >> and tried the regression command: >> >>> diamond.lm<- >> lm(price~weight+IF+VVS1+VVS2+VS1+VS2+SI1+SI2+I1+I2+D+E+F+G+H+I+J+K >> +L+ideal+excellent+very.good+good+fair, data="Diamonds2.txt") >> >> Error in eval(predvars, data, env) : invalid 'envir' argument >> >> Which lead to the error message below the command. I have tried searching for >> this, and assumed this was down to having categrocial variables within the data, is >> this a correct assumption or am i doing something else wrong? Apologies if this is a >> bit of a basic question! >> >> Thanks again, >> Matthew > > You need to read your data from Diamonds2.txt into a dataframe first before running the lm() function. What does your file Diamonds2.txt look like? And, to put it more bluntly, he needs to study some introductory R text rather more carefully to learn how the pieces fit together. (Although I can see that the error may be cryptic to a beginner, I am at a loss to explain what kind of leap of logic led him to believe that dummy variables had _anything_ to do with it. For Heaven's sake, he must be getting the same error if he leaves out the dummy variables!) -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([hidden email]) FAX: (+45) 35327907 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by whitaker m. (mw1006)
On 17/12/2009, at 11:14 AM, whitaker m. (mw1006) wrote: > I have a much larger dataset than in my original email (attached - > price dependent upon weight, Clarity (different levels IF-SI2), > colour(levels D-L) and Cut (ideal-fair), and tried the regression > command: > >> diamond.lm<-lm(price~weight+IF+VVS1+VVS2+VS1+VS2+SI1+SI2+I1+I2+D+E >> +F+G+H+I+J+K+L+ideal+excellent+very.good+good+fair, >> data="Diamonds2.txt") > > Error in eval(predvars, data, env) : invalid 'envir' argument > > Which lead to the error message below the command. I have tried > searching for this, and assumed this was down to having categrocial > variables within the data, is this a correct assumption or am i > doing something else wrong? Apologies if this is a bit of a basic > question! (a) You don't want the quote marks around the data argument. That is the source of the "invalid 'envir' argument" error. (b) You are not using the power of R. ***Don't*** create your own dummy variables; let lm() do it for you. Learn something about how R works, for crying out loud. Essentially you should be doing something like diamond.lm <- lm(price ~ weight + Clarity + colour + Cut, data = Diamond.txt) where price, weight, Clarity, colour, and Cut are columns of the data frame Diamond.txt. The columns price and weight should be numeric vectors; Clarity, colour, and Cut should be ***factors***. It is slightly worrying that you refer to ``Diamond.txt''. That ``.txt'' suffix would lead me to believe that ``Diamond.txt'' is a (text) file containing your data. If that is the case, this won't work. The ``data'' argument to lm() must be an ***R object***. You have to read the data file into an R object before trying to use the data in a call to lm(). Something like Diamond <- read.table("Diamond.txt") # Note that you ***do*** want to quote the file name. Then diamond.lm <- lm(price ~ weight + Clarity + colour + Cut, data = Diamond) should do what you want. The dummy variable encoding used will be determined by the (first) value of options()$contrasts, which by default i contr.treatment. Read up on factors and contrasts. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |