Regression Tree Questions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Regression Tree Questions

GaryB
Hi All,

I'm a newbie and have two questions.  Please pardon me if they are very basic.


1.  I'm using a regression tree to predict the selling prices of 10 new records (homes).  The following code is resulting in an error message:  pred <- predict(model, newdata = outOfSample[, -6])

The error message is:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object,  :
factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, 2530, 2672, 3365


Does anybody know what is causing this?  I've pasted a snippet of my original dataset (Crankshaw) and my out-of-sample dataset below.  Below it appears all code which I entered leading up to that point.  The error message appears at the end of that code.


2.  How can I get the regression tree to display in a more "friendly" way?  Unfortunately I cannot paste a picture of it in this email, but it displays the values of individual records at each node instead of the decision rule logic (e.g., Age >= 28).  I'm using the command > fancyRpartPlot(model) to display the tree.


Thank you!
Gary

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Original Data (Crankshaw):

Sq. Feet Age Bedrm Bathrm Garage Sell Price ($)
1620 17 3 2 2 185500
1864 28 3 2 2 195250
1628 15 3 2 2 190750
1670 1 4 3 2 195750
1762 23 3 4 2 197250
1520 1 3 3 2 192900


Out-of-Sample Data:

NEW RECORDS:
Sq. Feet Age Bedrm Bathrm Garage Sell Price ($)
3365 8 4 4 3
1547 28 3 2 2
1375 36 2 1 1
1621 53 3 1 2
2530 23 4 3 2
1868 42 3 2 2
2211 23 3 2 2
1421 39 2 1 1
2672 3 4 2 3
2265 7 3 2 2


All Code Entered:

> Crankshaw <- read_excel("C:/Data/Excel/Crankshaw.xlsx")
> View(Crankshaw)
> outOfSample <- Crankshaw[305:nrow(Crankshaw), ]
> Crankshaw <- Crankshaw[1:300, ]
> install.packages("caret")
Installing package into ‘C:/Users/Jason/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/caret_6.0-78.zip'
Content type 'application/zip' length 5155836 bytes (4.9 MB)
downloaded 4.9 MB

package ‘caret’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages
> install.packages("rattle")
Installing package into ‘C:/Users/Jason/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/rattle_5.1.0.zip'
Content type 'application/zip' length 1287407 bytes (1.2 MB)
downloaded 1.2 MB

package ‘rattle’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages
> library(rpart)
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
Warning messages:
1: package ‘caret’ was built under R version 3.4.3
2: package ‘ggplot2’ was built under R version 3.4.3

> library(rattle)
> n <- nrow(Crankshaw)
> train <- sample(1:n, size = 0.5 * n, replace = FALSE)
> CrankshawTrain <- Crankshaw[train, ]
> temp <- (1:n)[-train]
> val <- sample(temp, size = (0.3 / 0.5) * length(temp), replace = FALSE)
> CrankshawVal <- Crankshaw[val, ]
> test <- (1:n)[-c(train, val)]
> CrankshawTest <- Crankshaw[test, ]
> model <- rpart(`Selling Price ($)` ~ ., method = "anova", data = CrankshawTrain)
> fancyRpartPlot(model)
> pred <- predict(model, newdata = outOfSample[, -6])
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object,  :
  factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, 2530, 2672, 3365


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Regression Tree Questions

José María Mateos-2
On Sat, Feb 24, 2018 at 01:16:27PM -0600, Gary Black wrote:

> Hi All,
>
> I'm a newbie and have two questions.  Please pardon me if they are very basic.
>
>
> 1.  I'm using a regression tree to predict the selling prices of 10 new records (homes).  The following code is resulting in an error message:  pred <- predict(model, newdata = outOfSample[, -6])
>
> The error message is:
>
> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object,  :
> factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, 2530, 2672, 3365
>

Seems to me that variable 'Sq. Feet' is being encoded as a factor
instead of having numerical values. When you train, the model sees a
series of values that understands as categorical, and when you try to
predict it is encountering some different categories and it doesn't know
what to do with them.

As that variable is most probably numeric, it should be read as such.
You can try converting it on both your train and test datasets.

Cheers,

JMM.

-- José María Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Regression Tree Questions

Bert Gunter-2
But note that converting it e.g. via as.numeric() would be disastrous:

> as.numeric(factor(c(3,5,7)))
[1] 1 2 3

The OP may need to do some homework with R tutorials to learn about basic R
data structures; or if he has already done this, he may need to be more
explicit about how the data were created/entered.

-- Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, Feb 24, 2018 at 11:21 AM, José María Mateos <[hidden email]>
wrote:

> On Sat, Feb 24, 2018 at 01:16:27PM -0600, Gary Black wrote:
> > Hi All,
> >
> > I'm a newbie and have two questions.  Please pardon me if they are very
> basic.
> >
> >
> > 1.  I'm using a regression tree to predict the selling prices of 10 new
> records (homes).  The following code is resulting in an error message:
> pred <- predict(model, newdata = outOfSample[, -6])
> >
> > The error message is:
> >
> > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev
> = attr(object,  :
> > factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265,
> 2530, 2672, 3365
> >
>
> Seems to me that variable 'Sq. Feet' is being encoded as a factor
> instead of having numerical values. When you train, the model sees a
> series of values that understands as categorical, and when you try to
> predict it is encountering some different categories and it doesn't know
> what to do with them.
>
> As that variable is most probably numeric, it should be read as such.
> You can try converting it on both your train and test datasets.
>
> Cheers,
>
> JMM.
>
> -- José María Mateos
> https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Regression Tree Questions

Jeff Newmiller
In reply to this post by GaryB
As Bert implies, you may be getting ahead of yourself. An 8 may be a number, or it may be the character 8, or it could be a factor, and you don't seem to know the difference yet (thus suggesting tutorials). If you go to the trouble of making a reproducible example [1][2][3] then you may find the problem yourself or we will be able to check things using the example that you would not think to try. The str function can be helpful to find problems like the above.

One surprisingly valuable step mentioned in the reprex references below is giving us the data for your example using the dput function. Another surprisingly useful technique is sending your question using plain text email format as the Posting Guide indicates (details of how to do that depends on your email client, which is off topic here).

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette)
--
Sent from my phone. Please excuse my brevity.

On February 24, 2018 11:16:27 AM PST, Gary Black <[hidden email]> wrote:

>Hi All,
>
>I'm a newbie and have two questions.  Please pardon me if they are very
>basic.
>
>
>1.  I'm using a regression tree to predict the selling prices of 10 new
>records (homes).  The following code is resulting in an error message:
>pred <- predict(model, newdata = outOfSample[, -6])
>
>The error message is:
>
>Error in model.frame.default(Terms, newdata, na.action = na.action,
>xlev = attr(object,  :
>factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211,
>2265, 2530, 2672, 3365
>
>
>Does anybody know what is causing this?  I've pasted a snippet of my
>original dataset (Crankshaw) and my out-of-sample dataset below.  Below
>it appears all code which I entered leading up to that point.  The
>error message appears at the end of that code.
>
>
>2.  How can I get the regression tree to display in a more "friendly"
>way?  Unfortunately I cannot paste a picture of it in this email, but
>it displays the values of individual records at each node instead of
>the decision rule logic (e.g., Age >= 28).  I'm using the command >
>fancyRpartPlot(model) to display the tree.
>
>
>Thank you!
>Gary
>
>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>Original Data (Crankshaw):
>
>Sq. Feet Age Bedrm Bathrm Garage Sell Price ($)
>1620 17 3 2 2 185500
>1864 28 3 2 2 195250
>1628 15 3 2 2 190750
>1670 1 4 3 2 195750
>1762 23 3 4 2 197250
>1520 1 3 3 2 192900
>
>
>Out-of-Sample Data:
>
>NEW RECORDS:
>Sq. Feet Age Bedrm Bathrm Garage Sell Price ($)
>3365 8 4 4 3
>1547 28 3 2 2
>1375 36 2 1 1
>1621 53 3 1 2
>2530 23 4 3 2
>1868 42 3 2 2
>2211 23 3 2 2
>1421 39 2 1 1
>2672 3 4 2 3
>2265 7 3 2 2
>
>
>All Code Entered:
>
>> Crankshaw <- read_excel("C:/Data/Excel/Crankshaw.xlsx")
>> View(Crankshaw)
>> outOfSample <- Crankshaw[305:nrow(Crankshaw), ]
>> Crankshaw <- Crankshaw[1:300, ]
>> install.packages("caret")
>Installing package into ‘C:/Users/Jason/Documents/R/win-library/3.4’
>(as ‘lib’ is unspecified)
>trying URL
>'https://cran.rstudio.com/bin/windows/contrib/3.4/caret_6.0-78.zip'
>Content type 'application/zip' length 5155836 bytes (4.9 MB)
>downloaded 4.9 MB
>
>package ‘caret’ successfully unpacked and MD5 sums checked
>
>The downloaded binary packages are in
> C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages
>> install.packages("rattle")
>Installing package into ‘C:/Users/Jason/Documents/R/win-library/3.4’
>(as ‘lib’ is unspecified)
>trying URL
>'https://cran.rstudio.com/bin/windows/contrib/3.4/rattle_5.1.0.zip'
>Content type 'application/zip' length 1287407 bytes (1.2 MB)
>downloaded 1.2 MB
>
>package ‘rattle’ successfully unpacked and MD5 sums checked
>
>The downloaded binary packages are in
> C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages
>> library(rpart)
>> library(caret)
>Loading required package: lattice
>Loading required package: ggplot2
>Warning messages:
>1: package ‘caret’ was built under R version 3.4.3
>2: package ‘ggplot2’ was built under R version 3.4.3
>> library(rattle)
>> n <- nrow(Crankshaw)
>> train <- sample(1:n, size = 0.5 * n, replace = FALSE)
>> CrankshawTrain <- Crankshaw[train, ]
>> temp <- (1:n)[-train]
>> val <- sample(temp, size = (0.3 / 0.5) * length(temp), replace =
>FALSE)
>> CrankshawVal <- Crankshaw[val, ]
>> test <- (1:n)[-c(train, val)]
>> CrankshawTest <- Crankshaw[test, ]
>> model <- rpart(`Selling Price ($)` ~ ., method = "anova", data =
>CrankshawTrain)
>> fancyRpartPlot(model)
>> pred <- predict(model, newdata = outOfSample[, -6])
>Error in model.frame.default(Terms, newdata, na.action = na.action,
>xlev = attr(object,  :
>factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211,
>2265, 2530, 2672, 3365
>
>
>---
>This email has been checked for viruses by Avast antivirus software.
>https://www.avast.com/antivirus
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.