Quantcast

How to deal with missing values when using Random Forrest

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

How to deal with missing values when using Random Forrest

kevin123
I am using the package Random Forrest to test and train a model,
I aim to predict (LengthOfStay.days),:

> library(randomForest)
> model <- randomForest( LengthOfStay.days~.,data = training,  
+ importance=TRUE,
+ keep.forest=TRUE
+ )
 

This is a small portion of the data frame:  

data(training)

LengthOfStay.days CharlsonIndex.numeric DSFS.months
1                  0                   0.0         8.5
6                  0                   0.0         3.5
7                  0                   0.0         0.5
8                  0                   0.0         0.5
9                  0                   0.0         1.5
11                 0                   1.5         NaN



Error message

Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0, 0,  :
  missing values in object,

I would greatly appreciate any help

Thanks

Kevin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to deal with missing values when using Random Forrest

David Winsemius

On Feb 25, 2012, at 6:24 PM, kevin123 wrote:

> I am using the package Random Forrest to test and train a model,
> I aim to predict (LengthOfStay.days),:
>
>> library(randomForest)
>> model <- randomForest( LengthOfStay.days~.,data = training,
> + importance=TRUE,
> + keep.forest=TRUE
> + )
>
>
> *This is a small portion of the data frame:   *
>
> *data(training)*
>
> LengthOfStay.days CharlsonIndex.numeric DSFS.months
> 1                  0                   0.0         8.5
> 6                  0                   0.0         3.5
> 7                  0                   0.0         0.5
> 8                  0                   0.0         0.5
> 9                  0                   0.0         1.5
> 11                 0                   1.5         NaN
>
> *Error message*
>
> Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0,  
> 0,  :
>  missing values in object,

What part of that error message is unclear? Have you looked at the  
randomForest page? It tells you what the default behavior is na.fail.

>
> I would greatly appreciate any help


I would seem that the way forward is to remove the cases with missing  
values or to impute values.

--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to deal with missing values when using Random Forrest

Weidong Gu-2
In reply to this post by kevin123
Hi,

You can set na.action=na.roughfix which fills NAs with the mean or
mode of the missing variable.

Other option is to impute missing values using rfImpute, then run
randomForest on the complete data set.

Weidong Gu

On Sat, Feb 25, 2012 at 6:24 PM, kevin123 <[hidden email]> wrote:

> I am using the package Random Forrest to test and train a model,
> I aim to predict (LengthOfStay.days),:
>
>> library(randomForest)
>> model <- randomForest( LengthOfStay.days~.,data = training,
> + importance=TRUE,
> + keep.forest=TRUE
> + )
>
>
> *This is a small portion of the data frame:   *
>
> *data(training)*
>
> LengthOfStay.days CharlsonIndex.numeric DSFS.months
> 1                  0                   0.0         8.5
> 6                  0                   0.0         3.5
> 7                  0                   0.0         0.5
> 8                  0                   0.0         0.5
> 9                  0                   0.0         1.5
> 11                 0                   1.5         NaN
>
>
>
> *Error message*
>
> Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0, 0,  :
>  missing values in object,
>
> I would greatly appreciate any help
>
> Thanks
>
> Kevin
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-deal-with-missing-values-when-using-Random-Forrest-tp4421254p4421254.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to deal with missing values when using Random Forrest

kevin123
Hi,

Thanks for your help,

This worked very well:

na.action=na.roughfix

Kevin

On Sun, Feb 26, 2012 at 3:10 PM, Weidong Gu <[hidden email]> wrote:

> Hi,
>
> You can set na.action=na.roughfix which fills NAs with the mean or
> mode of the missing variable.
>
> Other option is to impute missing values using rfImpute, then run
> randomForest on the complete data set.
>
> Weidong Gu
>
> On Sat, Feb 25, 2012 at 6:24 PM, kevin123 <[hidden email]> wrote:
> > I am using the package Random Forrest to test and train a model,
> > I aim to predict (LengthOfStay.days),:
> >
> >> library(randomForest)
> >> model <- randomForest( LengthOfStay.days~.,data = training,
> > + importance=TRUE,
> > + keep.forest=TRUE
> > + )
> >
> >
> > *This is a small portion of the data frame:   *
> >
> > *data(training)*
> >
> > LengthOfStay.days CharlsonIndex.numeric DSFS.months
> > 1                  0                   0.0         8.5
> > 6                  0                   0.0         3.5
> > 7                  0                   0.0         0.5
> > 8                  0                   0.0         0.5
> > 9                  0                   0.0         1.5
> > 11                 0                   1.5         NaN
> >
> >
> >
> > *Error message*
> >
> > Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0, 0,  :
> >  missing values in object,
> >
> > I would greatly appreciate any help
> >
> > Thanks
> >
> > Kevin
> >
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/How-to-deal-with-missing-values-when-using-Random-Forrest-tp4421254p4421254.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...