Random Forest Reading N/A's, I don't see them

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Random Forest Reading N/A's, I don't see them

Lost in R
After checking the original data in Excel for blanks and running Summary(cm3) to identify any null values in my data, I'm unable to identify an instances. Yet when I attempted to use the data in Random Forest, I get the following error. Is there something that Random Forest is reading as null which is not actually null? Is there a better way to check for this?

> library(randomForest)
> system.time(
+ rf1 <- randomForest(as.matrix(cm3[,c(2:length(colnames(cm3)))]),
+ cm3[,1],data=cm3,ntree=50)
+ )
Error in randomForest.default(as.matrix(cm3[, c(2:length(colnames(cm3)))]),  :
  NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion
Timing stopped at: 1.33 0.01 1.35



Thanks in advance,
Mike
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

Michael Weylandt
Use str() on your object and attach the result. For even faster help, use dput() on a *small* sample of your data to make the problem reproducible.

My guess is that there are characters or, less likely, factors lurking about...

Michael

On Dec 15, 2011, at 2:39 PM, Lost in R <[hidden email]> wrote:

> After checking the original data in Excel for blanks and running Summary(cm3)
> to identify any null values in my data, I'm unable to identify an instances.
> Yet when I attempted to use the data in Random Forest, I get the following
> error. Is there something that Random Forest is reading as null which is not
> actually null? Is there a better way to check for this?
>
>> library(randomForest)
>> system.time(
> + rf1 <- randomForest(as.matrix(cm3[,c(2:length(colnames(cm3)))]),
> + cm3[,1],data=cm3,ntree=50)
> + )
> *Error in randomForest.default(as.matrix(cm3[, c(2:length(colnames(cm3)))]),
> :
>  NA/NaN/Inf in foreign function call (arg 1)
> In addition: Warning message:
> In storage.mode(x) <- "double" : NAs introduced by coercion
> Timing stopped at: 1.33 0.01 1.35 *
>
>
> Thanks in advance,
> Mike
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-them-tp4201546p4201546.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

Lost in R
Thanks Michael -  That was a help, i got rid of the "," in my numbers and the "%" which were making many of the numeric variables FACTORS. It appears that I made all of the those revisions, but still getting the same error. Attached is the str() output if anyone could shed some light it would be much appreciated.



Thanks,
Mike

Str%28%29.docx
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

Lost in R
I've also attached here a sample of my data in Excel. I'm thinking it must be a problem with a character, but can't figure it out. Is there a list somewhere of characters to avoid in R?

Thanks,
Mike

Sample_Data_Set.csv
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

jholtman
What exactly is your problem with this file?  The file that you sent
had 10 lines of what appeared to be data and 4489 lines with just
commas which would read in as NAs.  When you do an 'str' you get:

> str(x)
'data.frame':   4498 obs. of  195 variables:
 $ Good_Bad                   : Factor w/ 3 levels "","BAD","GOOD": 3
3 3 3 2 2 2 3 3 1 ...
 $ Good1Bad0                  : int  1 1 1 1 0 0 0 1 1 NA ...
 $ PercUltColl                : num  1 1 1 0.98 0.09 0.01 0.19 1 1 NA ...
 $ GoodMerchant.              : int  1 1 1 1 0 0 0 1 1 NA ...
 $ Fundid

so there are 4498 lines of data in the file, but you probably only
what the first 10.  Is this what your problem is?

On Fri, Dec 16, 2011 at 12:20 PM, Lost in R
<[hidden email]> wrote:

> I've also attached here a sample of my data in Excel. I'm thinking it must be
> a problem with a character, but can't figure it out. Is there a list
> somewhere of characters to avoid in R?
>
> Thanks,
> Mike
>
> http://r.789695.n4.nabble.com/file/n4205479/Sample_Data_Set.csv
> Sample_Data_Set.csv
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-them-tp4201546p4205479.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

David Winsemius
In reply to this post by Lost in R

On Dec 16, 2011, at 12:20 PM, Lost in R wrote:

> I've also attached here a sample of my data in Excel. I'm thinking it

It? What is "it"?

> must be
> a problem with a character, but can't figure it out. Is there a list
> somewhere of characters to avoid in R?
>
> Thanks,
> Mike
>
> http://r.789695.n4.nabble.com/file/n4205479/Sample_Data_Set.csv
> Sample_Data_Set.csv
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-them-tp4201546p4205479.html
> Sent from the R help mailing list archive at Nabble.com.

We are not looking at this with Nabble. This is a mailing list. You  
are asked to attach context. That is something that can be done easily  
done in Nabble, so your failure to do so is seen by most viewers of  
this list as one of:

cause <-  c("privileged attitude", "clueless about mailing lists",  
"persistent failure to read Posting Guide")

>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

Lost in R
In reply to this post by jholtman
The data set I attached was just those 10 lines. It was only meant to show any possible obvious mistake I may have made. The real set has the 4498 line of data.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

David Winsemius
In reply to this post by Lost in R

On Dec 15, 2011, at 2:39 PM, Lost in R wrote:

> After checking the original data in Excel for blanks and running  
> Summary(cm3)
> to identify any null values in my data, I'm unable to identify an  
> instances.
> Yet when I attempted to use the data in Random Forest, I get the  
> following
> error. Is there something that Random Forest is reading as null  
> which is not
> actually null? Is there a better way to check for this?
>
>> library(randomForest)
>> system.time(
> + rf1 <- randomForest(as.matrix(

# Are you aware of the effect of using as.matrix(..) on the storage  
mode?

> cm3[,c(2:length(colnames(cm3)))]),

# that was the x argument

> + cm3[,1],

# The y variable

> data=cm3,

# That's odd. You already offered the data objects.  I wonder what the  
function will do with that?


> ntree=50)
> + )
> *Error in randomForest.default(as.matrix(cm3[,  
> c(2:length(colnames(cm3)))]),
> :
>  NA/NaN/Inf in foreign function call (arg 1)
> In addition: Warning message:
> In storage.mode(x) <- "double" : NAs introduced by coercion

I can see two potential sources of such an error.

> Timing stopped at: 1.33 0.01 1.35 *
>
>
> Thanks in advance,
> Mike
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-them-tp4201546p4201546.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

William Dunlap
In reply to this post by Lost in R
Try randomForest with a small dataset to see how it works:
  > d <- data.frame(stringsAsFactors=FALSE,
  +                 Num=(1:10)%%9,
  +                 Fac=factor(rep(LETTERS[1:2],each=5)),
  +                 Char=rep(letters[24:26],len=10))
  > randomForest(x=d[,"Char",drop=FALSE], y=d$Num)
  Error in randomForest.default(x = d[, "Char", drop = FALSE], y = d$Num) :
    NA/NaN/Inf in foreign function call (arg 1)
  In addition: Warning message:
  In data.matrix(x) : NAs introduced by coercion
  > randomForest(x=d[,"Fac",drop=FALSE], y=d$Num)

  Call:
   randomForest(x = d[, "Fac", drop = FALSE], y = d$Num)
                 Type of random forest: regression
                       Number of trees: 500
  No. of variables tried at each split: 1

            Mean of squared residuals: 9.573558
                      % Var explained: -40.58

It appears to die if any predictors are character vectors:
it will not convert them to factors (as most modelling functions
do).

as.matrix(data.frame) creates a character matrix if not all columns
are numeric or logical, so I suspect you are running into the
no-character-data limitation.  Try leaving off the as.matrix and
pass in the data.frame that it expects:
   randomForest(x=cm3[,-1,drop=FALSE], y=cm3[,1])
(The is no need or use for the data= argument if you use the x=,y=
interface.  It is only there for the formula interface.)

If you dislike the no-character-data limitation discuss it with
the person at the address given by maintainer("randomForest").

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Lost in R
> Sent: Friday, December 16, 2011 2:55 PM
> To: [hidden email]
> Subject: Re: [R] Random Forest Reading N/A's, I don't see them
>
> The data set I attached was just those 10 lines. It was only meant to show
> any possible obvious mistake I may have made. The real set has the 4498 line
> of data.
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-
> them-tp4201546p4206630.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest Reading N/A's, I don't see them

Lost in R
Bill thanks so much. I left of the as.matrix and it worked! I really appreciate the help.