about randomForest

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

about randomForest

wanghong
hello,
I want to use randomForest to classify a matrix which is 331030×42,the last column is class signal.I use :
Memebers.rf<-randomForest(class~.,data=Memebers,proximity=TRUE,mtry=6,ntree=200) which told me" the error is matrix(0,n,n) set too elements"
then I use:
Memebers.rf<-randomForest(class~.,data=Memebers,importance=TRUE,proximity=TRUE) which told me"the error is na.fail.default(list(class = c(17L, 17L, 17L, 29L, 29L, 29L,  :
  missing values in object
"

what's wrong with it .Thanks a lot
 

        wanghong
        [hidden email]
          2008-12-26
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about randomForest

Uwe Ligges-3


wanghong wrote:
> hello,
> I want to use randomForest to classify a matrix which is 331030×42,the last column is class signal.I use :
> Memebers.rf<-randomForest(class~.,data=Memebers,proximity=TRUE,mtry=6,ntree=200) which told me" the error is matrix(0,n,n) set too elements"

I doubt "the error is matrix(0,n,n) set too elements" is really an error
message from randomForest.
I'd rather get "Error in matrix(0, n, n) : too many elements specified"
which tells us that randomForest cannot deal with such a huge
*data.frame* (rather than a matrix, I guess).

Finally, what do you think how much RAM will be required to store 200
trees grown with default setting on such a huge data.frame? I doubt it
will fit on your whole HDD (without having done any calculations), but
never in your RAM.

> then I use:
> Memebers.rf<-randomForest(class~.,data=Memebers,importance=TRUE,proximity=TRUE) which told me"the error is na.fail.default(list(class = c(17L, 17L, 17L, 29L, 29L, 29L,  :
>   missing values in object
> "

Missing values?


Uwe Ligges



> what's wrong with it .Thanks a lot
>  
>
>         wanghong
>         [hidden email]
>           2008-12-26
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about randomForest

Jim Porzak
In reply to this post by wanghong
Hi Wanghong,

Unless you have a huge linux box, you will need to sample down your 300k
rows to a few thousand.

In marketing aps, I often have data sets of comparable size.

I would suggest you start with a just a few k rows to make sure everything
else is working as you wish. Also, study carefully Andy's randomForest docs
- including the R News article a couple years ago.

In particular,

1) the formula interface is a memory hog. Andy suggests just using explicit
declaration. In you case, something like
      randomForest(Memebers[42], Memebers[-42], ...
2) proximity matirx is also memory & time intensive. Suggest proximity =
FALSE until, other things sorted out.

HTH,
Jim Porzak
TGN.com
San Francisco, CA
http://www.linkedin.com/in/jimporzak
useR Group SF: http://ia.meetup.com/67/


2008/12/26 wanghong <[hidden email]>

> hello,
> I want to use randomForest to classify a matrix which is 331030¡Á42,the last
> column is class signal.I use £º
> Memebers.rf<-randomForest(class~.,data=Memebers,proximity=TRUE,mtry=6,ntree=200)
> which told me" the error is matrix(0,n,n) set too elements"
> then I use:
> Memebers.rf<-randomForest(class~.,data=Memebers,importance=TRUE,proximity=TRUE)
> which told me"the error is na.fail.default(list(class = c(17L, 17L, 17L,
> 29L, 29L, 29L,  :
>  missing values in object
> "
>
> what's wrong with it .Thanks a lot
>
>
> wanghong
>  [hidden email]
> 2008-12-26
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: about randomForest

Liaw, Andy
Apologies for catching this so late.  Have been out for a few weeks and still trying to recover from that...

From: Jim Porzak

>
> Hi Wanghong,
>
> Unless you have a huge linux box, you will need to sample
> down your 300k
> rows to a few thousand.
>
> In marketing aps, I often have data sets of comparable size.
>
> I would suggest you start with a just a few k rows to make
> sure everything
> else is working as you wish. Also, study carefully Andy's
> randomForest docs
> - including the R News article a couple years ago.
>
> In particular,
>
> 1) the formula interface is a memory hog. Andy suggests just
> using explicit
> declaration. In you case, something like
>       randomForest(Memebers[42], Memebers[-42], ...

Actually that first argument probably should be Members[[42]].  I believe you get a data frame with one variable if you do Members[42].

Best,
Andy

> 2) proximity matirx is also memory & time intensive. Suggest
> proximity =
> FALSE until, other things sorted out.
>
> HTH,
> Jim Porzak
> TGN.com
> San Francisco, CA
> http://www.linkedin.com/in/jimporzak
> useR Group SF: http://ia.meetup.com/67/
>
>
> 2008/12/26 wanghong <[hidden email]>
>
> > hello,
> > I want to use randomForest to classify a matrix which is
> 331030¡Á42,the last
> > column is class signal.I use £º
> >
> Memebers.rf<-randomForest(class~.,data=Memebers,proximity=TRUE
> ,mtry=6,ntree=200)
> > which told me" the error is matrix(0,n,n) set too elements"
> > then I use:
> >
> Memebers.rf<-randomForest(class~.,data=Memebers,importance=TRU
E,proximity=TRUE)

> > which told me"the error is na.fail.default(list(class =
> c(17L, 17L, 17L,
> > 29L, 29L, 29L,  :
> >  missing values in object
> > "
> >
> > what's wrong with it .Thanks a lot
> >
> >
> > wanghong
> >  [hidden email]
> > 2008-12-26
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
>
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.