Quantcast

rpart weight prior

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

rpart weight prior

Aurélie Davranche
Hi!

Could you please explain the difference between "prior" and "weight" in
rpart? It seems to be the same. But in this case why including a weight
option in the latest versions? For an unbalanced sampling what is the
best to use : weight, prior or the both together?

Thanks a lot.

Aurélie Davranche.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: rpart weight prior

Prof Brian Ripley
On Sun, 8 Jul 2007, Aurélie Davranche wrote:

> Hi!
>
> Could you please explain the difference between "prior" and "weight" in
> rpart? It seems to be the same. But in this case why including a weight
> option in the latest versions? For an unbalanced sampling what is the best to
> use : weight, prior or the both together?

The 'weight' argument (sic) has been there for a decade, and is not the
same as the 'prior' param.

The help file (which you seem unfamiliar with) says

  weights: optional case weights.

    parms: optional parameters for the splitting function. Anova
           splitting has no parameters. Poisson splitting has a single
           parameter, the coefficient of variation of the prior
           distribution on the rates.  The default value is 1.
           Exponential splitting has the same parameter as Poisson. For
           classification splitting, the list can contain any of: the
           vector of prior probabilities (component 'prior'), the loss
           matrix (component 'loss') or the splitting index (component
           'split').  The priors must be positive and sum to 1.  The
           loss matrix must have zeros on the diagonal and positive
           off-diagonal elements.  The splitting index can be 'gini' or
           'information'.  The default priors are proportional to the
           data counts, the losses default to 1, and the split defaults
           to 'gini'.

The rpart technical report at

http://mayoresearch.mayo.edu/mayo/research/biostat/upload/61.pdf

may help you understand this.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...