Christopher,

thanks for you interest.

> I'm currently exploring a dataset with the help of conditional inference

> trees (still very much a beginner with this technique & log. reg.

> methods as a whole t.b.h.), since they explained more variation in my

> dataset than a binary logistic regression with /glm/. I started out with

> the /party /package, but after I while I ran into the 'updated'

> /partykit /package and tried this out, too.

If you want to use individual trees (as opposed to forests), then the

"partykit" package is recommended because it contains much improved

re-implementations of ctree() and mob() as well as the mob() convenience

interfaces lmtree() and glmtree(). For forests see below.

> Now, the strange thing is that both trees look quite different -

> actually even the very first split is different.

This might be due to several partitioning variables being associated with

tiny p-values in the root node. The re-implementation in partykit

internally computes with log-p-values and hence should be numerically more

stable. In the old implementation it could happen that from several highly

significant variables, always the first is chosen because the p-values

were essentially indistinguishable for the computer.

If you think that this is not the problem, then please contact the package

maintainer with a reproducible example.

Except for bug fixes like the one above, the trees grown by

partykit::ctree and party::ctree should be the same.

> So I did some research and came across the 'forest' concept. However, it

> seems that the /varImp /function does not yet work in the /partykit

> /implementation,

Correct. While the ctree() implementation in partykit is better than that

in party, the same is _not_ true for cforest(). The new partykit::cforest

is currently still a basic implementation which doesn't offer as many

features as the party::cforest implementation. More work is needed

especially for variable importance measures and different kinds of

predictions.

> which raises the question for me how I should evaluate the /partykit

> /forest - how can I find out whether the variables are important in the

> forest as in my /partykit /tree? Is there some way to do this or some

> other solution for this problem? I'd prefer to continue the /partykit

> /implementation of ctree, since it allows more settings for the final

> plot, which I'd need to get the final (large) plot into a readable form.

>

> Related to this project, I'd also like to give statistics for the overall

> model, e.g. overall significance, Nagelkerke's R², a C-value. After a

> 'regular' binary log. reg., I would use the lrm function to get these

> values, but I am unsure whether it would be correct to also apply this

> method to my tree data.

Overall significance is difficult because you have done model selection

when growing the tree. As for pseudo R-squared or information criteria

etc., it is relatively easy to compute these "by hand" based on the

observed and fitted responses. An example for this is provided at:

http://stackoverflow.com/questions/29524670/how-to-find-the-the-deviance-of-an-as-party-object-converted-from-rpart-tree-in/29693223#29693223______________________________________________

[hidden email] mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.