Accessing terminal datasets in Ctree()

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Accessing terminal datasets in Ctree()

Preetam Pal
Hi guys,

If I am applying ctree() on a data (specifying some control parameters like
maxdepth), is there a way I can programmatically access the (smaller)
datasets corresponding to the terminal nodes in the tree? Say, if there are
7 terminal nodes, I need those 7 datasets (of course, I can look at the
respective node-splitting attributes and write out a filtering function -
but clearly too much to ask for if I have a large number of terminal
nodes). Intention is to perform regression on each of these terminal
datasets.

Regards,
Preetam

--
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year,                                             Room No. N-114
Statistics Division,                                           C.V.Raman
Hall
Indian Statistical Institute,                                 B.H.O.S.
Kolkata.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Accessing terminal datasets in Ctree()

Achim Zeileis-4
On Mon, 2 May 2016, Preetam Pal wrote:

> Hi guys,
>
> If I am applying ctree() on a data (specifying some control parameters like
> maxdepth), is there a way I can programmatically access the (smaller)
> datasets corresponding to the terminal nodes in the tree? Say, if there are
> 7 terminal nodes, I need those 7 datasets (of course, I can look at the
> respective node-splitting attributes and write out a filtering function -
> but clearly too much to ask for if I have a large number of terminal
> nodes). Intention is to perform regression on each of these terminal
> datasets.

If you use the "partykit" implementation you can do:

library("partykit")
ct <- ctree(Species ~ ., data = iris)
data_party(ct, id = 6)

to obtain the data associated with node 6 for example. You can also use
ct[6] to obtain the subtree and ct[6]$data for its associated data.

For setting up a factor with the terminal node IDs, you can also use
predict(ct, type = "node") and then use that in lm() etc.

Finally, note that there is also lmtree() and glmtree() for trees with
(generalized) linear models in their nodes.

> Regards,
> Preetam
>
> --
> Preetam Pal
> (+91)-9432212774
> M-Stat 2nd Year,                                             Room No. N-114
> Statistics Division,                                           C.V.Raman
> Hall
> Indian Statistical Institute,                                 B.H.O.S.
> Kolkata.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Accessing terminal datasets in Ctree()

Preetam Pal
Great, thank you so much Achim.
But one issue, in case I do not know how many terminal nodes would be
there, what do I do? Note that I do not need the datasets corresponding to
the intermediate nodes only need the terminal datasets.
Regards,
Preetam

On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis <[hidden email]>
wrote:

> On Mon, 2 May 2016, Preetam Pal wrote:
>
> Hi guys,
>>
>> If I am applying ctree() on a data (specifying some control parameters
>> like
>> maxdepth), is there a way I can programmatically access the (smaller)
>> datasets corresponding to the terminal nodes in the tree? Say, if there
>> are
>> 7 terminal nodes, I need those 7 datasets (of course, I can look at the
>> respective node-splitting attributes and write out a filtering function -
>> but clearly too much to ask for if I have a large number of terminal
>> nodes). Intention is to perform regression on each of these terminal
>> datasets.
>>
>
> If you use the "partykit" implementation you can do:
>
> library("partykit")
> ct <- ctree(Species ~ ., data = iris)
> data_party(ct, id = 6)
>
> to obtain the data associated with node 6 for example. You can also use
> ct[6] to obtain the subtree and ct[6]$data for its associated data.
>
> For setting up a factor with the terminal node IDs, you can also use
> predict(ct, type = "node") and then use that in lm() etc.
>
> Finally, note that there is also lmtree() and glmtree() for trees with
> (generalized) linear models in their nodes.
>
> Regards,
>> Preetam
>>
>> --
>> Preetam Pal
>> (+91)-9432212774
>> M-Stat 2nd Year,                                             Room No.
>> N-114
>> Statistics Division,                                           C.V.Raman
>> Hall
>> Indian Statistical Institute,                                 B.H.O.S.
>> Kolkata.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>


--
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year,                                             Room No. N-114
Statistics Division,                                           C.V.Raman
Hall
Indian Statistical Institute,                                 B.H.O.S.
Kolkata.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Accessing terminal datasets in Ctree()

Achim Zeileis-4
On Mon, 2 May 2016, Preetam Pal wrote:

> Great, thank you so much Achim.But one issue, in case I do not know how many
> terminal nodes would be there, what do I do? Note that I do not need the
> datasets corresponding to the intermediate nodes only need the terminal
> datasets.

With predict(ct, type = "node") you can set up a new variable, e.g.,

iris$node <- factor(predict(ct, type = "node"))

and then use this to obtain the subset corresponding to each of the
terminal nodes.

> Regards,
> Preetam 
>
> On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis <[hidden email]>
> wrote:
>       On Mon, 2 May 2016, Preetam Pal wrote:
>
>             Hi guys,
>
>             If I am applying ctree() on a data (specifying some
>             control parameters like
>             maxdepth), is there a way I can programmatically
>             access the (smaller)
>             datasets corresponding to the terminal nodes in the
>             tree? Say, if there are
>             7 terminal nodes, I need those 7 datasets (of
>             course, I can look at the
>             respective node-splitting attributes and write out a
>             filtering function -
>             but clearly too much to ask for if I have a large
>             number of terminal
>             nodes). Intention is to perform regression on each
>             of these terminal
>             datasets.
>
>
>       If you use the "partykit" implementation you can do:
>
>       library("partykit")
>       ct <- ctree(Species ~ ., data = iris)
>       data_party(ct, id = 6)
>
>       to obtain the data associated with node 6 for example. You can
>       also use ct[6] to obtain the subtree and ct[6]$data for its
>       associated data.
>
>       For setting up a factor with the terminal node IDs, you can also
>       use predict(ct, type = "node") and then use that in lm() etc.
>
>       Finally, note that there is also lmtree() and glmtree() for
>       trees with (generalized) linear models in their nodes.
>
>             Regards,
>             Preetam
>
>             --
>             Preetam Pal
>             (+91)-9432212774
>             M-Stat 2nd Year,                                   
>                      Room No. N-114
>             Statistics Division,                               
>                        C.V.Raman
>             Hall
>             Indian Statistical Institute,                       
>                      B.H.O.S.
>             Kolkata.
>
>                     [[alternative HTML version deleted]]
>
>             ______________________________________________
>             [hidden email] mailing list -- To UNSUBSCRIBE
>             and more, see
>             https://stat.ethz.ch/mailman/listinfo/r-help
>             PLEASE do read the posting guide
>             http://www.R-project.org/posting-guide.html
>             and provide commented, minimal, self-contained,
>             reproducible code.
>
>
>
>
> --
> Preetam Pal                                                  
> (+91)-9432212774
> M-Stat 2nd Year,                                             Room No. N-114
> Statistics Division,                                           C.V.Raman
> HallIndian Statistical Institute,                                 B.H.O.S.
> Kolkata.
>
>
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Accessing terminal datasets in Ctree()

Preetam Pal
Again, really appreciate your help on this. Thanks, Achim.
-Preetam

On Tue, May 3, 2016 at 3:22 AM, Achim Zeileis <[hidden email]>
wrote:

> On Mon, 2 May 2016, Preetam Pal wrote:
>
> Great, thank you so much Achim.But one issue, in case I do not know how
>> many
>> terminal nodes would be there, what do I do? Note that I do not need the
>> datasets corresponding to the intermediate nodes only need the terminal
>> datasets.
>>
>
> With predict(ct, type = "node") you can set up a new variable, e.g.,
>
> iris$node <- factor(predict(ct, type = "node"))
>
> and then use this to obtain the subset corresponding to each of the
> terminal nodes.
>
>
> Regards,
>> Preetam
>>
>> On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis <[hidden email]>
>> wrote:
>>       On Mon, 2 May 2016, Preetam Pal wrote:
>>
>>             Hi guys,
>>
>>             If I am applying ctree() on a data (specifying some
>>             control parameters like
>>             maxdepth), is there a way I can programmatically
>>             access the (smaller)
>>             datasets corresponding to the terminal nodes in the
>>             tree? Say, if there are
>>             7 terminal nodes, I need those 7 datasets (of
>>             course, I can look at the
>>             respective node-splitting attributes and write out a
>>             filtering function -
>>             but clearly too much to ask for if I have a large
>>             number of terminal
>>             nodes). Intention is to perform regression on each
>>             of these terminal
>>             datasets.
>>
>>
>>       If you use the "partykit" implementation you can do:
>>
>>       library("partykit")
>>       ct <- ctree(Species ~ ., data = iris)
>>       data_party(ct, id = 6)
>>
>>       to obtain the data associated with node 6 for example. You can
>>       also use ct[6] to obtain the subtree and ct[6]$data for its
>>       associated data.
>>
>>       For setting up a factor with the terminal node IDs, you can also
>>       use predict(ct, type = "node") and then use that in lm() etc.
>>
>>       Finally, note that there is also lmtree() and glmtree() for
>>       trees with (generalized) linear models in their nodes.
>>
>>             Regards,
>>             Preetam
>>
>>             --
>>             Preetam Pal
>>             (+91)-9432212774
>>             M-Stat 2nd Year,
>>                      Room No. N-114
>>             Statistics Division,
>>                        C.V.Raman
>>             Hall
>>             Indian Statistical Institute,
>>                      B.H.O.S.
>>             Kolkata.
>>
>>                     [[alternative HTML version deleted]]
>>
>>             ______________________________________________
>>             [hidden email] mailing list -- To UNSUBSCRIBE
>>             and more, see
>>             https://stat.ethz.ch/mailman/listinfo/r-help
>>             PLEASE do read the posting guide
>>             http://www.R-project.org/posting-guide.html
>>             and provide commented, minimal, self-contained,
>>             reproducible code.
>>
>>
>>
>>
>> --
>> Preetam Pal
>> (+91)-9432212774
>> M-Stat 2nd Year,                                             Room No.
>> N-114
>> Statistics Division,                                           C.V.Raman
>> HallIndian Statistical Institute,                                 B.H.O.S.
>> Kolkata.
>>
>>


--
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year,                                             Room No. N-114
Statistics Division,                                           C.V.Raman
Hall
Indian Statistical Institute,                                 B.H.O.S.
Kolkata.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.