How to extract the splitting rules for the terminal nodes of ctree()
Asked Answered
Z

1

5

I have a data set with 6 categorical variables with levels ranging from 5 to 28. I have obtained an output from ctree() (party package) with 17 terminal nodes. I have followed the inputs by @Galled from ctree() - How to get the list of splitting conditions for each terminal node? to arrive at my desired output.

But, I'm getting the following error post running the code:

Error in data.frame(ResulTable, Means, Counts) : 
  arguments imply differing number of rows: 17, 2

I have tried adding this extra lines:

ResulTable <- rbind(ResulTable, cbind(Node = Node, Path = Path2))

ResulTable$Node <- rownames(ResulTable)

melt(ResulTable)

but no success so far. Any pointers on where it is going wrong?

Zielinski answered 2/5, 2015 at 7:27 Comment(0)
C
9

I would recommend to use the new partykit implementation of ctree() rather than the old party package, then you can use the function .list.rules.party(). This is not officially exported, yet, but can be leveraged to extract the desired information.

library("partykit")
airq <- subset(airquality, !is.na(Ozone))
ct <- ctree(Ozone ~ ., data = airq)
partykit:::.list.rules.party(ct)
##                                      3                                      5 
##             "Temp <= 82 & Wind <= 6.9" "Temp <= 82 & Wind > 6.9 & Temp <= 77" 
##                                      6                                      8 
##  "Temp <= 82 & Wind > 6.9 & Temp > 77"             "Temp > 82 & Wind <= 10.3" 
##                                      9 
##              "Temp > 82 & Wind > 10.3" 
Cand answered 2/5, 2015 at 8:17 Comment(4)
Thank you for your prompt reply. With the above code, I'm getting this error: Error in UseMethod("nodeids") : no applicable method for 'nodeids' applied to an object of class "c('BinaryTree', 'BinaryTreePartition')"Zielinski
Then you have fitted your tree with party::ctree not with partykit::ctree. Make sure that you do not load both packages simultaneously. This wis bound to lead to confusion...Cand
Running ctree with partykit package (with the default control parameters) is taking an indefinite time as compared to running ctree with party package which was much faster. I have a dataset with 100K rows and 6 columns. I'm running R version 3.1.3 on a 32-bit 64 GB machine. Any inputs on this?Zielinski
The old party implementation could run into numerical problems when comparing p-values from datasets with hundreds of thousands of observations. The new partykit implementation uses log-p-values instead which is numerically more stable. For your data this appears to lead to differences in the splitting with partykit continuing longer. I would recommend to not use the default values only but restrict mincriterion, minbucket, or maxdepth to values that are better suited for your data.Cand

© 2022 - 2024 — McMap. All rights reserved.