How do I jitter the node split strings in plotting ctree output from partykit?
Asked Answered
T

2

6

I have an issue where I am using mainly categorical data, set to a class of factor, in a classification tree. I am using the partykit package in R and not party as previous answers here suggested that the former package is better for manipulation of graphics output.

I do not have many nodes (about 7) in my real dataset, but I have quite a few factor levels for some variables and I am encountering the issue that the factor levels on the left side of the split and those from the right side are interfering with each other. Specifically, this occurs because of the horizontal orientation of the factor level lists in combination with the length of the factor levels.

I can reproduce the issue using the Aids2 dataset in the MASS package. This is a nonsense example, but it generates the behaviour I wish to solve

library("partykit")
SexTest <- ctree(sex ~ ., data=Aids2)
plot(SexTest)

If you look at the node split information for Node 1, you will see the behaviour I am describing:

In my real data frame, shrinking the font only works if I get it down to 4-point, which is unreadable.

Is there some way to define a text box for that string, and enable the text to wrap? I've looked through par and gpar trying to find a solution, but have been unsuccessful. Another option which would be suitable would be to stagger the vertical position of the factor information for each node, so that they are situated one under the other.

Tinytinya answered 16/5, 2013 at 7:38 Comment(0)
V
3

Hmmm. I've been there. Without modifying the internals of the partykit package, I don't know of a way to improve the output at that particular size (I frequently have issues with the X axis labels being too long on the bar chart output from plotting a tree with a polychotomous dependent variable).

It's an ugly workaround, but you can get the output from the tree to know which categories go where and then use something like GIMP to appropriately highlight the image for your powerpoint/report/whatever.

Model formula:
sex ~ state + diag + death + status + T.categ + age

Fitted party:
[1] root
|   [2] T.categ in hs, hsid, haem, other
|   |   [3] T.categ in hs, hsid, haem
|   |   |   [4] state in NSW, Other, VIC: M (n = 2386, err = 0.0%)
|   |   |   [5] state in QLD: M (n = 197, err = 0.5%)
|   |   [6] T.categ in other: M (n = 70, err = 10.0%)
|   [7] T.categ in id, het, blood, mother: M (n = 190, err = 42.6%)

Number of inner nodes:    3
Number of terminal nodes: 4

You could also adjust the size of the output to something bigger, say with png()

png('tmp.png',width=1024,height=768)
plot(SexTest)
dev.off()

larger resolution output from plot

Verbid answered 15/10, 2014 at 21:35 Comment(0)
T
1

An alternative that sort of works is to manually split the lists at the relevant points. You can do this by changing the names of the levels where you want a new line to include a "\n": "haem\n". This looks a bit ugly because the line then partially overlaps with the factor level, but it's the only real work around I have found so far.

Theaterintheround answered 15/11, 2016 at 17:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.