Python, PyDot and DecisionTree
Asked Answered
L

3

11

I'm trying to visualize my DecisionTree, but getting the error The code is:

X = [i[1:] for i in dataset]#attribute
y = [i[0] for i in dataset]
clf = tree.DecisionTreeClassifier()

dot_data = StringIO()
tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("tree.pdf")

And the error is

Traceback (most recent call last):
if data.startswith(codecs.BOM_UTF8):
TypeError: startswith first arg must be str or a tuple of str, not bytes

Can anyone explain me whats the problem? Thank you a lot!

Lightfingered answered 3/7, 2015 at 14:20 Comment(1)
Are you showing us all the code? I don't see the if statement that the traceback is pointing out. Other than that, obviously the method startswith() expects either a string as input "string" or a tuple of strings ("st", "st2", "st3"). You passed at the wrong data type into the startswith() method call. Either you're not using codecs.BOM_UTF8 correctly, or you have to cast it to a string --> str(codecs.BOM_UTF8)Shoestring
R
5

I had the same exact problem and just spent a couple hours trying to figure this out. I can't guarantee what I share here will work for others but it may be worth a shot.

  1. I tried installing official pydot packages but I have Python 3 and they simply did not work. After finding a note in a thread from one of the many websites I scoured through, I ended up installing this forked repository of pydot.
  2. I went to graphviz.org and installed their software on my Windows 7 machine. If you don't have Windows, look under their Download section for your system.
  3. After successful install, in Environment Variables (Control Panel\All Control Panel Items\System\Advanced system settings > click Environment Variables button > under System variables I found the variable path > click Edit... > I added ;C:\Program Files (x86)\Graphviz2.38\bin to the end in the Variable value: field.
  4. To confirm I can now use dot commands in the Command Line (Windows Command Processor), I typed dot -V which returned dot - graphviz version 2.38.0 (20140413.2041).

In the below code, keep in mind that I'm reading a dataframe from my clipboard. You might be reading it from file or whathaveyou.

In IPython Notebook:

import pandas as pd
import numpy as np
from sklearn import tree
import pydot
from IPython.display import Image
from sklearn.externals.six import StringIO

df = pd.read_clipboard()
X = df[df.columns[:-1]]
y = df[df.columns[-1]]

dtr = tree.DecisionTreeRegressor(max_depth=3)
dtr.fit(X, y)

dot_data = StringIO()  
tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns)  
graph = pydot.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png()) 

Decision Tree Visualization

Alternatively, if you're not using IPython, you can generate your own image from the command line as long as you have graphviz installed (step 2 above). Using my same example code above, you use this line after fitting the model:

tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns)

then open up command prompt where the treepic.dot file is and enter this command line:

dot -T png treepic.dot -o treepic.png

A .png file should be created with your decision tree.

Roadbed answered 11/12, 2015 at 8:51 Comment(0)
H
12

In case of using Python 3, just use pydotplus instead of pydot. It will also have a soft installation process by pip.

import pydotplus

<your code>

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
Heredes answered 6/4, 2016 at 16:29 Comment(2)
This is the best advise - thank you +1 I used it with Image(graph.create_png()) on Jupyter instead of writing it into a pdf and worked line a charmMatronage
You can also do dot_data = tree.export_graphviz(clf, out_file=None)Vulgus
R
5

I had the same exact problem and just spent a couple hours trying to figure this out. I can't guarantee what I share here will work for others but it may be worth a shot.

  1. I tried installing official pydot packages but I have Python 3 and they simply did not work. After finding a note in a thread from one of the many websites I scoured through, I ended up installing this forked repository of pydot.
  2. I went to graphviz.org and installed their software on my Windows 7 machine. If you don't have Windows, look under their Download section for your system.
  3. After successful install, in Environment Variables (Control Panel\All Control Panel Items\System\Advanced system settings > click Environment Variables button > under System variables I found the variable path > click Edit... > I added ;C:\Program Files (x86)\Graphviz2.38\bin to the end in the Variable value: field.
  4. To confirm I can now use dot commands in the Command Line (Windows Command Processor), I typed dot -V which returned dot - graphviz version 2.38.0 (20140413.2041).

In the below code, keep in mind that I'm reading a dataframe from my clipboard. You might be reading it from file or whathaveyou.

In IPython Notebook:

import pandas as pd
import numpy as np
from sklearn import tree
import pydot
from IPython.display import Image
from sklearn.externals.six import StringIO

df = pd.read_clipboard()
X = df[df.columns[:-1]]
y = df[df.columns[-1]]

dtr = tree.DecisionTreeRegressor(max_depth=3)
dtr.fit(X, y)

dot_data = StringIO()  
tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns)  
graph = pydot.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png()) 

Decision Tree Visualization

Alternatively, if you're not using IPython, you can generate your own image from the command line as long as you have graphviz installed (step 2 above). Using my same example code above, you use this line after fitting the model:

tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns)

then open up command prompt where the treepic.dot file is and enter this command line:

dot -T png treepic.dot -o treepic.png

A .png file should be created with your decision tree.

Roadbed answered 11/12, 2015 at 8:51 Comment(0)
N
0

The line in question is checking to see if the stream/file is encoded as UTF-8

Instead of:

if data.startswith(codecs.BOM_UTF8):

use:

if codecs.BOM_UTF8 in data:

You will likely have more success...

Nikolia answered 3/7, 2015 at 14:48 Comment(4)
One should note that those two lines aren't quite equivalent, if data needs to start with it then the second one might not work.Rubadub
He is looking for a unicode in a string method. Not likely to work. Although they may not be equivalent, the BOM is usually at the beginning of a file and not used anywhere else (unless you really mussed up your file) see en.wikipedia.org/wiki/Byte_order_markNikolia
I guess the problem is in my data file, does anybody knows how should it look like? I have a csv file, where the first string contains names of attributes in each column, and further strings contain numeric data. So my X and Y are the numeric data from a file, i've got them making "skiprows=1" when opening my fileLightfingered
@Lightfingered without seeing the file, we are all going to be guessing. You need to provide more details if you want more constructive answers. My answer above will likely deal with your original issue but within a certain context.Nikolia

© 2022 - 2024 — McMap. All rights reserved.