How to parse a DOT file in Python

Asked 4/2, 2015 at 5:8 Answered 1/7, 2020 at 2:23

python parsing dot morphological-analysis transducer

I have a transducer saved in the form of a DOT file. I can see a graphical representation of the graphs using gvedit, but what if I want to convert the DOT file to an executable transducer, so that I can test the transducer and see what strings it accepts and what it doesn't.

In most of the tools I have seen in Openfst, Graphviz, and their Python extensions, DOT files are only used to create a graphical representation, but what if I want to parse the file to get an interactive program where I can test the strings against the transducer?

Are there any libraries out there that would do the task or should I just write it from scratch?

As I said, the DOT file is related to a transducer I have designed that simulates morphology of English. It is a huge file, but just to give you an idea of how it is like, I provide a sample. Let's say I want to create a transducer that would model the behavior of English with regards to Nouns and in terms of plurality. My lexicon consists of only three words (book, boy, girl). My transducer in this case would look something like this:

enter image description here

which is directly constructed from this DOT file:

digraph A {
rankdir = LR;
node [shape=circle,style=filled] 0
node [shape=circle,style=filled] 1
node [shape=circle,style=filled] 2
node [shape=circle,style=filled] 3
node [shape=circle,style=filled] 4
node [shape=circle,style=filled] 5
node [shape=circle,style=filled] 6
node [shape=circle,style=filled] 7
node [shape=circle,style=filled] 8
node [shape=circle,style=filled] 9
node [shape=doublecircle,style=filled] 10
0 -> 4 [label="g "];
0 -> 1 [label="b "];
1 -> 2 [label="o "];
2 -> 7 [label="y "];
2 -> 3 [label="o "];
3 -> 7 [label="k "];
4 -> 5 [label="i "];
5 -> 6 [label="r "];
6 -> 7 [label="l "];
7 -> 9 [label="<+N:s> "];
7 -> 8 [label="<+N:0> "];
8 -> 10 [label="<+Sg:0> "];
9 -> 10 [label="<+Pl:0> "];
}

Now testing this transducer against the words means that if you feed it with book+Pl it should spit back books and vice versa. I'd like to see how it is possible to turn the dot file into a format that would allow such analysis and testing.

Dario answered 4/2, 2015 at 5:8 Comment(3)

Any chance we could see the .dot file? – Colis 4/2, 2015 at 5:15

A DOT file represents a graph which consists of nodes and edges. I guess that nodes are input or output point, and edge between two nodes represents transportation. If you show the .dot file, you may be able to get more useful comment and/or answer. – Geof 4/2, 2015 at 5:26

I just updated and added a sample. – Dario 4/2, 2015 at 5:49

Install the graphviz library. Then try the following:

import graphviz
graphviz.Source.from_file('graph4.dot')

Lichter answered 3/4, 2016 at 17:31 Comment(3)

This is not really parsing the DOT file. – Bullion 17/3, 2017 at 17:49

You're right, it isn't parsing the file into a useful structure like the OP asked. However, it is enough to render the graph (in Spyder), which solved my problem! – Asynchronism 3/1, 2019 at 7:10

If I do that, technically I'm parsing the file with Python, now I can dump it in other formats. So the answer is valid, OP was not requesting to avoid using third party libraries. – Acrylonitrile 22/1, 2021 at 1:0

Use this to load a .dot file in python:

graph = pydot.graph_from_dot_file(apath)

# SHOW as an image
import tempfile, Image
fout = tempfile.NamedTemporaryFile(suffix=".png")
graph.write(fout.name,format="png")
Image.open(fout.name).show()

Chrissychrist answered 30/4, 2019 at 20:32 Comment(0)

You could start by loading the file using https://code.google.com/p/pydot/ . From there it should be relatively simply to write the code to traverse the in-memory graph according to an input string.

Beam answered 4/2, 2015 at 5:57 Comment(3)

Could you elaborate on that a bit more? I know about pydot and I know that you can load a dot file in there. The dot_parser in pydot converts the dot file into some internal class representation. But I am not sure how I can use that. Pydot is basically an interface for Graphviz afaik. – Dario 4/2, 2015 at 6:17

@schmutter: see here: https://mcmap.net/q/745922/-parsing-comments-in-dot-file-with-python - you can load the edges. If you want a more full-featured graph library, see code.google.com/p/python-graph which can also load Dot files, and has algorithms included. – Beam 4/2, 2015 at 6:22

I'm not able to use (the current version) of pydot; it says it requires pyparsing. I downloaded the latest version of pyparsing, but pydot tried to import something from pyparsing that doesn't exist. Grr >:( – Aftermath 8/2, 2016 at 22:56

Another path, and a simple way of finding cycles in a dot file:

import pygraphviz as pgv
import networkx as nx

gv = pgv.AGraph('my.dot', strict=False, directed=True)
G = nx.DiGraph(gv)

cycles = nx.simple_cycles(G)
for cycle in cycles:
    print(cycle)

Froebel answered 1/7, 2020 at 2:23 Comment(4)

Looks good, but cannot be installed at the moment. pip install pygraphviz fails as well as pip3 install pygraphviz. – Sculpture 22/9, 2020 at 10:29

@Sculpture - on what platform? I have it working on Mac & Linux, but have had issues on Windows configurations (using Anaconda). – Froebel 23/9, 2020 at 8:15

I'm on latest Debian Buster. – Sculpture 23/9, 2020 at 8:50

@Sculpture - from memory, you'll need to install both Graphviz, and something like libgraphviz-dev, to get the build prerequisites. If that doesn't work, please post the error you're seeing. – Froebel 24/9, 2020 at 3:58

Guillaume's answer is sufficient to render the graph in Spyder (3.3.2), which might solve some folks problems.

If you really need to manipulate the graph, as the OP needs to, it will be a bit complex. Part of the problem is that Graphviz is a graph rendering library, while you are trying to analyse the graph. What you are trying to do is similar to reverse engineering a Word or LateX document from a PDF file.

If you can assume the nice structure of the OP's example, then regular expressions work. An aphorism I like is that if you solve a problem with regular expressions, now you have two problems. Nonetheless, that might just be the most practical thing to do for these cases.

Here are expressions to capture:

your node information: r"node.*?=(\w+).*?\s(\d+)". The capture groups are the kind and the node label.
your edge information: r"(\d+).*?(\d+).*?\"(.+?)\s". The capture groups are source, sink, and the edge label.

To try them out easily see https://regex101.com/r/3UKKwV/1/ and https://regex101.com/r/Hgctkp/2/.

Asynchronism answered 3/1, 2019 at 7:6 Comment(1)

Well, no, it isn't exactly like trying to reverse engineer a PDF file. At least not into a Word or latex file. Here we want to construct an internal computer representation, a parse tree, from the file. This very operation is performed by the graphviz program before generating its output. – Veroniqueverras 16/1, 2019 at 13:29

I haven’t tried it yet with the sample above, but NetworkX has a read_dot function that might have been a good way to solve this by converting the file into a graph object with good abilities to then analyze and test the graph.

Azeria answered 6/4, 2020 at 17:34 Comment(0)

Recommended topics

Hot tags