Plot a directed graph in Python?
Asked Answered
D

3

7

I am trying to make a directed graph or Sankey diagram (any would work) for customer state migration. Data looks like below, count means the number of users migrating from the current state to next state.

**current_state         next_state          count**
New Profile              Initiated           37715
Profile Initiated          End               36411
JobRecommended             End                6202
New                        End                6171
ProfileCreated             JobRecommended     5799
Profile Initiated          ProfileCreated     4360
New                        NotOpted           3751
NotOpted                   Profile Initiated  2817
JobRecommended             InterestedInJob    2542
IntentDetected             ProfileCreated     2334
ProfileCreated             IntentDetected     1839
InterestedInJob            Applied            1671
JobRecommended             NotInterestedInJob 1477
NotInterestedInJob         ProfileCreated     1408
IntentDetected             End                1325
NotOpted                   End                1009
InterestedInJob            ProfileCreated     975
Applied                    IntentDetected     912
NotInterestedInJob         IntentDetected     720
Applied                    ProfileCreated     701
InterestedInJob            End                673

I have written a code that builds a sankey, but the plot is not easily readable. Looking for a readable directed graph. Here is my code:

    df = pd.read_csv('input.csv')

    x = list(set(df.current_state.values) | set(df.next_state))
    di = dict()

    count = 0
    for i in x:
        di[i] = count
        count += 1

    #
    df['source'] = df['current_state'].apply(lambda y : di[y])
    df['target'] = df['next_state'].apply(lambda y : di[y])


    #
    fig = go.Figure(data=[go.Sankey(
        node = dict(
          pad = 15,
          thickness = 20,
          line = dict(color = "black", width = 0.5),
          label = x,
          color = "blue"
        ),
        link = dict(
          source = df.source, 
          target = df.target,
          value = df['count']
      ))])


    #
    fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
        width=1000,
        height=1000,
        margin=go.layout.Margin(
            l=50,
            r=50,
            b=100,
            t=100,
            pad=4
        ))
    fig.show()
Darken answered 26/12, 2019 at 6:42 Comment(4)
So, what is the actual question? We can not help you to plot something we don't know about! You even didn't provide us any requirements...Neddy
a directed graph or a sankey diagram, that explains customer migration from a state to the otherDarken
Have you checked Plotly docs network graphs ?Finnegan
Check networkX package for expressing digraphs in Python. Here what you would use to draw networkx.github.io/documentation/stable/reference/drawing.htmlCobos
I
10

For directed graphs, graphviz would be my tool of choice instead of Python.

The following script txt2dot.py converts your data into an input file for graphviz:

text = '''New Profile              Initiated           37715
Profile Initiated          End               36411
JobRecommended             End                6202
New                        End                6171
ProfileCreated             JobRecommended     5799
Profile Initiated          ProfileCreated     4360
New                        NotOpted           3751
NotOpted                   Profile Initiated  2817
JobRecommended             InterestedInJob    2542
IntentDetected             ProfileCreated     2334
ProfileCreated             IntentDetected     1839
InterestedInJob            Applied            1671
JobRecommended             NotInterestedInJob 1477
NotInterestedInJob         ProfileCreated     1408
IntentDetected             End                1325
NotOpted                   End                1009
InterestedInJob            ProfileCreated     975
Applied                    IntentDetected     912
NotInterestedInJob         IntentDetected     720
Applied                    ProfileCreated     701
InterestedInJob            End                673'''

# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')

# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))

print('digraph foo {')
for n in nodes:
    print(f'    {n};')
print()
for item in edges:
    print('    ', item[0],  ' -> ', item[1],  ' [label="', item[2], '"];', sep='')
print('}')

Running python3 txt2dot.py > foo.dot results in:

digraph foo {
    Applied;
    End;
    IntentDetected;
    InterestedInJob;
    JobRecommended;
    NewProfile;
    NotInterestedInJob;
    NotOpted;
    ProfileCreated;
    ProfileInitiated;

    NewProfile -> ProfileInitiated [label="37715"];
    ProfileInitiated -> End [label="36411"];
    JobRecommended -> End [label="6202"];
    NewProfile -> End [label="6171"];
    ProfileCreated -> JobRecommended [label="5799"];
    ProfileInitiated -> ProfileCreated [label="4360"];
    NewProfile -> NotOpted [label="3751"];
    NotOpted -> ProfileInitiated [label="2817"];
    JobRecommended -> InterestedInJob [label="2542"];
    IntentDetected -> ProfileCreated [label="2334"];
    ProfileCreated -> IntentDetected [label="1839"];
    InterestedInJob -> Applied [label="1671"];
    JobRecommended -> NotInterestedInJob [label="1477"];
    NotInterestedInJob -> ProfileCreated [label="1408"];
    IntentDetected -> End [label="1325"];
    NotOpted -> End [label="1009"];
    InterestedInJob -> ProfileCreated [label="975"];
    Applied -> IntentDetected [label="912"];
    NotInterestedInJob -> IntentDetected [label="720"];
    Applied -> ProfileCreated [label="701"];
    InterestedInJob -> End [label="673"];
}

Running dot -o foo.png -Tpng foo.dot gives:

graphviz image

Indigestible answered 31/12, 2019 at 0:18 Comment(3)
Also there are Graphviz bidings for Python: pygraphviz.github.ioElective
@roland : this make a plot, but a messy one. Can I have an option to place the nodes?Darken
@Darken The different programs in graphviz do automatic layout. But you can certainly influence the placement. For example group nodes together in subgraphs, use constrained ranks or you can use a different model. E.g. circo places the nodes in a circle. Look at the documentation and gallery on the website.Indigestible
M
1

This creates a basic Sankey Diagram, assuming you:

  1. Save your data in a file called state_migration.csv
  2. Replace whitespaces in labels (state names) with dash/underscore/nothing
  3. Replace whitespaces between columns with commas
  4. Have plotly, numpy and matplotlib installed

2 and 3 are easily doable with any non-prehistoric text editor, or even python itself, if it's a lot of data. I strongly recommend you avoid working with whitespaces in unquoted values.

Result

import plotly.graph_objects as go
import numpy as np
import matplotlib

if __name__ == '__main__':

  with open('state_migration.csv', 'r') as finput:
    info = [[ _ for _ in _.strip().lower().split(',') ]
                for _ in finput.readlines()[1:]]
  info_t = [*map(list,zip(*info))] # info transposed

  # this exists to map the data to plotly's node indexing format
  index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}

  fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = list(index.keys()),
      color = np.random.choice( list(matplotlib.colors.cnames.values()),
                                size=len(index.keys()), replace=False )
    ),
    link = dict(
      source = [index[_] for _ in info_t[0]],
      target = [index[_] for _ in info_t[1]],
      value = info_t[2]
  ))])

fig.update_layout(title_text="State Migration", font_size=12)
fig.show()

You can drag the nodes around. See this if you want to predefine their positions or check other parameters.

The data I used was a cleaned version of your input:

currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673

I changed "New Profile" to the existing state "New", since the diagram was otherwise weird. Feel free to tweak as you need.

The libraries I used are absolutely not needed for what you want, I'm simply more familiar with them. For the directed graph, Roland Smith has you covered. It can also be done with Plotly, see their gallery

  • Alternatives to Plotly, in order of preference: matplotlib, seaborne, ggplot, raw dot/graphviz
  • matplotlib was only used here to supply a list with pre-defined hex colors
  • numpy was only used to pick a random value from a list without replacement (a color in this case)

Tested on Python 3.8.1

Morelli answered 5/1, 2020 at 18:25 Comment(0)
G
0

looks like condekind has the answer covered but ... As you are using pandas, then these previous answers should help with the practical side of getting the data organised and producing the diagram :

How to define the structure of a sankey diagram using a pandas dataframe?

Draw Sankey Diagram from dataframe

and alishobeiri has a number of useful examples and code you could use: https://plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/

Along with the plot.ly documentation which answers the specific question of node placement.

If the sankey diagram is messy remember you can also try vertical rather than horizontal orientation.

Gulick answered 6/1, 2020 at 23:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.