Plotting log-binned network degree distributions
Asked Answered
S

1

13

I have often encountered and made long-tailed degree distributions/histograms from complex networks like the figures below. They make the heavy end of these tails, well, very heavy and crowded from many observations:

Classic long-tailed degree distribution

However, many publications I read have much cleaner degree distributions that don't have this clumpiness at the end of the distribution and the observations are more evenly-spaced.

!Classic long-tailed degree distribution

How do you make a chart like this using NetworkX and matplotlib?

Schizo answered 10/5, 2013 at 19:38 Comment(4)
What exactly is the question here? It looks like you've already achieved the result you are looking for. You'll need to be more specific than "make it better".Despumate
There's no question, just sharing how I solved a problem and opening it up to others' feedback if I've missed something in my approach.Schizo
The better way to do this, otherwise it will get closed, is to break this up into a question and answer it yourself. See blog.stackoverflow.com/2011/07/…Despumate
In this case you'll get feedback in the comments to the answer where they belong, as it stands now this question should be closed - but fix it since you've posted a lot of good information!Despumate
S
16

Use log binning (see also). Here is code to take a Counter object representing a histogram of degree values and log-bin the distribution to produce a sparser and smoother distribution.

import numpy as np
def drop_zeros(a_list):
    return [i for i in a_list if i>0]

def log_binning(counter_dict,bin_count=35):

    max_x = log10(max(counter_dict.keys()))
    max_y = log10(max(counter_dict.values()))
    max_base = max([max_x,max_y])

    min_x = log10(min(drop_zeros(counter_dict.keys())))

    bins = np.logspace(min_x,max_base,num=bin_count)

    # Based off of: https://mcmap.net/q/158510/-binning-data-in-python-with-scipy-numpy
    bin_means_y = (np.histogram(counter_dict.keys(),bins,weights=counter_dict.values())[0] / np.histogram(counter_dict.keys(),bins)[0])
    bin_means_x = (np.histogram(counter_dict.keys(),bins,weights=counter_dict.keys())[0] / np.histogram(counter_dict.keys(),bins)[0])

    return bin_means_x,bin_means_y

Generating a classic scale-free network in NetworkX and then plotting this:

import networkx as nx
ba_g = nx.barabasi_albert_graph(10000,2)
ba_c = nx.degree_centrality(ba_g)
# To convert normalized degrees to raw degrees
#ba_c = {k:int(v*(len(ba_g)-1)) for k,v in ba_c.iteritems()}
ba_c2 = dict(Counter(ba_c.values()))

ba_x,ba_y = log_binning(ba_c2,50)

plt.xscale('log')
plt.yscale('log')
plt.scatter(ba_x,ba_y,c='r',marker='s',s=50)
plt.scatter(ba_c2.keys(),ba_c2.values(),c='b',marker='x')
plt.xlim((1e-4,1e-1))
plt.ylim((.9,1e4))
plt.xlabel('Connections (normalized)')
plt.ylabel('Frequency')
plt.show()

Produces the following plot showing the overlap between the "raw" distribution in blue and the "binned" distribution in red.

Comparison between raw and log-binned

Thoughts on how to improve this approach or feedback if I've missed something obvious are welcome.

Schizo answered 10/5, 2013 at 20:54 Comment(4)
for noobs, what are the x-y labels here?Aldrich
The x-y labels are: x axis -> the log of the degrees encountered in the network; y axis -> the log of the frequency of those degrees.Fia
Note - in many places counter_dict.keys() should be replaced by list(counter_dict.keys()) for newer versions of python (for which dict.keys() is not a list)Standstill
Does it ever make sense to plot a degree distribution type plot like you have shown on data that isn't network data? That is, to use this plot over a histogram? Say on skewed blood pressure data or counts of something? thanksCapone

© 2022 - 2024 — McMap. All rights reserved.