Use log binning (see also). Here is code to take a Counter
object representing a histogram of degree values and log-bin the distribution to produce a sparser and smoother distribution.
import numpy as np
def drop_zeros(a_list):
return [i for i in a_list if i>0]
def log_binning(counter_dict,bin_count=35):
max_x = log10(max(counter_dict.keys()))
max_y = log10(max(counter_dict.values()))
max_base = max([max_x,max_y])
min_x = log10(min(drop_zeros(counter_dict.keys())))
bins = np.logspace(min_x,max_base,num=bin_count)
# Based off of: https://mcmap.net/q/158510/-binning-data-in-python-with-scipy-numpy
bin_means_y = (np.histogram(counter_dict.keys(),bins,weights=counter_dict.values())[0] / np.histogram(counter_dict.keys(),bins)[0])
bin_means_x = (np.histogram(counter_dict.keys(),bins,weights=counter_dict.keys())[0] / np.histogram(counter_dict.keys(),bins)[0])
return bin_means_x,bin_means_y
Generating a classic scale-free network in NetworkX
and then plotting this:
import networkx as nx
ba_g = nx.barabasi_albert_graph(10000,2)
ba_c = nx.degree_centrality(ba_g)
# To convert normalized degrees to raw degrees
#ba_c = {k:int(v*(len(ba_g)-1)) for k,v in ba_c.iteritems()}
ba_c2 = dict(Counter(ba_c.values()))
ba_x,ba_y = log_binning(ba_c2,50)
plt.xscale('log')
plt.yscale('log')
plt.scatter(ba_x,ba_y,c='r',marker='s',s=50)
plt.scatter(ba_c2.keys(),ba_c2.values(),c='b',marker='x')
plt.xlim((1e-4,1e-1))
plt.ylim((.9,1e4))
plt.xlabel('Connections (normalized)')
plt.ylabel('Frequency')
plt.show()
Produces the following plot showing the overlap between the "raw" distribution in blue and the "binned" distribution in red.
Thoughts on how to improve this approach or feedback if I've missed something obvious are welcome.