plotting a histogram on a Log scale with Matplotlib
Asked Answered
F

5

51

I have a Pandas DataFrame that has the following values in a Series

x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7, 19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1]

I was instructed to plot two histograms in a Jupyter notebook with Python 3.6.

x.plot.hist(bins=8)
plt.show()

I chose 8 bins because that looked best to me. I have also been instructed to plot another histogram with the log of x.

x.plot.hist(bins=8)
plt.xscale('log')
plt.show()

This histogram looks TERRIBLE. Am I not doing something right? I've tried fiddling around with the plot, but everything I've tried just seems to make the histogram look even worse. Example:

x.plot(kind='hist', logx=True)

I was not given any instructions other than plot the log of X as a histogram.

For the record, I have imported pandas, numpy, and matplotlib and specified that the plot should be inline.

Freeliving answered 16/12, 2017 at 21:41 Comment(2)
What is "Terrible" about the histogram?Allies
The best way/workaround is just plt.hist(np.log(x)).Velvety
L
68

Specifying bins=8 in the hist call means that the range between the minimum and maximum value is divided equally into 8 bins. What is equal on a linear scale is distorted on a log scale.

What you could do is specify the bins of the histogram such that they are unequal in width in a way that would make them look equal on a logarithmic scale.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7, 
     19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1]
x = pd.Series(x)

# histogram on linear scale
plt.subplot(211)
hist, bins, _ = plt.hist(x, bins=8)

# histogram on log scale. 
# Use non-equal bin sizes, such that they look equal on log scale.
logbins = np.logspace(np.log10(bins[0]),np.log10(bins[-1]),len(bins))
plt.subplot(212)
plt.hist(x, bins=logbins)
plt.xscale('log')
plt.show()

enter image description here

Lemmie answered 16/12, 2017 at 23:11 Comment(1)
I'd use logbins = np.geomspace(x.min(), x.max(), 8) to save typing all those logs (and bins[0], bins[-1] are just min and max anyway).Gossett
A
36

Here is one more solution without using a subplot or plotting two things in the same image.

import numpy as np
import matplotlib.pyplot as plt

def plot_loghist(x, bins):
  hist, bins = np.histogram(x, bins=bins)
  logbins = np.logspace(np.log10(bins[0]),np.log10(bins[-1]),len(bins))
  plt.hist(x, bins=logbins)
  plt.xscale('log')

plot_loghist(np.random.rand(200), 10)

example hist plot

Amber answered 5/2, 2019 at 7:16 Comment(3)
You should test the code before posting it - it cannot compile because of no ":" after function declaration. And, after adding it, the code still doesn't work - it only crashes.Spectacle
thanks for pointing out. fixed the typo. The code works fine for me on python 3.5Amber
Works for me too, python 3.8. Thanks for useful contributionTavarez
G
14

plot another histogram with the log of x.

is not the same as plotting x on the logarithmic scale. Plotting the logarithm of x would be

np.log(x).plot.hist(bins=8)
plt.show()

hist

The difference is that the values of x themselves were transformed: we are looking at their logarithm.

This is different from plotting on the logarithmic scale, where we keep x the same but change the way the horizontal axis is marked up (which squeezes the bars to the right, and stretches those to the left).

Gossett answered 16/12, 2017 at 22:31 Comment(0)
E
3

Seaborn is also a good solution for histograms with a log scale, without having to manually specify the histogram bin edges, as you would with just matplotlib.

# standard imports...
import seaborn as sns

x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7, 19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1]
x = pd.Series(x)
plt.hist(x)
plt.xscale('log')
plt.gca().set(title='Matplotlib histogram, logarithmic x axis')
plt.show()
#x.plot(kind='hist', log=True)

sns.histplot(x, bins=8, log_scale=True)
plt.gca().set(title='Seaborn histogram, logarithmic x axis')
plt.show()
sns.histplot(x, bins=8, log_scale=True)
plt.gca().set(title='Seaborn histogram, logarithmic x axis, with scalar ticks')
plt.gca().xaxis.set_major_formatter(mpl.ticker.ScalarFormatter())
plt.gca().set_xticks([1, 10, 100, 150])
plt.show()

graph And another And another

Eph answered 9/7, 2023 at 15:53 Comment(0)
L
1

According to my experiment, the use of np.histogram might be unnecessary, since the two ends of x are exactly the minimum and the maximum, which needn't np.histogram to calculate:

import numpy as np
from matplotlib import pyplot as plt

def plot_loghist(x, bins):
    logbins = np.logspace(np.log10(np.min(x)),np.log10(np.max(x)),bins+1)
    plt.hist(x, bins=logbins)
    plt.xscale('log')


plot_loghist(np.random.rand(200), 10)
Lacto answered 5/7, 2023 at 10:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.