Matplotlib - Boxplot calculated on log10 values but shown in logarithmic scale
Asked Answered
C

2

13

I think this is a simple question, but I just still can't seem to think of a simple solution. I have a set of data of molecular abundances, with values ranging many orders of magnitude. I want to represent these abundances with boxplots (box-and-whiskers plots), and I want the boxes to be calculated on log scale because of the wide range of values. I know I can just calculate the log10 of the data and send it to matplotlib's boxplot, but this does not retain the logarithmic scale in plots later.

So my question is basically this: When I have calculated a boxplot based on the log10 of my values, how do I convert the plot afterward to be shown on a logarithmic scale instead of linear with the log10 values? I can change tick labels to partly fix this, but I have no clue how I get logarithmic scales back to the plot.

Or is there another more direct way to plotting this. A different package maybe that has this options already included?

Many thanks for the help.

Counterspy answered 5/1, 2016 at 9:51 Comment(6)
Why not convert your log10 calculated values back to normal values (10**y) and set the y-scale to be logarithmic?Orsola
Maybe I should clarify that I create the plot like this: bp = ax.boxplot(np.log10(abunds)). This command calculate the box values and creates the plot. I will need to change things in the plot, not the values, right?Counterspy
The way you're doing it, you are plotting different things. I still don't understand why you can't do bp = ax.boxplot(abunds); ax.set_yscale('log'). That will give you a log-scale, and thus the y-ticks properly correspond to your values.Orsola
Because the log-values are negative (values are 10^(-4) and lower), so I get an error with ax.set_yscale('log')Counterspy
Tobias your log values are negative, but your original abunds values should not be. Are you sure you did exactly what @Evert suggested?Inheritable
Sorry my mistake, I mistook and thought he suggested ax.boxplot(np.log10(abunds)). However, I don't think it will in this case calculate the box plots based on a logarithmic scale. There is too much spread in the plots and causing a lot of outliersCounterspy
H
11

I'd advice against doing the boxplot on the raw values and setting the y-axis to logarithmic, because the boxplot function is not designed to work across orders of magnitudes and you may get too many outliers (depends on your data, of course).

Instead, you can plot the logarithm of the data and manually adjust the y-labels.

Here is a very crude example:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter

np.random.seed(42)

values = 10 ** np.random.uniform(-3, 3, size=100)

fig = plt.figure(figsize=(9, 3))


ax = plt.subplot(1, 3, 1)

ax.boxplot(np.log10(values))
ax.set_yticks(np.arange(-3, 4))
ax.set_yticklabels(10.0**np.arange(-3, 4))
ax.set_title('log')

ax = plt.subplot(1, 3, 2)

ax.boxplot(values)
ax.set_yscale('log')
ax.set_title('raw')

ax = plt.subplot(1, 3, 3)

ax.boxplot(values, whis=[5, 95])
ax.set_yscale('log')
ax.set_title('5%')

plt.show()

results

The right figure shows the box plot on the raw values. This leads to many outliers, because the maximum whisker length is computed as a multiple (default: 1.5) of the interquartile range (the box height), which does not scale across orders of magnitude.

Alternatively, you could specify to draw the whiskers for a given percentile range: ax.boxplot(values, whis=[5, 95]) In this case you get a fixed amount of outlires (5%) above and below.

Hydrophobic answered 5/1, 2016 at 13:7 Comment(3)
Thank you for the nice example. Is there a way to add also minor ticks for the log plot as they are in the raw plot?Counterspy
I don't know, sorry. Maybe it's possible with matplotlib.ticker: matplotlib.org/examples/pylab_examples/major_minor_demo1.htmlHydrophobic
I could set minor ticks following a similar logic of the major ticks. For example, to set minor ticks at positions 1, 2, ..., 9, 20, 30, ..., 90, compute their log10 and set as minor ticks: minor_xticks = np.log10(np.concatenate((np.arange(1, 10), np.arange(1, 10) * 10)).astype(np.float)) ax.set_xticks(minor_xticks, minor=True)Beneficial
C
3

You can use plt.yscale:

plt.boxplot(data); plt.yscale('log')
Crossindex answered 5/3, 2021 at 20:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.