I am often working with data that has a very 'long tail'. I want to plot histograms to summarize the distribution, but when I try to using pandas I wind up with a bar graph that has one giant visible bar and everything else invisible.
Here is an example of the series I am working with. Since it's very long, I used value_counts() so it will fit on this page.
In [10]: data.value_counts.sort_index()
Out[10]:
0 8012
25 3710
100 10794
200 11718
300 2489
500 7631
600 34
700 115
1000 3099
1200 1766
1600 63
2000 1538
2200 41
2500 208
2700 2138
5000 515
5500 201
8800 10
10000 10
10900 465
13000 9
16200 74
20000 518
21500 65
27000 64
53000 82
56000 1
106000 35
530000 3
I'm guessing that the answer involves binning the less common results into larger groups somehow (53000, 56000, 106000, and 53000 into one group of >50000, etc.), and also changing the y index to represent percentages of the occurrence rather than the absolute number. However, I don't understand how I would go about doing that automatically.