plotly: huge number of datapoints
Asked Answered
T

4

11

I am trying to plot something with a huge number of data points (2mm-3mm) using plotly.

When I run

py.iplot(fig, filename='test plot')

I get the following error:

Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points

If the visualization you're using aggregates points (e.g., box plot, histogram, etc.) you can disregard this warning.

So then I try to save it with this:

py.image.save_as(fig, 'my_plot.png')

But then I get this error:

PlotlyRequestError: Unknown Image Server Error

How do I do this properly? I don't care if it's a still image or an interactive display within my notebook.

Teeth answered 5/9, 2017 at 20:16 Comment(2)
What kind of plot are you generating? For scatter plots try using scattergl.Bottali
Right now, trying to do a density plot.Teeth
P
12

Plotly really seems to be very bad in this. I am just trying to create a boxplot with 5 Million points, which is no problem in the simple R function "boxplot", but plotly is calculating endlessly for this.

It should be a major issue to improve this. Not all data has to be saved (and shown) in the plotly object. This is the main problem I guess.

Potence answered 11/5, 2020 at 10:9 Comment(2)
Hi @PhillipPro! I'm trying to plot 100 million coordinates using plotly mapbox function density_mapbox. It will generate a heatmap. But due to large number of coordinates, Im not able to do so. Do you have any suggestions?Washcloth
@Lostman Did you try downsampling so that your dataset is still representative but less points are available? Try to get a representative subsample of your dataset. See also Petronellas answer above.Potence
S
8

one option would be down-sampling your data, not sure if you'd like that: https://github.com/devoxi/lttb-py

I also have problems with plotly in the browser with large datasets - if anyone has solutions, please write! Thank you!

Scaramouch answered 18/9, 2017 at 9:50 Comment(3)
There's an open issue to integrate LTTB support to plotly, through doesn't seem like it's a priority: github.com/plotly/plotly.js/issues/560Retrad
This is viable ONLY fo linear data, timeseries. If you have other kind of data, it might give a very erroneous result!Scaramouch
Solution to the problem: #71643295Washcloth
P
0

You can try the render_mode argument. Example:

import plotly.express as px
import pandas as pd
import numpy as np

N = int(1e6) # Number of points

df = pd.DataFrame(dict(x=np.random.randn(N),
                       y=np.random.randn(N)))

fig = px.scatter(df, x="x", y="y", render_mode='webgl')
fig.update_traces(marker_line=dict(width=1, color='DarkSlateGray'))
fig.show()

In my computer N=1e6 takes about 5 seconds until the plot is visible, and the "interactiveness" is still very good. With N=10e6 it takes about 1 minute and the plot is not responsive anymore (i.e. it is really slow to zoom, pan or anything).

Paramagnet answered 17/12, 2021 at 10:9 Comment(2)
In my case, it responds faster without using render_mode.Fecundate
It doesn't work for meDedie
S
0

Use the WebGL render mode. I had a chart with ~500k points, which is very slow in browser if I use SVG. By changing to WebGL, it works like a charm.

You can find some examples of how to use WebGL in plotly here:

https://plotly.com/python/webgl-vs-svg/

Suspender answered 29/2 at 11:33 Comment(1)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewAmalgamation

© 2022 - 2024 — McMap. All rights reserved.