I'm visualizing a scatterplots with between 400K and 2.5M points. I expectected to need to downsample before visualizing but to see just how much I ran a pilot test with a 400k dataset in plotly express, and the plot popped up quickly, beautifully, and responsively.
In order to make the interractive figure I really need to use plotly.graph_objects, as I need multiple traces with different colorscales, so I made basically the same graph with graph_objects and it wasn't just slower, it crashed my computer.
I'd really like to downsample as little as possible and I'm surprised by the sheer performance difference between these two approaches so I guess that boils down to my question:
Why is there such a performance difference and is it possible to change layout/figure/whatever parameters in graph_objects so to close the gap?
Here is a snippet to show what I mean by basically the same graph:
graph_objects
fig = go.Figure()
fig.add_trace(go.Scatter(x = x_values, y = y_values, opacity = opacity, marker = {
'size': size,
'color': community,
'colorscale': colorscale
}))
express
pacmap_map = px.scatter(x = x_values, y = y_values, color_continuous_scale=colorscale, opacity = opacity, color = community)
pacmap_map.update_traces(marker = {
'size': size
})
I would have expected performance to either be identical or at least in the same ballpark, but express works like a dream and graph_objects crashes the jupyter kernel and whatever IDE it is running from, so a large difference.