How change the point style in a vaex interactive Jupyter bqplot plot_widget to make individual points larger and visible?

I am evaluating vaex for an interactive outlier selection use case described at: Large plot: ~20 million samples, gigabytes of data

Basically, I have some individual points which are outliers, and I want to see them on a graph to manually select them and them examine them further.

The problem is that individual points become invisible if the rest of the dataset is too large.

How to make such individual points visible?

For example, if I generate a dataset with 1 billion points and one outlier on the center top:

import h5py
import numpy

size = 1000000000

with h5py.File('1b.hdf5', 'w') as f:
    x = numpy.arange(size + 1)
    x[size] =  size / 2
    f.create_dataset('x', data=x, dtype='int64')
    y = numpy.arange(size + 1) * 2
    y[size] =  3 * size / 2
    f.create_dataset('y', data=y, dtype='int64')
    z = numpy.arange(size + 1) * 4
    z[size] = -1
    f.create_dataset('z', data=z, dtype='int64')

and then display it on a Jupyter notebook with:

import vaex
df = vaex.open('1b.hdf5')
df.plot_widget(df.x, df.y, backend='bqplot')

I get this on Jupyter:

so I can't see the outlier which should be at the center top.

I can however select it since I know where it is, and it does show on selection=True methods. It is just not getting displayed.

There are some examples at: https://vaex.readthedocs.io/en/latest/tutorial.html#Smaller-datasets-/-scatter-plot which look pretty visible, but I tried adding the extra arguments c="red", alpha=0.5, s=4 to plot_widget and it did not work, presumably this backend does not support them.

Maybe there is a way to configure bqplot to change its plotting style?

Tested on vaex 2.0.2.

This could be a layout issue with the widgets. Looking at the top, I see it seem clipped. However, if you zoom out, you should be able to see it.

Assuming you did, you probably will not see much, since vaex' plot_widget does not plot symbols (otherwise it would not be able to show 1 billion points), it shows a heatmap.

If you want to see low-density regions, you may want to show in log (pass f='log', or open the drawer on the left and select it). If you show in log, the regions where it is empty will be -inf, which will be shown transparent. So you should be able to see the 'outlier pixels' more easily now, especially if you decrease the resolution (pass shape=128).

With those two options:

df.plot_widget(df.x, df.y, f='log', shape=128, backend='bqplot')

the output looks like this:

and the outlier point becomes clearly visible at the center top.

Recommended topics

Hot tags