Using transparency/alpha, as suggested in this answer, can be very helpful for cases where crowding is a problem. However, if you have multiple data classes, you can still have issues with the later-plotted classes obscuring the earlier ones, especially if some classes have more data points than others.
This answer uses a heatmap to show density, which is great when you want to show the density for a single class, but not straightforward to adapt to the case where you have multiple overlapping classes and want all to be visible.
One approach I've sometimes found helpful in this situation is to randomize plot order, instead of plotting classes sequentially. This can be combined with transparency.
For instance, modifying the example given in tom10's answer:
import numpy as np
import matplotlib.pyplot as plt
N0 = 2000
x0 = np.random.normal(0,2,N0)
y0 = np.random.normal(0,0.2,N0) + 0.25/(x0**2+0.25)
plt.scatter(x0, y0, s=70, alpha=0.03,c="r")
N1 = 10000
mean = [0, 0]
cov = [[2, 2], [0, 2]]
x1,y1 = np.random.multivariate_normal(mean, cov, N1).T
plt.scatter(x1, y1, s=70, alpha=0.03,c="b")
plt.ylim((-5, 5))
plt.xlim((-5, 5))
plt.show()
results in:
Glancing at this plot, the red points appear to be distributed close to the line y = 0.
But if we randomize plot order:
x = np.concatenate((x0,x1))
y = np.concatenate((y0,y1))
cols = np.concatenate((np.tile("r",len(x0)),np.tile("b",len(x1))))
rng = np.random.default_rng()
neworder=rng.permutation(len(x))
x_shuffled = x[neworder]
y_shuffled = y[neworder]
cols_shuffled = cols[neworder]
plt.ylim((-5, 5))
plt.xlim((-5, 5))
plt.show()
we get this:
It's now much easier to see that near x = 0, the red points deviate significantly from the y=0 relationship that we'd have guessed when we could only see the edges of that distribution.
We could achieve similar results by binning the points (e.g. on a hex or square grid) and then setting colour and alpha for each bin according to the class distributions and number of data points for each bin. But I'm fond of the random-order approach because it's lower-tech and reduces the number of methods I need to remember.