You can use geomloss
or dcor
packages for the more general implementation of the Wasserstein and Energy Distances respectively. The geomloss
also provides a wide range of other distances such as hausdorff, energy, gaussian, and laplacian distances. It also uses different backends depending on the volume of the input data, by default, a tensor framework based on pytorch
is being used.
dcor
uses scipy.spatial.distance.pdist
and scipy.spatial.distance.cdist
primarily to calculate the eneryg distance.
Here's a few examples of 1D, 2D, and 3D distance calculation:
# create random 3D data for the test
import torch
torch.random.manual_seed(0)
X = torch.rand((3,100))
Y = torch.rand((3,100))
Energy Distance
# energy distance with geomloss
from geomloss import SamplesLoss
Loss = SamplesLoss("energy")
# 3D tensors
Loss( X, Y ).item()
>>> 0.0063512325286865234
# 2D tensors
Loss( X[:,0:2], Y[:,0:2] ).item()
>>> 0.005196928977966309
# 1D tensors
Loss( X[:,0,np.newaxis], Y[:,0,np.newaxis] ).item()
>>> 0.004647582769393921
# energy distance with dcor
import dcor
# 3D tensors
dcor.energy_distance(X,Y)/2
>>> 0.006351688367976838
# 2D tensors
dcor.energy_distance(X[:,0:2], Y[:,0:2])/2
>>> 0.005197086538981377
# 1D tensors
dcor.energy_distance(X[:,0,np.newaxis], Y[:,0,np.newaxis])/2
>>> 0.004647575813531868
As you might have noticed, I divided the energy distance by two. That's due to the fact that the geomloss
calculates energy distance divided by two and I wanted to compare the results between the two packages.
You can also look at my implementation of energy distance that is compatible with different input dimensions. It is written using Numba that parallelizes the computation and uses available hardware boosts and in principle should be possible to run it on GPU but I haven't tried. Albeit, it performs slower than dcor
implementation.
Wasserstein Distance
Calculating the Wasserstein distance is a bit evolved with more parameters. Sinkhorn distance is a regularized version of Wasserstein distance which is used by the package to approximate Wasserstein distance. It could also be seen as an interpolation between Wasserstein and energy distances, more info in this paper. In principle, for small values of blur near to zero, you would expect to get Wasserstein and for larger values, you get energy distance but for some reason (I think due to due some implementation issues and numerical/precision issues) after some large values, you get some negative value for the distance. Anyhow, if you are interested in Wasserstein distance here is an example:
Loss = SamplesLoss("sinkhorn", blur=0.05,)
Loss( X, Y ).item()
>>> 0.01524302177131176
Loss( X[:,0:2], Y[:,0:2] ).item()
>>> 0.005164701491594315
Loss( X[:,0,np.newaxis], Y[:,0,np.newaxis] ).item()
>>> 0.0018786040600389242
Other than the blur
, I recommend looking into other parameters of this method such as p
, scaling
, and debias
. Please note that the implementation of this method is a bit different with scipy.stats.wasserstein_distance
, and you may want to look into the definitions from the documentation or code before doing any comparison between the two for the 1D case!