You can also use the parDist
function of the parallelDist package, which is specifically built for parallelized distance matrix computations. Advantages are that the package is available on Mac OS, Windows and Linux and already supports 39 different distance measures (see parDist).
Performance comparison for manhattan distance (Sys spec: Mac OS; Intel Core i7 with 4 cores @ 2,5 GHz and hyperthreading enabled):
library(parallelDist)
library(amap)
library(wordspace)
library(microbenchmark)
set.seed(123)
x <- matrix(rnorm(2000 * 100), nrow = 2000, ncol = 100)
microbenchmark(parDist(x, method = "manhattan"),
Dist(x, method = "manhattan", nbproc = 8),
dist.matrix(x, method = "manhattan"),
times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
parDist(x, method = "manhattan") 210.9478 214.3557 225.5894 221.3705 237.9829 247.0844 10
Dist(x, method = "manhattan", nbproc = 8) 749.9397 755.7351 797.6349 812.6109 824.4075 844.1090 10
dist.matrix(x, method = "manhattan") 256.0831 263.3273 279.0864 275.1882 296.3256 311.3821 10
With a larger matrix:
x <- matrix(rnorm(10000 * 100), nrow = 10000, ncol = 100)
microbenchmark(parDist(x, method = "manhattan"),
+ Dist(x, method = "manhattan", nbproc = 8),
+ dist.matrix(x, method = "manhattan"),
+ times = 10)
Unit: seconds
expr min lq mean median uq max neval
parDist(x, method = "manhattan") 6.298234 6.388501 6.737168 6.894203 6.947981 7.221661 10
Dist(x, method = "manhattan", nbproc = 8) 22.722947 24.113681 24.326157 24.477034 24.658145 25.301353 10
dist.matrix(x, method = "manhattan") 7.156861 7.505229 7.544352 7.567980 7.655624 7.800530 10
Further performance comparisons can be found in parallelDist
's vignette.
featureVector
? – Adis