Consider using Locality Sensitive Hashing, which is a general technique that can be applied to certain distance metrics including Hamming distance. Excerpt from Wikipedia:
LSH hashes input items so that similar items map to the same “buckets” with high probability (the number of buckets being much smaller than the universe of possible input items).
In short, you can use LSH to obtain buckets, brute-force Hamming distances within each bucket, and output the smallest distance found. To obtain the right answer with higher probability, you can tweak parameters of the LSH algorithm and/or run LSH multiple times (to get different allocations of items to buckets). I believe you can get arbitrarily close to the correct (optimal) answer with a failure rate exponentially-decreasing in runtime. (You might have to binary search over the LSH parameters, if your Hamming distances are all very close, but you'll still avoid computing n^2
Hamming distances.)
The algorithm and analysis are pretty involved, so I don't think I can write a complete summary here at the moment (it's about 2-3 hours worth of lecture material). I recommend taking a look at the lecture notes/slides here, here, and here; they all cover LSH (in varying degrees of detail) with some mention of Hamming distance.
d(a,c) ≤ d(a,b)+d(b,c)
, and that could be certainly used not to test every pair. – Dysarthrian
binary strings have(n-1)*n/2
Hamming distances, as the Hamming distance is defined between two strings. The efficient nearest neighbor search in such a Hamming space is not trivial. My Google search at least has not been fruitful so far. – Rattleheadm
) use a lookup-table to determine the distance between two strings. This leads to complexityO(n log n)
. – Rattlehead