Problem
Suppose you have N (~100k-1m) integers/bitstrings each K (e.g. 256) bits long. The algorithm should return the k pairs with the lowest pairwise Hamming distance.
Example
N = 4
K = 8
i1 = 00010011
i2 = 01010101
i3 = 11000000
i4 = 11000011
HammingDistance(i1,i2) = 3
HammingDistance(i1,i3) = 5
HammingDistance(i1,i4) = 3
HammingDistance(i2,i3) = 4
HammingDistance(i2,i4) = 4
HammingDistance(i3,i4) = 2
For k=1 it should return the pairlist {(i3,i4)}. For k=3 it should return {(i1,i2), (i1,i4), (i3,i4)}. And so on.
Algorithm
The naive implementation computes all pairwise distances, sorts the pairs and returns the k with the lowest distance: O(N^2). Are there any better data structures or algorithms? It looks like the ideas from Efficiently find binary strings with low Hamming distance in large set can not be used since there is no single query integer.