You can always implement custom binary search algorithm to find the closest value. Alternately, you can leverage standard implementation of libc bsearch(). You can use other binary search implementations as well, but it does not change the fact that you have to implement the comparing function carefully to find the closest element in array. The issue with standard binary search implementation is that it is meant for exact comparison. That means your improvised comparing function needs to do some kind of exactification to figure out if an element in array is close-enough. To achieve it, the comparing function needs to have awareness of other elements in the array, especially following aspects:
- position of the current element (one which is being compared with the
key).
- the distance with key and how it compares with neighbors (previous
or next element).
To provide this extra knowledge in comparing function, the key needs to be packaged with additional information (not just the key value). Once the comparing function have awareness on these aspects, it can figure out if the element itself is closest. When it knows that it is the closest, it returns "match".
The the following C code finds the closest value.
#include <stdio.h>
#include <stdlib.h>
struct key {
int key_val;
int *array_head;
int array_size;
};
int compar(const void *k, const void *e) {
struct key *key = (struct key*)k;
int *elem = (int*)e;
int *arr_first = key->array_head;
int *arr_last = key->array_head + key->array_size -1;
int kv = key->key_val;
int dist_left;
int dist_right;
if (kv == *elem) {
/* easy case: if both same, got to be closest */
return 0;
} else if (key->array_size == 1) {
/* easy case: only element got to be closest */
return 0;
} else if (elem == arr_first) {
/* element is the first in array */
if (kv < *elem) {
/* if keyval is less the first element then
* first elem is closest.
*/
return 0;
} else {
/* check distance between first and 2nd elem.
* if distance with first elem is smaller, it is closest.
*/
dist_left = kv - *elem;
dist_right = *(elem+1) - kv;
return (dist_left <= dist_right) ? 0:1;
}
} else if (elem == arr_last) {
/* element is the last in array */
if (kv > *elem) {
/* if keyval is larger than the last element then
* last elem is closest.
*/
return 0;
} else {
/* check distance between last and last-but-one.
* if distance with last elem is smaller, it is closest.
*/
dist_left = kv - *(elem-1);
dist_right = *elem - kv;
return (dist_right <= dist_left) ? 0:-1;
}
}
/* condition for remaining cases (other cases are handled already):
* - elem is neither first or last in the array
* - array has atleast three elements.
*/
if (kv < *elem) {
/* keyval is smaller than elem */
if (kv <= *(elem -1)) {
/* keyval is smaller than previous (of "elem") too.
* hence, elem cannot be closest.
*/
return -1;
} else {
/* check distance between elem and elem-prev.
* if distance with elem is smaller, it is closest.
*/
dist_left = kv - *(elem -1);
dist_right = *elem - kv;
return (dist_right <= dist_left) ? 0:-1;
}
}
/* remaining case: (keyval > *elem) */
if (kv >= *(elem+1)) {
/* keyval is larger than next (of "elem") too.
* hence, elem cannot be closest.
*/
return 1;
}
/* check distance between elem and elem-next.
* if distance with elem is smaller, it is closest.
*/
dist_right = *(elem+1) - kv;
dist_left = kv - *elem;
return (dist_left <= dist_right) ? 0:1;
}
int main(int argc, char **argv) {
int arr[] = {10, 20, 30, 40, 50, 60, 70};
int *found;
struct key k;
if (argc < 2) {
return 1;
}
k.key_val = atoi(argv[1]);
k.array_head = arr;
k.array_size = sizeof(arr)/sizeof(int);
found = (int*)bsearch(&k, arr, sizeof(arr)/sizeof(int), sizeof(int),
compar);
if(found) {
printf("found closest: %d\n", *found);
} else {
printf("closest not found. absurd! \n");
}
return 0;
}
Needless to say that bsearch() in above example should never fail (unless the array size is zero).
If you implement your own custom binary search, essentially you have to embed same comparing logic in the main body of binary search code (instead of having this logic in comparing function in above example).
fuzzyjoin
could be helpful in setting explicit criteria for and finding inexact matches according to the match that has the best score – Gora