I have a custom distance metric that I need to use for KNN
, K Nearest Neighbors
.
I tried following this, but I cannot get it to work for some reason.
I would assume that the distance metric is supposed to take two vectors/arrays of the same length, as I have written below:
import sklearn
from sklearn.neighbors import NearestNeighbors
import numpy as np
import pandas as pd
def d(a,b,L):
# Inputs: a and b are rows from a data matrix
return a+b+2+L
knn=NearestNeighbors(n_neighbors=1,
algorithm='auto',
metric='pyfunc',
func=lambda a,b: d(a,b,L)
)
X=pd.DataFrame({'b':[0,3,2],'c':[1.0,4.3,2.2]})
knn.fit(X)
However, when I call: knn.kneighbors()
, it doesn't seem to like the custom function. Here is the bottom of the error stack:
ValueError: Unknown metric pyfunc. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra', 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski'], or 'precomputed', or a callable
However, I see the exact same in the question I cited. Any ideas on how to make this work on sklearn version 0.14
? I'm not aware of any differences in the versions.
Thanks.