I have a query that works sufficiently, but I want to sort the results of this by using levenshtein between the query param and the field in question.
Right now I'm doing the query in ES and then I do the sorting in my application. Right now I'm testing the script field in sort. This is the script
import org.elasticsearch.common.logging.*;
ESLogger logger = ESLoggerFactory.getLogger('levenshtein_script');
def str1 = '%s'.split(' ').sort().join(' ');
def str2 = doc['%s'].values.join(' '); //Needed since the field is analyzed. This will change when I reindex the data.
def dist = new int[str1.size() + 1][str2.size() + 1]
(0..str1.size()).each { dist[it][0] = it }
(0..str2.size()).each { dist[0][it] = it }
(1..str1.size()).each { i ->
(1..str2.size()).each { j ->
dist[i][j] = [dist[i - 1][j] + 1, dist[i][j - 1] + 1, dist[i - 1][j - 1] + ((str1[i - 1] == str2[j - 1]) ? 0 : 1)].min()
}
}
def result = dist[str1.size()][str2.size()]
logger.info('Query param: ['+str1+'] | Term: ['+str2+'] | Result: ['+result+']');
return result;
Basically this is a template (check the %s) that I fill in my application like this
sortScript = String.format(EDIT_DISTANCE_GROOVY_FUNC, fullname, FULLNAME_FIELD_NAME);
The problem is this http://code972.com/blog/2015/03/84-elasticsearch-one-tip-a-day-avoid-costly-scripts-at-all-costs. Which is understandable.
My question is, how can I do what I need (sort the results by levenshtein) inside elasticsearch so I can avoid the overhead in my application. Can I use lucene expressions for this? Do you have an example? Is there some other way that I can accomplish this?
I'm using ElasticSearch 1.7.5 as a service. So native plugins should not be the first solution (I don't know even if it's possible, I'll have to check with my provider, but if it's the only viable solution I will do just that).
UPDATE
So it seems a good solution would be to save it in config/scripts
folder as it will be compiled once https://www.elastic.co/blog/running-groovy-scripts-without-dynamic-scripting. The script can be indexed instead of saving it https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html . This is much more convenient for my use case. Does this have the same behaviour regarding the compilation of the script? Will it be compiled only once?
rescore
query for sorting the topn
hits using a groovy script file. – Photobathicrescore
, but for my use case it is a bit of an overkill. – Character