We're storing a title
field in our index and want to use the field for two purposes:
- We're analyzing with an ngram filter so we can provide autocomplete and instant results
- We want to be able to list results using an ASC sort on the
title
field rather than score.
The index/filter/analyzer is defined like so:
array(
'number_of_shards' => $this->shards,
'number_of_replicas' => $this->replicas,
'analysis' => array(
'filter' => array(
'nGram_filter' => array(
'type' => 'nGram',
'min_gram' => 2,
'max_gram' => 20,
'token_chars' => array('letter','digit','punctuation','symbol')
)
),
'analyzer' => array(
'index_analyzer' => array(
'type' => 'custom',
'tokenizer' =>'whitespace',
'char_filter' => 'html_strip',
'filter' => array('lowercase','asciifolding','nGram_filter')
),
'search_analyzer' => array(
'type' => 'custom',
'tokenizer' =>'whitespace',
'char_filter' => 'html_strip',
'filter' => array('lowercase','asciifolding')
)
)
)
),
The problem we're experiencing is unpredictable results when we Sort on the title
field. After doing a little searching, we found this at the end of the sort
man page at ElasticSearch... (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_memory_considerations)
For string based types, the field sorted on should not be analyzed / tokenized.
How can we both analyze the field and sort on it later? Do we need to store the field twice with one using not_analyzed
in order to sort? Since the field _source
is also storing the title
value in it's original state, can that not be used to sort on?
The naive approach to indexing the same string in two ways would be to include two separate fields in the document
on a related page to the one you linked to ;) – Shauna