I want to be able autocomplete names.
For example, if we have the name John Smith
, I want to be able to search for Jo
and Sm
and John Sm
to get the document back.
In addition, I do not want jo sm
matching the document.
I currently have this analyzer:
return array(
'settings' => array(
'index' => array(
'analysis' => array(
'analyzer' => array(
'autocomplete' => array(
'tokenizer' => 'autocompleteEngram',
'filter' => array('lowercase', 'whitespace')
)
),
'tokenizer' => array(
'autocompleteEngram' => array(
'type' => 'edgeNGram',
'min_gram' => 1,
'max_gram' => 50
)
)
)
)
)
);
The problem with this is that first we split the text up and then tokenize using edgengrams.
This results in this:
j
jo
joh
john
s
sm
smi
smit
smith
This means, if I search for john smith
or john sm
, nothing would be returned.
So, I need to be generate tokens that look like this:
j
jo
joh
john
s
sm
smi
smit
smith
john s
john sm
john smi
john smit
john smith
.
How can I set up my analyzer so that I generates those extra tokens?