Cloudsearch Fuzzy terms and phrases
Asked Answered
V

1

13

I am trying to get my head around how fuzzy search works on AWS CloudSearch

I want to find "Star Wars" but in my search, I spell it

ster wers

The logic of my app will add fuzzy but it never returns Star Wars. I have tried:

ster~1 wers~1
"ster wers"~2
"ster"~1 "wers"~1

What am I missing here?

Vannesavanness answered 31/3, 2015 at 11:55 Comment(0)
L
13

The reason your query doesn't work is because of how CloudSearch stems. If your field is indexed with the Analysis Scheme set to English, then wars will be stored in its stemmed form as war.

Here's a little demo of how stemming is affecting your query.

Searching with the un-stemmed query ('ster wers'):

Searching with the un-stemmed query requires you to match wers to war, which is off by 2 chars and requires this query: q=ster~1+wers~2.

Searching with the stemmed query ('ster wer'):

Searching with the stemmed version means you're matching wer to war and you're only off by 1 char. Thus ster~1 wer~1 will get the desired result (ie it matches star wars).

How to fix:

The use case you described will work if you configure the Analysis Scheme for the field in question to not use any stemming.

  1. To do this, log into the AWS Web Console and go to Analysis Schemes --> Add Analysis Scheme: enter image description here

  2. Then go to Indexing Options and configure your field to use your new no-stemming analysis scheme: enter image description here

  3. Submit your changes and re-index.

That will address your issue but of course you'll lose the benefits of stemming. You can't have your cake and eat it too.

Lemmuela answered 27/4, 2015 at 23:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.