Azure Search and Dashes
Asked Answered
H

3

6

I am using Azure Search and trying to perform a search against documents:

It seems as though doing this: /indexes/blah/docs?api-version=2015-02-28&search=abc\-1003

returns the same results as this: /indexes/blah/docs?api-version=2015-02-28&search=abc-1003

Shouldn't the first one return different results than the second due to the escaping backwards slash? From what I understand the backwards slash should allow for an exact search on the whole string of "abc-1003" instead of doing a "not" operator.

(more info here: https://msdn.microsoft.com/en-us/library/azure/dn798920.aspx)

The only way I can get it to work is by doing this (note the double quotes): /indexes/blah/docs?api-version=2015-02-28&search="abc-1003"

I would rather not do that because that would mean making the user enter in the quotes, which they will not know how to do.

Am I expecting something I shouldn't or is it possibly a bug with Azure Search?

Hayne answered 2/6, 2016 at 20:55 Comment(1)
Maybe I'm blind but... I don't see a difference in the two search strings in your question.Cupronickel
J
6

First, a dash not prefaced by a whitespace acts like a dash, not a negation operator.

As per the MSDN docs for simple query syntax

- Only needs to be escaped if it's the first character after whitespace, not if it's in the middle of a term. For example, "wi-fi" is a single term

Second, unless you are using a custom analyzer for your index, the dash will be treated by the analyzer almost like white-space and will break abc-1003 into two tokens, abc and 1003.

Then when you put it in quotes"abc-1003" it will be treated as a search for the phrase abc 1003, thus returning what you expect.

If you want to exact match on abc-1003 consider using a filter instead. It is faster and can matching things like guids or text with dashes

Jejune answered 2/6, 2016 at 23:9 Comment(1)
Thank you for the information about how it treats the dash. I have looked into the custom analyzer and it seems like it will definitely meet my needs. Appreciate the help.Hayne
A
4

The documentation says that a hyphen "-" is treated as a special character that must be escaped.
In reality a hyphen is treated as a split of the token and words on both sides are searched, as Sean Saleh pointed out.

After a small investigation, I found that you do not need a custom analyzer, built-in whitespace would do.
Here is how you can use it:

{
    "name": "example-index-name",
    "fields": [
        {
            "name": "name",
            "type": "Edm.String",  
            "analyzer": "whitespace",
            ...
        },
    ],
...
}

You use this endpoint to update your index:

https://{service-name}.search.windows.net/indexes/{index-name}?api-version=2017-11-11&allowIndexDowntime=true

Do not forget to include api-key to the request header.

You can also test this and other analyzers through the analyzer test endpoint:

{
  "text": "Text to analyze",
  "analyzer": "whitespace"
}
Aborning answered 1/2, 2019 at 12:16 Comment(0)
B
2

Adding to Sean's answer, a custom analysis configuration with keyword tokenizer and a lowercase tokenfilter will address the issue. It appears that you are using the default standard analyzer which breaks words with special characters during lexical analysis at indexing. At query time, this lexical analysis applies to regular queries, not wildcard search queries. As a result, with your example, you have and <1003> in the search index and the wildcard search query that wasn't tokenized the same way and looks for terms that start with abc-1003 doesn't find it because neither terms in the index starts with abc-1003. Hope this makes sense. Please let me know if you have any additional questions.

Nate

Barri answered 16/2, 2017 at 18:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.