Elastic search exact match
Asked Answered
T

3

14

I'm using elasticsearch and am having a devil of a time getting an exact match to happen. I've tried various combinations of match, query_string, etc, and I either get nothing or bad results. Query looks like this:

{
  "filter": {
    "term": {
      "term": "dog",
      "type": "main"
    }
  },
  "query": {
    "match_phrase": {
      "term": "Dog"
    }
  },
  "sort": [
    "_score"
  ]
}

Sorted results

10.102211 {u'term': u'The Dog', u'type': u'main', u'conceptid': 7730506}
10.102211 {u'term': u'That Dog', u'type': u'main', u'conceptid': 4345664}
10.102211 {u'term': u'Dog', u'type': u'main', u'conceptid': 144}
7.147442 {u'term': u'Dog Eat Dog (song)', u'type': u'main', u'conceptid': u'5288184'}

I see, of course that "The Dog", "That Dog" and "Dog" all have the same score, but I need to figure out how I can boost the exact match "Dog" in score.

I also tried

{
  "sort": [
    "_score"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "term": "Dog"
          }
        },
        {
          "match_phrase": {
            "term": {
              "query": "Dog",
              "boost": 5
            }
          }
        }
      ]
    }
  },
  "filter": {
    "term": {
      "term": "dog",
      "type": "main"
    }
  }
}

but that still just gives me

11.887239 {u'term': u'The Dog', u'type': u'main', u'conceptid': 7730506}
11.887239 {u'term': u'That Dog', u'type': u'main', u'conceptid': 4345664}
11.887239 {u'term': u'Dog', u'type': u'main', u'conceptid': 144}
8.410372 {u'term': u'Dog Eat Dog (song)', u'type': u'main', u'conceptid': u'5288184'}
Tortricid answered 3/9, 2013 at 17:45 Comment(0)
F
14

Fields are analyzed with the standard analyzer by default. If you would like to check exact match, you could store your field not analyzed also e.g:

"dog":{
            "type":"multi_field",
            "fields":{
                "dog":{
                    "include_in_all":false,
                    "type":"string",
                    "index":"not_analyzed",
                    "store":"no"
                },
                "_tokenized":{
                    "include_in_all":false,
                    "type":"string",
                    "index":"analyzed",
                    "store":"no"
                }
            }
        }

Then you can query the dog-field for exact matches, and dog._tokenized for analyzed queries (like fulltext)

Falciform answered 4/9, 2013 at 9:54 Comment(2)
Would this require making a change to all the records? I've got about just shy of 50,000,000 (very small) records in this index. Or would it make more sense to nuke the index and reimport with this structure in place as appropriate? I'm a newbie when it comes to Lucene stuff, if I stored my data like this, how would my ES queries change? Thanks for your help! I was going to ask about the best way to design my index here but I'll make that another question I think.Tortricid
This was what I needed, I ended up rebuilding the data anyhow as I wanted to get at it in a different way, but I used this as the basis of my approach. Thanks!Tortricid
H
0

I think that your problem is that field term is being analyzed (check your mapping) with the standard analyzer and is filtering stopwords such as the or that. For that reason you get the same score for Dog and The Dog. So maybe you can solve your problem by configuring a custom analyzer => documentation page

Howund answered 3/9, 2013 at 19:34 Comment(0)
E
-1

Hash two value which you need to search into hash key, then search it.

Eta answered 9/5, 2017 at 2:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.