elasticsearch disable term frequency scoring

Asked 24/8, 2015 at 22:38 Answered 30/5, 2021 at 13:1

Solved java elasticsearch frequency scoring

I want to change the scoring system in elasticsearch to get rid of counting multiple appearances of a term. For example, I want:

"texas texas texas"

and

"texas"

to come out as the same score. I had found this mapping that elasticsearch said would disable term frequency counting but my searches do not come out as the same score:

"mappings":{
"business": {   
   "properties" : {
       "name" : {
          "type" : "string",
          "index_options" : "docs",
          "norms" : { "enabled": false}}
        }
    }
}

}

Any help will be appreciated, I have not been able to find a lot of information on this.

I am adding my search code and what gets returned when I use explain.

My search code:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build();
    Client client = new TransportClient(settings)
    .addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300));

    SearchRequest request =  Requests.searchRequest("businesses")
            .source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery()
            .should(QueryBuilders.matchQuery("name", "Texas")
            .minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH);
    
    ExplainRequest request2 = client.prepareIndex("businesses", "business")

and when I search with explain I get:

  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9U5KBks4zEorv9YI4n",
      "_score" : 1.0,
      "_source":{
"name" : "texas"
}
,
      "_explanation" : {
        "value" : 1.0,
        "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.0,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(freq=1.0), with freq of:",
            "details" : [ {
              "value" : 1.0,
              "description" : "termFreq=1.0"
            } ]
          }, {
            "value" : 1.0,
            "description" : "idf(docFreq=2, maxDocs=3)"
          }, {
            "value" : 1.0,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    }, {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9U5K6Ks4zEorv9YI4o",
      "_score" : 0.8660254,
      "_source":{
"name" : "texas texas texas"
}
,
      "_explanation" : {
        "value" : 0.8660254,
        "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 0.8660254,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.7320508,
            "description" : "tf(freq=3.0), with freq of:",
            "details" : [ {
              "value" : 3.0,
              "description" : "termFreq=3.0"
            } ]
          }, {
            "value" : 1.0,
            "description" : "idf(docFreq=2, maxDocs=3)"
          }, {
            "value" : 0.5,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    } ]
  }

It looks like it is still considering frequency and doc frequency. Any ideas? Sorry for the bad formatting I don't know why it is appearing so grotesque.

My code from the browser search http://localhost:9200/businesses/business/_search?pretty=true&qname=texas is:

    {
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YcCKjKvtg8NgyozGK",
      "_score" : 1.0,
      "_source":{"business" : {
"name" : "texas texas texas texas" }
}
    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YateBKvtg8Ngyoy-p",
      "_score" : 1.0,
      "_source":{
"name" : "texas" }

    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YavVnKvtg8Ngyoy-4",
      "_score" : 1.0,
      "_source":{
"name" : "texas texas texas" }

    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9Yb7NgKvtg8NgyozFf",
      "_score" : 1.0,
      "_source":{"business" : {
"name" : "texas texas texas" }
}
    } ]
  }
}

It finds all 4 objects I have in there and has them all the same score. When I run my java API search with explain I get:

    {
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.287682,
    "hits" : [ {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YateBKvtg8Ngyoy-p",
      "_score" : 1.287682,
      "_source":{
"name" : "texas" }
,
      "_explanation" : {
        "value" : 1.287682,
        "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.287682,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(freq=1.0), with freq of:",
            "details" : [ {
              "value" : 1.0,
              "description" : "termFreq=1.0"
            } ]
          }, {
            "value" : 1.287682,
            "description" : "idf(docFreq=2, maxDocs=4)"
          }, {
            "value" : 1.0,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    }, {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YavVnKvtg8Ngyoy-4",
      "_score" : 1.1151654,
      "_source":{
"name" : "texas texas texas" }
,
      "_explanation" : {
        "value" : 1.1151654,
        "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.1151654,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.7320508,
            "description" : "tf(freq=3.0), with freq of:",
            "details" : [ {
              "value" : 3.0,
              "description" : "termFreq=3.0"
            } ]
          }, {
            "value" : 1.287682,
            "description" : "idf(docFreq=2, maxDocs=4)"
          }, {
            "value" : 0.5,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    } ]
  }
}

Terrorstricken answered 24/8, 2015 at 22:38 Comment(6)

the mismatch is probably got more to do with doc frequency rather than term frequency are you using search_type=dfs_query_then_fetch . If that doesn't help try setting explain=true in the query to see the breakdown in scoring – Chock 25/8, 2015 at 3:47

I switched it to dfs_query_then_fetch but that didn't work. I will post my code and explain results in a second – Terrorstricken 25/8, 2015 at 14:7

could you post the query too ? – Chock 25/8, 2015 at 14:16

I'm sorry, what do you mean? I just execute the SearchRequest from above with: ActionFuture af = client.search(request); – Terrorstricken 25/8, 2015 at 14:20

And thank you for the formatting edit! – Terrorstricken 25/8, 2015 at 14:21

oh my bad did not realise the query is in the code snippet could you print the actual query dsl the code generates ,explain seems to suggest the query is against the _all field – Chock 25/8, 2015 at 14:26

Looks like one cannot override the index options for a field after the field has been initial set in mapping

Example:

put test
put test/business/_mapping
{

      "properties": {
         "name": {
            "type": "string",
           "index_options": "freqs",
            "norms": {
               "enabled": false
            }
         }
      }

}
put test/business/_mapping
{

      "properties": {
         "name": {
            "type": "string",
            "index_options": "docs",
            "norms": {
               "enabled": false
            }
         }
      }

}
get  test/business/_mapping

   {
   "test": {
      "mappings": {
         "business": {
            "properties": {
               "name": {
                  "type": "string",
                  "norms": {
                     "enabled": false
                  },
                  "index_options": "freqs"
               }
            }
         }
      }
   }
}

You would have to recreate the index to pick up the new mapping

Chock answered 25/8, 2015 at 14:20 Comment(13)

Well this is embarrasing, that was my own stupidity, I was testing just using my browser with the command: localhost:9200/businesses/…, after I change it to "qname=texas" it works, the scores are the same. So why doesn't it work with my java API search, where it seems like I am searching the name field? – Terrorstricken 25/8, 2015 at 14:34

could you paste the whole snippet or better the response with explain set in java client – Chock 25/8, 2015 at 15:39

I'm sorry I am not sure how to set it in javaAPI, it doesn't seem to be an option with SearchRequest. I will update my OP with the code. – Terrorstricken 25/8, 2015 at 16:18

I changed to SearchResponse to be able to use explain, updating OP again and overwriting from previous edit. It looks like when i'm using the java API its not hitting the settings that should ignore the frequencies. – Terrorstricken 25/8, 2015 at 16:36

strange could you try this http://localhost:9200/businesses/business/_search?pretty=true&q=name:texas&search_type=dfs_query_then_fetch&explain=true in browser and see if you still get the same score ? I have a feeling probably the mapping wasn't applied or was applied post indexing the documents – Chock 25/8, 2015 at 17:42

That new search gives me the same results as my java api. And regarding the mappings, why it be working for one search but not the other when it is on the same documents? I set the mapping before indexing anything. – Terrorstricken 25/8, 2015 at 17:57

the previous http://localhost:9200/businesses/business/_search?pretty=true&qname=texas has wrong syntax and elasticsearch unfortunately instead of throwing an error ignores the wrong url params` . It defaults to match all .This is the reason all documents have the same score. You can try with http://localhost:9200/businesses/business/_search?pretty=true&qname=thiscannotbeinthedocument and you should get the same as previous result . it looks very likely the mapping wasn't applied correctly try http://localhost:9200/businesses/business/_mapping – Chock 25/8, 2015 at 18:8

Wow you're right on all counts it looks like... same results, and the current mapping is not what I put in, it looks like the default assignment that elasticsearch gives. When I am submitting the mapping it gives me an all good response, I don't remember what it is exactly but its something like acknowledged: true. Maybe I am putting it in the wrong place? – Terrorstricken 25/8, 2015 at 18:21

You are on to something , updated answer actually looks like once the index has been created and field specified int he mapping you cannot override it with mapping call . Don't think it is mentioned in the documents though so probably you can raise an issue with elasticsearch since it should atleast raise an error rather than silently fail – Chock 25/8, 2015 at 18:43

I am just using it on a test elasticsearch right now, so I am deleting the index, adding the mapping to "businesses" and then adding little test objects. Is there something different I can be doing when adding the mapping initially? – Terrorstricken 25/8, 2015 at 18:50

You were right, I was using the wrong way to map it. I'll update my post above with my working mapping, thank you so much!! – Terrorstricken 25/8, 2015 at 19:18

Is there a way to add "index_options" : "freqs" to all fields, not just the "name" field? I'm looking for something like "*" instead of "name" – Madonna 3/10, 2016 at 20:49

should be able to achieve it using dynamic templates – Chock 3/10, 2016 at 21:8

your field type must be text

you must re-indexing elasticsearch - create a new index

"mappings": {
    "properties": {
      "text": {
        "type": "text",
        "index_options": "docs"
      }
    }
  }

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-options.html

Endometriosis answered 30/5, 2021 at 13:1 Comment(0)

Recommended topics

Hot tags