Change default mapping of string to "not analyzed" in Elasticsearch
Asked Answered
C

4

37

In my system, the insertion of data is always done through csv files via logstash. I never pre-define the mapping. But whenever I input a string it is always taken to be analyzed, as a result an entry like hello I am Sinha is split into hello,I,am,Sinha. Is there anyway I could change the default/dynamic mapping of elasticsearch so that all strings, irrespective of index, irrespective of type are taken to be not analyzed? Or is there a way of setting it in the .conf file? Say my conf file looks like

input {  
      file {
          path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
          type => "promosms_dec15"
          start_position => "beginning"
          sincedb_path => "/dev/null"
      }
}
filter {

    csv {
        columns => ["Comm_Plan","Queue_Booking","Order_Reference","Multi_Ordertype"]
        separator => ","
    }  
    ruby {
          code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
    }

}
output {  
    elasticsearch { 
        action => "index"
        host => "localhost"
        index => "promosms-%{+dd.MM.YYYY}"
        workers => 1
    }
}

I want all the strings to be not analyzed and I don't mind it being the default setting for all future data to be inserted into elasticsearch either

Candancecandela answered 15/12, 2014 at 11:44 Comment(0)
D
21

You can query the .raw version of your field. This was added in Logstash 1.3.1:

The logstash index template we provide adds a “.raw” field to every field you index. These “.raw” fields are set by logstash as “not_analyzed” so that no analysis or tokenization takes place – our original value is used as-is!

So if your field is called foo, you'd query foo.raw to return the not_analyzed (not split on delimiters) version.

Dudleyduds answered 24/2, 2015 at 15:0 Comment(6)
the foo.raw fields are empty while the foo fields are filled. I don't get itDefeasible
@Roland Kofler Did you find out why they were empty?. I have the same issueGretta
@AviArro all I can remember it is intended that way. Sort of a hack. More I can't rememberDefeasible
Is it a given that everyone with this question will be using logstash?Tabular
@JonCrowell well yes and no. I stumbled back on my answer the other day when NOT using logstash. I couldn't rind the .raw field and realized it was something logstash created for me, so my answer here was not helpful. :) However, the question is tagged and refers to logstash, so makes sense that my answer is the accepted one. There are other related SO questions and answers about creating not_analyzed fields with vanilla Elasticsearch that can be referred upon.Dudleyduds
Logstash 5.x with Elasticsearch 5.x changes the mapping from .raw to .keyword - elastic.co/guide/en/logstash/current/…Striate
C
28

Just create a template. run

curl -XPUT localhost:9200/_template/template_1 -d '{
    "template": "*",
    "settings": {
        "index.refresh_interval": "5s"
    },
    "mappings": {
        "_default_": {
            "_all": {
                "enabled": true
            },
            "dynamic_templates": [
                {
                    "string_fields": {
                        "match": "*",
                        "match_mapping_type": "string",
                        "mapping": {
                            "index": "not_analyzed",
                            "omit_norms": true,
                            "type": "string"
                        }
                    }
                }
            ],
            "properties": {
                "@version": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "geoip": {
                    "type": "object",
                    "dynamic": true,
                    "path": "full",
                    "properties": {
                        "location": {
                            "type": "geo_point"
                        }
                    }
                }
            }
        }
    }
}'
Candancecandela answered 16/12, 2014 at 10:6 Comment(6)
What exactly does this do? How does it work? Can you elaborate a bit more on what this template is and how it applies to the question?Drynurse
@Drynurse as you can see here, inside dynamic templates and properties, the index is set as not analyzed. As a result, if we input a string hello, I am Sinha it will be treated as one string and not split into hello I am SinhaCandancecandela
what is the index name that this mapping applies to?Flame
it applies to all indices @FlameCandancecandela
ok, that makes sense. what if i only want to apply to some fields of an index. lets say i have an index ind. the number of fields in the incoming data is not fixed but for sure there will be a string fieldd. I only want field d to be analyzed and all other string fields to be not analyzed.Flame
@SagnikSinha what exactly your are doing to drop ".raw" fields of each fields? Did you mean dropping "fields" parameter in mapping file will drop ".raw" part of each field and save 50% of space? I do have mapping file in which I do have "fields" parameter for each field I am extracting out of log.Quickman
D
21

You can query the .raw version of your field. This was added in Logstash 1.3.1:

The logstash index template we provide adds a “.raw” field to every field you index. These “.raw” fields are set by logstash as “not_analyzed” so that no analysis or tokenization takes place – our original value is used as-is!

So if your field is called foo, you'd query foo.raw to return the not_analyzed (not split on delimiters) version.

Dudleyduds answered 24/2, 2015 at 15:0 Comment(6)
the foo.raw fields are empty while the foo fields are filled. I don't get itDefeasible
@Roland Kofler Did you find out why they were empty?. I have the same issueGretta
@AviArro all I can remember it is intended that way. Sort of a hack. More I can't rememberDefeasible
Is it a given that everyone with this question will be using logstash?Tabular
@JonCrowell well yes and no. I stumbled back on my answer the other day when NOT using logstash. I couldn't rind the .raw field and realized it was something logstash created for me, so my answer here was not helpful. :) However, the question is tagged and refers to logstash, so makes sense that my answer is the accepted one. There are other related SO questions and answers about creating not_analyzed fields with vanilla Elasticsearch that can be referred upon.Dudleyduds
Logstash 5.x with Elasticsearch 5.x changes the mapping from .raw to .keyword - elastic.co/guide/en/logstash/current/…Striate
T
13

Make a copy of the lib/logstash/outputs/elasticsearch/elasticsearch-template.json from your Logstash distribution (possibly installed as /opt/logstash/lib/logstash/outputs/elasticsearch/elasticsearch-template.json), modify it by replacing

"dynamic_templates" : [ {
  "string_fields" : {
    "match" : "*",
    "match_mapping_type" : "string",
    "mapping" : {
      "type" : "string", "index" : "analyzed", "omit_norms" : true,
      "fields" : {
        "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
      }
    }
  }
} ],

with

"dynamic_templates" : [ {
  "string_fields" : {
    "match" : "*",
    "match_mapping_type" : "string",
    "mapping" : {
      "type" : "string", "index" : "not_analyzed", "omit_norms" : true
    }
  }
} ],

and point template for you output plugin to your modified file:

output {
  elasticsearch {
    ...
    template => "/path/to/my-elasticsearch-template.json"
  }
}

You can still override this default for particular fields.

Topi answered 15/12, 2014 at 13:2 Comment(6)
That doesn't seem to give me desired results... I have a field named State which has an instance called West Bengal. When I plot a bar graph, I get 2 different legends namely west and bengal which is wrong. That problem is still thereCandancecandela
Is this in a new index? Keep in mind that changing the index template won't make a difference for existing data.Deformed
I deleted the previous index, refreshed it and then inserted again. Yes, its in a new indexCandancecandela
Interesting. What if you get the mapping of the index and have a look at the actual mapping being used?Deformed
It shows {"promosms-16.12.2014":{"mappings":{"promosms_dec15":{"properties":{..............,"State":{"type":"string"},.........}}}}} there is no information as to whether it is analyzed or notCandancecandela
I declared the whole thing as a template and it worked. I set "template" : "*"Candancecandela
G
1

I think updating the mapping is wrong approach just to handle a field for reporting purposes. Sooner or later you may want to be able to search the field for tokens. If you are updating the field to "not_analyzed" and want to search for foo from a value "foo bar", you won't be able to do that.

A more graceful solution is to use kibana aggregation filters instead of terms. Something like below will search for the terms ivr04 and ivr02. So in your case you can have a filter "Hello I'm Sinha". Hope this helps.

enter image description here

Gaona answered 1/11, 2016 at 10:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.