Elasticsearch analizer working when created throw Springdata but failing when creating straight from Postman/curl
Asked Answered
Q

1

0

Goal: create Elasticsearch index aimed to be loaded with 10 million simple documents. Each document is basically "Elastisearch id", "some company id" and "name". Provide search-as-suer-type feature.

I could created successfully an index and an analyzer either straight from Postman (curl or any other tool not relying on Spring Data) or during Spring boot initialization. Nevertheless, when I try to use the analizer it seems it is ignored for the one created straight from Postman.

So my main question is: is Springdata adding some setting I am missing when I try straight from posting the json stting? A secondary question is: is there someway to enable Springdata to print the commands auto-generated and executed (kind of similar approach with Hibernate whihc allows you to see the commands printed)? If so, I can visually debug and check what is different.

This is the way creting Index and Analyzer from Springboot/Spring-Data.

main method to boot

@EnableElasticsearchRepositories
@SpringBootApplication
public class SearchApplication {

    public static void main(String[] args) {
        SpringApplication.run(SearchApplication.class, args);
    }

}

my model

@Document(indexName = "correntistas")
@Setting(settingPath = "data/es-config/elastic-setting.json")
@Getter
@Setter
public class Correntista {
    @Id
    private String id;
    private String conta;
    private String sobrenome;

    @Field(type = FieldType.Text, analyzer = "autocomplete_index", searchAnalyzer = "autocomplete_search")
    private String nome;
}

src/main/resources/data/es-config/elastic-setting.json *** NOTE THIS IS EXACTLY THE SAME SETTING I AM POSTING FROM POSTMAN

{
  "analysis": {
    "filter": {
      "autocomplete_filter": {
        "type": "edge_ngram",
        "min_gram": 1,
        "max_gram": 20
      }
    },
    "analyzer": {
      "autocomplete_search": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase"
        ]
      },
      "autocomplete_index": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "autocomplete_filter"
        ]
      }
    }
  }
}

Checking if it was created succesfully I see:

get http://localhost:9200/correntistas/_settings

{
    "correntistas": {
        "settings": {
            "index": {
                "number_of_shards": "5",
                "provided_name": "correntistas",
                "creation_date": "1586615323459",
                "analysis": {
                    "filter": {
                        "autocomplete_filter": {
                            "type": "edge_ngram",
                            "min_gram": "1",
                            "max_gram": "20"
                        }
                    },
                    "analyzer": {
                        "autocomplete_index": {
                            "filter": [
                                "lowercase",
                                "autocomplete_filter"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "autocomplete_search": {
                            "filter": [
                                "lowercase"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "xtN-NOX3RQWJjeRdyC8CVA",
                "version": {
                    "created": "6080499"
                }
            }
        }
    }
}

So far so good.

Now I delete the index with curl -XDELETE localhost:9200/correntistas and I will do the same idea but creating the index and analyzer at once from Postman:

put http://localhost:9200/correntistas with exact same analyzer posted above:

creting index and analyiser at once

Then if I check I the settings I see exact the same result as it was created above from Spring-Data.

Am I missing some extra step that Spring-Data is giving by free and hiding from eyes?

To sum up, when created from Spring-data I see searching with few letters working but when Icreated from postman it simply retrieve data when I search with whole word.

*** Thanks to so friendly and smart help from Opster Elasticsearch Ninja I can add here an extra trick I had learned when posting from Postman (somehow some header enabled in my Postman was crashing with "... Root mapping definition has unsupported parameters... mapper_parsing_exception..." while trying the solution answered bellow. I guess it can be usefull to add here for future readers.

postman headers messing up

Quiteria answered 11/4, 2020 at 14:38 Comment(4)
Wow, quite some details, but I guess I was lost somewhere, are you saying, it doesn't work with postman? I see searching with few letters working but when Icreated from postman it simply retrieve data when I search with whole word.?Hindustani
Can you explain what you do mean by failing, can you provide your search query, sample and expected docs?Hindustani
@OpsterElasticsearchNinja regard your first comment, I mean, if I create both the Index and the Analysis from Springboot it works. I mean, I can search using only the first letter. On another hand, if I create both the Index and Analysis from Postman, when I try to search,it only retrieves result if I type the whole word. It seems to me that Spring-data is adding some extra settings. But when I compare the settings, the one created from Spring with the one I created manually from Postman the settings are exactly the same. I will try to improve my question after study your example bellow.Quiteria
Thanks for clarification, Now its much clear :), you need to compare mapping, not only settings to figure out why results are different, you can check ES docs for understanding the diff b/w themHindustani
H
1

As you have not provided your search query which you are using in postman, also the mapping, which would help us to debug, if you are not using the right analyzer on the fields, you are using in your search query. Also adding sample documents and your actual and expected search results always help.

Nvm, I added your mapping and showing below, how that using postman as well, you will get the correct results.

Index def exactly same as yours

{
    "settings": {
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete_search": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase"
                    ]
                },
                "autocomplete_index": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "autocomplete_index",
                "search_analyzer": "autocomplete_search"
            }
        }
    }
}

Index sample docs

{
    "name" : "opster"
}

{
    "name" : "jim c"
}

{
    "name" : "jimc"
}

{
    "name" : "foo"
}

Searching for partial words like ji brings both jim c and jimc docs

{
    "query": {
        "match": {
            "name": {
                "query": "ji"
            }
        }
    }
}

Result

  "hits": [
            {
                "_index": "61158504",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.69263697,
                "_source": {
                    "name": "jimc"
                }
            },
            {
                "_index": "61158504",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.6133945,
                "_source": {
                    "name": "jim c"
                }
            }
        ]
Hindustani answered 11/4, 2020 at 16:29 Comment(10)
Well, do I need to create a mapping if I had created analysis? I have used few times ELK for logs purpose but it is my first time using Elastisearch for such specific business requirement: search-as-user-type purpose. All samples around I have studied didn't guide me to create a mapping. As a simple analogy, I understand that mapping stands for a bit similar purpose as "Index" in relational database does (very simple analogy).Quiteria
From elastic.co/guide/en/elasticsearch/reference/current/… I read "Mapping is the process of defining how a document, and the fields it contains, are stored and indexed". I understand mapping add performance behind the scene but it doesn't change the query result at all.Am I wrong?Quiteria
@JimC, no mapping changes the query as well, as there are vast types of queries and how they work depends on what type of data types and analyzer you usedHindustani
@JimC, do I need to create a mapping if I had created analysis? if you just trying out locally, then best is to delete this and create a new one, there are ways by which you can update existing mapping but that's for more advance use cases and have some cavets to it, I would suggest please follow my example with fresh index and mapping and try to understand the concepts as its best when you start with blank state :)Hindustani
Thanks, I totally agree with your statement "... I would suggest please follow my example with fresh index and mapping and try to understand the concepts as its best when you start with blank state." I need to learn more about Elasticsearch. If don't mind, an extra comment: what would you change in you analysis in order to match letters both in beggin as in the middle? (eg. "im" would bring Jim C and Jim and "st" will reult Opster)Quiteria
@JimC, begining one would work with my example, to match the letters in middle or anywhere(its called infix search), we need to create another mapping with instead if n-gram we need to use ngram analyzer but just remember it would increase index size greatly, I talked about infix search in my another answer #60584599 and same link I gave but looks like you missed it :)Hindustani
Let us continue this discussion in chat.Hindustani
I changed just a bit your proposal in order to fit better my model need and it seems all goes worng. Kindly, if you have the chance I would appreciatte check #61690241Quiteria
@JimC, so basically now you want to be able to search on mid of worlds as well, correct ? like if doc contains foobar then searching for fo and oob and bar should also returns the result?Hindustani
@JimC, I provided an answer according to my understanding and waiting for your comment on that 😊Hindustani

© 2022 - 2024 — McMap. All rights reserved.