Multi-field, multi-word, match without query_string
Asked Answered
L

4

22

I would like to be able to match a multi word search against multiple fields where every word searched is contained in any of the fields, any combination. The catch is I would like to avoid using query_string.

curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"firstname":"john","middlename":"clark","lastname":"smith"}'
curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"firstname":"john","middlename":"paladini","lastname":"miranda"}'

I would like the search for 'John Smith' to match only document 1. The following query does what I need but I would rather avoid using query_string in case the user passes "OR", "AND" and any of the other advanced params.

curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{
  "query": {
    "query_string": {
      "query": "john smith",
      "default_operator": "AND",
      "fields": [
        "firstname",
        "lastname",
        "middlename"
      ]
    }
  }
}'
Linn answered 15/3, 2013 at 1:10 Comment(1)
I keep coming to this question over and over and over again. Great, evergreen question!Jasso
O
36

What you are looking for is the multi-match query, but it doesn't perform in quite the way you would like.

Compare the output of validate for multi_match vs query_string.

multi_match (with operator and) will make sure that ALL terms exist in at least one field:

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "multi_match" : {
      "operator" : "and",
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "((+lastname:john +lastname:smith) | (+firstname:john +firstname:smith))",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

While query_string (with default_operator AND) will check that EACH term exists in at least one field:

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "query_string" : {
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith",
      "default_operator" : "AND"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "+(firstname:john | lastname:john) +(firstname:smith | lastname:smith)",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

So you have a few choices to achieve what you are after:

  1. Preparse the search terms, to remove things like wildcards, etc, before using the query_string

  2. Preparse the search terms to extract each word, then generate a multi_match query per word

  3. Use index_name in your mapping for the name fields to index their data into a single field, which you can then use for search. (like your own custom all field):

As follows:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "index_name" : "name",
               "type" : "string"
            },
            "lastname" : {
               "index_name" : "name",
               "type" : "string"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "VJFU_RWbRNaeHF9wNM8fRA",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 33
# }

Note however, that firstname and lastname are no longer searchable independently. The data for both fields has been indexed into name.

You could use multi-fields with the path parameter to make them searchable both independently and together, as follows:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "fields" : {
                  "firstname" : {
                     "type" : "string"
                  },
                  "any_name" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            },
            "lastname" : {
               "fields" : {
                  "any_name" : {
                     "type" : "string"
                  },
                  "lastname" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

Searching the any_name field works:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "any_name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 11
# }

Searching firstname for john AND smith doesn't work:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [],
#       "max_score" : null,
#       "total" : 0
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 2
# }

But searching firstname for just john works correctly:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.30685282,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.30685282,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 3
# }
Occultism answered 15/3, 2013 at 10:40 Comment(4)
I'm confused about why that first multi_match query is returning something. I assume that when you say "all terms" you mean "john" and "smith". It is not the case that "john" and "smith" both exist in the first_name field. And it is not the case that "john" and "smith" both exist in the last_name field.Agitate
@AdamZerner is above answer still valid? because when I execute the validate request for multi_match and query_string query, they both produce the same explanation ? I am using ES 7.10.2Shutz
@NishikantTayade I'm not sure, sorry.Agitate
@AdamZerner No worries! I have posted the same on ES forum, will get back here if I get the answer.Shutz
A
1

I would rather avoid using query_string in case the user passes "OR", "AND" and any of the other advanced params.

In my experience, escaping the special characters with backslash is a simple and effective solution. The list can be found in the documentation http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description, plus AND/OR/NOT/TO.

Abb answered 6/12, 2013 at 20:6 Comment(0)
E
0

Nowadays you can use cross_fields type in multi_match

GET /_validate/query?explain
{
    "query": {
        "multi_match": {
            "query":       "peter smith",
            "type":        "cross_fields", 
            "operator":    "and",
            "fields":      [ "firstname", "lastname", "middlename" ]
        }
    }
}

Cross-fields take a term-centric approach. It treats all of the fields as one big field, and looks for each term in any field.

One thing to note though is that if you want it to work optimally, all fields analyzed should have the same analyzer (standard, english, etc.):

For the cross_fields query type to work optimally, all fields should have the same analyzer. Fields that share an analyzer are grouped together as blended fields.

If you include fields with a different analysis chain, they will be added to the query in the same way as for best_fields. For instance, if we added the title field to the preceding query (assuming it uses a different analyzer), the explanation would be as follows:

(+title:peter +title:smith) ( +blended("peter", fields: [first_name, last_name]) +blended("smith", fields: [first_name, last_name]) )

Empress answered 12/4, 2019 at 15:11 Comment(0)
V
-1

I think "match" query is what you are looking for:

"The match family of queries does not go through a “query parsing” process. It does not support field name prefixes, wildcard characters, or other “advance” features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does)"

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

Vola answered 15/3, 2013 at 7:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.