Queries vs. Filters
Asked Answered
L

9

249

I can't see any description of when I should use a query or a filter or some combination of the two. What is the difference between them? Can anyone please explain?

Leafy answered 30/1, 2013 at 3:3 Comment(3)
Official documentation is not very clear in factGallinacean
Looks like there are appeared a page with more advanced explanation: elastic.co/guide/en/elasticsearch/guide/master/…People
Worth noting that queries and filters will be merged in ES 2.0, hence most of what's been said and written for queries vs filters will not apply anymore. Also check the official blog post announcing this change.Judaize
P
238

The difference is simple: filters are cached and don't influence the score, therefore faster than queries. Have a look here too. Let's say a query is usually something that the users type and pretty much unpredictable, while filters help users narrowing down the search results , for example using facets.

Photoconduction answered 30/1, 2013 at 9:37 Comment(6)
Right so, if the user is doing a google type search then I would use a query? If they are selecting possible value from a drop down (eg, invoice count > 50) then this would be a filter?Leafy
Yep, that's exactly right. Any time you need to restrict the entire set of documents by some metric, that's usually a case that a filter is appropriate. So maybe by age, length, size, etc etcDartmouth
My solution uses filters and queries in the same request and it is super fast on the test database. We will soon get the live data in there to see how fast it really is.Leafy
@Dartmouth To be absolutely clear, in a multi-tenant system -with permissions for users within a tenant-, it sounds like the tenant/authentication information would be a filter added to every query (i.e. a Filtered Query). Right?Ottar
@activescott Yep, that's what I would do. You can also set up filtered aliases so that "user aliases" always apply the appropriate filter. Makes administration easier and doesn't require code changes to update queries, extra cruft in your query, etc.Dartmouth
We use 'function_score' where you can set a query or a filter.We only set a filter. In the functions part you can also define filters which influence the score of youre results.Gauthier
J
119

This is what official documentation says:

As a general rule, filters should be used instead of queries:

  • for binary yes/no searches
  • for queries on exact values

As a general rule, queries should be used instead of filters:

  • for full text search
  • where the result depends on a relevance score
Jarad answered 10/8, 2014 at 13:48 Comment(2)
when I want to delete document, should I use a filter if possible ? I don't want it to be cachedTelepathist
when deleting a doc, you do not require any score, nor do you need to do a full text search. So this would be a filter than, as you just need to make a delete/not delete decision. filter-query-contextUdine
A
28

An example (try it yourself)

Say index myindex contains three documents:

curl -XPOST localhost:9200/myindex/mytype  -d '{ "msg": "Hello world!" }'
curl -XPOST localhost:9200/myindex/mytype  -d '{ "msg": "Hello world! I am Sam." }'
curl -XPOST localhost:9200/myindex/mytype  -d '{ "msg": "Hi Stack Overflow!" }'

Query: How well a document matches the query

Query hello sam (using keyword must)

curl localhost:9200/myindex/_search?pretty  -d '
{
  "query": { "bool": { "must": { "match": { "msg": "hello sam" }}}}
}'

Document "Hello world! I am Sam." is assigned a higher score than "Hello world!", because the former matches both words in the query. Documents are scored.

"hits" : [
   ...
     "_score" : 0.74487394,
     "_source" : {
       "name" : "Hello world! I am Sam."
     }
   ...
     "_score" : 0.22108285,
     "_source" : {
       "name" : "Hello world!"
     }
   ...

Filter: Whether a document matches the query

Filter hello sam (using keyword filter)

curl localhost:9200/myindex/_search?pretty  -d '
{
  "query": { "bool": { "filter": { "match": { "msg": "hello sam" }}}}
}'

Documents that contain either hello or sam are returned. Documents are NOT scored.

"hits" : [
   ...
     "_score" : 0.0,
     "_source" : {
       "name" : "Hello world!"
     }
   ...
     "_score" : 0.0,
     "_source" : {
       "name" : "Hello world! I am Sam."
     }
   ...

Unless you need full text search or scoring, filters are preferred because frequently used filters will be cached automatically by Elasticsearch, to speed up performance. See Elasticsearch: Query and filter context.

Anta answered 20/12, 2017 at 3:58 Comment(0)
R
21

Filters -> Does this document match? a binary yes or no answer

Queries -> Does this document match? How well does it match? uses scoring

Regan answered 9/2, 2018 at 9:20 Comment(0)
C
13

Few more addition to the same. A filter is applied first and then the query is processed over its results. To store the binary true/false match per document , something called a bitSet Array is used. This BitSet array is in memory and this would be used from second time the filter is queried. This way , using bitset array data-structure , we are able to utilize the cached result.

One more point to note here , the filter cache is created only when the request is executed hence only from the second hit , we actually get the advantage of caching.

But then you can use warmer API , to outgrow this. When you register a query with filter against a warmer API , it will make sure that this is executed against a new segment whenever it comes live. Hence we will get consistent speed from the first execution itself.

Christeenchristel answered 24/2, 2015 at 2:40 Comment(2)
Interesting! I didn't realise filters happen before queries. The caching of filters makes more sense now.Giantess
Not always. The basic and primary difference between filtered and constant score query. Constant score always execute query first and then applies filter over it. Even filtered query have settings by which query can execute before filters.Deegan
L
11

Basically, a query is used when you want to perform a search on your documents with scoring. And filters are used to narrow down the set of results obtained by using query. Filters are boolean.

For example say you have an index of restaurants something like zomato. Now you want to search for restaurants that serve 'pizza', which is basically your search keyword.

So you will use query to find all the documents containing "pizza" and some results will obtained.

Say now you want list of restaurant that serves pizza and has rating of atleast 4.0.

So what you will have to do is use the keyword "pizza" in your query and apply the filter for rating as 4.0.

What happens is that filters are usually applied on the results obtained by querying your index.

Leanoraleant answered 23/11, 2014 at 5:50 Comment(1)
Cant you provide an example of a request body?Formaldehyde
U
1

Since version 2 of Elasticsearch, filters and queries have been merged and any query clause can be used as either a filter or a query (depending on the context). As with version 1, filters are cached and should be used if scoring does not matter.

Source: https://logz.io/blog/elasticsearch-queries/

Unfaithful answered 7/5, 2020 at 7:20 Comment(0)
P
1

Queries : calculate score; thus they’re able to return results sorted by relevance. Filters : don’t calculate score, making them faster and easier to cache.

Pastorship answered 15/8, 2020 at 4:56 Comment(0)
D
0

In Elasticsearch, both queries and filters are used to search and retrieve data, but they have different purposes and impacts on search operations:

enter image description here

That's it.

Deerhound answered 25/4 at 4:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.