Sub-queries with "union" in elasticsearch
Asked Answered
C

3

6

I'm currently busy working on a project in which we chose to use Elasticsearch as the search engine for a classifieds website.

Currently, I have the following business rule:

List 25 adverts per page. Of these 25, 10 of the displayed adverts must be "Paid Adverts", and the other 15 must be "Free". All 25 must be relevant to the search performed (i.e. Keywords, Region, Price, Category, etc.)

I know I can do this using two seperate queries, but this seems like an immense waste of resources. Is it possible to do "sub-queries" (if you can call them that?) and union these results into a single result set? Somehow only fetching 10 "Paid" adverts and 15 "Free" ones from elasticsearch, in one single query? Assuming of course that there are enough adverts to make this requirement possible.

Thanks for any help!

edit - Just adding my mapping information for more clarity.

"properties": {
       "advertText": {
          "type": "string",
          "boost": 2,
          "store": true,
          "analyzer": "snowball"
       },
       "canonical": {
          "type": "string",
          "store": true
       },
       "category": {
          "properties": {
             "id": {
                "type": "string",
                "store": true
             },
             "name": {
                "type": "string",
                "store": true
             },
             "parentCategory": {
                "type": "string",
                "store": true
             }
          }
       },
       "contactNumber": {
          "type": "string",
          "index": "not_analyzed",
          "store": true
       },
       "emailAddress": {
          "type": "string",
          "store": true,
          "analyzer": "url_email_analyzer"
       },
       "advertType": {
          "type": "string",
          "index": "not_analyzed"
       },
       ...
}

What I want then is to be able to query this and get 10 results where "advertType": "Paid" and 15 where "advertType": "Free"...

Cusack answered 25/6, 2014 at 12:42 Comment(3)
can you share your mappings, sample data and a sample query for each type? would help a lot in coming up with a solution.Distinctly
@JohnPetrone, I shall do so when I'm back at work (it's 8PM here in RSA). Still though, let's take the concept of "adverts" completely out of the mix, is it possible to query elasticsearch once, but do something like "get 5 admins and 10 employees from the people index"? I don't know how to phrase the question in proper terms so that it makes more sense!Cusack
I think I can get you pretty close - I'll start writing up an answer. Will require a bit of explanation.Distinctly
D
7

A couple of approaches you can take.

First, you can try using the multi-search API:

Multi Search API

The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.

The format of the request is similar to the bulk API format

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-multi-search.html

A basic example:

curl -XGET 'http://127.0.0.1:9200/advertising_index/_msearch?pretty=1'  -d '
{}
{"query" : {"match" : {"Paid_Ads" : "search terms"}}, "size" : 10}
{}
{"query" : {"match" : {"Free" : "search terms"}}, "size" : 15}
'

I've made up the fields and query but overall you should get the idea - you hit the _msearch endpoint and pass it a series of queries starting with empty brackets {}. For Paid I've set size to 10 and for Free I've set size to 15.

Subject to the details of your own implementation you should be able to use something like this.

If that does not work for whatever reason you can also try using a limit filter:

Limit Filter

A limit filter limits the number of documents (per shard) to execute on. For example:

{
    "filtered" : {
        "filter" : {
             "limit" : {"value" : 100}
         },
         "query" : {
            "term" : { "name.first" : "shay" }
        }
    }
}

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html

Note that the limits are per shard, not per index. Given a default of 5 primary shards per index, to get a total response of 10 you would set limit to 2 (2X5 == 10). Also note that this can produce incomplete results if you have multiple matches on one shard but none on another.

You would then combine two filters with a bool filter:

Bool Filter

A filter that matches documents matching boolean combinations of other queries. Similar in concept to Boolean query, except that the clauses are other filters. Can be placed within queries that accept a filter.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html

I've not fleshed this one out in any detail as it will require more information about your specific indexes, mappings, data and queries.

Distinctly answered 25/6, 2014 at 20:12 Comment(4)
I think the first one might be what I'm loooking for. How is this with regards to performance? The multi search endpoint I mean. Is the query cached? How much faster is it than doing each query individually? I've updated my question to reflect the mapping information, shout if you want anything else! Thanks for your help!Cusack
It acts in a similar way to the bulk loading api - it's like a bulk query facility. You still have the cost of two queries but only 1 round trip and only 1 payload delivered back to the client, which can have significant positive performance impacts.Distinctly
Here both queries are perform individually.But if one query output is taken as input to another query .How can do this one within in a single apiMindimindless
doing it the first way, are the results deduplicated? can we have different sorting in each of these subqueries?Arel
B
0

Try using limit filter that limits number of docs returned

{
"filtered" : {
    "filter" : {
         "limit" : {"value" : 10}
     },
     "query" : {
        "term" : { "name.first" : "shay" }
    }
}
}

Change value to 2 to get 10 results and 3 to get 15

Borchert answered 25/6, 2014 at 13:3 Comment(0)
M
-4

You are asking for query?

(select * from tablename where advert = "Paid Advert" limit 10) union (select * from tablename where advert = "Free" limit 15);

of logic to generate limit per page?

Magellan answered 25/6, 2014 at 13:8 Comment(1)
This is a SQL query. Looking for an Elasticsearch query.Cusack

© 2022 - 2024 — McMap. All rights reserved.