Is there a Python ElasticSearch client that supports asynchronous requests?

W

9

12

I'm looking for an ElasticSearch Python client that can make asynchronous requests. For example, I'd like to write this code,

query1_future = es.search('/foobar', query1_json)
query2_future = es.search('/baz', query2_json) # Submit query 2 right after query 1, don't wait for its response
query1 = query1_future.get()
query2 = query2_future.get()

However, I don't see any clients (PyES, or the official client, for example) supporting this. Further, the two I'm familiar with couple the request logic with the response processing logic, so modifying them myself seems difficult. Perhaps a sufficient interim solution would be to use the asynchronous version of Requests, grequests?

Also, it's worth pointing out that ElasticSearch's _msearch may be a better-performing option, but for real-world applications it'd require some code restructuring.

Watertight answered 9/10, 2013 at 18:56 Comment(0)

S

9

Just came across this question. There is an official asynchronous Elasticsearch client based on asyncio:

https://github.com/elastic/elasticsearch-py-async

Speculation answered 10/3, 2018 at 15:8 Comment(1)

Hey! I work at Elastic maintaining the Python client. The package mentioned above is deprecated, the official way to use Asyncio with Elasticsearch is via AsyncElasticsearch in the "elasticsearch" package: elasticsearch-py.readthedocs.io/en/master/async.html – Foregoing 4/8, 2020 at 22:25

W

3

You can also consider the following options to perform I/O without blocking main executing process using existent clients:

Use multithreading on Jython or IronPython (they do not have GIL and take advantage of multiple CPU cores)
Use ProcessPoolExecutor on Python3
Use gevent with sockets monkey pathching to force existent clients work with gevent sockets that actually makes the client asynchronous but also request some additional code to manage results

Gevent usage is the most lightweight (for RAM / CPU resources) and allows processing of the most intensive I/O, but it's also the most complex among the listed solutions. Also note that it works in the single process and to use advantage of multiple cores multiprocessing package should be used.

Whereabouts answered 25/1, 2014 at 0:5 Comment(0)

E

2

I've forked txes into txes2. It features a more PEP8 friendly interface, test coverage (unit and integration) and support for ES v1.x.

Still a work in progress, but probably a good choice for people using Twisted.

Effluvium answered 31/1, 2015 at 3:24 Comment(0)

L

1

There's this Tornado async client for ES.

Lizethlizette answered 27/1, 2014 at 16:50 Comment(0)

J

1

This is an older question, but now in 2019, there is official async wrapper package. https://github.com/elastic/elasticsearch-py-async

I have had success with using against ES 5.x but issue is the 5.x branch is not being maintained https://github.com/elastic/elasticsearch-py-async/issues/46

Joellejoellen answered 14/3, 2019 at 19:2 Comment(0)

O

0

I haven't used it yet, but I found this:

https://github.com/jkoelker/txes

Orta answered 6/2, 2014 at 3:53 Comment(0)

U

0

Twistes is a good library if you are using twisted

Unscramble answered 24/2, 2017 at 22:15 Comment(0)

S

0

I've created an async ElasticSearch ORM, which is based on Pydantic (v2.x), called ESORM

You can easily create model like this:

from esorm import ESModel

class User(ESModel):
    name: str
    age: int

or if you would like to use other ES field types:

from esorm.fields import byte, keyword, text

class User(ESModel):
    name: keyword
    age: byte
    cv: text

Nested documents:

class User(ESModel):
    name: text
    email: keyword
    age: byte = 18  # You can specify defauld values as in Pydantic

class Post(ESModel):
    title: text
    content: text
    writer: User  # User is a nested document

You can easily create mappings for your models, which will create your indices with the specified types automatically:

# Create indices and mappings
async def prepare_es():
    import models  # Import your models
    # Here models argument is not needed, but you can pass it to prevent unused import warning
    await setup_mappings(models)

Search documents:

async def query():
    users = await User.search(
        query={
            'bool': {
                'must': [{
                    'range': {
                        'age': {
                            'gte': 18
                        }
                    }
                }]
            }
        }
    )

Everything is annotated, type checked and autocompleted by the IDE. Even queries, because it uses TypedDict.

There are a tons of other features in it.

Shend answered 25/1 at 13:5 Comment(0)

M

-1

My suggestion is to just stick with CURLing everything. There are so many different methods, filters, and queries that various "wrappers" have a hard time recreating all the functionality. In my view, it is similar to using an ORM for databases...what you gain in ease of use you lose in flexibility/raw power.

Give CURL a try for a while and see how that treats you. You can use external JSON formatters to check your JSON, the mailing list to look for examples and the docs are ok if you use JSON.

Mcgehee answered 16/2, 2014 at 4:39 Comment(0)

Recommended topics

Hot tags