Practical Limits of ElasticSearch + Cassandra
Asked Answered
T

4

24

I am planning on using ElasticSearch to index my Cassandra database. I am wondering if anyone has seen the practical limits of ElasticSearch. Do things get slow in the petabyte range? Also, has anyone has any problems using ElasticSearch to index Cassandra?

Thievish answered 15/6, 2011 at 14:31 Comment(0)
E
25

See this thread from 2011, which mentions ElasticSearch configurations with 1700 shards each of 200GB, which would be in the 1/3 petabyte range. I would expect that the architecture of ElasticSearch would support almost limitless horizontal scalability, because each shard index works separately from all other shards.

The practical limits (which would apply to any other solution as well) include the time needed to actually load that much data in the first place. Managing a Cassandra cluster (or any other distributed datastore) of that size will also involve significant workload just for maintenance, load balancing etc.

Enteron answered 21/6, 2011 at 7:5 Comment(1)
Thank you DNA for your response. It was quite helpful.Thievish
E
13

Sonian is the company kimchy alludes to in that thread. We have over a petabyte on AWS across multiple ES clusters. There isn't a technical limitation to how far horizontally you can scale ES, but as DNA mentioned there are practical problems. The biggest by far is network. It applies to every distributed data storage. You can only move so much across the wire at a time. When ES has to recover from a failure, it has to move data. The best option is to use smaller shards across more nodes (more concurrent transfer), but you risk a higher rate of failure and exhorbitant cost per byte.

Englishism answered 1/5, 2012 at 17:12 Comment(0)
E
0

AS DNA mentioned, 1700 shards, but it is not 1700 shards but there are 1700 indexes each with 1 shard and 1 replica. So it is quite possible that these 1700 indexes are not present on single machine but are split around multiple machines. So this is never a problem

Epexegesis answered 12/6, 2014 at 13:48 Comment(0)
R
-1

I am currently starting working with Elisandra (Elasticsearch + Cassandra)

I am also, having problems to index Cassandra with elasticsearch. My problem is basically the node configuration.

Doing $ nodetool status you can see Host ID and then ruining:

curl -XGET http://localhost:9200/_cluster/state/?pretty=true

You can check that one of the node: is the same name as Host ID

Romeo answered 10/11, 2017 at 10:42 Comment(1)
this not an answerBrunella

© 2022 - 2024 — McMap. All rights reserved.