Using etcd as primary store/database?

Asked 9/12, 2016 at 15:1 Answered 27/12, 2021 at 17:32

Can etcd be used as reliable database replacement? Since it is distributed and stores key/value pairs in a persistent way, it would be a great alternative nosql database. In addition, it has a great API. Can someone explain why this is not a thing?

Stagnate answered 9/12, 2016 at 15:1 Comment(3)

I am trying to see if I can use etcd (k8s CRDs) as database replacement, can you share your experience with etcd. See #52565631 – Gyronny 29/9, 2018 at 4:38

I found etcd especially useful to store config files / static files which need to be available all the time (like Kubernetes does and the name implies a distributed /etc folder => etc + d(istributed) = etcd). By running a multi-node etcd cluster, one can be sure files are available. I would say it highly depends on your use case and the data you want to store. Benchmarks show about 30k queries per second max on etcd. – Stagnate 7/10, 2018 at 22:37

I used etcd for all sorts of config data stuff, and did so for a long time. It's not a generic database, but rather, a key-value database. For data stores which need high-speed distributed access using a model which is based on retrieving values by key or range of keys, possibly with namespacing and granular access control, it's a great option. For models where there is frequent searching of records for a value containing a string, for example, it's not so great. Choose a data store based on how the data will be used. :) – Kyles 19/2, 2020 at 16:53

etcd

etcd is a highly available key-value store which Kubernetes uses for persistent storage of all of its objects like deployment, pod, service information.
etcd has high access control, that it can be accessed only using API in master node. Nodes in the cluster other than master do not have access to etcd store.

nosql database

There are currently more than than 255 nosql databases, which can be broadly classified into Key-Value based, Column based, Document based and Graph based. Considering etcd as an key-value store, lets see the available nosql key-value data stores.
Redis, memcached and memcacheDB are popular key-value stores. These are general-purpose distributed memory caching system often used to speed up dynamic database-driven websites by caching data and objects in memory.

Why etcd not an alternative

etcd cannot be stored in memory(ram) they can only be persisted in disk storage, whereas redis can be cached in ram and can also be persisted in disk.
etcd does not have various data types. It is made to store only kubernetes objects. But redis and other key-value stores have data-type flexibility.
etcd guarantees only high availabilty, but does not give you the fast querying and indexing. All the nosql key-value stores are built with the goal of fast querying and searching.

Eventhough it is obvious that etcd cannot be used as an alternative nosql database, I think the above explanation will prove it cannot be an suitable alternative.

Hedley answered 14/12, 2016 at 7:25 Comment(6)

"It is made to store only kubernetes objects" --> this is not true. Although Kubernetes is the one of the main customer of etcd, but that doesn't mean only kubernetes objects can be stored in etcd. etcd is more aiming to store data in distributed environment. – Pellikka 16/3, 2019 at 10:25

why do you state that "etcd has high access control, that it can be accessed only using API in master node. Nodes in the cluster other than master do not have access to etcd store". Deploying your own etcd is as esasy as deploying your own database and access can be provided to whichever entity you want ? – Vitrain 6/6, 2019 at 2:48

The cons here are all wrong, probably because the author has only worked with etcd in the context of Kubernetes. etcd works from memory, and only stores the journal on disk. etcd stores data (both key and value) as a binary array; the end user can apply whatever typing they want (often by storing values as JSON). And etcd uses a btree to index the keys, which is the same indexing that most any other DB uses on generic data. It doesn't use SQL, I suppose, but "queries and searches" appropriate for data in a key-value DB are extremely fast in etcd. – Kyles 19/2, 2020 at 16:49

This answer should not be concerned by anybody. The second part is completely wrong. – Lindner 22/10, 2020 at 11:54

I don't think the conclusion honors the whole discussion, I think etcd can be used as an alternative to a nonsql database, the real answer is it depends on your use case and what trade-offs are you willing to make – Abaft 16/11, 2021 at 14:24

This answer is plain wrong on a lot of points, not misinterpreting- just wrong. How has this not been removed or edited? – Casilde 19/3 at 11:46

From the ETCD.IO site:

etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node.

It has a simple interface using http and json. It is NOT just for Kubernetes. Kubernetes is just an example of a critical application that uses it.

You are right it should be a thing. A nice reliable data store with an easy to use API and a nice way of telling you when things change using raft protocol. This is great for feature toggles and other items where everything needs to know and is much better than things like putting a trigger in an sql database and getting it to send an event to an external application or really horrible polling.

So if you are writing something like the kubernetes use case >> it is perfect a well proven store for a distributed application.

If you are writing something very different to the kubernetes use case, then you are comparing with all the other no-sql databases. But is very different to something like mongodb so it may be better for you if mongodb or similar does not work for you.

Other example users

M3, a large-scale metrics platform for Prometheus created by Uber, uses etcd for rule storage and other functions

Consistency There is a nice comparison of NOSQL database consistency by Jepson at https://jepsen.io/analyses

ETCD sum up their result at https://etcd.io/blog/jepsen-343-results/

Kaiulani answered 12/3, 2021 at 20:20 Comment(0)

See if this checklist of limitations of etcd compared to a more full-featured database will work for you:

Your database size is going to be within 2 GB (extensible to max 8 GB)
No sharding and hence data scalability that NoSQL db clusters (Mongo, Redis,...) provide
Meant for simple value stores with payloads limited to 1.5 MB. Can be increased but impacts other queries. Most dbs can store large BLOBs. Redis can store a value of 512 MB.
No query language for more complex searches beyond key prefix. Other databases provide more complex data types like document, graph storage with querying and indexing. Even key-value db Redis supports more complex types through modules along with querying and search capabilities
No ACID transactions

Having a hammer, everything may look like a potential nail. You need to make sure it is indeed one.

Ophir answered 27/12, 2021 at 17:32 Comment(0)

The only answer I've come to see are those between our ears. Guess we need to show first that it can be done, and what the benefits are.

My colleagues seem to shy off it because "it's for storing secrets, and common truth". The etcd v3 revise made etcd capable of much more, but the news hasn't simply rippled down, yet.

Let's make some show cases, success stories. Personally, I like etcd because of the reasons you mentioned, and because of its focus on dependable performance.

Keratin answered 30/10, 2017 at 20:53 Comment(0)

First, no. Etcd is not the next nosql replacement. But there are some sort of scenarios, where it can come in handy.

Let's imagine you have (configuration) data, that is mostly static but may change on runtime. Maybe your frontend needs to know the backend endpoints based on the customers country to comply with legal and you know the world wide rollout is done in phases.

So you could just use a k8s configMap to store the array of data (country -> endpoint) and let your backend watch this configMap for changes. On change, the application just reads in the list and provides a repository to allow access to the data from your service layer. All operations need to be implemented in the repository (search, get, update, ...) but your data will be in memory (probably a linked hash map). So it will be very quick to retrieve (like a local cache).

If data get changed by the application just serialize the list and patch the configMap. Any other application watching the configMap will update their internal state. However there is no locking. So quick changes may result in race conditions.

etcd allows for 1Mb to be stored. That's enough for almost static data.

Another application might be feature toggles. They do not changed that much but when they do, every application needs to know quickly and polling sucks.

Leclaire answered 12/1, 2020 at 14:29 Comment(0)

Recommended topics

Hot tags