couchdb keeps growing (filesize)
Asked Answered
L

1

11

I'm very confused about the CouchDB behaviour in terms of database file-size on disk. It seems like it doesn't matter what I do, the database-file only gets bigger and bigger (even on deleting/purging documents or whole databases).

I watched my /var/lib/couchdb/_dbs.couch file and it never decreased in size ever. Simple example:

curl -X PUT http://admin:secretpassword@localhost:5984/testdb

_dbs.couch increased filesize by 5kb.

curl -X DELETE http://admin:secretpassword@localhost:5984/testdb

No changes in filesize. Even if I do filtered replications of Databases (filtering out deleted documents) or manually trigger a compaction, the disk file-size does not decrease. What's really confusing now is, that Fauxton actually shows reduced databases sizes after those actions, but it never reflects in the physical diskspace used.

I'm using pretty much a standard configuration after a fresh installation.

Is this "working like intended" or is there anything wrong here?

More importantly: Is there anything I can do about it?

Limpkin answered 31/1, 2018 at 22:59 Comment(2)
Have you checked this? smartregister.atlassian.net/wiki/spaces/Documentation/pages/…Claypool
Even tho the initial line sounds very sobering, yes. I tried pretty much everything what is suggested in this article, as mentioned here.Limpkin
G
10

It's working as intended, you're just not looking at the right files.

Each database has corresponding files with the same name.

For example with:

curl -X PUT http://admin:secretpassword@localhost:5984/testdb

curl -X PUT http://admin:secretpassword@localhost:5984/emaildb

  • Since you have a _dbs.couch file, you're probably using CouchDB 2.X.X with the sharding feature. It will create multiple files in subfolders of the "shards" folder.

data/ +-- shards/ | +-- 00000000-7fffffff/ | | -- emaildb.124456678.couch | | -- testdb.647948447.couch | +-- 80000000-ffffffff/ | | -- emaildb.124456678.couch |___|____-- testdb.647948447.couch

More infos: http://docs.couchdb.org/en/latest/cluster/sharding.html

  • In a nutshell, the sharding and cluster features allow you to have a distributed database with distributed map/reduce computation. In the above example, each dbs has 2 shards, which means each database spans over two files. Every new doc created can end up in one of those two. The disk usage won't be evenly distributed though. For example, if every doc is a small json doc, but one of them gets a 1GB attachment (http://docs.couchdb.org/en/latest/intro/api.html#attachments), only one shard will get a 1GB bump. The sharding is doc based. You can have 2 shards, you can have 20, and they don't all have to be on the same server (http://docs.couchdb.org/en/latest/cluster/theory.html). If you know that one server won't have enough disk space to hold all your data, you can set up 20 couchdb servers that will each hold 1 shard (around 1/20 of all the docs). Whether it's a single node in a basement, or a cluster of couchdb servers all over the world, for the client app (curl, pouchdb, firefox, etc), it's the same api.

  • The _dbs database (_dbs.couch) records informations for each dbs for cluster and shards management. Its size increases because each time you create and delete a database, it gets updated (Copy-On-Write). From CouchDB 2.1.0 and beyond, it will auto-compact. You can check the auto-compaction settings in your server's config.(in a browser: http://localhost:5984/_utils/#/_config/, compactions sections). Admin panel is on a different port: http://localhost:5986/_utils

  • The size reported in Fauxton is the "active size". Doesn't count deleted docs still on disk that will be deleted after compaction. curl http://localhost:5984/testdb will give additional informations, like the size on disk (http://docs.couchdb.org/en/latest/api/database/common.html#get--db).

Grass answered 4/2, 2018 at 4:24 Comment(4)
Very interesting, thanks for the insights. I still don't totally understand what is going on, but I have a much better clue now. For instance, I do have that shards folder with subfolders 00000000-1fffffff to e0000000-ffffffff, each containing a <dbsname.couch>` file. But not a single one of those files matches the filesize which is mentioned in Fauxton. Furthermore, I don't have a corresponding _dbs in Fauxton. Is that normal or do I have to manually create it?Limpkin
In CouchDB 1, Futon (Fauxton predecessor), used to report disk size. I suppose it switched to active data when the auto-compaction feature was added.Grass
So, with shards you can no longer just backup a single database file, but you have to copy/clone the hole shards folder I suppose? What if you replicate a database, will this re-create the whole shards too?Limpkin
You can also create a database with one shard (see documentation about sharding linked above).Grass

© 2022 - 2024 — McMap. All rights reserved.