Configure a Mongo replica set to only replicate certain collections

Asked 22/1, 2013 at 21:18 Answered 30/12, 2022 at 6:6

I have a ~3GB mongo database with several dozen collections. Three of these collections handle ~300 queries per second, while the rest sustain a much lower volume. I expect the traffic to continue to grow quickly.

I'd like to set up a replica set to handle the high-traffic collections. It isn't necessary for this new instance to replicate the rest of the database. Is this possible?

Whitney answered 22/1, 2013 at 21:18 Comment(0)

Seems like not possible at the moment by built-in features of mongodb and only way to do is to come up with your own manual replication algorithm or use some other tools written by third parties.

https://github.com/wordnik/wordnik-oss project might help you to achieve this according to the following post.

https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/Ap9V4ArGuFo

Describes workaround to filter documents in replication.

Replicate only documents where {'public':true} in MongoDB

Or just replicate the data yourself manually which might worth trying.

Good luck.

Branen answered 22/1, 2013 at 22:32 Comment(3)

Thanks for the links! "Filtered replication" sounds like the right phrase to keep on top of to see if this feature ever manifests. – Whitney 23/1, 2013 at 20:23

You are welcome =) and the community is quite active I am sure they will develop the needed features eventually =) – Branen 23/1, 2013 at 20:30

@Branen - with regards to your last comment, it is now May 2016. Do you know the options today? – Retroact 16/5, 2016 at 5:21

No that isn't possible now. What you could do is move those collections into another unreplicated database. But this will cause headaches once these collections see higher traffic too, so you would need to move them into your "replication"-db.

But in general Replication isn't the way to go if you need to scale, it's more considered for DR/failover. Replicaset Secondaries can only (optionally) answer read queries but no write queries, this is something you should keep in mind. So if you have high write load this may not cure your problem.
Once you allow your application to read from secondaries you need to live with eventual consistency, meaning that your application isn't guaranteed to see always the latest data. This is caused due to the asynchronous replication to the secondaries.
Indeed you can cure this problem if you configure your writeconcern, so that the write needs to succeeded on all replicas, before it's considered written and your driver returns. But this may slow down your write operations significant.

So for scaling query execution capabilities I would go with Sharding. This is possible on a per collection level, all unsharded collections will remain on a "default-shard".

Strepphon answered 23/1, 2013 at 9:59 Comment(0)

Not possible but then if the data size is so small and these collections aren't updated, then the only overhead of having them replicated is the small storage size on the secondary. That is a relatively small price to pay, especially since the collections won't grow in size, compared with writing your own replication logic.

Watery answered 23/1, 2013 at 16:6 Comment(0)

Instead of that archive the data, and have only the latest data set on the production server and the rest of the data can archive on the new server.

Auric answered 30/12, 2022 at 6:6 Comment(0)

Recommended topics

Hot tags