How to trigger or check status of chained map reduce (dbcopy)
Asked Answered
H

1

6

With standard CouchDB view indexes, I have flexibility and introspection into staleness vs. freshness. How do I get the analogous functionality for Cloudant's dbcopy feature?

CouchDB view query freshness

  • current index on disk, possibly stale: stale=ok
  • current index on disk, but trigger updating: stale=update_after
  • up to date index, even if that requires updating the index: leave off stale flag (a.k.a. stale=false)

CouchDB view freshness introspection

I can compare the DB's update_seq with the design doc's update_seq, which can be obtained with update_seq=true in a view query or from GET /db/_design/foo/_info.

BigCouch caveats

This is slightly clouded by BigCouch's DB partitioning and multiple servers. E.g. update_seq is a composite and should be only compared within a tolerance range; stale=false might choose a different shard than stale=ok which might be more or less up to date; although there isn't a way to get the update_seq for all nodes (or for the specific node(s) that would be chosen by stale=false queries) it can be cheated by quickly issuing multiple /db/_design/foo/_info queries. It would be nice to have additional shard/partition introspection here, but the above still works for my purposes.

Cloudant's dbcopy

dbcopy has roughly the same "eventual consistency" characteristics. Querying the docs in chained DB is roughly analogous to querying the origin view with group=true&stale=ok. Which is fine, most of the time. But the documentation doesn't give any pointers on the following:

  • How can I query the current dbcopy state? E.g. Does the DB consider itself up to date or are view changes waiting their turn in the IOQ? If it's not up to date, roughly how stale is it?
  • How can I trigger or bump up the priority of the dbcopy (as in stale=update_after or stale=false). E.g. I want something along the lines of POST /origin_db/_design/foo/_view/bar/_dbcopy that will forcibly push the reduced results to the dbcopy DB immediately (optionally updating the origin view first).
  • If the chained DB somehow gets out of sync (e.g. documents are deleted or updated directly in the DB rather than by the dbcopy mechanism or the dbcopy mechanism misses a few documents), can this be detected? How can it be corrected? Is there a dbcopy "reset button"?
Humane answered 16/5, 2013 at 15:40 Comment(0)
A
3

How can I query the current dbcopy state? E.g. Does the DB consider itself up to date or are view changes waiting their turn in the IOQ? If it's not up to date, roughly how stale is it?

We are looking into a better method, but at the moment the only way to get the dbcopy's current state is to compare the records in the view to the documents in the target database.

Is there a dbcopy "reset button"?

You can re-trigger the dbcopy by forcing a rebuild of the source view. This can be done by updating the design doc so that the view signature changes - e.g. adding whitespace or comments to an existing view. This is somewhat inelegant but would result in the dbcopy being re-run.

Actinoid answered 21/5, 2013 at 15:27 Comment(1)
Thanks. I look forward to any future better introspection methods. Alas, changing the source view is worse than just inelegant; it can also be very slow. Likewise, comparing the rows in the view to the targer DB rows can also be very slow (although we do have a helper method that does just that). The main reason we're using dbcopy is because the data set is often too large to slurp down in its entirety. :)Humane

© 2022 - 2024 — McMap. All rights reserved.