Cost of continuous replications vs one-shot replications (using TouchDB and Cloudant)
Asked Answered
B

1

7

We have an app that uses Cloudant as a remote server. Nevertheless, Cloudant is not completely compatible with TouchDB's continuous replications from previous experience. So our alternative for now is to trigger manually one-shot replications at a fixed frequency. Nevertheless, we would like to know if that approach is going to cost us more money than continuous replications, since continuous replications use longpoll and doesn't need to query the server often. In other words, does one-shot pull replications with Cloudant as the target cost us a GET request?

Thank you, Paul

Benildas answered 15/7, 2013 at 13:55 Comment(2)
"Cloudant is not compatible with TouchDB's continuous replications." Why not?Whilst
Mike referred to the incompatibility issue in his reply. It's an open issue that is not resolved yet. Neither the TouchDB authors or Cloudant support was able to determine what is the origin of the problem, but we managed to reproduce it with relative ease.Benildas
P
8

I think the issue you refer to is [1]. Cloudant's replication is 100% compatible with CouchDB. In this instance, TouchDB's logs indicate the iOS network stack passed on incomplete JSON to TouchDB. It's not clear who was to blame in this case for the replication failure.

[1] https://github.com/couchbaselabs/TouchDB-iOS/issues/241

For the cost question, a one-shot pull replication will result in a GET to the _changes feed each time it happens, plus the other requests required to replicate. This _changes request will be counted as a light HTTP request against your Cloudant account.

However, whether this works out as more or fewer requests overall depends on the number of changes coming down from the remote server.

It's also important to remember that the number of _changes calls are very small relative to the number of other calls involved (e.g., getting the content of the changes themselves and particularly if there are many attachments).

While this question is specific to TouchDB, and I mention specific behaviours of that codebase, this answer deals with the requests involved in replication between any two systems speaking the CouchDB replication protocol[2].

[2] http://www.dataprotocols.org/en/latest/couchdb_replication.html

Let's take a contrived example: 1 update per 10 second window to the source database for the replication, where a TouchDB database is the target. Let's take a 5 minute poll vs. a continuous replication. For simplicity of call-counting, let's also take attachments out of the picture. We'll also assume the device has a constant network connection.

For the continuous case, every 10s TouchDB will receive an update in the _changes feed. This causes the longpoll connection to close. TouchDB then runs through the changes, requesting the updates from the source database; one or more GET requests on the remote server. While this is happening, TouchDB has to open up another longpoll request to _changes. So in a five minute period, you'd end up with perhaps 30 calls to _changes, plus all the calls to get documents and record checkpoints.

Compare this with a one-shot replication every five minutes. You'd receive notification of the 30 updates in one _changes feed call. TouchDB implements an optimisation[3] whereby it will call _all_docs to get updated documents for 1- revs, so you might end up with a single call to get all 30 documents (not possible in the continuous case as you've received a single change). Then you've the checkpoint documents to record. At best fewer than 5 HTTP calls, at most about a third of the continuous case as you've avoided extra _changes requests.

[3] https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm#performance

It comes down to the frequency of updates you expect to the source database. One-shot replication is likely to provide a smoother price curve as you're in better control of the number of requests you make.

A further question is how often connections will drop because of the network disconnects which happen regularly with mobile devices. TouchDB's continuous replications will fire back up each time the user comes on line (if added via the _replicator database). This is a further source of unpredictable costs.

However, the benefits from more immediate visibility of changes may certainly be worth the uncertainty.

Prevot answered 15/7, 2013 at 15:29 Comment(5)
Thank you very much for the very detailed answer. Nevertheless, I was under the impression that the longpoll only closed when there is a change to pull. I did not know that it received an update telling it there is nothing the pull, which closes the longpoll. I kind of recall testing a continuous pull with a proxy and it only seemed to trigger requests when I did indeed change data on the server side, but I might be wrong. So, in other words, you confirm that every 10 seconds, the longpoll is closed even if there is nothing to pull, forcing it to trigger another GET request to re-open it?Benildas
The longpoll request has a timeout (by default 60s) after which time it will reset if there have been no changes. This is configurable on a per-replication basis - see docs.couchdb.org/en/latest/changes.html.Inborn
In case that there is no changes every 10 seconds, does that change the cost calculation in favor of continuous replication since the longpoll timeouts are pretty long? Also, in practice, wouldn't a longer timeout makes continuous replications less expensive? What are the negative aspects of longer timeouts?Benildas
It depends on your workload. A continuous replication will still perform individual GET requests in response to a change compared to the potential benefits of batch fetch when using one-shot replication. If writes are rare then a longer timeout will reduce the number of requests required by continuous replication at the cost of the server being unable to free up connections held by unresponsive clients. Be aware that 60 seconds is (by default in CouchDB and, I believe, on Cloudant) the maximum timeout you can specify.Inborn
airpaulg: the request isn't closed every 10s for longpoll; in my example an update happened to the database which would appear in _changes and close the connection every 10s (which agrees with your thinking). In addition, if you specify a heartbeat on _changes, which TouchDB does, the default 60s timeout Will mentioned is disregarded. The negatives of long timeouts are primarily in device battery life as holding the radios open is very power-hungry (in fact, that's a big reason to poll every few mins if possible).Prevot

© 2022 - 2024 — McMap. All rights reserved.