Couchdb Mango Performance vs Map Reduce Views
Asked Answered
A

3

20

I've just noticed that in the release notes of Couchdb 2.0, it is mentionned that Mango queries are recommended for new applications. It is also mentionned that apparently Mango indexes are from 2x to x10 faster than javascript queries which really surprised me, as such I have a number of questions :

  • Are Map/Reduce views being phased out ? I'm expecting the answer to be no since it seems to me that Mango does not cover all the use cases of Map/Reduce (the easiest example being Reduce itself), and the flexibility of this querying style seems to be more limited too. But m prefer to ask because of the recommendation :

We recommend all new apps start using Mango as a default.

  • We know that Map/Reduce views rely on B-trees, but I can't find any insight, in the doc or the mailing list regarding the magic behind Mango. Mango essentially is white magic for me at the minute. Yet I can tell that having an in-depth knowledge of how the javascript views are indexed behind the scenes was massively helpful to avoid pitfalls, naive implementations as well as to optimize performances. Does anyone have any insight on how Mango works ? Are the indexes B-trees too ? When are the indexes updated since there is no longer design documents ? Where do the performance gains come from ? (these gains are counter-intuitive to me, since in my understanding, the performance of javascript queries came from the precomputed nature of Map functions)

What I'm essentially after is on the one hand some insight regarding Mango and on the other hand, an overview of how Mango and Map/Reduce are supposed to live together in the 2.x era.

Axes answered 22/11, 2016 at 23:12 Comment(0)
A
7

Answer from a core developer :

Some good questions. I don't think Mango will ever replace Map/Reduce completely. It is an alternative querying tool. What is great about the Mango query syntax is that it is a lot easier to understand and get started. And we can use it in a lot of places outside of just querying for documents. It can be used for replication filtering and the changes feed. We hope to soon have support for validation doc updates as well.

Underneath Mango is using erlang map/reduce. Which means it is creating a B-tree index just like map/reduce. What makes it faster is that it is using erlang/native functions to create the B-Tree instead of javascript. I wrote a blog post a long time ago about the internals of PouchDB-find [1] which is the mango syntax for PouchDB. It might help you understand a little more how the internals work. The key thing to understand is that there is a Map query part which uses the B-Tree and an in-memory filter. Ideally the less memory filtering you do the faster your query will be.

I would say that Mango is very much a work in process but the basic ground work is done. There are definitely things we can improve on. I've seen it used quite a bit when developers start a new project because its quick and simple to do basic querying, like find by email address or find all users with the name "John Rambo".

Hope that helps.

[1] http://www.redcometlabs.com/blog/2015/12/1/a-look-under-the-covers-of-pouchdb-find

Axes answered 12/4, 2017 at 12:53 Comment(1)
What is the source for this answer? I know it's from a core developer - who?Tropopause
A
13

I recently tried to switch my app over to using Mango queries, with the result of scrapping it completely and switching back to map/reduce. Here are a few of my reasons:

  1. Mango is buggy when dealing with queries that do not exactly specify the index to use. This one drove me batty for a while last weekend. If you don't specify the index, sometimes an alternate index will be selected and return no (or incorrect) results.
  2. Mango performance is not 'magic'. Many types of queries will end up doing in memory searches. Couch will select the best fit index then march through all those records in memory to fit the corner cases. Cloudant hand waves over some of these issues by saying to use 'text' based searches, which aren't available in Couchdb.
  3. As you pointed out, Mango searches simply cannot handle some types of query constructions well. I wouldn't consider my app to be overly complicated yet I ran into several situations where I could not construct a suitable Mango query for the task at hand. A major one here is searching arrays to find tags (for example, searching to see what users are members of a group). Mango cannot index array elements so resorts to doing full scans in memory.
  4. Views have some very powerful features for transformation of search results in the form of Lists. That doesn't exist in Mango.

Your mileage may vary, but just wanted to leave a warning that this is still quite new features.

Acton answered 2/12, 2016 at 8:30 Comment(1)
Views and lists make for an incredible combo. Sad to see lists being deprecated.Exhibitionist
A
7

Answer from a core developer :

Some good questions. I don't think Mango will ever replace Map/Reduce completely. It is an alternative querying tool. What is great about the Mango query syntax is that it is a lot easier to understand and get started. And we can use it in a lot of places outside of just querying for documents. It can be used for replication filtering and the changes feed. We hope to soon have support for validation doc updates as well.

Underneath Mango is using erlang map/reduce. Which means it is creating a B-tree index just like map/reduce. What makes it faster is that it is using erlang/native functions to create the B-Tree instead of javascript. I wrote a blog post a long time ago about the internals of PouchDB-find [1] which is the mango syntax for PouchDB. It might help you understand a little more how the internals work. The key thing to understand is that there is a Map query part which uses the B-Tree and an in-memory filter. Ideally the less memory filtering you do the faster your query will be.

I would say that Mango is very much a work in process but the basic ground work is done. There are definitely things we can improve on. I've seen it used quite a bit when developers start a new project because its quick and simple to do basic querying, like find by email address or find all users with the name "John Rambo".

Hope that helps.

[1] http://www.redcometlabs.com/blog/2015/12/1/a-look-under-the-covers-of-pouchdb-find

Axes answered 12/4, 2017 at 12:53 Comment(1)
What is the source for this answer? I know it's from a core developer - who?Tropopause
D
2

I am new to Mango and CouchDB but I think I can provide some insight. Once your index/view is updated, Mango is not any faster. The large performance gain with Mango is when you are creating the index for the first time because couch doesn't need to create a separate couchjs process for this.

I found that Mango works well even when some of your documents are large. Currently with CouchDB 2.0.0, at least with windows, large documents crash the couchjs.exe view server used with Map/Reduce. This is not the case with CouchDB 1.6.1 and is already fixed in the development version https://github.com/apache/couchdb-couch/commit/1659fda5dd1808f55946a637fc26c73913b57e96

Dilatant answered 25/2, 2017 at 16:17 Comment(1)
I can confirm this memory issue with large documents and couchjs has been a huge problem for me too.Axes

© 2022 - 2024 — McMap. All rights reserved.