Return unique values by key in CouchDB
Asked Answered
S

3

10

Is there a way to do the following in CouchDB? A way to return unique, distinct values by a given key?

SELECT DISTINCT field FROM table WHERE key="key1"

'key1' => 'somevalue'
'key1' => 'somevalue'
'key2' => 'anotherval'
'key2' => 'andanother'
'key2' => 'andanother'

For example:

http://localhost:5984/database/_design/designdoc/_view/distinctview?key="key1" would return ['somevalue']

http://localhost:5984/database/_design/designdoc/_view/distinctview?key="key2" would return ['anotherval', 'andanother']

Schecter answered 28/3, 2011 at 8:58 Comment(3)
Could you add a bit more detail regarding your document structure? Are key1 and key2 completely different fields altogether? Are they somehow related?Dalliance
@Dominic - I took it to mean objects like { "key":"key1", "value":"somevalue" } and { "key":"key2", "value":"othervalue" } rather than key1 and key2 being different fields, but I agree that it is vague.Lovering
@Matt That would be more or less right, we've tried very similar solutions to the ones presented, but have had to contend with "reduce_overflow_error" issues.Schecter
M
11

As suggested in the CouchDB definitive guide, you should put the values you want to be unique in the key, then query the reduce function with group=true.

For example, given that keyfield is the field with "key1" and "key2" and valuefield is the field with the values, your map function could be:

function(doc) {
  // filter to get only the interesting documents: change as needed
  if (doc.keyfield && doc.valuefield) {
    /*
     * This is the important stuff:
     * 
     *  - by putting both, the key and the value, in the emitted key,
     *    you can filter out duplicates
     *    (simply group the results on the full key);
     * 
     *  - as a bonus, by emitting 1 as the value, you get the number
     *    of duplicates by using the `_sum` reduce function.
     */
    emit([doc.keyfield, doc.valuefield], 1);
  }
}

and your reduce function could be:

_sum

Then querying with group=true&startkey=["key2"]&endkey=["key2",{}] gives:

{"rows":[
{"key":["key2","anotherval"],"value":1},
{"key":["key2","andanother"],"value":2}
]}
Meraz answered 29/3, 2011 at 17:58 Comment(0)
D
3

Based on what I see here, (I'll change my answer if needed) key1 and key2 look like independent fields, so you'll need 2 separate views.

I created 5 simple documents in my test database:

// I've left out fields like _id and _rev for the sake of simplicity
{ "key1": "somevalue" }
{ "key1": "somevalue" }
{ "key2": "anotherval" }
{ "key2": "andanother" }
{ "key2": "andanother" }

Here are the 2 view queries you'll need:

// view for key1
function(doc) {
  if (doc.key1) {
    emit("key1", doc.key1);
  }
}

// view for key2
function(doc) {
  if (doc.key2) {
    emit("key2", doc.key2);
  }
}

From there, your reduce function can return all the values in an array by just doing this:

function (key, values) {
  return values;
}

However, you specifically mentioned distinct values. Since JavaScript doesn't have a native unique() method for arrays, and we can't use CommonJS modules in view functions, we'll have to add our own logic for that. I just copy-pasted the first array.unique() function I found on Google, you can write your own that is better optimized for sure.

function (key, values, rereduce) {
  var o = {}, i, l = values.length, r = [];

  for (i = 0; i < l; i += 1) {
    o[values[i]] = values[i];
  }

  for (i in o) {
    r.push(o[i]);
  }

  return r;
}

You'll use this same reduce function in both views. When you query any of those views, by default it will also perform the reduce. (You'll need to explicitly pass reduce=false to get just the results of your map function.

Here are the result-sets you'd retrieve using the above map/reduce queries: (remember they are 2 separate queries)

{"rows":[
  {"key":"key1", "value": ["somevalue"]}
]}

{"rows":[
  {"key": "key2", "value": ["anotherval", "andanother"]}
]}
Dalliance answered 28/3, 2011 at 14:51 Comment(4)
I tried creating a reduce function that simply returns the values array, but unfortunately couchdb responds with reduce_overflow_error. The reason give is, "Reduce output must shrink more rapidly...". Is there any way to work around this issue?Schuster
You have 3 options. 1. You can either set reduce_limit = false in your couchdb configuration. (least favorable option) 2. You could use the map function, then perform your reduce (to get unique values) on your client. (still not very favorable) 3. Rewrite your map function to be emit(doc.key1, null); and use group=true (probably the best option)Dalliance
According to the CouchDB Guide this is the incorrect way of doing things. That being said, i would LOVE to do things this way. But the "Reduce output must shrink more rapidly..." error really gets in the way. This error is thrown if the output size from the reduce function is not half the size of the input. Size here means json encoded strings. If the input size is 200 bytes, this constraint is ignored. Theoretically it's possible to game the system by passing huge amounts of data in the front end to meet the 1/2 constraint. seems like a waste to me!Cm
The CouchDB section about this is here: guide.couchdb.org/editions/1/en/views.html#example/3Cm
O
0
    map: (doc, emit) => {
         emit(doc.tag, doc.value);
    },
    reduce: (keys, values, rereduce) => {
    
         if (rereduce) {
              let res = {};
              values.forEach(v => {
                 res = {
                       ...res,
                       ...v
                     }
              })
          }
          else {
               const res = {};
               keys.forEach((key, idx) => res[key[0] + ':' + values[idx]] = true);
               return res;
          }
    
    }

yields the following:

key1:somevalue: true
key2:andanother: true
key2:anotherval: true

which one can just extract.

Operant answered 27/6, 2023 at 16:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.