How to query a multi index in RethinkDB over an array of objects
Asked Answered
E

2

8

I'm working with a data set that looks something like this:

"bitrates": [
  {
    "format":  "mp3" ,
    "rate":  "128K"
  } ,
  {
    "format":  "aac" ,
    "rate":  "192K"
  }
] ,
"details": [ ... ] ,
"id": 1 ,
"name":  "For Those About To Rock We Salute You" ,
"price": 1026 ,
"requires_shipping": false ,
"sku":  "ALBUM-1" 
}

And I wanted to create a secondary index on bitrates, flexing {multi:true}. This was my attempt:

r.db("music").table("catalog").indexCreate("bitrates", {multi: true})

The index built just fine, but when I query it, nothing returns - which seems contrary to every example I've read here:

http://rethinkdb.com/docs/secondary-indexes/javascript/

The query I wrote is this:

r.db("music").table("catalog").getAll(["mp3", "128K"], {index : "bitrates"})

There is no error, just 0 results (and I have 300 or so documents with this exact data).

I'm using RethinkDB 2.0 RC1.

Ensconce answered 2/4, 2015 at 14:18 Comment(0)
N
13

When you create an index for a column, the values in the column are used literally as the keys of the index. In your case, the keys for your bitrates index would be the objects within the bitrates array in the document.

It seems like what you want is an index that's derived from the values in a field of the document. To do that, you want to define a custom indexing function that reduces the document to just the data you care about. The easiest way to experiment with them is to start by writing a query, and once you're happy with the results, converting it into an indexCreate() statement.

Here's a statement that grabs your sample document (with id 1), and plucks the format and rate terms from all of the objects in its bitrate array, and then merges them together to create a distinct set of strings:

r.db('music').table('catalog').get(1).do(function(row) {
  return row('bitrates').map(function(bitrate) {
    return [bitrate('format'), bitrate('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  })
})

Running this statement will return the following:

["mp3", "128K", "aac", "192K"]

This looks good, so we can use our function to create an index. In this case, since we're expecting the indexing function to return a set of items, we also want to specify {multi: true} to ensure we can query by the items in the set, not the set itself:

r.db('music').table('catalog').indexCreate('bitrates', function(row) {
  return row('bitrates').map(function(bitrate) {
    return [bitrate('format'), bitrate('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  })
}, {multi: true})

Once created, you can query your index like this:

r.db('music').table('catalog').getAll('mp3', {index: 'bitrates'})

You can also supply multiple query terms, to match rows that match any of the items:

r.db('music').table('catalog').getAll('mp3', '128K', {index: 'bitrates'})

However, if a single document matches more than one term in your query, it will be returned more than once. To fix this, add distinct():

r.db('music').table('catalog').getAll('mp3', '128K', {index: 'bitrates'}).distinct()

If necessary, you might also consider using downcase() to normalize the casing of the terms used in the secondary index.

You could also skip all of the indexing business entirely and use a filter() query:

r.db('music').table('catalog').filter(function(row) {
  return row('bitrates').map(function(bitrates) {
    return [bitrates('format'), bitrates('rate')];
  }).reduce(function(left, right) {
    return left.setUnion(right);
  }).contains('mp3');
})

That said, if you're almost always querying your table in the same manner, generating a secondary index using a custom function will result in dramatically better performance.

Nurse answered 2/4, 2015 at 15:8 Comment(1)
Thanks Nate - great answer! I think I should have added in my description that my goal was to use a multi index on an array of objects and it sounds like that's not possible with a straight up {multi:true}. This is more of an exploratory thing I'm doing - either way your answer is stellar thank you :)Ensconce
T
1

Keys of secondary indexes can't be objects right now:

> r.table('foo').indexCreate('bitrates', {multi: true})
> r.table('foo').getAll({format: "mp3", rate: "128K"}, {index: 'bitrates'})
RqlRuntimeError: Secondary keys must be a number, string, bool, pseudotype, or array

You can track this issue at https://github.com/rethinkdb/rethinkdb/issues/2773 .

For a work-around, you can do this:

> r.table('foo').indexCreate('bitrates', function(row){
    return row('bitrates').map(function(bitrate){return bitrate.coerceTo('array');})
  }, {multi: true});
> r.table('foo').getAll(r.expr({format: "mp3", rate: "128K"}).coerceTo('array'), {index: 'bitrates'})
Thremmatology answered 2/4, 2015 at 19:31 Comment(2)
Thanks... user359whatever :). As mentioned I'm using 2.0 RC1, which doesn't do what you're suggesting above - it works just fine without the error.Ensconce
Rob -- your post said r.db("music").table("catalog").getAll(["mp3", "128K"], {index : "bitrates"}) didn't produce an error, does r.table('foo').getAll({format: "mp3", rate: "128K"}, {index: 'bitrates'}) not produce an error either?Thremmatology

© 2022 - 2024 — McMap. All rights reserved.