RethinkDB - Find documents with missing field

// Get the ids of all documents missing "location" r.db("mydb").table("mytable").filter({location: null},{default: true}).pluck("id") // Get a count of all documents missing "location" r.db("mydb").table("mytable").filter({location: null},{default: true}).count()

A naive suggestion

You could use the hasFields method along with the not method on to filter out unwanted documents:

r.db("mydb").table("mytable")
  .filter(function (row) {
    return row.hasFields({ location: true }).not()
  })

This might or might not be faster, but it's worth trying.

Using a secondary index

Ideally, you'd want a way to make location a secondary index and then use getAll or between since queries using indexes are always faster. A way you could work around that is making all rows in your table have a value false value for their location, if they don't have a location. Then, you would create a secondary index for location. Finally, you can then query the table using getAll as much as you want!

Adding a location property to all fields without a location

For that, you'd need to first insert location: false into all rows without a location. You could do this as follows:

r.db("mydb").table("mytable")
  .filter(function (row) {
    return row.hasFields({ location: true }).not()
  })
  .update({
    location: false
  })

After this, you would need to find a way to insert location: false every time you add a document without a location.

Create secondary index for the table

Now that all documents have a location field, we can create a secondary index for location.

r.db("mydb").table("mytable")
 .indexCreate('location')

Keep in mind that you only have to add the { location: false } and create the index only once.

Use getAll

Now we can just use getAll to query documents using the location index.

r.db("mydb").table("mytable")
 .getAll(false, { index: 'location' })

This will probably be faster than the query above.

Using a secondary index (function)

You can also create a secondary index as a function. Basically, you create a function and then query the results of that function using getAll. This is probably easier and more straight-forward than what I proposed before.

Create the index

Here it is:

r.db("mydb").table("mytable")
 .indexCreate('has_location', 
   function(x) { return x.hasFields('location'); 
 })

Use getAll.

Here it is:

r.db("mydb").table("mytable")
 .getAll(false, { index: 'has_location' })

Recommended topics

Hot tags