Couchbase : How to maintain arrays without duplicate elements?

I

2

5

We have a Couchbase store which has the Customer data.

Each customer has exactly one document in this bucket.
Daily transactions will result in making updates to this customer data.

Sample document. Let's focus on the purchased_product_ids array.

{
  "customer_id" : 1000
  "purchased_product_ids" : [1, 2, 3, 4, 5 ] 
      # in reality this is a big array - hundreds of elements
  ... 
  ... many other elements ...
  ...
} 

Existing purchased_product_ids : 
    [1, 2, 3, 4, 5]

products purchased today : 
    [1, 2, 3, 6]  // 6 is a new entry, others existing already

Expected result after the update: 
    [1, 2, 3, 4, 5, 6]

I am using Subdocument API to avoid large data transfer between server and clients.

Option1 "arrayAppend" :

customerBucket.mutateIn(customerKey)
    .arrayAppend("purchased_product_ids", JsonObject for [1,2,3,6] )
    .execute();

It results in duplicate elements. 
"purchased_product_ids" : [1, 2, 3, 4, 5, 1, 2, 3, 6]

Option2 "arrayAddUnique" :

customerBucket.mutateIn(customerKey)
    .arrayAddUnqiue("purchased_product_ids", 1 )
    .arrayAddUnqiue("purchased_product_ids", 2 )
    .arrayAddUnqiue("purchased_product_ids", 3 )
    .arrayAddUnqiue("purchased_product_ids", 6 )
    .execute();

It throws exception for most of the times, 
because those elements already existing.

Is there any better way to do this update ?

Insurer answered 28/11, 2018 at 10:21 Comment(0)

A

4

You could use N1QL, and the ARRAY_APPEND() and ARRAY_DISTINCT() functions.

UPDATE customer USE KEYS "foo" 
SET purchased_product_ids = ARRAY_DISTINCT(ARRAY_APPEND(purchased_product_ids, 9))

Presumably this would be a prepared statement and the key itself and the new value would be supplied as parameters.

Also, if you want to add multiple elements to the array at once, ARRAY_CONCAT() would be a better choice. More here:

https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/arrayfun.html

Alfonse answered 28/11, 2018 at 13:24 Comment(4)

thanks Johan. Seems to be simple if I go with N1QL route instead of SDK api's. If I use N1QL instead of SDK, am I going to miss anything on the performance or functionality ? Does that avoid server to client round trip data transfers like SubDocument api's do ? – Insurer 28/11, 2018 at 14:48

The document will not travel to the client if you launch a query of the form I use above. The execution happens in the query engine. The query engine will need to pull the document from whatever server it is stored on and write it back there, though, but that happens intra-cluster. – Alfonse 28/11, 2018 at 15:44

If you are using N1QL and you can also consider ARRAY_PUT(). docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/… – Aoudad 28/11, 2018 at 16:43

Generally speaking the SDK API will be faster than the N1QL route. – Jeffery 29/11, 2018 at 14:57

S

4

Do you need purchased_product_ids to be ordered? If not you can convert it to a map, e.g.

{
  "customer_id" : 1000
  "purchased_product_ids" : {1: {}, 3: {}, 5: {}, 2: {}, 4: {}}
}

and then write to that map with subdoc, knowing you won't be conflicting (assuming product IDs are unique):

customerBucket.mutateIn(customerKey)
   .upsert("purchased_product_ids.1", JsonObject.create()) // already exists
   .upsert("purchased_product_ids.6", JsonObject.create()) // new product
   .execute();

which will result in:

{
  "customer_id" : 1000
  "purchased_product_ids" : {1: {}, 3: {}, 6: {}, 5: {}, 2: {}, 4: {}}
}

(I've used JsonObject.create() as a placeholder here in case you need to associate additional information for each customer-order paid, but you could equally just write null. If you do need purchased_product_ids to be ordered, you can write the timestamp of the order, e.g. 1: {date: <TIMESTAMP>}, and then order it in code when you fetch.)

Saturnian answered 28/11, 2018 at 11:48 Comment(5)

So you suggest to use JSON dictionaries instead of JSON arrays. This solves my data update problem. Just one question though : Can we UNNEST dictionaries in N1QL like we do arrays ? (Most of our query patterns use N1QL UNNEST). Please help if there is a way to do that with dictionaries. Otherwise this data will not be much useful when we query it later. – Insurer 28/11, 2018 at 14:39

I'm not certain. Perhaps, instead of changing the data structure, it would be easiest to avoid subdoc and simply get() and replace() the document, using CAS to detect any concurrency issues? – Saturnian 28/11, 2018 at 15:12

Graham, in my case, compared to the transaction size, document size of a customer is so big. So, the "get, process and replace" will result in moving a much larger data(whole customer document) twice, for every transaction. But the changes I want to "push" is smaller, so that is the reason I went for subdoc. I will try the N1QL approach suggested by Johan. – Insurer 29/11, 2018 at 1:13

True, but if you know the document ID then it may still be faster (though admittedly higher bandwidth) to do a full document SDK 'get and replace' than the N1QL query, even with the two network round trips, as your app can send the requests directly to the correct data node, plus there's no index or query parsing involved. It will also reduce load on the query node(s). If this is a mission-critical part of the app and latency is crucial then it could be worth benchmarking both approaches. – Saturnian 29/11, 2018 at 18:5

you are correct. I need to check how each option behaves with a considerable load in the system. with few records it may be misleading. – Insurer 30/11, 2018 at 7:25

A

4