What is the most efficient way to store name/value pairs in a Marklogic database
Asked Answered
W

2

10

My application often needs to decorate values in the documents it serves using a lookup take to fetch human readable forms of various codes.

For example <product_code>PC001</product_code> would want to be returned as <product_code code='PC001'>Widgets</product_code>. It's not always product_code; there are a few different types of code that need similar behaviour (some of them having just a few dozen examples, some of them a few thousand.)

What I want to know is what is the most efficient way to store that data in the database? I can think of two possibilities:

1) One document per code type, with many elements:

<product-codes>
  <product-code code = "PC001">Widgets</product-code>
  <product-code code = "PC002">Wodgets</product-code>
  <product-code code = "PC003">Wudgets</product-code>
</product-codes>

2) One document per code, each containing a <product-code> element as above.

(Obviously, both options would include sensible indexes)

Is either of these noticeably faster than the other? Is there another, better option?

My feeling is that it's generally better to keep one 'thing' per document since it's conceptually slightly cleaner and (I understand) better suited to ML's indexing, but in this case that seems like it would lead to a very large number of very small files. Is that something I should worry about?

Worsen answered 14/3, 2013 at 17:16 Comment(0)
B
8

Anything that needs to be searched independently should be its own document or fragment. However, if you are just doing lookups then an element attribute range index should be very fast at returning values:

element-attribute-range-query(xs:QName('product-code'), xs:QName('code'), '=', 'PC001') 
=> 
Widgets

Using a range index the lookups will all occur from the same index regardless of how you chunk the documents. So unless you will need to use cts:search on product-code to retrieve the actual elements, it shouldn't matter how you chunk the documents.

Blotto answered 14/3, 2013 at 17:27 Comment(0)
L
7

Another approach is to store a map that represents the name-value pairs.

let $m := map:map()
let $_ := map:put($m, 'a', 'fubar')
return document { $m }

This returns an XML representation of the hashmap, which can be stored directly in the database using xdmp:document-insert. You can turn an XML map back into a native map using map:map as a constructor function. The native map could also be memoized using xdmp:set-server-field.

Lynxeyed answered 14/3, 2013 at 20:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.