Best NoSQL for filtering on multiple indexes/fields
Asked Answered
C

3

6

Because of the size of the data that needs to be queried and ability to scale as needed on multiple nodes, I am considering using some type of NoSQL db. I have been researching numerous NoSQL offerings but can't yet decide on what would be the best option which would provide best performance, scalability and features for our data structure.

Data structure model is of a product catalog where each document/set contains certain properties and descriptions for the that individual product. Properties would vary from product to product which is why schema-less offering would work the best.

Sample structure would be like

[
 {"name": "item name",
  "cost": 563.34,
  "category": "computer",
  "manufacturer: "sony",
.
.
.
 }
]

So requirement is that I need to be able to filter/query on many different data set fields/indexes in the record set, where I could filter on and exclude multiple indexes/fields in the same query. Queries will be mostly reads and there would not be much of a need for any joins or relationship type of linking.

I have looked into: Elastic Search, mongodb, OrientDB, Couchbase and Aerospike.

  • Elastic Search seems like an obvious choice, but I was wondering on the performance and it's stability?
  • Aerospike seems like it would be really fast since it does it all mostly in memory but it's filtering and searching capability didn't seem that capable

What do you think best option would be for my use case? or if there any other recommended DBs that I should look into.

I know that best way is to test the performance with the actual real life use case, but I am hoping to first narrow it down little bit.

Thank you

Catbird answered 4/2, 2015 at 16:56 Comment(0)
D
6

This is a variant on the popular question "what is the best product" :)

As always: this depends on your specific use case and goals. Database products (like all products) are always the result of trade-offs. So there does NOT exist a single product offering best performance, scalability and features. However there are many very good products for your use case.

Because your question is about Product Data and I am working with Product Data for more than 15 years, it will try to answer your question.

  • A document model is a perfect fit for Product Data. So for all use cases other than simple look up I would recommend a Document Store
  • If your use case concerns a single application and you are using the Java platform. I would recommend to use an embedded database. This makes things simpler and has a big performance advantage
  • If you need faceted search or other advance product search, i recommend you to use SOLR or Elastic Search
  • If you need a distributed system I recommend Elastic Search over SOLR
  • If you need Product recommendations based on reviews or other graph oriented algorithms, I recommend to use OrientDB or ArangoDB (or Neo4J, but in this case this would be my second choice)

Products we are using in Production or evaluated in depth for the use case you describe are

  • SOLR and ES. Both extremely well engineered products. Both (also ES) mature and stable products
  • Neo4J. Most mature graph database. Big advantage IMO is the awesome query language they use. Integrated Lucene engine. Very mature and well engineered product. Disadvantage is the fact that it is not a Document Graph but Property (key-value) Graph. Also it can be expensive
  • MongoDB. Our first experience with Document store. Very good product. Big advantage: excellent documentation, (by far) most popular NoSQL database
  • OrientDB and ArangoDB. Both support the Graph/Document paradigm. This are less known products, but very powerful. Because we are a Java based shop, our preference goes to OrientDB. OrientDB has a Lucene engine integrated (although the implementation is quite simple). ArangoDB on the other hand has very good documentation and a very smart and efficient storage format and finally the AQL is also very nice!
  • Performance: (tested with 11.43 mio Articles and 2.3 mio products). All products are very fast, especially SOLR and ES in this use case. Embedded OrientDB is also mind blowing fast for import and simple queries. For faceted search only the Search Servers provide real fast performance!
  • Bottom line: I would go for a Graph/Document store and/or Search Server (SOLR or ES). Because you mentioned "filtering" (I assume faceted search). The Search Server is the obvious first choice
Decomposer answered 5/2, 2015 at 9:28 Comment(1)
Yea it seems to me that either OrientDB or ES would be a best choice. Thank you for your detailed answer.Catbird
D
0

OrientDB supports composite indexes on multiple fields. Example:

CREATE INDEX Product_idx ON Product (name, category, manufacturer) unique

SELECT FROM INDEX:Product_idx WHERE key = ["Donald Knuth", "computer"]

You could also create a FULL-TEXT index by using all the power of Lucene as engine.

Dorsad answered 4/2, 2015 at 22:49 Comment(0)
D
0

Aerospike is a key-value store, not an document database. A document database would do such field-level indexing and deeper searching into a nested object better. The secondary indexes in Aerospike currently (version 3.4.x) work on string and integer 'bins' (a concept similar to a document's field or a SQL table's column).

That said, the list and map complex types of Aerospike are being augmented with those capabilities, in work being done in this quarter. Keep an eye out for those changes in the upcoming releases. You'll be able to index and query on bins of type list and map.


Deforce answered 5/2, 2015 at 0:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.