Can Azure Cognitive Search be used as a primary database for some data?
Asked Answered
M

1

22

Microsoft promotes Azure Search as "cloud search", but doesn't necessarily say it's a "database" or "data storage". It stops short of saying it's big data.

Can/should Azure Search be used as the primary database for some data? Or should there always be some "primary" datastore that is "duplicated" in Azure Search for search purposes?

If so, under what circumstances/what scenarios does it make sense to use Azure Search as a primary database?

Macrae answered 18/10, 2016 at 6:36 Comment(0)
C
41

Although we generally don't recommend it, you might consider using Azure Search as a primary store if:

  1. Your app can tolerate some data inconsistency. Azure Search is eventually consistent.
    • When you index data, it is not available for querying immediately.
    • Currently there is no mechanism to control concurrent updates to the same document in an index.
    • When reading data using search queries, paging is not based on any kind of snapshot, so you may get missing or duplicated documents.
  2. You don't need to read out the entire contents of your index. Paging in Azure Search relies on the $skip parameter, which is currently capped at a maximum of 100000. For indexes larger than 100000 documents, it can be very tricky to read all your data out. You'll need to pick some field to partition on, and your reads have no consistency guarantees.
  3. In case of accidental deletion, you are ok with losing your data. Azure Search does not support backup/restore as of the time of this writing. If you accidentally delete your data, you will need to re-index it from its original source.
  4. You won't need to change your index definition much. Modifying or removing fields from your index currently requires re-indexing all your data (you can add new fields without re-indexing). If Azure Search is your primary store, your only option may be to try to read all the data from your old index into a new one, which is subject to all the aforementioned limitations around consistency, $skip, etc.
  5. Your application's query needs match the features that Azure Search provides. Azure Search supports full-text search, facets, and a subset of the OData filter language, but it does not support things like joins between indexes or arbitrary aggregations. If your app needs different query features than what Azure Search provides, you should consider another NoSQL solution like Azure Cosmos DB.
  6. Your application can tolerate high write latency. Since it is a search engine and not a general-purpose DB, Azure Search is optimized heavily for query performance (especially full-text search queries). This comes at the cost of slower write performance, since every write requires a lot of work to index the data. In particular, you will get the best write throughput by batching indexing actions together (batches can contain up to 1000 indexing actions). Writing documents one at a time to the index will result in much lower throughput.

Note that many of these are areas where we want to improve Azure Search in the future for the sake of manageability and ease of use, but it has never been our goal to make Azure Search a general-purpose NoSQL database.

Capriccioso answered 18/10, 2016 at 19:11 Comment(8)
Bruce, thanks so much for this detailed answer. In particular, I am considering (ok already implemented as a prototype) using it for the backing store of an email system - so basically storing all emails in there for all users. Commonly emails are stored in the file system anyway. I think all your bullet points match up nicely with the needs of email. The only thing that concerns me is latency under load. What kind of latency should I expect? What do you think about instead using Azure table storage as the main data store (easily retrieving data by descending date)...Macrae
...and just using Azure Search for searches?Macrae
As someone with thousands of emails piling up, I can imagine wanting capabilities for managing email that Azure Search doesn't support, like deleting based on a query (you'd have to explicitly list all IDs to delete otherwise). When I open a folder hit "select all" then "delete", I expect it to be fast. Regarding indexing latency, it's going to depend on your service topology and load. My point is that it won't scale well if you're indexing one doc at a time, which you would have to for email. I'd master the emails elsewhere.Capriccioso
Thanks Bruce. And what do you think about mastering them in Azure Table Storage?Macrae
Just realized Table Storage has 1 MB limit per record so that won't work!Macrae
I ended up mastering them in table storage in json format, with time-sorted "index" in table storage and searchable index in azure search.Macrae
Actually ended up "mastering" them in blob storage as json blob files, with a quick access summary in table storage (sorted by time descending), and searchable indexed in an azure search. Works a treat! Though it is three stores that I have to keep track of.Macrae
Depending on what your search scenario is, you might find Cosmos to be a better primary store that can also do much of the searching quickly (2 for 1). We had a moderately complex search scenario (80+ fields, 10 facets, 5 text fields) about a year back that we'd architected in Azure Search successfully and the MS architects encouraged us to use Cosmos instead. We benchmarked both and the performance is very similar, with Cosmos offering better control over consistency and redundancy.Autunite

© 2022 - 2024 — McMap. All rights reserved.