When should I NOT use App Engine's Full Text Search API?

Asked 8/6, 2012 at 15:40 Answered 26/11, 2016 at 20:58

Solved google-app-engine full-text-search

So far, I've used App Engine's Full Text Search to help search through existing entities in my datastore. This involves creating at least one Document per entity, and linking the two together somehow. And every time I change the entity, I must change the corresponding Documents.

My question is, why not just store all my data in Documents and forget about Datastore entities? The search API supports a much richer query language that can handle multiple inequality filters and boolean operators, unlike the datastore.

Am I missing something about the design of the search API that would preclude using it to replace the Datastore entirely?

Vitrics answered 8/6, 2012 at 15:40 Comment(0)

According to the Java docs

However, an index search can find no more than 10,000 matching documents. The App Engine Datastore may be more appropriate for applications that need to retrieve very large result sets.

Though I don't see that as a common use case.

More realistically, getting entities by key will be a lot cheaper with the Datastore (presumably faster as well). With the search API, you can either use Index.get() to find a document by ID, or duplicate the ID by storing it in a field and searching on that field.

Here's a cost breakdown:

- Index.get():     $0.10 /  10,000 or 0.00001 per get
- Index.search():  $0.13 /  10,000 or 0.000013 per get
- Datastore get(): $0.06 / 100,000 or 0.0000006 per get

As you can see, a Datastore get is much cheaper than the Search API options (16x cheaper than Index.get()).

If your data is structured in a way that makes use of a lot of direct gets and few complex searches, the Datastore will be a clear winner in terms of cost.

Note: I did not include the extra cost for storing duplicate data with the Index.search() method, since that depends on how many entities you store.

Anima answered 20/8, 2013 at 17:37 Comment(5)

Thanks, this is very helpful! and a good explanation of why Search might not be an appropriate drop-in Datastore replacement. – Vitrics 21/8, 2013 at 0:57

@pixel where did you see this limitation of 1000 APIs calls per day ? From what i understand this is only the free quota limit. – Beloved 21/12, 2013 at 13:31

@Beloved The document I linked mentions "These calls are subject to a daily limit of of 1,000 operations per day." but I think you are correct that this only applies to the free quota and this sentence is misleading, given that pricing is in 10k increments. I've edited my answer to remove that comment. – Anima 21/12, 2013 at 22:58

@pixel Also, correct me if im wrong, but if we are using only complex searches, the total number of table entities should also affect our decision. Thats because, according to the documentation, not only retreived entities are being billed but every entity that is being read. So if we are dealing with a humungous table this will cost us. – Beloved 21/12, 2013 at 23:36

@Beloved Not sure where you see that. In any case, only simple queries are needed for the situation described by the question. – Anima 22/12, 2013 at 0:44

Just put the data in both - the storage is cheap and depending how much writes your app does it could be cheap to do updates as well. For easy queries and getting single entities by key - use memcache and datastore. For complex queries use search api. You'll have to make the tradeoff once pricing is announced.

Achondrite answered 18/10, 2012 at 17:48 Comment(1)

This is what we are doing today, but it would still be good to know more about the design and intent of the search API. – Vitrics 18/10, 2012 at 21:51

right now indexing an entity in the searchdoc every time i put it and i also index a serialized version of the entity.
its actually much much faster searching for documents over the search api and extracting the serialized field than getting the same amount of entities from the datastore.

Diaconal answered 18/10, 2012 at 18:47 Comment(3)

That's interesting to know that the search API is much faster - is there any reason to expect it to be faster? Or maybe it's because it's still in limited trial and not a lot of users are hammering it yet? – Vitrics 18/10, 2012 at 21:50

Can you detail how you are measuring faster? Clearly if you just deserialize the entity from and existing search doc it will be faster than a datastore get. But do you measure the increase in search latency by having fatter documents? – Achondrite 19/10, 2012 at 1:2

i dont have numbers right now but aproximately fetching 1000 entities with one stringproperty only takes around 2s. searching for the same amount of entities (and returning the serialized field only) and load the serialized json for all those documents takes under 0.5s on an F1. – Diaconal 19/10, 2012 at 13:22

Wouldn't you:

lose any benefits of memcache
face lower quotas. "we expect that our free quota will cover about 1,000 searches per day once the feature has graduated from experimental" I can't see the number of reads you get but I believe it's higher for datastore. I looked at https://developers.google.com/appengine/docs/quotas#Resources

Also, for an entity update, we are charged differently by update or new put. It seems the indexes are not updated but rather added as a new document (that's what I'm doing anyway). Not having the details of index pricing, it's difficult to know exactly but perhaps updating one or two indexed values on an entity would be cheaper that putting a new whole index. It would depend on your data I guess.

Finally, the Total Index Size for indexes is now at 250M while data is capped at 1 GB. The datastore is larger then and no word yet on additional pricing costs for the index.
need to come up with a backup plan. I don't know anyway now to backup or restore the index if it got corrupted. Having the data in entities means the search index could be recreated. You can backup with the admin console for the datastore now.

Predominance answered 8/6, 2012 at 22:13 Comment(1)

Thanks for the answer. My responses: 1) Memcache is a totally separate system unrelated to datastore or search. 2) Current limited quotas are temporary. Updating+deleting Documents is possible (just like entities). 3) Good point about automatic backup; I'd expect them to support backing up Documents as well though, eventually. – Vitrics 9/6, 2012 at 15:30

In addition to performance costs for querying large sets of data, the datastore also has the advantage of allowing strongly consistent data. Take a look at this link for more information on strongly consistent vs. eventual consistent data.

It should be assumed that documents stored in the Search API indexes are eventually consistent.

Irfan answered 26/11, 2016 at 20:58 Comment(0)

Recommended topics

Hot tags