Datastore vs Cloud SQL in Google App Engine
Asked Answered
L

3

17

I want to build an application that will serve a lot of people (more than 2 million) so I think that I should use Google Cloud Datastore. However I also know that there is an option to use Google Cloud SQL and still serve a lot of people using mySQL (like what Facebook and Youtube do).

Is this a correct assumption to use Datastore rather that the relational Cloud SQL with this many users? Thank you in advance

Lonergan answered 7/3, 2014 at 11:28 Comment(1)
Google cloud sql is prohibitively expensive and it's simply not feasible to use it unless you are creating a "test/play app", or unless your app doesn't need to store much data.Masterstroke
E
19

It is not strictly true that Facebook and YouTube are using MySQL to serve the majority of their content to the majority of their users. They both mainly use very large NoSQL stores (Cassandra and BigTable) for scalability, and probably use MySQL for smaller scale work that demands more complex relational storage. Try to use Datastore if you can, because you can start for free and will also save money when handling large volumes of data.

Ephebe answered 7/3, 2014 at 17:19 Comment(1)
actually my app will serve more than 2 million people and have about 50 thousand entity? so do you think that datastore is the best choice ?Lonergan
S
32

To give an intelligent answer, I would need to know a lot more about your app. But... I'll outline the biggest gotchas I've found...

Google Datastore is effectively a distributed hierarchical data store. To get the scalability they wanted there had to be some compromises. As a developer you will find that these are anywhere from easy to work around, difficult to work around, or impossible to work around. The latter is far more likely than you would ever assume.

If you are accustomed to relational databases and the ability to manipulate data across multiple tables within the same transaction, you are likely to pull your hair out with datastore. The biggest(?) gotcha is that transactions are only supported across a limited number of entity groups (5 at the current time). To give a simple example, say you had a simple parent-child relationship and you needed to update child records under more than 5 parents at the same time within a transaction... can't be done (yes, really). If you reorganize your data structures and try to put all of the former child records under a single entity so they can be updated in a single transaction, you will come across another limitation... the fact that you can't reliably update the same entity group more than once per second (yes, really). And if you query an entity type across parents without specifying the root entity of each, you will get what is euphemistically referred to as "eventual consistency"... which means it isn't (yes, really).

The above is all in Google's documentation, but you are likely to gloss over it if you are just getting started (of course it can handle it!).

Schenck answered 14/4, 2014 at 0:11 Comment(1)
It might be helpful to note that the limitations have changed, for example: the maximum number of entity groups that can be accessed in a transaction is now 25. You can find the most updated limitations here: cloud.google.com/datastore/docs/concepts/limitsGladwin
E
19

It is not strictly true that Facebook and YouTube are using MySQL to serve the majority of their content to the majority of their users. They both mainly use very large NoSQL stores (Cassandra and BigTable) for scalability, and probably use MySQL for smaller scale work that demands more complex relational storage. Try to use Datastore if you can, because you can start for free and will also save money when handling large volumes of data.

Ephebe answered 7/3, 2014 at 17:19 Comment(1)
actually my app will serve more than 2 million people and have about 50 thousand entity? so do you think that datastore is the best choice ?Lonergan
T
9

It depends on what you mean by 'a lot of people', what sort of data you have, and what you want to do with it.

Cloud SQL is designed for applications that need a SQL database, which can handle any query you can write in SQL, and ensures your data is always in a consistent state.

Cloud SQL can serve up to 3200 concurrent queries, depending on the tier. If the queries are simple and can be served from RAM they should take just a few ms, and assuming your users issue about 1 request per second, then it could support tens of thousands of simultaneously active users. If, however, they are doing more complex queries like searches, or writing a lot of data, then it will be less.

If you have a simple set of queries, are less concerned about immediate consistency, or expect much more traffic, then you should look at datastore.

Tolbert answered 7/3, 2014 at 17:24 Comment(2)
actually my app will serve more than 2 million people and have about 50 thousand entity? so do you think that datastore is the best choice ?Lonergan
As volumes of data become larger the efficiency of a distributed key-value system such as Datastore will outperform SQL increasingly in both speed and cost. This applies not only to AppEngine, but also to Azure and other cloud platforms, take a look. If you mean 2 million people with 50 thousand entities per person that would be 100 billion entities and definitely favours Datastore. If you mean 2 million people sharing the same 50 thousand entities then SQL is feasible, and you can use Memcache to improve performance. But then Memcache is more like Datastore than like SQL anyway.Ephebe

© 2022 - 2024 — McMap. All rights reserved.