Google Bigtable vs BigQuery for storing large number of events
Asked Answered
B

7

24

Background

We'd like to store our immutable events in a (preferably) managed service. Average size of one event is less than 1 Kb and we have between 1-5 events per second. The main reason for storing these events is to be able to replay them (perhaps using table scanning) once we create future services that might be interested in these events. Since we're in the Google Cloud we're obviously looking at Google's services as first choice.

I suspect that Bigtable would be a good fit for this but according to the price calculator it'll cost us more than 1400 USD per month (which to us is a big deal):

enter image description here

Looking at something like BigQuery renders a price of 3 USD per month (if I'm not missing something essential):

enter image description here

Even though a schema-less database would be better suited for us we would be fine with essentially storing our events as a blob with some metadata.

Questions

Could we use BigQuery for this instead of Bigtable to reduce costs? For example BigQuery has something called streaming inserts which to me seems like something we could use. Is there anything that'll bite us in the short or long term that I might not be aware of if going down this route?

Butterfield answered 23/12, 2015 at 14:14 Comment(3)
You are not missing essential, BQ is extremely "cheap".Hydropic
BigQuery is optimized for long-term storage and analytics, BigTable for heavy usage by an online appEmiliaemiliaromagna
Not sure but could be limits in terms of operations. E.g think you can only do like 1k append to a table per day (that was one BQ api limit I hit a while ago). Although I think the streaming api is more forgiving. Just might be another dimension to consider.Scleroprotein
B
11

Bigtable is great for large (>= 1TB) mutable data sets. It has low latency under load and is managed by Google. In your case, I think you're on the right track with BigQuery.

Buckthorn answered 23/12, 2015 at 15:20 Comment(0)
T
9

FYI

Cloud Bigtable is not a relational database; it does not support SQL queries or joins, nor does it support multi-row transactions. Also, it is not a good solution for small amounts of data (< 1 TB).

Consider these cases: - If you need full SQL support for an online transaction processing (OLTP) system, consider Google Cloud SQL.

If you need interactive querying in an online analytical processing (OLAP) system, consider Google BigQuery.

If you need to store immutable blobs larger than 10 MB, such as large images or movies, consider Google Cloud Storage.

If you need to store highly structured objects, or if you require support for ACID transactions and SQL-like queries, consider Cloud Datastore.

Taper answered 17/1, 2017 at 8:12 Comment(0)
C
7

The overall cost boils down to how often you will 'query' the data. If it's an backup and you don't replay events too often, it'll be dirt cheap. However, if you need to replay it once daily, you start triggering the 5$/TB scanned too easily. We were surprised too how cheap inserts and storage were, but this is ofc because Google expects you to run expensive queries at some point in time on them. You'll have to design around a few things though. E.g. AFAIK streaming inserts have no guarantue's of being written to the table and you have to poll frequently on tail of list to see if it was really written. Tailing can be done efficiently with time range table decorator, though (not paying for scanning whole dataset).

If you don't care about order, you can even list a table for free. No need to run a 'query' then.

Convex answered 17/1, 2016 at 23:29 Comment(0)
F
6

Google Cloud - GCP database options decision flowchart

This flowchart may help in deciding between different Google cloud storage offerings (Disclaimer! copied this image from Google cloud's page)

If your usecase is a live database(let's say, backend of a website), BigTable is what you need (Still it's not really an OLTP system though) . If it is more of an data analytics/ datawarehouse kind of purpose, then BigQuery is what you need.

Think of OLTP vs OLAP; Or if you are familiar with Cassandra and Hadoop, BigTable roughly equates to Cassandra, BigQuery roughly equates to Hadoop (Agreed, not a fair comparison, but you get the idea)

https://cloud.google.com/images/storage-options/flowchart.svg

Please keep in mind that Bigtable is not a relational database, it's a noSQL solution without any SQL features like JOIN etc. If you want an RDBMS OLTP, you might need to look at cloudSQL (mysql/ postgres) or spanner.

Cloud spanner is relatively young, but is powerful and promising. At least, google marketing claims that it's features are best of both worlds (Traditional RDBMS and noSQL)

enter image description here

Cost Aspect

Cost aspect is already covered nicely here https://mcmap.net/q/205226/-google-bigtable-vs-bigquery-for-storing-large-number-of-events

I know this is very late answer, but adding it anyway incase it may help somebody else in future.

Fullbodied answered 23/8, 2018 at 15:43 Comment(0)
D
3

Hard to summarize better than it is already done by Google.

I think you need to figure out how you are going to use (replay) your data (events) and this can help you in making final decision.

So far, BigQuery looks like a best choice for you

Disembowel answered 23/12, 2015 at 15:49 Comment(0)
I
1

Bigtable is a distributed (run on clusters) database for applications that manage massive data. Its designed for massive unstructured data, scales horizontally and made of column families. It stores data in key value pairs as opposed to relational or structured databases.

BigQuery is a datawarehouse application. That means it provides connection to several data sources or streams such that they can be extracted, transformed and loaded into bigQuery table for further analysis. Unlike Bigtable, It does store data in structured tables and supports SQL queries.

Use cases; If you want to do analytics or business intelligence by deriving insights from collected data on from different sources (applications, research, surveys, feedback, logs etc...) of your organisation , you may want to pull all this information into one location. This location will most likely be a Bigquery data warehouse.

If you have an application that collects Big data, in other words massive information (High data volume) per time at higher speeds (High velocity) and in unstructured inconsistent forms with different data types as audio, text, video, images, etc... ( Variety and veracity), then your probable choice of database application for this app would be Bigtable.

Ingold answered 27/6, 2020 at 8:2 Comment(0)
C
0

Bigtable stores data in massively scalable tables, each of which is a sorted key/value map. The table is composed of rows, each of which typically describes a single entity, and columns, which contain individual values for each row.

On other hand, BigQuery is mostly used for structured data for analysis purpose and additionally it cant handle low latency writes as well as Big table.

Bigtable tables are sparse; if a column is not used in a particular row, it does not take up any space.

If your data is unstructured and if you miss any column in any row In Bigtable all values are stored in blocks of contiguous rows which saves your used space and so cost.

Contorted answered 11/12, 2023 at 9:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.