Best way to import Google Cloud SQL data into BigQuery
Asked Answered
C

7

12

I have a database in a Cloud SQL instance. I would like to copy its content into BigQuery in order to perform analysis. It is not a requirement for me to continuously update the BigQuery dataset. It is OK if the export is done only once.

What is the best way to achieve this?

The 'Create Table' BigQuery UI does not allow me to import from Cloud SQL (only File, Cloud Storage, Drive or BigTable).

Cleveite answered 3/12, 2017 at 22:46 Comment(1)
I would probably just export it to csv into GCS, and load it into BigQuery from there. That would be the easiest. #27785243Patrolman
D
3

Up to now, there is no automated tool to import data into BigQuery from Cloud SQL, so a procedure you can follow consists in:

  1. Export the data from the table you want in your Cloud SQL instance in CSV format, as explained in the documentation.
  2. Import the CSV data into the BigQuery table you want following the procedure also explained in the documentation.

You are done. If your database is large and has many tables, you may want to do the import programatically, using the API.

Doralia answered 6/12, 2017 at 8:30 Comment(2)
Note that I get many import errors: "Error while reading data, error message: Error detected while parsing row starting at position: 387. Error: Bad character (ASCII 0) encountered. (error code: invalid)"Cleveite
This answer is now outdated and should no longer be the accepted answer. @CleveiteGuadiana
G
13

BigQuery can directly query Cloud SQL through Cloud SQL federated queries. It introduces a new SQL function called EXTERNAL_QUERY(connection_id, external_sql), which run the external_sql in the Cloud SQL database specified by connection_id.

You need to first create connection in BigQuery, then refer the connection_id in EXTERNAL_QUERY(). Following is a sample query to copy Cloud SQL data to BigQuery.

INSERT
  demo.customers (column1)
SELECT
   * 
FROM
   EXTERNAL_QUERY("project.us.connection",
                  "SELECT column1 FROM mysql_table;");
Grout answered 15/9, 2019 at 19:20 Comment(2)
Note: Federated queries may not perform as well as querying data residing in BigQuery storage.Postprandial
This saves me DAYS of work (instead of implementing some batch or stream approach to loading the data to BigQuery) 🥹💗 Thank you so much!!! (And thanks GCP for this feature!!)Knucklehead
F
7

After creating a connection to your CloudSQL server, you can use it to create a table from BigQuery in a single query.

CREATE TABLE CUSTOMER AS
SELECT * FROM EXTERNAL_QUERY("<your_connection_id>", "SELECT * FROM CUSTOMER");
Famish answered 13/2, 2020 at 14:54 Comment(0)
D
3

Up to now, there is no automated tool to import data into BigQuery from Cloud SQL, so a procedure you can follow consists in:

  1. Export the data from the table you want in your Cloud SQL instance in CSV format, as explained in the documentation.
  2. Import the CSV data into the BigQuery table you want following the procedure also explained in the documentation.

You are done. If your database is large and has many tables, you may want to do the import programatically, using the API.

Doralia answered 6/12, 2017 at 8:30 Comment(2)
Note that I get many import errors: "Error while reading data, error message: Error detected while parsing row starting at position: 387. Error: Bad character (ASCII 0) encountered. (error code: invalid)"Cleveite
This answer is now outdated and should no longer be the accepted answer. @CleveiteGuadiana
Z
0

Updated solution: In beta now, you can use Cloud Data Fusion to do this very easily (supporting MySQL and SQL Server for now).

Zecchino answered 28/4, 2021 at 19:7 Comment(0)
A
0

GCP now allows simple replicaiton fo CloudSQL data to BigQuery via their DataStream Product: https://cloud.google.com/datastream/docs/quickstart-replication-to-bigquery

This is far more straightforward than other method mentioned in answers here.

Acrobatic answered 7/3 at 18:38 Comment(0)
U
0

Through CloudSQL you can write a query in BigQuery to pull all the rows of a specific table you want:

SELECT * FROM EXTERNAL_QUERY("{connection string}",'''SELECT * FROM SQL_DB.users''')

Then use the Schedule feature which select a table to pass the generated rows to and run the query on a schedule. If you set it to overwrite the target table, it will replace the table each time.

enter image description here

Uniformize answered 23/5 at 16:23 Comment(0)
B
-1

If you're looking to export data from Cloud SQL to BigQuery but are running into limitations with BigQuery's UI options, you might want to explore a tool like Hevo Data. It can streamline the process by directly transferring data from your Cloud SQL instance to BigQuery, even if it's just a one-time migration. This way, you bypass the manual upload steps and ensure a smoother data transfer. If you want to understand more about this process or have specific requirements, Hevo's documentation might offer the guidance you need.

Blamed answered 5/9 at 12:11 Comment(1)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewDagney

© 2022 - 2024 — McMap. All rights reserved.