Protocol buffer/JSON Schema to SQL DDL
Asked Answered
I

1

7

We use protocol buffers(protobuf) to define our schemas internally and have translators to translate to various other internal representations such as JSON Schema(Protobuf to JSON schema converter) so as to have protobuf schemas to be the source of truth.

We are also planning on creating SQL DDLs from the protobuf schemas and couldn't find a reliable, well supported converter for that. Some of the ones we have found such as jsonutils don't seem to be active.

SQL to protobuf generators are plenty(Example: https://github.com/jackskj/protoc-gen-map/)

We want a simple converter to translate protobuf schemas to CREATE statements on protobuf schema creation and potentially ALTER statements when new columns are added to protobuf schema(we don't support changing field types, or deleting fields in our protobuf schemas) in SnowSQL. Nested messages can be stored as variant and doesn't have to be flattened out. Are there any tools out there or should we write a custom tool for this?

message Foo {
  string uuid = 1;
  MessageType1 message_type1 = 2;
  MessageType2 message_type2 = 3;
  google.protobuf.Timestamp updated_at = 4;
  float some_num = 5;
}

gets translated to

CREATE TABLE FOO (
    uuid string,
    message_type1 variant,
    message_type2 variant,
    updated_at TIMESTAMP_LTZ(9),
    some_num float
);

Update to a schema

message Foo {
  string uuid = 1;
  MessageType1 message_type1 = 2;
  MessageType2 message_type2 = 3;
  google.protobuf.Timestamp updated_at = 4;
  float some_num = 5;
  string new_field = 6;
}

will translate to

ALTER TABLE FOO ADD COLUMN new_field string;

or

CREATE TABLE FOO (
    uuid string,
    message_type1 variant,
    message_type2 variant,
    updated_at TIMESTAMP_LTZ(9),
    some_num float,
    new_field string
);
Inform answered 25/10, 2021 at 18:17 Comment(5)
We are looking to solve this same problem. Were you able to find any existing solutions, or did you end up writing a custom tool? I think writing a custom protoc plugin could accomplish this, and would also allow for custom annotations to designate additional information (primary keys, etc.).Entasis
I ended up writing a custom tool to translate protobuf message to SQL. We use a few message/field options in this case. We can accomplish more via these options, as suggested by you.Inform
@Inform is it open source? How has the experience been so far?Curia
@Pristan since we couldn't find any open source tool/lib, I wrote a custom proto to DDL converter myself, no issues so far since our schema evolutions are append only.Inform
@Inform That's great! You should consider publishing that as an open source tool!Propeller
K
0

There are a few options that I can see:

  1. Use Snowflake Schema change tool
  2. Code a stored procedure which will produce the RECREATE/ALTER DDL You can find an example here

Usually, you would like to control the attributes in your tables/views. A good approach would be to have the JSON as VARIANT in the RAW. Then use dbt to convert into your CLEAN layer where you bring all the wanted columns. If you store your dbt code in git repository, you would have some data governance with this approach.

Kristine answered 20/3, 2024 at 16:27 Comment(1)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewShastashastra

© 2022 - 2025 — McMap. All rights reserved.