Convert serialized protobuf output to python dictionary
Asked Answered
M

3

6

Given, a serialized protobuf (protocol buffer) output in the string format. I want to convert it to a python dictionary.

Suppose, this is the serialized protobuf, given as a python string:

person {
  info {
    name: John
    age: 20
    website: "https://mywebsite.com"
    eligible: True
  }
}

I want to convert the above python string to a python dictionary data, given as:

data = {
  "person": {
    "info": {
      "name": "John",
      "age": 20,
      "website": "https://mywebsite.com",
      "eligible": True,
    }
  }
}

I can write a python script to do the conversion, as follows:

  • Append commas on every line not ending with curly brackets.
  • Add an extra colon before the opening curly bracket.
  • Surround every individual key and value pair with quotes.
  • Finally, use the json.loads() method to convert it to a Python dictionary.

I wonder whether this conversion can be achieved using a simpler or a standard method, already available in protocol buffers. So, apart from the manual scripting using the steps I mentioned above, is there a better or a standard method available to convert the serialized protobuf output to a python dictionary?

Moor answered 20/6, 2021 at 12:58 Comment(5)
I've not tried this but you should be able to convert a message to a dictionary directly or through JSON. See MessageToDict: googleapis.dev/python/protobuf/latest/google/protobuf/…Jeremyjerez
@Jeremyjerez I tried it, but it didn't work. The possible reason for not working is that in the MessageToJson(message, ...), the message parameter is The protocol buffer message instance to serialize., so it's an instance of the protocol buffer. I need conversion from a plain string to JSON, not from a protocol buffer object.Moor
Yes, that would be a problem. This would be a better approach but evidently you don't the Protobuf message and you don't have the proto so you can't ParseFromString either. You have a binary string not a purely human-readable string and so you should take care converting it differently to JSON. See: developers.google.com/protocol-buffers/docs/encodingJeremyjerez
Have you found an answer to this problem?Selene
@Selene No, didn't find a straightforward solution. One way is to use regex for doing the conversion, a better and standard way might be to add some functionality in the service returning the protobuf to also return a JSON upon passing some flag. E.g: pass -json flag to get the json output, else the protobuf output.Moor
A
4

You can use proto's Message class.

In [6]: import proto

In [7]: curr
Out[7]:
campaign {
  resource_name: "customers/1234/campaigns/5678"
  id: 9876
  name: "testing 1, 2, 3"
  advertising_channel_type: SEARCH
}
landing_page_view {
  resource_name: "customers/1234/landingPageViews/1234567890"
  unexpanded_final_url: "https://www.example.com/"
}

In [8]: proto.Message.to_dict(
   ...:     curr,
   ...:     use_integers_for_enums=False,
   ...:     including_default_value_fields=False,
   ...:     preserving_proto_field_name=True
   ...: )
Out[8]:
{'campaign': {'resource_name': 'customers/1234/campaigns/5678',
  'advertising_channel_type': 'SEARCH',
  'name': 'testing 1, 2, 3',
  'id': '9876'},
 'landing_page_view': {'resource_name': 'customers/1234/landingPageViews/1234567890',
  'unexpanded_final_url': 'https://www.example.com/'}}

Note that in to_dict, all kwargs default to True.

There's also a to_json method if you just want to immediately serialise the message without having to use json.dumps.

A caveat that's also worth noting is the proto package's recent memory leaks. The thread says that a fix has been issued but my experience when using it on larger datasets suggested otherwise 😅. Just because something works locally, doesn't mean that the container you deploy it to can handle the same load.

Affirm answered 5/7, 2023 at 10:24 Comment(0)
P
1

You can use google package MessageToDict function to covert proto3 message to python dict.

The relevant arguments are:

use_integers_for_enums: If true, print integers instead of enum names.
descriptor_pool: A Descriptor Pool for resolving types. If None use the
        default.
preserving_proto_field_name: If True, use the original proto field
        names as defined in the .proto file. If False, convert the field
        names to lowerCamelCase.
including_default_value_fields: If True, singular primitive fields,
        repeated fields, and map fields will always be serialized.  If
        False, only serialize non-empty fields.  Singular message fields
        and oneof fields are not affected by this option.
from google.protobuf.json_format import MessageToDict
import test_pb2

python_dict = MessageToDict(
    message,
    preserving_proto_field_name=True,
    including_default_value_fields=True,
    descriptor_pool=test_pb2.test_output)["output_list"]

where my proto3 file has test_output as output from module.

Pullover answered 27/3 at 6:39 Comment(1)
This was already suggested in the original question comments, which OP said wouldn't work because "I need conversion from a plain string to JSON, not from a protocol buffer object."Sobersided
L
0

you can use MessageToDict

from google.protobuf.json_format import MessageToDict
....
....
message_obj = MessageToDict(protobuf_msg)
Litterbug answered 6/5 at 4:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.