Why is the creation of Python protobuf messages so slow? - McMap

About

Why is the creation of Python protobuf messages so slow?

Asked 13/5, 2020 at 21:49 Answered 13/5, 2020 at 21:49

python protocol-buffers protoc protobuf-python

P

0

6

Say I have a message defined in test.proto as:

message TestMessage {
    int64 id = 1;
    string title = 2;
    string subtitle = 3;
    string description = 4;
}

And I use protoc to convert it to Python like so:

protoc --python_out=. test.proto

timeit for PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python:

from test_pb2 import TestMessage

%%timeit
tm = TestMessage()
tm.id = 1
tm.title = 'test title'
tm.subtitle = 'test subtitle'
tm.description = 'this is a test description'

6.75 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

timeit for PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp:

1.6 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Compare that to just a dict:

%%timeit
tm = dict(
    id=1,
    title='test title',
    subtitle='test subtitle',
    description='this is a test description'
)

308 ns ± 2.47 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is also only for one message. Protobuf cpp implementation is about 10.6µs for my full project.

Is there a way to make this faster? Perhaps compiling the output (test_pb2)?

Pomiculture answered 13/5, 2020 at 21:49 Comment(7)

Protocol buffers are widely-used, and pretty well-optimized already, so I doubt it. Also, you don't really "compile" a Python source file; you could use a different interpreter if you needed to (pypy, etc.). But in any case, do you have reason to believe that serialization specifically is a bottleneck in your application? – Alfie 13/5, 2020 at 23:32

@Alfie I was thinking there might be a way to output c++ and call those messages from python by building with setup.py somehow. It's a bottleneck for me because I'm parsing millions of rows of data into proto messages and it's taking 15+ hours – Pomiculture 14/5, 2020 at 0:47

Do you mean write a C++ executable to do the serialization, and then call that from Python? If so, that would be more expensive than what you have (you need to get the data from Python to C++, which is...serialization, plus process overhead). Have you tried the standard tools for parallelizing CPU-bound work, like ProcessPoolExecutor, joblib or similar? – Alfie 14/5, 2020 at 1:26

@Alfie I found this example which might be what I'm looking for yz.mit.edu/wp/fast-native-c-protocol-buffers-from-python – Pomiculture 14/5, 2020 at 1:48

What protobuf and python versions are you using? – Scrawny 30/8, 2020 at 20:17

@Scrawny I'm using Python 3.8 and Protobuf version 3.9.2 – Pomiculture 30/8, 2020 at 20:27

Hey, @BrendanMartin. Did you solve this issue? – Bane 7/2, 2022 at 16:50

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2025 — McMap. All rights reserved.