google protobuf maximum size
Asked Answered
E

2

43

I have some repeating elements in my protobuf message. At runtime the length of the message could be anything - I see some questions already asked like this one - [1]: Maximum serialized Protobuf message size

  1. I have a slightly different question here. If my JMS (Java Messaging service) provider (in this case my weblogic or tibco jms server) doesn't have any size limit on the max message size, will protocol buffer compiler complain at all about the maximum message size ?
  2. Does the performance of encoding/decoding suffer horribly at large sizes (around ~10MB)..?
Envelope answered 7/12, 2015 at 8:2 Comment(0)
C
90

10MB is pushing it but you'll probably be OK.

Protobuf has a hard limit of 2GB, because many implementations use 32-bit signed arithmetic. For security reasons, many implementations (especially the Google-provided ones) impose a size limit of 64MB by default, although you can increase this limit manually if you need to.

The implementation will not "slow down" with large messages per se, but the problem is that you must always parse an entire message at once before you can start using any of the content. This means the entire message must fit into RAM (keeping in mind that after parsing the in-memory message objects are much larger than the original serialized message), and even if you only care about one field you have to wait for the whole thing to parse.

Generally I recommend trying to limit yourself to 1MB as a rule of thumb. Beyond that, think about splitting the message up into multiple chunks that can be parsed independently. However, every application -- for some, 10MB is no big deal, for others 1MB is already way too large. You'll have to profile your own app to find out.

I've actually seen cases where people were happy sending messages larger than 1GB, so... it "works".

On a side note, Cap'n Proto has a very similar design to Protobuf but can support messages up to 2^64 bytes (2^32 segments of 4GB each), and it actually does allow you to read one field from the message without parsing the whole message (if it's in a file on disk, use mmap() to avoid reading the whole thing in).

(Disclosure: I'm the author of Cap'n Proto as well as most of Google's open source Protobuf code.)

Crossways answered 9/12, 2015 at 18:47 Comment(10)
What if I have few ints and 1 large byte array in protobuf. Is it still a problem?Chit
@MuditJain The parser will allocate a second byte array and copy the bytes from your message into that byte array. The copy should be pretty fast but it's still a copy. If you instead wrote a protobuf followed by your byte array (without putting the byte array into the message itself), you could probably make it faster -- but less convenient. You should measure the performance and decide whether you need it to be faster.Crossways
I have a large table with 10 million rows of binary blob data. Due to the map reduce infrastructure, I have to serialize this table on multiple independent machines eg each machine outputs 100K rows. My initial idea to serialize this was to produce on each machine (row_binary_size_uint64, actual_binary_blob), (row_binary_size_uint64, actual_binary_blob), ...and just concat data produced on each machine. Now for future extensibility, i want to convert each row to protobuf format(my original question). So is there a way to detect boundaries of protobuf message and send that to parser?Chit
Protobuf is not self-delimiting, so you'll still need to write a size before each message. By the way, for the use case of a large binary blob embedded inside a message, Cap'n Proto works a lot better than Protobuf, since it is zero-copy. If there is a good Cap'n Proto implementation for the language you are working in, you might want to try it.Crossways
@KentonVarda "For security reasons, many implementations (especially the Google-provided ones) impose a size limit of 64MB by default" - can you elaborate on these security reasons?Whenever
@Whenever First, the memory usage of a parsed object is much larger than the size of the original message. If you let people send you messages of any size, they could send you a very large message which, when parsed, exhausts your memory and crashes your service. Second, the Protobuf implementation uses 32-bit integers in many places, so it's important that sizes not get anywhere near big enough to overflow 32-bit integers.Crossways
@KentonVarda in case the entire message does not fit into the RAM, could we have truncated message? Because maybe I'm facing this issue in my case #67035751Erena
@Erena You could write a protobuf decoder which accepts partial messages, sure. In fact, I think many of them will do that already. E.g. the C++ parser will return false if the message was truncated, but all the data up until the message ended will have been parsed and will be available in the message object. (Note that I haven't worked on Protobuf in 10 years, so if you're asking for a new feature to be added, you're asking the wrong person.)Crossways
Hi, could you please share how exactly would one increase the size limit from 64MB to something bigger?Henni
@EricHua There should be an option for that somewhere in the parser library API, but I don't remember exactly where, sorry.Crossways
L
4
  1. I don't think the protobuf compiler will ever complain about message sizes. Atleast not until you get to the 18 exabyte maximum of uint64_t.

  2. For most implementations, performance starts to suffer at the point where the message cannot fit into RAM at once. So 10 MB should be fine, 10 GB not. Another possible issue is if you don't need all of the data - protobuf does not support random access, so you need to decode the whole message even if you only need a part of it.

Lomond answered 7/12, 2015 at 15:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.