How to encode protocol buffer string to binary using protoc
Asked Answered
S

2

9

I been trying to encode strings using protoc cli utility. Noticed that output still contains plain text. What am i doing wrong?

osboxes@osboxes:~/proto/bin$ cat ./teststring.proto
syntax = "proto2";
message Test2 {
  optional string b = 2;
}

echo b:\"my_testing_string\"|./protoc --encode Test2 teststring.proto>result.out

result.out contains:

^R^Qmy_testing_string

protoc versions libprotoc 3.6.0 and libprotoc 2.5.0

Sixpack answered 28/6, 2018 at 16:6 Comment(7)
Are you sure it isn't working fine? Displaying it to the console is bound to cause problems - the console is text, not binary. But pipe it to a file, and it'll probably be right. You can test at protogen.marcgravell.com/decode - just upload your test file there and see what it makes of itPetrology
@MarcGravell I think it was exactly what i did in the example above... Piping the output of encoding to the file result.outSixpack
Ok, my mistake. Now: you're displaying text - what are the hex of that file? Looking at it as text is doomed to failure. Note that since protobuf encodes strings as utf8, it is expected that your text appears "as is". What is interesting to me here is the first 6 (or so) bytes of the file. As hex, not as characters.Petrology
Note - I think the decode page above will display the hex if you upload a filePetrology
@MarcGravell Thanks! You right. The decode page above indeed shows expected hex.Sixpack
I was under impression that protobuf encoding has sort of compression which makes the transferred binary messages much smaller when comparing to JSONSixpack
that isn't compression as such - just an efficient framing protocol. But yes, protobuf will always be smaller than JSON - I don't think there's any scenario in which JSON can be smaller for any field. Plus it is computationally efficient, too. Compare the json: {"b":2} - that's 7 bytes, and probably much more if you have real names (not just b). In protobuf that would usually be 2 bytes: 1 byte for the field header and data type tag, one byte for the value encoded as "varint". Additionally, JSON decoder has lots of text parsing to do - much more intensive than a dense binary protocol.Petrology
P
3

Just to formalize in an answer:

The command as written should be fine; the output is protobuf binary - it just resembles text because protobuf uses utf-8 to encode strings, and your content is dominated by a string. However, despite this: the file isn't actually text, and you should usually use a hex viewer or similar if you need to inspect it.

If you want to understand the internals of a file, https://protogen.marcgravell.com/decode is a good resource - it rips an input file or hex string following the protocol rules, and tells you what each byte means (field headers, length prefixes, payloads, etc).

I'm guessing your file is actually:

(hex) 10 11 6D 79 5F etc

i.e. 0x10 = "field 2, length prefixed", 0x11 = 17 (the payload length, encoded as varint), then "my_testing_string" encoded as 17 bytes of UTF8.

Petrology answered 28/6, 2018 at 23:23 Comment(0)
G
3
protoc --proto_path=${protobuf_path} --encode=${protobuf_message} ${protobuf_file} < ${source_file} > ${output_file}

and in this case:

protoc --proto_path=~/proto/bin --encode="Test2" ~/proto/bin/teststring.proto < ${source.txt} > ./output.bin

or:

cat b:\"my_testing_string\" | protoc --proto_path=~/proto/bin --encode="Test2" ~/proto/bin/teststring.proto > ./output.bin
Gamic answered 12/11, 2019 at 5:28 Comment(1)
You mean echo b:\"my_testing_string\" not cat b:\"my_testing_string\", right?Mccray

© 2022 - 2024 — McMap. All rights reserved.