Protocol Buffers: How to parse a .proto file in Java
Asked Answered
D

4

10

I am trying to dynamically parse a given .proto file in Java to decode a Protobuf-encoded binary.

I have the following parsing method, in which the "proto" string contains the content of the .proto file:

public static Descriptors.FileDescriptor parseProto (String proto) throws InvalidProtocolBufferException, Descriptors.DescriptorValidationException {
        DescriptorProtos.FileDescriptorProto descriptorProto = DescriptorProtos.FileDescriptorProto.parseFrom(proto.getBytes());
        return Descriptors.FileDescriptor.buildFrom(descriptorProto, null);
}

Though, on execution the previous method throws an exception with the message "Protocol message tag had invalid wire type.". I use the example .proto file from Google so I guess it is valid: https://github.com/google/protobuf/blob/master/examples/addressbook.proto

Here is the stack trace:

15:43:24.707 [pool-1-thread-1] ERROR com.github.whiver.nifi.processor.ProtobufDecoderProcessor - ProtobufDecoderProcessor[id=42c8ab94-2d8a-491b-bd99-b4451d127ae0] Protocol message tag had invalid wire type.
com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.
    at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:115)
    at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:551)
    at com.google.protobuf.GeneratedMessageV3.parseUnknownField(GeneratedMessageV3.java:293)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.<init>(DescriptorProtos.java:88)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.<init>(DescriptorProtos.java:53)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet$1.parsePartialFrom(DescriptorProtos.java:773)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet$1.parsePartialFrom(DescriptorProtos.java:768)
    at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:163)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:197)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.parseFrom(DescriptorProtos.java:260)
    at com.github.whiver.nifi.parser.SchemaParser.parseProto(SchemaParser.java:9)
    at com.github.whiver.nifi.processor.ProtobufDecoderProcessor.lambda$onTrigger$0(ProtobufDecoderProcessor.java:103)
    at org.apache.nifi.util.MockProcessSession.write(MockProcessSession.java:895)
    at org.apache.nifi.util.MockProcessSession.write(MockProcessSession.java:62)
    at com.github.whiver.nifi.processor.ProtobufDecoderProcessor.onTrigger(ProtobufDecoderProcessor.java:100)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.util.StandardProcessorTestRunner$RunProcessor.call(StandardProcessorTestRunner.java:251)
    at org.apache.nifi.util.StandardProcessorTestRunner$RunProcessor.call(StandardProcessorTestRunner.java:245)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Any idea? Thank you!

Debose answered 4/12, 2017 at 13:57 Comment(0)
E
12

It looks like you're trying to use FileDescriptorSet.parseFrom to populate a FileDescriptorSet. This will only work if the bytes you're providing are the binary protobuf contents - which is to say: a compiled schema. You can get a compiled schema by using the protoc command-line-tool with the --descriptor_set_out option. What you're actually passing it right now is the text bytes that make up the text schema, which is not what parseFrom expects.

Without a compiled schema, you would need a runtime .proto parser. I'm not aware of one for Java; protobuf-net includes one (protobuf-net.Reflection), but that is C#/.NET. Without an available runtime .proto parser, you'd need to shell-execute protoc instead.

Elegy answered 4/12, 2017 at 14:21 Comment(10)
I see, this makes sense. I will try to find a way to compile my proto file. Thank you for your answer!Debose
@Debose it should just be protoc --descriptor_set_outElegy
Okay, thanks for the precision. However if someone has ever heard about a Java proto compiler, I'm still interested :)Debose
@Debose There probably isn't a Java implementation. The protocol compiler is more complicated than you might expect, and maintaining multiple implementations of the compiler in many languages wouldn't make a lot of sense. What you should try to do is arrange to have your .protos parsed using --descriptor_set_out offline, then send around the compiled descriptor set as needed, rather than try to parse the whole .proto file on-demand.Garey
Hey there @kenton, great to see you still chiming in on protobuf questions. And yes, I found some of those complications when I finally got around to writing a 100% c# implementation and running it through every .proto I could find including most of the public Google API surface :)Elegy
@Kenton I understand, I have found a workaround to avoid having to parse the .proto files. Btw, I have also found this project which seems to be a full implementation of Protobuf in Java and is said to support .proto files parsing: github.com/square/wire Maybe I'll give it a look.Debose
One correction FileDescriptorSet developers.google.com/protocol-buffers/docs/…Outmaneuver
@GeorgeCampbell taElegy
@MarcGravell can you show how to do it in protobuf-net? Task: given protobuf as a string, get the C# class by making a call in C# code.Excurvature
@Excurvature protobuf-net.Reflection is the library that has all the parsing and code-gen; here's the main file from protogen, which exposed this in a command-line interface like protoc: github.com/protobuf-net/protobuf-net/blob/main/src/protogen/… - look for .Process and .Generate. the same tools are also available via Roslyn plugins, a website, a "dotnet tool", etcElegy
S
2

Drawing from the other answers, here's a snippet of working Kotlin code from a library I'm developing. https://github.com/asarkar/okgrpc

private fun lookupProtos(
    protoPaths: List<String>,
    protoFile: String,
    tempDir: Path,
    resolved: MutableSet<String>
): List<DescriptorProtos.FileDescriptorProto> {
    val schema = generateSchema(protoPaths, protoFile, tempDir)
    return schema.fileList
        .filter { resolved.add(it.name) }
        .flatMap { fd ->
            fd.dependencyList
                .filterNot(resolved::contains)
                .flatMap { lookupProtos(protoPaths, it, tempDir, resolved) } + fd
        }
}

private fun generateSchema(
    protoPaths: List<String>,
    protoFile: String,
    tempDir: Path
): DescriptorProtos.FileDescriptorSet {
    val outFile = Files.createTempFile(tempDir, null, null)
    val stderr = ByteArrayOutputStream()
    val exitCode = Protoc.runProtoc(
        (protoPaths.map { "--proto_path=$it" } + listOf("--descriptor_set_out=$outFile", protoFile)).toTypedArray(),
        DevNull,
        stderr
    )
    if (exitCode != 0) {
        throw IllegalStateException("Failed to generate schema for: $protoFile")
    }
    return Files.newInputStream(outFile).use { DescriptorProtos.FileDescriptorSet.parseFrom(it) }
}

The idea is to use os72/protoc-jar to write out a compiled schema/file descriptor. Then use FileDescriptorSet.parseFrom to read that file, and recurse on its dependencies.

Sweetscented answered 26/11, 2020 at 11:21 Comment(0)
M
1

An alternaive to "shelling out" to exec protoc would be to use a .proto parser written in Java. There seem to be a few floating around - Google something like "proto parser in java". (I'm looking for one for an issue in my project).

Mayday answered 12/5, 2023 at 16:44 Comment(1)
Example script using the "wire-schema" library: gist.github.com/jmini/a0241af9a2f13a51532d1f1448ee38e6Flyblown
C
-1

Don't use java String to hold the protobuf payload. The issue is that String does translations behind the scenes, and makes assumptions about character sets.

Protobuf works on byte arrays, and the exact representation in the array has to be unchanged. Going to and from String does not work.

Champlain answered 4/12, 2017 at 14:6 Comment(2)
That depends on whether they're loading the data, vs loading a schema. A schema (in .proto format) is text.Elegy
As Bob said, I am trying to parse a text file so I guess String should not be a problem.Debose

© 2022 - 2024 — McMap. All rights reserved.