Reading comments from .proto files using a Protocol Buffers descriptor object
Asked Answered
S

1

8

I am currently revisiting a project using Google Protocol Buffers.

In the project I want to make use of the features Descriptors and Reflection of Protocol Buffers.

The official documentation states that the comments of .proto files can be read:

  1. With the function DebugStringWithOptions(), called on a message or descriptor.
  2. With the function GetSourceLocation(), called on a descriptor.

I am unable to retrieve comments, so I think I am doing something completely wrong or that feature isn't fully implemented in Protocol Buffers, yet.

Here are some code snippets:

google::protobuf::DebugStringOptions options;
options.include_comments = true;
std::cout << "google::protobuf::Descriptor::DebugStringWithOptions(): "
          << message.descriptor()->DebugStringWithOptions(options) << std::endl
          << std::endl;

const google::protobuf::FieldDescriptor* field_descriptor{
    message.descriptor()->field(1)};

// TODO(wolters): Why doesn't this work?
google::protobuf::SourceLocation* source_location{
    new google::protobuf::SourceLocation};
field_descriptor->GetSourceLocation(source_location);

// if (field_descriptor->GetSourceLocation(source_location)) {
std::cout << "start_line: " << source_location->leading_comments
          << std::endl;
std::cout << "end_line: " << source_location->leading_comments << std::endl;
std::cout << "start_column: " << source_location->start_column << std::endl;
std::cout << "end_column: " << source_location->end_column << std::endl;
std::cout << "leading_comments: " << source_location->leading_comments
          << std::endl;
std::cout << "trailing_comments: " << source_location->trailing_comments
          << std::endl;
// }

I've tried using the following two syntaxes for comments in the .proto file, but none of them seems to work:

MessageHeader header = 1;  // The header of this `Message`.

/**
 * The header of this `Message`.
 */
MessageHeader header = 1;

I am using GCC 4.7.1 (with C++11 support enabled) and the latest Protocol Buffers version 3.0.0-alpha-4.1.

Can someone guide me into the correct direction and/or provide me an working example?

EDIT 2015-09-24:

After rearding the Self Describing Messages section in the official documentation and testing lots of stuff, it seems to me that I have a bit better understanding of protobuf descriptors.

Correct me if one or more of the following statements are incorrect:

  1. The SelfDescribingMessage proto is only useful if the other end does not know the .proto definitions.
  2. The only way to access the comments of the proto definition is by creating a .desc file using the protoc application.
  3. To obtain a comment, the GetSourceLocation member function can only be used if the "top" element is either FileDescriptorSet, FileDescriptorProto or FileDesriptor. If this is correct, Protocol Buffers has a poor API design, since the google::protobuf::Message class is a God Class (providing access to the complete file descriptor API, but the values are not provided at all).
  4. The call concrete_message.descriptor()->file() does not (and can not) contain source comments information, since it is not part of the compiled code.

It seems to me that the only way to make this work is:

  1. Invoke protoc for theMessage.proto file (which references all other messages) with the arguments:

    --include_imports --include_source_info and --descriptor_set_out=message.desc
    
  2. Ship the message.desc file together with the application/library to be able to read it during runtime (see below).

  3. Create a google::protobuf::FileDescriptorSet from that file.
  4. Iterate over all google::protobuf::FileDescriptorProto of the FileDescriptorSet.
  5. Convert each FileDescriptorProto to a google::protobuf::FileDescriptor using google::protobuf::DescriptorPool::BuildFile().
  6. Lookup the message and/or fields with one of the Find… functions, applied on the FileDescriptor instance.
  7. Call the function GetSourceLocation on the message/field descriptor instance.
  8. Read the comments via google::protobuf::SourceLocation::leading_comments and google::protobuf::SourceLocation::trailing_comments.

This seems pretty complicated to me, so I have two additional questions:

  1. Isn't there a way to include the source information without using a FileDescriptorSet?
  2. Is it possible to "connect"/set the FileDescriptorSet with a concrete Message class/instance, since that would drastically simplify things?

EDIT 2015-09-25: By God Class I mean that the Message class and/or the descriptor classes offer public functions that are more or less useless, since they provide no information when used by a client. Take a "normal" message for example: So the generated code does not contain source comment information, therefore the GetSourceLocation method in all descriptor classes (e.g. Descriptor and FieldDescriptor) is completely useless. From a logical perspective separate instances DescriptorLite and FieldDescriptorLite should be provided if dealing with messages and Descriptor and FieldDescriptor if dealing with information from a FileDescriptorSet (whose source is normally a .desc file generated from a .proto file). A [...]Lite class would then be the parent class of a "normal" class. The argument that protoc will possibly never include source comments, underlines my point.

By "connecting", I mean an API function to update the descriptor information in the message with the descriptor information from the .desc file (which is always a superset of the descriptors provided by the message, if I understood correctly).

Smithson answered 23/9, 2015 at 14:40 Comment(1)
Is it possible that it is related to the Protocol Buffers Compiler protoc? I've just stumbled upon the protoc arguments -o and --include_source_info. Do I have to create a FileDescriptorSet in order to retrieve comments?Smithson
D
4

It sounds like you've basically figured it out.

You're getting deep into APIs inside the protocol compiler which were not really designed for public consumption. It gets complicated because no one has written a helper layer to simplify things, because not many people use these features.

I am not sure what you mean about Message being a "God class". Message is merely the abstract interface for a protobuf instance. Descriptors describe types of protobuf instances. Message::getDescriptor() returns the type of a message, but beyond that there isn't much direct connection between these APIs...

Isn't there a way to include the source information without using a FileDescriptorSet?

The comments are intentionally stripped from the descriptors embedded into the generated code, so you need to run the parser separately, generate a descriptor set, and consume it dynamically.

Is it possible to "connect"/set the FileDescriptorSet with a concrete Message class/instance, since that would drastically simplify things?

Do you mean that you want Message::getDescriptor() to return a descriptor that includes the comment data from the source file? That would require that the comment data be embedded into generated code, which would be trivial for protoc to implement (it currently intentionally strips them out, so it would just have to not do that) but potentially bloat-y and dangerous (could reveal secrets for people shipping closed-source binaries built with protobufs).

Decalogue answered 25/9, 2015 at 5:40 Comment(2)
Thank you for your comment and the explaination! I've updated my question again for clarification. I will add a working example later, maybe someone can review it then.Smithson
@FlorianWolters Thanks for the clarification. I disagree with the suggestion, though: source locations are an obscure, rarely-used part of the descriptor interface. It doesn't make sense to add complexity to the class hierarchy specifically to distinguish descriptors with and without source info. Plus, some Message subclasses (like DynamicMessage) could actually have source info in their descriptors. (Of course, I don't work on protobufs anymore so it's not my call in any case.)Decalogue

© 2022 - 2024 — McMap. All rights reserved.