Why required and optional is removed in Protocol Buffers 3
Asked Answered
S

4

380

I'm recently using gRPC with proto3, and I've noticed that required and optional has been removed in new syntax.

Would anyone kindly explain why required/optional are removed in proto3? Such kind of constraints just seem necessary to make definition robust.

syntax proto2:

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3;
}

syntax proto3:

syntax = "proto3";
message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}
Shoshana answered 4/8, 2015 at 5:23 Comment(0)
T
645

The usefulness of required has been at the heart of many a debate and flame war. Large camps have existed on both sides. One camp liked guaranteeing a value was present and was willing to live with its limitations but the other camp felt required dangerous or unhelpful as it can't be safely added nor removed.

Let me explain more of the reasoning why required fields should be used sparingly. If you are already using a proto, you can't add a required field because old application's won't be providing that field and applications in general don't handle the failure well. You can make sure that all old applications are upgraded first, but it can be easy to make a mistake and it doesn't help if you are storing the protos in any datastore (even short-lived, like memcached). The same sort of situation applies when removing a required field.

Many required fields were "obviously" required until... they weren't. Let's say you have an id field for a Get method. That is obviously required. Except, later you might need to change the id from int to string, or int32 to int64. That requires adding a new muchBetterId field, and now you are left with the old id field that must be specified, but eventually is completely ignored.

When those two problems are combined, the number of beneficial required fields becomes limited and the camps argue over whether it still has value. The opponents of required weren't necessarily against the idea, but its current form. Some suggested developing a more expressive validation library that could check required along with something more advanced like name.length > 10, while also making sure to have a better failure model.

Proto3 overall seems to favor simplicity, and required removal is simpler. But maybe more convincing, removing required made sense for proto3 when combined with other features, like removal of field presence for primitives and removal of overriding default values.

I'm not a protobuf developer and am in no way authoritative on the subject, but I still hope that the explanation is useful.

Tetrabasic answered 4/8, 2015 at 16:49 Comment(17)
Yep. See also this extended explanation of things that can go horribly wrong with required fields: capnproto.org/…Fusil
I dont know if using 'required' can cause any trouble, but removal of 'optional' and 'has_field' most certainly does. Even at the simplest case of a bool which has a meaning for both its possible values, this forces you to re-invent the wheel or to add more params to your api, instead of just relying on the infrastructure layerGlimpse
Optional isn't removed; everything is optional in proto3. But yes, field visibility (has_field) has been removed for primitives. If you need field visibility use wrappers.proto which has messages like StringValue. Since they are messages, has_field is available. This is effectively "boxing" which is common in many languages.Tetrabasic
On the contrary, it seems like "optional" was removed in proto3. Every field exists, and is filled in with a default value. You have no way of knowing if the primitive field was filled in by the user, or by default. Message fields, which are basically pointers, are optional, in that they can have a null value.Suited
Proto2 would default to 0, if not overridden, for primitives when calling the getter. This behaves the same in proto2 and proto3 (in most languages). The optional keyword was removed in proto3, but it behaves like optional in proto2.Tetrabasic
@Suited "Every field exists" -- but consider: does it exist when serialized?Erato
int32, uint32, int64, uint64, and bool are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility. If a number is parsed from the wire which doesn't fit in the corresponding type, you will get the same effect as if you had cast the number to that type in C++ (e.g. if a 64-bit number is read as an int32, it will be truncated to 32 bits). developers.google.com/protocol-buffers/docs/proto#updatingRosariarosario
@FabianHertwig, int32, uint32, int64, uint64, and bool are wire-compatible, but they aren't necessarily API-compatible. Changing the .proto may cause applications to fail to compile or break at runtime. Thus, unless you have tight control on all the usages and there are few of them, it may be better to introduce a new field instead of changing an existing one.Tetrabasic
I would love to mark a field as required when creating a new message but not when parsing a string. Say I add a new field then I want to make sure all the code actually sets that new field. But messages from old code or storage can get the default value.Secretarygeneral
i feel like protobuf is a language designed expressly to start flame warsAnalogize
Seems like most people don't want to version their API's. It's easier for them to make everything optional for "backward compatibility".Havelock
Yeah, it seems that the underlying protocol shouldn't have these validation, which should happen on server side.Trafficator
This is all well and good, but - how do I tell if a field was not provided, or received the default? For things that are zero-indexed, it's a bit strange that, if an API call fails to provide an index, that index defaults to zero, with absolutely no way to tell it was not provided.Sachi
Primitives whose value is 0 might not be sent, even if explicitly set to 0 by the user. "Is it present" is no longer a semantically valid question for primitives and the system does track nor communicate it, on the wire or in the API. That said, opt-in primitive field presence is in the process of being added to proto3. See release notes for protobuf 3.12.Tetrabasic
@EricAnderson so primitives can be unboxed (fallback to default value) & boxed (fallback to null / None); class properties are boxed by nature and clients don't require to even specify them (means client compilation will succeed when update to new version). How do I, as API developer, can express the pb/grpc contract in the way that tells new clients to provide value (say, by generating constructors for such required fields) and keeping old clients without providing that property? Like how do I ask to break API but maintain ABI compatibility?Warlike
it would be better to let the user decide if required is useful or not.Elide
@Havelock depending on how fast your API evolves, adding a new message version each time a required field is added/removed may soon result in a nightmare. OTOH, if you want to check for required fields with proto3, you can still do it after deserialization yourself.Raper
W
92

You can find the explanation in this protobuf Github issue:

We dropped required fields in proto3 because required fields are generally considered harmful and violating protobuf's compatibility semantics. The whole idea of using protobuf is that it allows you to add/remove fields from your protocol definition while still being fully forward/backward compatible with newer/older binaries. Required fields break this though. You can never safely add a required field to a .proto definition, nor can you safely remove an existing required field because both of these actions break wire compatibility. For example, if you add a required field to a .proto definition, binaries built with the new definition won't be able to parse data serialized using the old definition because the required field is not present in old data. In a complex system where .proto definitions are shared widely across many different components of the system, adding/removing required fields could easily bring down multiple parts of the system. We have seen production issues caused by this multiple times and it's pretty much banned everywhere inside Google for anyone to add/remove required fields. For this reason we completely removed required fields in proto3.

After the removal of "required", "optional" is just redundant so we removed "optional" as well.

Womble answered 17/9, 2018 at 2:56 Comment(13)
I don't get it; what's the difference between dropping a message after deserializing and on deserialization? it will get dropped by the older client since it does not contains a field that is needed (e.g. id).Faveolate
I'm inclined to agree with @ShmuelH. required fields are going to be a part of an api one way or another. Well that's supported automatically through syntax given to both parties, or hidden in the backend, it's still there. May as well make it visible in the api definitionKurzawa
The problem is you're confusing gprc with protobuf. required fields are going to be a part of an api one way or another makes sense only for gprc api and not for protobuf in general. API messaging is not the only goal for Ser-Des.Sergei
I totally agree with @ShmuelH. fields are required in an API one way or another and its useful for the customer to know this. This makes me think we just haven't gotten versioning right yet.Kovar
Another vote for @ShmuelH. If you change your API in a backwards-incompatible way (adding a required field), then surely you want your parser to detect that? Version your APIs! You can even do it completely in Protobuf if you want, using oneof { MessageV1, MessageV2, etc. }.Uranology
It could not justify having required fields initially. And adding a required field is incompatible change and usually should be handled by protocol version change (i.e. a new message type).Timmy
@ShmuelH. if you think in the context of multiple services it will make more sense. For ex if Amazon had a service that returns a list of products and another service for updating products, you might use the same Product message in both services. For fetching products to render on the home page, the client doesn't need to know the ownerId. For a client that sends a product payload to update, it may require that the ownerId is present. The independent service can decide which fields they require without needing to enforce it at the api level for every client.Mcinerney
@Kovar Versioning won't help. Backwards compatibility is about existing code, but clients with old code can also have new code added that might want to use new fields without needing to update old code that doesn't need the field. You could add a new field that should be required for one client and hidden from others. After that you could add an even newer field that you want all clients to see. In this case you'd want client code to use fields across three iterations (versions) of the schema at the same time, not just a single version.Mcinerney
It really boils down to the fact that required/optional forces the bundling of two concerns; the first is serialization and the second is application data validation. This bundling leads to the problems described, meaning that you really want validation to happen somewhere else. It is an error to have the serialization layer include this validation, though ideally you want it to be easily interoperable.Heid
Good writeup concerning this from a comment in the accepted answer: capnproto.org/…Panada
@Uranology If you have an application a little more complex than HelloWorld, then you will probably need more advanced validation, and so you better separate that from parsing. If you have an API evolving fast, then introducing new version each time you introduce/remove new required field will soon get you to nightmare. For details see the link in the comment above.Raper
@AlexChe It's best to do as much validation as possible as early and as automatically as possible. This is the sort of validation that the parser should be able to do. You likely need additional validation in most applications but that's fine. It's particularly tedious to not have the parser handle optionality in languages that encode optionality in the type system, e.g. Rust because everything becomes Optional<>.Uranology
I agree my proposed solution isn't the best. I think the best solution would be to support field versioning natively, but I have yet to see a message format actually try that.Uranology
A
39

Optional fields were returned in protobuf 3.15

Armelda answered 17/3, 2021 at 13:10 Comment(3)
If everything is optional, then what is the use of returning "optional" in the said version?Witter
@SubinSebastian see github.com/protocolbuffers/protobuf/blob/master/docs/…Megaphone
@SubinSebastian with optional you get the ability to explicitly check if a field is set. Lets say you have a field int32 confidence. Currently when receiving a message which such a type you cannot know the difference between confidence = 0 or confidence not set. Because default values are optimized away in the serialization. If you mark the field as optional then presumably some extra bits are set in the serialization and a has_confidence() method will be generated so that you on the receiving end can disambiguate the two.Canescent
E
6

Because orthogonal factoring of related concepts is hard and the Protocol Buffers design combined at least 4 separate concepts in a way that became frustrating: Nullability (a.k.a. Presence Tracking), Content Validation, Useful Defaults, and Space Efficiency.

Proto2 allowed a field to be either 'required' or 'optional' and allowed specifying default values, but only for 'optional' fields.

The 'optional' keyword came back in 3.12/3.15 to be used for Presence Tracking. See the app note on Field Presence.

The 'required' keyword had been applied as a validation check and that was just unfortunate because Protobuf isn't up to the task of being a validation tool. It doesn't have min/max value syntax, length, or pattern restrictions. More importantly it has nothing for expressing data value dependencies between fields (apart from relationships intrinsic to basic structure).

Protobuf uses 'default' values for initial construction values and as a mechanism to reduce the size of messages by not sending those values(in proto3). That is a bit unfortunate because being able to specify an initial value for a particular code generation run would be handy for either a producer ("this is the value I want to send in my normal usage") or a consumer ("maybe I should check for null/not present, but falling back as if 'x' was sent will be good enough for now").

OTOH, it does seem like it would be helpful to specify a contractual default assumption for how to proceed if a value is not supplied. However, doing a good job of that often depends on the values(or presence) of other fields and Protobuf isn't up to that for the same (arguably beautiful) lack of complexity that makes it unsuitable for data validation. So if you want good behavior in the face of missing fields, you are best off combining explicit presence checks with whatever other data checks are appropriate. .... and at least in 3.12/3.15 onward you can. ... and maybe 0-ish values are good enough for the simplistic cases.

Erythrism answered 15/10, 2022 at 0:36 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.