How do Java, C++, C#, etc. get around this particular syntactic ambiguity with < and >?
Asked Answered
R

2

29

I used to think C++ was the "weird" one with all the ambiguities with < and >, but after trying to implement a parser I think I found an example which breaks just about every language that uses < and > for generic types:

f(g<h, i>(j));

This could be syntactically either interpreted as a generic method call (g), or it could be interpreted as giving f the results of two comparisons.

How do such languages (especially Java, which I thought was supposed to be LALR(1)-parsable?) get around this syntactic ambiguity?

I just can't imagine any non-hacky/context-free way of dealing with this, and I'm baffled at how any such language can be context-free, let alone LALR(1)-parsable...

(It's worth noting that even a GLR parser can't return a single parse for this statement with no context!!)

Renayrenckens answered 19/1, 2013 at 8:29 Comment(10)
+1 great question, Just a note, the java compiler has plenty of heuristics to resolve ambiguity, it guesses plenty. They themselves admit that their generic system is broken and there are cases it fails (although they are rate), look at the java 8 features presentation they explain some of the problems and how they'll address them in java 8.Highpitched
@BenjaminGruenbaum: Thanks. :) Regarding heuristics -- they're definitely great for generating good compiler error messages, but unless they're part of the language spec's grammar, they can't be legally used to resolve ambiguities, can they? And if they are part of the grammar, then how can the grammar be context-free (let alone LR(1)-parsable)?Renayrenckens
its not amibugous, it depends on what g is declared asHesiod
@NimChimpsky: Yeah, that's the point -- it's syntactically ambiguous (since it depends on the context), but not semantically ambiguous (since, given the context, it's unambiguous).Renayrenckens
Could this be part of why scala went with []?Postaxial
@KarthikT: Huh, maybe... I don't know Scala well at all unfortunately but that sounds very plausible.Renayrenckens
@NimChimpsky: Also, I just noticed, your name is surprisingly similar to Chomsky! :)Renayrenckens
@Mehrdad lol, its the name of a talking monkey, named after noam.Hesiod
@Mehrdad. It's definitely why D went with !(). dlang.org/templates-revisited.html ("D solves it by noticing that ! is not used as a binary operator")Ikkela
The point of GLR is to give you all the valid syntactic parses of your context free language. It would have no trouble parsing your example, and would produce both parses. To use this result, you'd have to decide which of the ambiguous choices made sense, using something other than this particular form (e.g., actual type information about "g" is). That's how we solve with our tools, and it nicely isolates parsing from name and type resolution. As other have noted, Java was LALR(1) in a galaxy long, long ago, and far far away.Uproot
A
4

a generic method call in java would be <h,i>g(j) so there is no ambiguity :)

Annates answered 19/1, 2013 at 16:25 Comment(2)
Wow, you're right... I thought it would be g<h, i>(j) but apparently it isn't! +1 Great answer, thanks :)Renayrenckens
That was my reaction, too. I think it make it easier for the parsers, and I have to admit it is a really ugly hack. Langauge designers shouldn't listen to the compiler guys complaining when they design; you get stuff like this.Uproot
I
2

I just can't imagine any non-hacky/context-free way of dealing with this, and I'm baffled at how any such language can be context-free, let alone LALR(1)-parsable...

The answer is that they aren't (at least not Java and C++; I know very little about C#). The Java grammar that you link to dates back to 1996, way before generics have been introduced.

For further discussion, see Are C# and Java Grammars LALR(x)?

Irremediable answered 19/1, 2013 at 9:13 Comment(1)
Not even C is fully context-free since the use of a symbol can represent either an identifier or a type name, depending on declarations that can be separated from the use by an arbitrary amount of code.Rigdon

© 2022 - 2024 — McMap. All rights reserved.