Is Vala a sane language to parse, compared to C++?
Asked Answered
B

2

7

The problems parsing C++ are well known. It can't be parsed purely based on syntax, it can't be done as LALR (whatever the term is, i'm not a language theorist), the language spec is a zillion pages, etc. For that and other reasons I'm deciding on an alternative language for my personal projects.

Vala looks like a good language. Although providing many improvements over C++, is just as troublesome to parse? Or does it have a neat, reasonable length formal grammar, or some logical description, suitable for building parsers for compilers, source analyzers and other tools?

Whatever the answer, does that go for the Genie alternative syntax?

(I also wonder albeit less intensely about D and other post-C++ non-VM languages.)

Bullet answered 27/11, 2010 at 4:45 Comment(12)
Features are a good reason to choose some specific language for a project, but what does it matter how difficult that language is to parse? (Unless your personal project is writing a compiler for said language). On that note, C++ is not an LR(1) grammar, like Java and C# are, and can potentially involve infinite lookahead.Counselor
Vana 'should' be saner than C++. I know Java is REALLY sane, I've used a Java Parser written in Java and generated by a compiler-compiler directly from the grammar in EBNF.Levan
Err.. it can be parsed based on syntax, at least insofar as any language could be considered that way (of course, things like identifiers are technically context sensitive, but they are common across most languages). Yes the grammar is not LALR(1), but it's of course parsable. On the other hand, difficulty of parsing really shouldn't be your main criterion for choosing a language -- there's a lot to be said for popularity (and therefore ease of aquiring libraries and such) for a given language.Sackcloth
Java is way too "sane" for my taste!Bullet
My "personal projects" include tools to analyze source code. I've got various crazy ideas in the back of my mind, for which parsing or otherwise analyzing source code is a part.Bullet
Parsing is easy. People that want to build programming tools seem to focus on parsing as the problem. It is only one of many problems, and in fact it is the easiest since it is well solved. The harder part is acquiring the language semantics and doing something with them. I characterize this as the "climbing Everest" problem; parsing gets you to the foothills and that step is sort of easy. Going from the foothills to the peak requires a whole new class of technology, engineering and sweat (see my bio for what that might look like).Forgave
Parsing C++ is in fact undecidable #794515Lobster
For those wondering why parse-ability might be important, you might enjoy "The Role of the Study of Programming Languages in the Education of a Programmer" by Daniel Friedman (see cs.indiana.edu/~dfried ).Sayer
@jleedev: You seem to be casting FUD. The mentioned SO article is more nuanced than what is implied in your short comment. As a practical matter, parsing C++ is decidable or programmers would stop using C++ compilers. The confusion over local ambiguity vs templates is a non-issue. Local ambiguity of certain phrases is easily solved with standard parser technology (I use GLR in our parser and that works fine). Templates computing halting predicates are different but in practice nobody writes those anyway, and the compilers put finite limits on template processing to boot.Forgave
@Ira My apologies. I am actually curious about IntelliSense to see what ambiguities it can handle and what it might choke on.Lobster
@jleedev: If the issue is, "what syntax can be predicated to come next", ambiguity in the grammar doesn't matter because you get the same sequence of surface characters. So a parser which can handle ambiguity for C++ could provide fine intellisense-like hints about what can come next. I don't know how Intellisense actually works, but I hear it has the Edison Design Grooup parser behind it; that should be good enought to handle the local ambiguity if the left context contains sufficient type information.Forgave
IntelliSense merely needs to know what name can follow a -> or .. This means it has to resolve the name on the left of the -> or . In this specific context, there's no lookahead needed at all, let alone an infinite lookahead. The type of the expression preceding -> is entirely independent of the token following ->.Reluctant
S
8

C++ is one of the most complex (if not the most complex) programming language to parse in common use. Of particular difficulty is it's name lookup rules and template instantiation rules. C++ is not parsable using a LALR(1) parser (such as the parsers generated by Bison and Yacc), but it is by all means parsable (after all, people use parsers which have no problem parsing C++ every day). (In fact, early versions of G++ were built on top of Bison's Generalized LR parser framework Actually not, see comments) before it was more recently replaced with a hand written recursive descent parser)

On the other hand, I'm not sure I see what "improvements" Vala offers over C++. The languages look to attempt to accomplish the same goals. On the other hand, you're probably not going to find much outside of GTK+ written with Vala interfaces. You're going to be using C interfaces to everything else, which really defeats the point of using such a language.

If you don't like C++ due to it's complexity, it might be a good idea to consider Objective-C instead, because it is a simple extension of C, (like Vala), but has a much larger community of programmers for you to draw upon given it's foundation for everything in Mac land.

Finally, I don't see why the difficulty of parsing the language itself has to do with what a programmer should be caring about in order to use the language. Just my 2 cents.

Sackcloth answered 27/11, 2010 at 5:25 Comment(8)
+1 for that last part, parsability shouldn't even be a concern for the average developer.Turney
I doubt that g++ was based on the Generalized LR parser infrastructure of Bison: that one has been added in version 1.5 in 2002. AFAIK, the pre 3.4 parser was LALR based with more or less clean hacks to make it handle C++.Transcendentalistic
@AProgrammer: I'm afraid I don't understand. C++ simply isn't parsable using LALR. How could one "hack" in support for that?Sackcloth
The early versions of GCC parsed C++ by using Bison operating as an pure LALR(1) parser, with some pretty awful hacks involving symbol tables and type information to resolve the ambiguities. (See my answer #243883 for more details as to how this works). Newer versions use a hand-written rescursive descent... and I think the same essential hacks to avoid lookahead.Forgave
As far as "awful languages to parse" go, lots of them get my vote: PHP for sheer poor definition; mainframe Natural for having keywords everywhere that can sometimes be identifiers, sometimes not; and if you want to go nuts, try parsing any dynamic HTML in which people mix fragments of JavaScript with HTML chunks and client-side code that in some 3rd language (C#, JSP, ...) that manufactures bits of JavaScript. (If you want to analyze an HTML page, you need to do stuff like this).Forgave
@Billy, I've not looked at the old g++ parser but I'm worked on programs using hacks to make a LALR parser generator handle some language which isn't LALR. Those involves playing with the tokenizer: having it return different things for the same text depending on the context, having it returns dummy token which are coming from the parser and not the input, using it to implement backtracking (refeeding the same input but after having changed the context so it is tokenized differently). When you start that, you aren't really using a parser generator, but a strange PL and anything is possible.Transcendentalistic
Also, Vala is a nice language to learn because it presents C using idioms and syntax familiar to Java and .Net programmers. Coming from most languages, I would think Objective C syntax would be bizarre.Embarkation
+1 for the Objective-C reference, people tend to forget it does actually exist outside of the Mac ecosystem.Lemoine
A
6

It's pretty simple. You can use libvala to do both parsing, semantic analyzing and code generation instead of writing your own.

Anikaanil answered 8/12, 2010 at 11:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.