chomsky hierarchy and programming languages
G

3

18

I'm trying to learn some aspects of the Chomsky Hierarchy which are related to programming languages, and i still have to read the Dragon Book.

I've read that most programming languages can be parsed as a context free grammar (CFG). In term of computational power, it equals the one of a pushdown non deterministic automaton. Am I right?

If it's true, then how could a CFG hold an unrestricted grammar (UG), which is turing complete? I'm asking because, even if programming languages are described by CFGs, they are actually used to describe turing machines, and so via an UG.

I think that's because of at least two different levels of computing, the first, which is the parsing of a CFG focuses on the syntax related to the structure ( representation ? ) of the language, while the other focuses on the semantic ( sense, interpretation of the data itself ? ) related to the capabilities of the programming language which is turing complete. Again, are these assumptions right?

Grimmett answered 28/5, 2010 at 13:47 Comment(0)
W
28

I've read that most programming languages can be parsed as a context free grammar (CFG). In term of computational power, it equals the one of a pushdown non deterministic automaton. Am I right?

Technically yes. Usefully, no.

There are at least two useful ways to think about these questions:

  • If you're thinking of a set of strings, you have a language.
  • If you're thinking about an algorithm to decide whether a string is or is not in a language, you have a decision problem.

The difficulty is that while most programming languages have an underlying structure that is easily described by a context-free grammar (Tcl being an interesting exception), many sentences that are described by the context-free grammar are not actually "in the language," where by "in the language" I mean "a valid program in the language in question." These extra sentences are usually ruled out by some form of static semantics. For example, the following utterance is a sentence in the context-free grammar of C programs but it is not itself in the set of valid C programs:

int f(void) { return n + 1; }

The problem here is that n is not in scope. C requires "declaration before use", and that property cannot be expressed using a context-free grammar.

A typical decision procedure for a real programming language is part of the front end of a compiler or interpreter, and it has at least two parts: one, the parser, is equivalent in decision power to a pushdown automaton; but the second does additional checks which rule out many utterances as invalid. If these checks require any kind of definition-before-use property, they can't be done by a pushdown automaton or context-free grammar.

If it's true, then how could a CFG hold an unrestricted grammar (UG), which is turing complete?

A CFG doesn't "hold" anything—it simply describes a language.

... even if programming languages are described by CFGs, they are actually used to describe turing machines, and so via an UG.

You're skipping past some important levels of indirection here.

I think that's because of at least two different levels of computing, the first, which is the parsing of a CFG focuses on the syntax related to the structure ( representation ? ) of the language, while the other focuses on the semantic ( sense, interpretation of the data itself ? ) related to the capabilities of the programming language which is turing complete. Again, are these assumptions right?

They seem a bit muddled to me, but you're on the right track. A key question is "what's the difference between a language and a programming language?" The answer is that a programming language has a computational interpretation. Computational interpretations come in many fine varieties, and not all of them are Turing-complete. But the magic is in the interpretation, not in the syntax, so the Chomsky hierarchy is not very relevant here.

To prove my point, an extreme example: the regular language [1-9][0-9]* is Turing-complete under the following interpretation:

  • The SK-combinator language is Turing complete.
  • There are only countably many SK programs.
  • They can easily be enumerated uniquely and deterministically.
  • Therefore we can associate each positive integer with an SK program.
  • If we interpret a sequence of digits as a positive integer in the standard way, we can equally well interpret the same sequence of digits as an SK program, and moreover, any SK program can be expressed using a finite sequence of digits.

Therefore the language of integer literals is Turing-complete.

If your head doesn't hurt now, it should.

Wiencke answered 1/6, 2010 at 1:4 Comment(6)
FYI, you can do a BNF for Tcl. It's just less informative than in most languages because the usual recursive terms (if, while, program blocks in general) are defined entirely at the semantic level. That is, they're standard library functions, nothing more. (The flip side of this is that it is really easy to embed foreign syntaxes inside Tcl programs, provided they're parenthetically balanced. Virtually everything is…)Priddy
@Donal: Yes, except any program can add arbitrary new productions to the "grammar", dynamically. Having a parser is not much use in practice---you can't really analyze a Tcl program---and Tcl doesn't have much of a parser. But embedding strangeness is indeed very, very easy.Wiencke
Thx a lot ! It was the kind of response i was looking for. Not sure that everything about this is clear, but it's clearer. And I think I got the point, "the magic is in the interpretation, not in the syntax".Grimmett
Ok, everything is clear now. the SK-combinator programs mapped to integers is such a powerful example, I guess this is more or less a Godel numbering. thank you Ramsey !Grimmett
"A key question is "what's the difference between a language and a programming language?" The answer is that a programming language has a computational interpretation." -- That is what linguists believe about language, but it's wrong. Sentences in a spoken language don't just generate a parse tree or a data structure. They have a computational interpretation, or they would have no effect on their hearers. This is obviously true for imperatives.Trier
There is no BNF for Perl, because it was deliberately designed to be context-sensitive. There are actually parts of the compiler code (like for interpreting the smartmatch operator, and "indirect object" notation), that try to guess what the programmer wants. Unfortunately that doesn't give the language more computational power; it just makes it harder to use.Trier
A
1

This is absolutely not true. Most programming languages have a syntax that can be described by a CFG or BNG, but conforming to the syntax does not guarantee a legal program. There are all sorts of extra conditions such as "variables must be declared before use" or "the types in this expression must be combined in a legal way" that are not covered by the grammar, and that is what makes the languages non-context-free. (This is a bit like XML, which has a formally verifiable definition, but usually also extra constraints that a parser cannot verify.)

Alum answered 28/5, 2010 at 14:3 Comment(0)
C
1

Very good example of a language,that does not have CFG for its syntax is C++. You seem not to understand the UG exactly. The universal grammar is a problem of interpretation described as a language of words which contain code for turing machine and word which is accepted by that turing machine. So you do not encode the language itself (set of words), but the turing machine for it. Now comes the point - you can have a language of infinite words, but you cannot have a word of infinite symbols. This means, that UG as well contains finite words and therefore all descriptions of turing machines are finite. The description of the turing machine (program in a programming language) has therefore finite number of symbols (statements), so the language of descriptions (programming language syntax grammar) can be even regular. Look for example at the Binary Combinatory Logic.

Compony answered 30/5, 2010 at 23:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.