How to make C language context-free?
Asked Answered
L

3

6

I know that C is not a context-free language, a famous example is:

int foo;
typedef int foo;
foo x;

In this case the lexer doesn't know, whether foo in the 3rd line, is an identifier, or typedef.

My question is, is this the only reason that makes C a Context-Sensitive Language?

I mean, if we get rid of typedef, would it become context-free language? Or there are other reasons (examples) that prevent it from being so?

Laundress answered 23/5, 2017 at 17:45 Comment(12)
If you remove such things, you no longer have C, but a completely new language that is similar to (but not compatible with) C.Trichroism
+ depends on context: in 40 + 2 it means integer addition; in 40 + 2.0 it means floating point addition.Shirleeshirleen
@Shirleeshirleen The parser doesn't care what the types of things are.Cossack
@Cossack On the other hand + (and -) are context-sensitive since they can both mean two different operators depending on context: Unary or binary plus/minus. So from the lexers point of view it's impossible to say which operator is meant.Trichroism
@BiteBytes The example doesn't really work. If foo weren't a typedef, foo x; would simply be an error, so there's no syntactic ambiguity there. I mean sure, the lexer can't tell whether foo is an identifier or a typename, but the example doesn't explain why it would need to know that (in fact the lexer doesn't necessarily have to know that - that's just one possible way of solving the problem). The common example would be the expression (foo)(x), which could either be a function call or a cast, depending on whether foo is a type or not.Cossack
@Someprogrammerdude That doesn't depend on context. I can easily parse unary and binary operators with a context free grammar.Cossack
@Someprogrammerdude Still, any construct involving these operators can be described using CFGKeele
@Cossack True, but on the other hand the OP seems more worried about it from a lexical point of view (which doesn't really involve the grammar). Some clarification from the OP would be nice...Trichroism
@Someprogrammerdude "OP seems more worried about it from a lexical point of view" The distinction between unary and infix operators matters even less on a lexical level. A plus sign is just a plus sign. Also the term "context free" doesn't really have a meaning at the lexical level, so I see no way to interpret the question like that. The way I see it, the OP talked about the lexer because the cast-vs.-function-call ambiguity is often solved with a lexer hack that distinguishes typenames from other identifiers, not because the question is about lexing per se.Cossack
I don't think the words you've used accurately reflect the question you really mean to ask. Nothing in the formal definition of a context-free language requires that every valid sentence in the language have a unique production from terminal symbols. You seem to be looking for constructs whose interpretation in terms of grammar rules is unambiguous, but the existence of such constructs does not make the language context-sensitive.Wideawake
@JohnBollinger: It's true (and important) that a CFG can be ambiguous. But the question is whether C is described by a CFG, and for that we need to be clear on what we mean by "C". For example, int foo; int bar = (foo)(3); is not valid C, while typedef int foo; int bar = (foo)(3); is valid. There is an argument (which I don't totally agree with) that we can ignore that error as "semantic", but there is no doubt that the C standard forbids the first excerpt, so that the best we can say is that the (potential) CFG describes a superset of C. (Ignoring the preprocessor.) ...Rations
@JohnBollinger: ... that ends up with the claim that C is (kind of) context free because a superset of C can be described with a CFG. What is the superset? Why, it is exactly the smallest language which includes C and is context-free (because we've eliminated all the "semantic" errors which are nonetheless inherent in the program text). The fact that the smallest context-free superset of C is context-free is tautological. But is it interesting? (Perhaps I should edit that pair of comments into an answer, but I feel like I've written it all before.)Rations
K
4

The post-processor C syntax can be parsed with a classical lex + yacc combo. The lexer definition and the yacc grammar are freely available at

http://www.quut.com/c/ANSI-C-grammar-l-2011.html and http://www.quut.com/c/ANSI-C-grammar-y-2011.html

As you can see from the lex file, it's straightforward except for the context-sensitive check_type() (and comment(), but comment processing technically belongs to the preprocessor), which makes typedef the only source of context-sensitivity there. Since the yacc file doesn't contain any context-sensitivity introducing tricks either, a typedef-less C would be a perfectly context-free syntax.

The subsequent typechecking of C (matching declarations with use sites) is context sensitive, so you could say that overall, C is context sensitive.

Kagera answered 23/5, 2017 at 17:57 Comment(5)
If you ignore the preprocessor :-)Rations
It is wrong answer. In C when you call function you need to check argument list to match declaration. This is context-sensitive. In any CFG of C including cited, call with wrong number of arguments is accepted.Thralldom
@KonstantinVladimirov Yes, typechecking (semantic check) is context sensitive. The syntaxt is almost context-free.Kagera
Even checking number of arguments is context-sensitive.Thralldom
@KonstantinVladimirov That falls under type checking.Kagera
A
3

No. C cannot be a strict context independent language. For that, you should describe a syntax that doesn't allow to use a nondeclared variable (this is context) in a similar way as what you describe in your question. The language authors always describe syntax using some kind of context free grammar, but just to describe the main syntactic constructs of the language. The case you describe (making a type identifier to fit in a different token class to be able to go in places where it shouldn't) is only an example. If you look for example, the freedom in the order for things like static unsigned long long int variable simplifies the syntax remembering by programmers, but complicates things to the compiler authors.

Allineallis answered 24/5, 2017 at 4:50 Comment(0)
E
1

As per my knowledge and research there are two basic reasons that make C context sensitive language. These are:

  1. Variable is declared before it is used.
  2. Matching the formal and actual parameters of functions or sub-routines.

These two can't be done by PushDown Automata (PDA) but Linear Bounded Automata (LBA) can do thes two.

Ervinervine answered 21/8, 2019 at 5:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.