Is the C99 preprocessor Turing complete?
Asked Answered
S

4

92

After discovering the Boost preprocessor's capabilities I found myself wondering: Is the C99 preprocessor Turing complete?

If not, what does it lack to not qualify?

Sipes answered 28/6, 2010 at 22:45 Comment(2)
The missing thing in CPP for turing completeness is essentially recursion, since it can't loop without it (and really has a fairly limited conditional since you can't conditionally expand portions of a macro)Bethelbethena
If you're truly curious you should check it out for yourself, but just a warning to anyone seeing the arguments that CPP is turing complete: all the attempts to do so seem to depend on defining a LARGE NUMBER of macros, and them using them to facilitate looping / recursion. This is not theoretically turing-complete, although you could argue that it's practically sufficient.Countryman
S
42

Here is an example of abusing the preprocessor to implement a Turing machine. Note that an external build script is needed to feed the preprocessor's output back into its input, so the preprocessor in and of itself isn't Turing complete. Still, it's an interesting project.

From the description of the afore-linked project:

the preprocessor is not Turing complete, at least not if the program is preprocessed only once. This is true even if the program is allowed to include itself. (The reason being that for a given program, the preprocessor has only a finite number of states, plus a stack consisting of the places which the file has been included from. This is only a push-down automaton.)

The answer by Paul Fultz II is quite impressive and certainly closer than I thought the preprocessor could ever get, but it's not a true Turing machine. The C preprocessor has certain limits that prevent it from executing an arbitrary program like a Turing machine could, even if you had infinite memory and time. Section 5.2.4.1 of the C spec gives the following minimum limits for a C compiler:

  • 63 nesting levels of parenthesized expressions within a full expression
  • 63 significant initial characters in an internal identifier or a macro name
  • 4095 macro identifiers simultaneously defined in one preprocessing translation unit
  • 4095 characters in a logical source line

The counter mechanism below requires a macro definition per value, so the macro definition limit will limit how many times you can loop (EVAL(REPEAT(4100, M, ~)) would yield undefined behavior). This essentially puts a cap on the complexity of the program that you can execute. The nesting and complexity of the multi-level expansions may hit one of the other limits as well.

This is fundamentally different than the "infinite memory" limitation. In this case, the spec specifically says that a standards-conforming C compiler is only required to conform to these limits, even if it has infinite time, memory, etc. Any input file exceeding these limits can be processed in an unpredictable or undefined manner (or outright rejected). Some implementations may have higher limits, or no limits at all, but that's considered "implementation-specific" and not part of the standard. It may be possible to use Paul Fultz II's method to implement something like a Turing machine on some specific compiler implementation that has no finite limits, but in a general sense of "can this be done on any arbitrary, standards-conforming C99 pre-processor", the answer is no. Since the limit here is built into the language itself and not simply a side-effect of our inability to construct an infinite computer, I say that breaks Turing completeness.

Simdars answered 28/6, 2010 at 23:10 Comment(4)
If you mean the 115 point answer by Paul Fultz II below: it confirms this answer.Ashes
The limit is in the language itself, but not because of the spec, but because we must write the scans to evaluate the algorithm in the language itself, which there is no mechanism to apply an infinite number of scans.Rossuck
To add to the lack of recursiveness, that seems to be (under)specified by the standard. §6.10.3.5 paragraph 4 shows an example of two macros calling each other and states: "There are cases where it is not clear whether a replacement is nested or not [...]. Strictly conforming programs are not permitted to depend on such unspecified behavior. "Astrosphere
@Ashes technically, the 217 point answer (now above smh) doesn't actually confirm this answer, as someone else could still prove that CPP is TC. However, it seems like no-one has, and all who attempt use LARGE NUMBERS of macros. The fact that the algorithm is designed to always terminate is enough for me (see alinsoar's answer).Countryman
R
165

Well macros don't directly expand recursively, but there are ways we can work around this.

The easiest way of doing recursion in the preprocessor is to use a deferred expression. A deferred expression is an expression that requires more scans to fully expand:

#define EMPTY()
#define DEFER(id) id EMPTY()
#define OBSTRUCT(...) __VA_ARGS__ DEFER(EMPTY)()
#define EXPAND(...) __VA_ARGS__

#define A() 123
A() // Expands to 123
DEFER(A)() // Expands to A () because it requires one more scan to fully expand
EXPAND(DEFER(A)()) // Expands to 123, because the EXPAND macro forces another scan

Why is this important? Well when a macro is scanned and expanding, it creates a disabling context. This disabling context will cause a token, that refers to the currently expanding macro, to be painted blue. Thus, once its painted blue, the macro will no longer expand. This is why macros don't expand recursively. However, a disabling context only exists during one scan, so by deferring an expansion we can prevent our macros from becoming painted blue. We will just need to apply more scans to the expression. We can do that using this EVAL macro:

#define EVAL(...)  EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(__VA_ARGS__)))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(__VA_ARGS__)))
#define EVAL5(...) __VA_ARGS__

Now if we want to implement a REPEAT macro using recursion, first we need some increment and decrement operators to handle state:

#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__

#define INC(x) PRIMITIVE_CAT(INC_, x)
#define INC_0 1
#define INC_1 2
#define INC_2 3
#define INC_3 4
#define INC_4 5
#define INC_5 6
#define INC_6 7
#define INC_7 8
#define INC_8 9
#define INC_9 9

#define DEC(x) PRIMITIVE_CAT(DEC_, x)
#define DEC_0 0
#define DEC_1 0
#define DEC_2 1
#define DEC_3 2
#define DEC_4 3
#define DEC_5 4
#define DEC_6 5
#define DEC_7 6
#define DEC_8 7
#define DEC_9 8

Next we need a few more macros to do logic:

#define CHECK_N(x, n, ...) n
#define CHECK(...) CHECK_N(__VA_ARGS__, 0,)

#define NOT(x) CHECK(PRIMITIVE_CAT(NOT_, x))
#define NOT_0 ~, 1,

#define COMPL(b) PRIMITIVE_CAT(COMPL_, b)
#define COMPL_0 1
#define COMPL_1 0

#define BOOL(x) COMPL(NOT(x))

#define IIF(c) PRIMITIVE_CAT(IIF_, c)
#define IIF_0(t, ...) __VA_ARGS__
#define IIF_1(t, ...) t

#define IF(c) IIF(BOOL(c))

#define EAT(...)
#define EXPAND(...) __VA_ARGS__
#define WHEN(c) IF(c)(EXPAND, EAT)

Now with all these macros we can write a recursive REPEAT macro. We use a REPEAT_INDIRECT macro to refer back to itself recursively. This prevents the macro from being painted blue, since it will expand on a different scan(and using a different disabling context). We use OBSTRUCT here, which will defer the expansion twice. This is necessary because the conditional WHEN applies one scan already.

#define REPEAT(count, macro, ...) \
    WHEN(count) \
    ( \
        OBSTRUCT(REPEAT_INDIRECT) () \
        ( \
            DEC(count), macro, __VA_ARGS__ \
        ) \
        OBSTRUCT(macro) \
        ( \
            DEC(count), __VA_ARGS__ \
        ) \
    )
#define REPEAT_INDIRECT() REPEAT

//An example of using this macro
#define M(i, _) i
EVAL(REPEAT(8, M, ~)) // 0 1 2 3 4 5 6 7

Now this example is limited to 10 repeats, because of limitations of the counter. Just like a repeat counter in a computer would be limited by the finite memory. Multiple repeat counters could be combined together to workaround this limitation, just like in a computer. Furthermore, we could define a FOREVER macro:

#define FOREVER() \
    ? \
    DEFER(FOREVER_INDIRECT) () ()
#define FOREVER_INDIRECT() FOREVER
// Outputs question marks forever
EVAL(FOREVER())

This will try to output ? forever, but will eventually stop because there are no more scans being applied. Now the question is, if we gave it an infinite number of scans would this algorithm complete? This is known as the halting problem, and Turing completeness is necessary to prove the undecidability of the halting problem. So as you can see, the preprocessor can act as a Turing complete language, but instead of being limited to the finite memory of a computer it is instead limited by the finite number of scans applied.

Rossuck answered 10/5, 2012 at 0:44 Comment(17)
...Wow. Very impressive! Here I thought that the C99 preprocessor was definetely not turing complete.. +1 for thinking out of the boxAlbertoalberts
+1 Quite a creative way to show that preprocessor can scan symbols on a tape ;-) (Thanks to the mod for accepting the flagging to remove the wiki!).Huynh
I like how this uses O(log(N)) macros to recurse N times. This is better than Boost Preprocessor which uses O(N) macros.Sisterhood
@Sisterhood The EVAL macro applies over 250 scans to the algorithm even it is not needed. Whereas Boost.PP only uses the number of scans that is needed by the algorithm, so Boost.PP is much more efficient.Rossuck
@Paul I guess that it is a sort of a macro source size vs speed trade off. This does make me wonder if there is a way to get both log(N) macro size and Boost.PP's reduced number of scans. Maybe it would be possible to quit at the first power of two number of scans after the algorithm finishes, so it would only have up to N extra scans.Sisterhood
Paul, without checking the details of how you did the for(;;) {} loop, I can say that you cannot do an infinite loop in cpp, simply because the maximum number of iterations depend on the number of identifiers inside the input file, whatever you do. You should create new symbols inside the eval(), but it is not possible to create an infinite number of symbols...Kandicekandinsky
I am confused as to why this qualifies as Turing completeness. It seems we are guaranteed that the program terminates because there are only a finite number of scans. If you could force infinite scans, then it would be Turing complete, but you can't force infinite scans.Exorable
The algorithm is not guaranteed to terminate, however, the program is guaranteed to always terminate just like recursion is always guaranteed to terminate in C due to stack overflow. The difference comes from the "machinery", which is why the PP acts as a Turing complete language whereas C is a Turing complete language.Rossuck
This does not establish Turing completeness. The essential limitation is the need to fix the number of scans being done.Ashes
The limited number of scans is analogous is not analogous to a computer having limited memory. The essence of Turing completeness is that the specification of the computations itself carries no limitation, even when it is actually made to run on a machine that does have limited capacity.Ashes
I tested it. The FOREVER() is not infinite, for me it displays only 365 ?. This function is not mu-recusive operator but sigma-recursive. The only way to try and make it turing-complete is by defining an infinite number of macros, as I explained in my answer. C preprocessor does not allow such a feature.Kandicekandinsky
@Kandicekandinsky The Ackerman function is considered mu-recursive and not sigma-recursive and can be implemented in the preprocessor: gist.github.com/pfultz2/80391e8b18abf3225da2242dcc570cecRossuck
And interestingly enough, the Ackerman function in the PP will usually hit a stack overflow before running out of scans.Rossuck
@Paul Fultz II It is because you know a priori about this function that it is going to finish, it was proved. So you need limited time to finish. The mu recursive functions , like searching for a solution of an arbitrary equation, cannot be computed by cpp. I said that cpp cannot compute the class of mu recursive, not some particular example of function about you know already lots of things.Kandicekandinsky
I figured out a way to use just the number of scans you need, at least in some cases, rather than using a macro like this EVAL which throws a large hard-coded amount of scans at it. It works if you have a parenthetical sequence to drive it. The first set of iteration macros generates text in the form F ( (where the F is whatever macro you want to call some amount of times, and ( is the result of a deferred expansion), and the second set generates the closing ). When all that is finished, you have F ( F ( F ( ... ) ) ), which only needs a small constant number of scans to expand.Distefano
You can see an example of the "optimization" I am describing in action in my C tagged union library. I couldn't bring myself to use a large hard-coded number of scans, because most use cases need less scans (and in principle I don't like having a hard-coded upper bound on the number of scans anyway, I'd rather such limits be the configurable business of the pre-processor or OS).Distefano
@Sisterhood You can even have O(1) macro definitions to expand/recurse N times, if you have a situation where you want to iterate/recurse on N inputs and you can express each input as a parenthetical expression. E.g. FOO((a)(b)(c)(...)(n)) can be expanded to something like BAR(BAR(BAR(...(BAR(whatever, a), b), c) ...), n) with just eleven macro definitions or so, and then that whole result can be expanded the rest of the way with just one more.Distefano
S
42

Here is an example of abusing the preprocessor to implement a Turing machine. Note that an external build script is needed to feed the preprocessor's output back into its input, so the preprocessor in and of itself isn't Turing complete. Still, it's an interesting project.

From the description of the afore-linked project:

the preprocessor is not Turing complete, at least not if the program is preprocessed only once. This is true even if the program is allowed to include itself. (The reason being that for a given program, the preprocessor has only a finite number of states, plus a stack consisting of the places which the file has been included from. This is only a push-down automaton.)

The answer by Paul Fultz II is quite impressive and certainly closer than I thought the preprocessor could ever get, but it's not a true Turing machine. The C preprocessor has certain limits that prevent it from executing an arbitrary program like a Turing machine could, even if you had infinite memory and time. Section 5.2.4.1 of the C spec gives the following minimum limits for a C compiler:

  • 63 nesting levels of parenthesized expressions within a full expression
  • 63 significant initial characters in an internal identifier or a macro name
  • 4095 macro identifiers simultaneously defined in one preprocessing translation unit
  • 4095 characters in a logical source line

The counter mechanism below requires a macro definition per value, so the macro definition limit will limit how many times you can loop (EVAL(REPEAT(4100, M, ~)) would yield undefined behavior). This essentially puts a cap on the complexity of the program that you can execute. The nesting and complexity of the multi-level expansions may hit one of the other limits as well.

This is fundamentally different than the "infinite memory" limitation. In this case, the spec specifically says that a standards-conforming C compiler is only required to conform to these limits, even if it has infinite time, memory, etc. Any input file exceeding these limits can be processed in an unpredictable or undefined manner (or outright rejected). Some implementations may have higher limits, or no limits at all, but that's considered "implementation-specific" and not part of the standard. It may be possible to use Paul Fultz II's method to implement something like a Turing machine on some specific compiler implementation that has no finite limits, but in a general sense of "can this be done on any arbitrary, standards-conforming C99 pre-processor", the answer is no. Since the limit here is built into the language itself and not simply a side-effect of our inability to construct an infinite computer, I say that breaks Turing completeness.

Simdars answered 28/6, 2010 at 23:10 Comment(4)
If you mean the 115 point answer by Paul Fultz II below: it confirms this answer.Ashes
The limit is in the language itself, but not because of the spec, but because we must write the scans to evaluate the algorithm in the language itself, which there is no mechanism to apply an infinite number of scans.Rossuck
To add to the lack of recursiveness, that seems to be (under)specified by the standard. §6.10.3.5 paragraph 4 shows an example of two macros calling each other and states: "There are cases where it is not clear whether a replacement is nested or not [...]. Strictly conforming programs are not permitted to depend on such unspecified behavior. "Astrosphere
@Ashes technically, the 217 point answer (now above smh) doesn't actually confirm this answer, as someone else could still prove that CPP is TC. However, it seems like no-one has, and all who attempt use LARGE NUMBERS of macros. The fact that the algorithm is designed to always terminate is enough for me (see alinsoar's answer).Countryman
K
11

To be Turing complete, one needs to define recursion that may never finish -- one calls them mu-recursive operator.

To define such an operator one needs an infinite space of defined identifiers (in case that each identifier is evaluated a finite number of times), as one cannot know a priori an upper limit of time in which the result is found. With a finite number of operators inside the code one needs to be able to check an unlimited number of possibilities.

So this class of functions cannot be computed by the C preprocessor because in C preprocessor there is a limited number of defined macros and each one is expanded only once.

The C preprocessor uses the Dave Prosser's algorithm (written by Dave Prosser for the WG14 team in 1984). In this algorithm a macro is painted blue in the moment of the first expansion; a recursive call (or mutual recursive call) does not expand it, as it has already been painted blue in the moment when the first expansion starts. So with a finite number of preprocessing lines it is impossible to make infinite calls of functions(macros), which characterizes the mu-recursive operators.

The C preprocessor can compute only sigma-recursive operators .

For details see the course of computation of Marvin L. Minsky (1967) -- Computation: Finite and Infinite Machines, Prentice-Hall, Inc. Englewood Cliffs, N.J. etc.

Kandicekandinsky answered 26/5, 2016 at 14:11 Comment(4)
The Ackerman function, which is mu-recursive only, can be implemented in the PP, so the C preprocessor is not limited to sigma-recursive operators only: gist.github.com/pfultz2/80391e8b18abf3225da2242dcc570cecRossuck
As I said in the other comment, it's a difference between what you did, to check a large (but limited) number of inputs and to search an infinte number. The Ackerman function is known to finish, this is why cpp can find its value. It is impossible to compute any mu operator and make cpp turing complete.Kandicekandinsky
@PaulFultzII You need to modify your code in order to check for a larger space of solutions, while a mu-operator is fixed and will check for an infinite space (as large as the hardware resources allow).Kandicekandinsky
The algorithm(ie the A macro) does not need to be modified, only the evaluation needs to be updated to add more scans.Rossuck
S
4

It's Turing complete within limits (as are all computers since they don't have infinite RAM). Check out the kinds of things you can do with Boost Preprocessor.

Edit in response to question edits:

The main limitation on Boost is the maximum macro expansion depth which is compiler-specific. Also, the macros that implement recursion (FOR..., ENUM..., etc.) aren't truly recursive, they just appear that way thanks to a bunch of near-identical macros. In the big picture, this limitation is no different than having a maximum stack size in an actually recursive language.

The only two things that are really necessary for limited Turing-completeness (Turing-compatibility?) are iteration/recursion (equivalent constructs) and conditional branching.

Supportable answered 28/6, 2010 at 22:52 Comment(9)
hi. That was actually what prompted my question, I have been using preprocessor for a while.Sipes
digging around in BOOST_PP's source code is the best way to figure out how it's done.Supportable
I believe that macros can't do recursion. Boost just seems to simulate them by having hardcoded macros named like macro0, macro1 .. macro255. I'm not sure whether that counts as "turing complete". The preprocessor has an explicit rule that forbids going from macro255 back to macro0 :( It seems like trying to build a verifyer for fully parenthesized expressions using a finite state automaton. It can work for a limited number of parentheses, but that's not a general verifyer anymore. I've no clue about boost.pp inner workings though, so i could likely be wrong on this.Bosomy
@Johannes Schaub: yes you're right. I had confused that with vararg when i wrote this initially. I updated the answer.Supportable
@Johannes: The chaos preprocessor doesn't have any macros like that. Check it out here: sourceforge.net/projects/chaos-ppBluster
Or you can take a look into M4 preprocessor which supports recursion :-)Probst
To the reader: be aware that Turing complete within limits means not Turing complete.Ashes
@Ashes by that logic, nothing is turing complete, because there are always finite resourcesVarden
@Varden No. Turing completeness is a property of computational systems. Such systems describe an abstraction of reality. Programming languages do that. When I write while :; do echo 1; done in my Linux shell, I've specified an unbounded loop. It is actually unbounded, as far as that programming language is concerned. The fact that all such programs must run in an environment that is finite and will at some point stop any such loop, is irrelevant. We're not discussing that environment, we're only discussing the language itself.Ashes

© 2022 - 2024 — McMap. All rights reserved.