After discovering the Boost preprocessor's capabilities I found myself wondering: Is the C99 preprocessor Turing complete?
If not, what does it lack to not qualify?
After discovering the Boost preprocessor's capabilities I found myself wondering: Is the C99 preprocessor Turing complete?
If not, what does it lack to not qualify?
Here is an example of abusing the preprocessor to implement a Turing machine. Note that an external build script is needed to feed the preprocessor's output back into its input, so the preprocessor in and of itself isn't Turing complete. Still, it's an interesting project.
From the description of the afore-linked project:
the preprocessor is not Turing complete, at least not if the program is preprocessed only once. This is true even if the program is allowed to include itself. (The reason being that for a given program, the preprocessor has only a finite number of states, plus a stack consisting of the places which the file has been included from. This is only a push-down automaton.)
The answer by Paul Fultz II is quite impressive and certainly closer than I thought the preprocessor could ever get, but it's not a true Turing machine. The C preprocessor has certain limits that prevent it from executing an arbitrary program like a Turing machine could, even if you had infinite memory and time. Section 5.2.4.1 of the C spec gives the following minimum limits for a C compiler:
- 63 nesting levels of parenthesized expressions within a full expression
- 63 significant initial characters in an internal identifier or a macro name
- 4095 macro identifiers simultaneously defined in one preprocessing translation unit
- 4095 characters in a logical source line
The counter mechanism below requires a macro definition per value, so the macro definition limit will limit how many times you can loop (EVAL(REPEAT(4100, M, ~))
would yield undefined behavior). This essentially puts a cap on the complexity of the program that you can execute. The nesting and complexity of the multi-level expansions may hit one of the other limits as well.
This is fundamentally different than the "infinite memory" limitation. In this case, the spec specifically says that a standards-conforming C compiler is only required to conform to these limits, even if it has infinite time, memory, etc. Any input file exceeding these limits can be processed in an unpredictable or undefined manner (or outright rejected). Some implementations may have higher limits, or no limits at all, but that's considered "implementation-specific" and not part of the standard. It may be possible to use Paul Fultz II's method to implement something like a Turing machine on some specific compiler implementation that has no finite limits, but in a general sense of "can this be done on any arbitrary, standards-conforming C99 pre-processor", the answer is no. Since the limit here is built into the language itself and not simply a side-effect of our inability to construct an infinite computer, I say that breaks Turing completeness.
Well macros don't directly expand recursively, but there are ways we can work around this.
The easiest way of doing recursion in the preprocessor is to use a deferred expression. A deferred expression is an expression that requires more scans to fully expand:
#define EMPTY()
#define DEFER(id) id EMPTY()
#define OBSTRUCT(...) __VA_ARGS__ DEFER(EMPTY)()
#define EXPAND(...) __VA_ARGS__
#define A() 123
A() // Expands to 123
DEFER(A)() // Expands to A () because it requires one more scan to fully expand
EXPAND(DEFER(A)()) // Expands to 123, because the EXPAND macro forces another scan
Why is this important? Well when a macro is scanned and expanding, it creates a disabling context. This disabling context will cause a token, that refers to the currently expanding macro, to be painted blue. Thus, once its painted blue, the macro will no longer expand. This is why macros don't expand recursively. However, a disabling context only exists during one scan, so by deferring an expansion we can prevent our macros from becoming painted blue. We will just need to apply more scans to the expression. We can do that using this EVAL
macro:
#define EVAL(...) EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(__VA_ARGS__)))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(__VA_ARGS__)))
#define EVAL5(...) __VA_ARGS__
Now if we want to implement a REPEAT
macro using recursion, first we need some increment and decrement operators to handle state:
#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__
#define INC(x) PRIMITIVE_CAT(INC_, x)
#define INC_0 1
#define INC_1 2
#define INC_2 3
#define INC_3 4
#define INC_4 5
#define INC_5 6
#define INC_6 7
#define INC_7 8
#define INC_8 9
#define INC_9 9
#define DEC(x) PRIMITIVE_CAT(DEC_, x)
#define DEC_0 0
#define DEC_1 0
#define DEC_2 1
#define DEC_3 2
#define DEC_4 3
#define DEC_5 4
#define DEC_6 5
#define DEC_7 6
#define DEC_8 7
#define DEC_9 8
Next we need a few more macros to do logic:
#define CHECK_N(x, n, ...) n
#define CHECK(...) CHECK_N(__VA_ARGS__, 0,)
#define NOT(x) CHECK(PRIMITIVE_CAT(NOT_, x))
#define NOT_0 ~, 1,
#define COMPL(b) PRIMITIVE_CAT(COMPL_, b)
#define COMPL_0 1
#define COMPL_1 0
#define BOOL(x) COMPL(NOT(x))
#define IIF(c) PRIMITIVE_CAT(IIF_, c)
#define IIF_0(t, ...) __VA_ARGS__
#define IIF_1(t, ...) t
#define IF(c) IIF(BOOL(c))
#define EAT(...)
#define EXPAND(...) __VA_ARGS__
#define WHEN(c) IF(c)(EXPAND, EAT)
Now with all these macros we can write a recursive REPEAT
macro. We use a REPEAT_INDIRECT
macro to refer back to itself recursively. This prevents the macro from being painted blue, since it will expand on a different scan(and using a different disabling context). We use OBSTRUCT
here, which will defer the expansion twice. This is necessary because the conditional WHEN
applies one scan already.
#define REPEAT(count, macro, ...) \
WHEN(count) \
( \
OBSTRUCT(REPEAT_INDIRECT) () \
( \
DEC(count), macro, __VA_ARGS__ \
) \
OBSTRUCT(macro) \
( \
DEC(count), __VA_ARGS__ \
) \
)
#define REPEAT_INDIRECT() REPEAT
//An example of using this macro
#define M(i, _) i
EVAL(REPEAT(8, M, ~)) // 0 1 2 3 4 5 6 7
Now this example is limited to 10 repeats, because of limitations of the counter. Just like a repeat counter in a computer would be limited by the finite memory. Multiple repeat counters could be combined together to workaround this limitation, just like in a computer. Furthermore, we could define a FOREVER
macro:
#define FOREVER() \
? \
DEFER(FOREVER_INDIRECT) () ()
#define FOREVER_INDIRECT() FOREVER
// Outputs question marks forever
EVAL(FOREVER())
This will try to output ?
forever, but will eventually stop because there are no more scans being applied. Now the question is, if we gave it an infinite number of scans would this algorithm complete? This is known as the halting problem, and Turing completeness is necessary to prove the undecidability of the halting problem. So as you can see, the preprocessor can act as a Turing complete language, but instead of being limited to the finite memory of a computer it is instead limited by the finite number of scans applied.
EVAL
macro applies over 250 scans to the algorithm even it is not needed. Whereas Boost.PP only uses the number of scans that is needed by the algorithm, so Boost.PP is much more efficient. –
Rossuck for(;;) {}
loop, I can say that you cannot do an infinite loop in cpp, simply because the maximum number of iterations depend on the number of identifiers inside the input file, whatever you do. You should create new symbols inside the eval(), but it is not possible to create an infinite number of symbols... –
Kandicekandinsky ?
. This function is not mu-recusive operator but sigma-recursive. The only way to try and make it turing-complete is by defining an infinite number of macros, as I explained in my answer. C preprocessor does not allow such a feature. –
Kandicekandinsky EVAL
which throws a large hard-coded amount of scans at it. It works if you have a parenthetical sequence to drive it. The first set of iteration macros generates text in the form F (
(where the F
is whatever macro you want to call some amount of times, and (
is the result of a deferred expansion), and the second set generates the closing )
. When all that is finished, you have F ( F ( F ( ... ) ) )
, which only needs a small constant number of scans to expand. –
Distefano FOO((a)(b)(c)(...)(n))
can be expanded to something like BAR(BAR(BAR(...(BAR(whatever, a), b), c) ...), n)
with just eleven macro definitions or so, and then that whole result can be expanded the rest of the way with just one more. –
Distefano Here is an example of abusing the preprocessor to implement a Turing machine. Note that an external build script is needed to feed the preprocessor's output back into its input, so the preprocessor in and of itself isn't Turing complete. Still, it's an interesting project.
From the description of the afore-linked project:
the preprocessor is not Turing complete, at least not if the program is preprocessed only once. This is true even if the program is allowed to include itself. (The reason being that for a given program, the preprocessor has only a finite number of states, plus a stack consisting of the places which the file has been included from. This is only a push-down automaton.)
The answer by Paul Fultz II is quite impressive and certainly closer than I thought the preprocessor could ever get, but it's not a true Turing machine. The C preprocessor has certain limits that prevent it from executing an arbitrary program like a Turing machine could, even if you had infinite memory and time. Section 5.2.4.1 of the C spec gives the following minimum limits for a C compiler:
- 63 nesting levels of parenthesized expressions within a full expression
- 63 significant initial characters in an internal identifier or a macro name
- 4095 macro identifiers simultaneously defined in one preprocessing translation unit
- 4095 characters in a logical source line
The counter mechanism below requires a macro definition per value, so the macro definition limit will limit how many times you can loop (EVAL(REPEAT(4100, M, ~))
would yield undefined behavior). This essentially puts a cap on the complexity of the program that you can execute. The nesting and complexity of the multi-level expansions may hit one of the other limits as well.
This is fundamentally different than the "infinite memory" limitation. In this case, the spec specifically says that a standards-conforming C compiler is only required to conform to these limits, even if it has infinite time, memory, etc. Any input file exceeding these limits can be processed in an unpredictable or undefined manner (or outright rejected). Some implementations may have higher limits, or no limits at all, but that's considered "implementation-specific" and not part of the standard. It may be possible to use Paul Fultz II's method to implement something like a Turing machine on some specific compiler implementation that has no finite limits, but in a general sense of "can this be done on any arbitrary, standards-conforming C99 pre-processor", the answer is no. Since the limit here is built into the language itself and not simply a side-effect of our inability to construct an infinite computer, I say that breaks Turing completeness.
To be Turing complete, one needs to define recursion that may never finish -- one calls them mu-recursive operator.
To define such an operator one needs an infinite space of defined identifiers (in case that each identifier is evaluated a finite number of times), as one cannot know a priori an upper limit of time in which the result is found. With a finite number of operators inside the code one needs to be able to check an unlimited number of possibilities.
So this class of functions cannot be computed by the C preprocessor because in C preprocessor there is a limited number of defined macros and each one is expanded only once.
The C preprocessor uses the Dave Prosser's algorithm (written by Dave Prosser for the WG14 team in 1984). In this algorithm a macro is painted blue in the moment of the first expansion; a recursive call (or mutual recursive call) does not expand it, as it has already been painted blue in the moment when the first expansion starts. So with a finite number of preprocessing lines it is impossible to make infinite calls of functions(macros), which characterizes the mu-recursive operators.
The C preprocessor can compute only sigma-recursive operators .
For details see the course of computation of Marvin L. Minsky (1967) -- Computation: Finite and Infinite Machines, Prentice-Hall, Inc. Englewood Cliffs, N.J. etc.
A
macro) does not need to be modified, only the evaluation needs to be updated to add more scans. –
Rossuck It's Turing complete within limits (as are all computers since they don't have infinite RAM). Check out the kinds of things you can do with Boost Preprocessor.
Edit in response to question edits:
The main limitation on Boost is the maximum macro expansion depth which is compiler-specific. Also, the macros that implement recursion (FOR..., ENUM..., etc.) aren't truly recursive, they just appear that way thanks to a bunch of near-identical macros. In the big picture, this limitation is no different than having a maximum stack size in an actually recursive language.
The only two things that are really necessary for limited Turing-completeness (Turing-compatibility?) are iteration/recursion (equivalent constructs) and conditional branching.
macro0
, macro1
.. macro255
. I'm not sure whether that counts as "turing complete". The preprocessor has an explicit rule that forbids going from macro255
back to macro0
:( It seems like trying to build a verifyer for fully parenthesized expressions using a finite state automaton. It can work for a limited number of parentheses, but that's not a general verifyer anymore. I've no clue about boost.pp inner workings though, so i could likely be wrong on this. –
Bosomy while :; do echo 1; done
in my Linux shell, I've specified an unbounded loop. It is actually unbounded, as far as that programming language is concerned. The fact that all such programs must run in an environment that is finite and will at some point stop any such loop, is irrelevant. We're not discussing that environment, we're only discussing the language itself. –
Ashes © 2022 - 2024 — McMap. All rights reserved.