You can compare the source code below, Recursive function vs Reimplemented as "iterative" to intuitively understand the approach without reading this whole answer.
Some conversions are trivial and not nearly as involved as what's described here. But for non-trivial cases, this methodology should cover converting even the most complex recursive cases. Even those where there are multiple recursive call points in the code and return values used within the code itself.
Sometimes it may be better not to convert a complex recursive algorithm; for instance, where the recursive depth is predictable and minimal. Some sorting algorithms are good examples of this where the max depth of recursion is close to log2(n)
.
A case where conversion would be beneficial is where the recursive depth can go beyond the allocated call stack space. The stack of an iterative solution can scale to even unpredictably large data sets without necessitating stack allocation tuning. Unexpected bugs can be avoided as datasets continue to grow year after year.
With compiled languages, there may be a slight boost in speed of execution from conversion. However, with interpreted bytecode languages like Python, instead of the native engine managing the call stack, bytecode is managing an explicit stack, which may slightly impact performance.
The approach
A general way to convert a recursive function to an iterative solution that will apply to any case is to mimic the process natively compiled code uses during a function call and the return from the call.
Taking an example that requires a somewhat involved approach, we have the multi-key quicksort algorithm. This function has three successive recursive calls, and after each call, execution begins at the next line.
The state of the function is captured in the stack frame, which is pushed onto the execution stack. When sort()
is called from within itself and returns, the stack frame present at the time of the call is restored. In that way all the variables have the same values as they did before the call - unless they were modified by the call.
Recursive function
def sort(a: list_view, d: int):
if len(a) <= 1:
return
p = pivot(a, d)
i, j = partition(a, d, p)
sort(a[0:i], d)
sort(a[i:j], d + 1)
sort(a[j:len(a)], d)
Taking this model, and mimicking it, a list is set up to act as the stack. In this example tuples are used to mimic frames. If this were encoded in C, structs could be used. The data can be contained within a data structure instead of just pushing one value at a time.
Reimplemented as "iterative"
# Assume `a` is view-like object where slices reference
# the same internal list of strings.
def sort(a: list_view):
LEFT, MID, RIGHT = 0, 1, 2
stack = []
stack.append((LEFT, a, 0)) # Initial frame.
while stack:
frame = stack.pop()
if len(frame[1]) <= 1: # Guard.
continue
stage = frame[0] # Where to jump to.
if stage is LEFT:
_, a, d = frame # a - array/list, d - depth.
p = pivot(a, d)
i, j = partition(a, d, p)
stack.append((MID, a, i, j, d)) # Where to go after "return".
stack.append((LEFT, a[0:i], d)) # Simulate function call.
elif stage is MID: # Picking up here after "call"
_, a, i, j, d = frame # State before "call" restored.
stack.append((RIGHT, a, j, d)) # Set up for next "return".
stack.append((LEFT, a[i:j], d + 1)) # Split list and "recurse".
elif stage is RIGHT:
_, a, j, d = frame
stack.append((LEFT, a[j:len(a)], d)
else:
raise ValueError("Unreachable. Fix your code.")
When a function call is made, information on where to begin execution after the function returns is included in the stack frame. In this example, if/elif/else
blocks represent the points where execution begins after return from a call. In C this could be implemented as a switch
statement.
In the example, the blocks are given labels; they're arbitrarily labeled by how the list is partitioned within each block. The first block, "LEFT" splits the list on the left side. The "MID" section represents the block that splits the list in the middle, etc.
With this approach, mimicking a call takes two steps. First a frame is pushed onto the stack that will cause execution to resume in the block following the current one after the "call" "returns". A value in the frame indicates which if/elif/else
section to fall into on the loop that follows the "call".
Then the "call" frame is pushed onto the stack. This sends execution to the first, "LEFT", block in most cases for this specific example. This is where the actual sorting is done regardless which section of the list was split to get there.
Before the looping begins, the primary frame pushed at the top of the function represents the initial call. Then on each iteration, a frame is popped. The "LEFT/MID/RIGHT" value/label from the frame is used to fall into the correct block of the if/elif/else
statement. The frame is used to restore the state of the variables needed for the current operation, then on the next iteration the return frame is popped, sending execution to the subsequent section.
The structure of the frame tuple could vary for each branch of the switch or if-elif
block to minimize the amount of data on the stack. Only place in a certain arm's frame what it needs, if possible, and overall space usage reduced.
Return values
I've found that the best approach to handling return values used within the converted code is to create a separate stack for return values. Any section of code that returns a value pushes it on this results
stack, and any section of code that expects to receive a return value from a call, pops it from the results
stack. For an example of this approach, you can check out this conversion of a recursive algorithm.