lru_cache vs dynamic programming, stackoverflow with one but not with the other?
Asked Answered
T

1

4

I'm doing this basic dp (Dynamic Programming) problem on trees (https://cses.fi/problemset/task/1674/). Given the structure of a company (hierarchy is a tree), the task is to calculate for each employee the number of their subordinates.

This:

import sys
from functools import lru_cache  # noqa
sys.setrecursionlimit(2 * 10 ** 9)

if __name__ == "__main__":
    n: int = 200000
    boss: list[int] = list(range(1, 200001))  
    # so in my example it will be a tree with every parent having one child
    graph: list[list[int]] = [[] for _ in range(n)]

    for i in range(n-1):
        graph[boss[i] - 1].append(i+1)  # directed so neighbours of a node are only its children

    @lru_cache(None)
    def dfs(v: int) -> int:
        if len(graph[v]) == 0:
            return 0
        else:
            s: int = 0
            for u in graph[v]:
                s += dfs(u) + 1
            return s

    dfs(0)

    print(*(dfs(i) for i in range(n)))

crashes (I googled the error message and it means stack overflow)

Process finished with exit code -1073741571 (0xC00000FD)

HOWEVER

import sys
sys.setrecursionlimit(2 * 10 ** 9)

if __name__ == "__main__":
    n: int = 200000
    boss: list[int] = list(range(1, 200001))
    # so in my example it will be a tree with every parent having one child
    graph: list[list[int]] = [[] for _ in range(n)]

    for i in range(n-1):
        graph[boss[i] - 1].append(i+1)  # directed so neighbours of a node are only its children

    dp: list[int] = [0 for _ in range(n)]
    def dfs(v: int) -> None:
        if len(graph[v]) == 0:
            dp[v] = 0
        else:
            for u in graph[v]:
                dfs(u)
                dp[v] += dp[u] + 1

    dfs(0)

    print(*dp)

doesn't and it's exactly the same complexity right? The dfs goes exactly as deep in both situations too? I tried to make the two pieces of code as similar as I could.

I tried 20000000 instead of 200000 (i.e. graph 100 times deeper) and it still doesn't stackoverflow for the second option. Obviously I could do an iterative version of it but I'm trying to understand the underlying reason why there are such a big difference between those two recursive options so that I can learn more about Python and its underlying functionning.

I'm using Python 3.11.1.

Thibeault answered 15/2 at 17:30 Comment(12)
I don’t understand @chepner. Using dynamic programming I compute F(1000) by doing F(999) which uses F(998) etc… in my code.Thibeault
DP would look more like f = {} (using a dict just to simplify code in the comment), f[0] = 1; f[1] = 1; for i in range(2, 1000): f[i] = f[i-1] + f[i-2]. That's not recursion; that's using a lookup table that you fill in "just in time", as it were. The stack never grows, unlike a call to def F(n): return 1 if n == 0 else 1 if n == 1 else F(n-1) + F(n-2), which uses O(n) stack space whether or not you use a cache to reduce the complexity from O(2**N) to O(n).Sold
@Sold You seem to be saying that only bottom-up DP is DP, but top-down DP (memoization) is DP, too. (I'm not sure what to call the question's second program. It's a rather odd mix. But it suffices to demonstrate what the question is about.)Distributive
The recursion limit is there specifically to prevent a stack overflow. You disabled that protection by setting a ridiculously high recursion limit, then doing deep recursion. Why?!?Cathicathie
@Cathicathie That's specifically what setrecursionlimit is for. Quote from its doc: "A user may need to set the limit higher when they have a program that requires deep recursion and a platform that supports a higher limit."Distributive
@KellyBundy: Yeah, there's a huge difference between the default 1000 and a 2 billion depth limit. No platform in the world would allow the stack to grow anywhere close to that level without significant modification.Cathicathie
@Cathicathie I suspect they meant that as "infinity", an attempt to say "no limit". It's not the depth they're actually trying. I'm not so sure that no platform would allow that much. Up to how much do you think realistic platforms do allow?Distributive
Yes I set it super high so that I wouldn’t encounter a recursion limit @shadowrangerThibeault
@Sold and yes I was talking about top-down dp / memoisation, not bottom-up dp. Maybe I could’ve been more precise than just saying dp, apologies.Thibeault
@FluidMechanicsPotentialFlows: Yes, but now the recursion limit is the stack, which you overflowed, and got a much less helpful response to. That's the point.Cathicathie
Oh ok, I now understand your comment and that you meant "the recursion limit is there to avoid overflowing the stack"!Thibeault
@Cathicathie Did you see my question "Up to how much do you think realistic platforms do allow"? (Lately I'm sometimes only notified about the last comment, so now I'm never sure whether others have seen replies before the last.)Distributive
D
6

lru_cache is implemented in C, its calls are interleaved with your function's calls, and your C code recursion is too deep and crashes. Your second program only has deep Python code recursion, not deep C code recursion, avoiding the issue.

In Python 3.11 I get a similar bad crash:

[Execution complete with exit code -11]

In Python 3.12 I just get an error:

Traceback (most recent call last):
  File "/ATO/code", line 34, in <module>
    dfs(0)
  File "/ATO/code", line 31, in dfs
    s += dfs(u) + 1
         ^^^^^^
  File "/ATO/code", line 31, in dfs
    s += dfs(u) + 1
         ^^^^^^
  File "/ATO/code", line 31, in dfs
    s += dfs(u) + 1
         ^^^^^^
  [Previous line repeated 496 more times]
RecursionError: maximum recursion depth exceeded

That's despite your sys.setrecursionlimit(2 * 10 ** 9).

What’s New In Python 3.12 explains:

sys.setrecursionlimit() and sys.getrecursionlimit(). The recursion limit now applies only to Python code. Builtin functions do not use the recursion limit, but are protected by a different mechanism that prevents recursion from causing a virtual machine crash

So in 3.11, your huge limit is applied to C recursion as well, Python obediently attempts your deep recursion, its C stack overflows, and the program crashes. Whereas in 3.12 the limit doesn't apply to C recursion, Python protects itself with that different mechanism at a relatively shallow recursion depth, producing that error instead.

Let's avoid that C recursion. If I use a (simplified) Python version of lru_cache, your first program works fine in both 3.11 and 3.12 without any other change:

def lru_cache(_):
    def deco(f):
        memo = {}
        def wrap(x):
            if x not in memo:
                memo[x] = f(x)
            return memo[x]
        return wrap
    return deco

See CPython's GitHub Issue 3.12 setrecursionlimit is ignored in connection with @functools.cache for more details and the current progress on this. There are efforts to increase the C recursion limit, but it looks like it'll remain only in the thousands. Not enough for your super deep recursion.

Miscellaneous further information I might expand on later:

If you want the Python version of the regular lru_cache, you could try the hack of importing and deleting the C version of its wrapper before you import the Python version (Technique from this answer, which did that with heapq):

import _functools
del _functools._lru_cache_wrapper
from functools import lru_cache

Or you could remove the import that replaces the Python implementation with the C implementation. But that's an even worse hack and I wouldn't do that. Maybe I would copy&rename the functools module and modify the copy. But I mostly mention these hacks to illustrate what's happening, or as a last resort.

Distributive answered 24/2 at 14:43 Comment(8)
So the C stack overflows, but the Python stack doesn’t? Even though traditionally C has better memory management?Thibeault
C doesn't have "better" memory management; it has manual memory management. But this isn't about memory management, but stack growth. Python monitors the stack and raises an exception if the stack gets "too big"; C simply lets the stack grow until you run out of memory for it.Sold
@FluidMechanicsPotentialFlows Yes. The C stack limit appears to usually be ~8 MB nowadays and to overflow with C recursion depth in the thousands or tens of thousands. Whereas pure Python recursion since Python 3.11 now mostly doesn't involve the C stack and is only limited by the limit you can control with sys.setrecursionlimit and by how much (heap) memory you have available. Which can be much higher. If you have Linux, you might be able to increase your C stack with ulimit -s sizeInKB.Distributive
@FluidMechanicsPotentialFlows See the added "Miscellaneous" section for more references/explanation.Distributive
So in pure Python recursion, the recursion calls use the heap which is a lot bigger than the stack - and the lru_cache (C cache) uses a stack (faster but smaller) which explains the difference?Thibeault
@FluidMechanicsPotentialFlows Yes. Python's devguide talks about The call stack of its interpreter: up to Python 3.10, "each call would require a heap allocation for the stack frame" (though with some reuse optimization). It also describes the new mechanism, but isn't quite as clear about that, but it surely still uses the heap. That's slower than a single fixed preallocated stack like C's would be, but as Python is relatively slow anyway, the extra cost of using the heap is much less significant than it would be for C.Distributive
@FluidMechanicsPotentialFlows Thanks for the bounty, although a thought for the future: There's no need to award it early, and it's beneficial to wait until it's over. An active bounty attracts attention, which leads to more answers and more readers. Good for everybody. I don't care about reputation points, but it makes me happy when more people read my work. Awarding a bounty stops that increased readership, which is a bummer. Also, maybe an even better answer would've come along and you could've awarded the bounty to that one then.Distributive
@KellyBundy Oh I understand, I did not think about the fact that awarding it would decrease visibility.Thibeault

© 2022 - 2024 — McMap. All rights reserved.