What's the modern way to solve Plone deadlock issues?
Asked Answered
I

2

2

I currently have a Plone 4.3.8 site where editing a portlet causes a deadlock.

I'm trying to find tools to fix this, but most deadlock tools don't work & I'm not getting good information (IMO) from those that at least run.

I've tried:

  • z3c.deadlockdebugger => can't get to a stacktrace
  • ZopeHealthWatcher => can't see the results on command line (or webpage)
  • Products.LongRequestLogger => perhaps the best so far, gives me some log output - but it's stack traces focus on Diazo code, but the problem still occurs when Diazo isn't in scope (running against 127.0.0.1)
  • gdb attach - just landed me in C code
  • winpdb => it can't attach to running processes in the same way that gdb can (only to processes started with the intention of attachment by winpdb)
  • Products.signalstack (OR Products.signalstacklogger) => USR1 signal just shuts down a zope process!

Note: z3c.deadlockdebugger (and things that depend on it) needs checked out source code to drop the threadframe dependency.

My situation seems to be linked to product upgrades - probably one or both of either plone.app.contenttypes or plone.app.multilingual, an empty site doesn't have this issue, but I obviously I need my site data!

What should I do to progress this?

EDIT:

I believe Maurits answer to be the most correct one, but it didn't work in my case. What I ended up doing was using pdb to track down the point at which the code was hanging (in plone.app.debugtoolbar as it happens)

Icily answered 14/4, 2016 at 17:25 Comment(3)
About the threadframe dependency: it's longly deprecated but you can also choose to include it in your buildout adding it to the eggs section. As it's not published on pypi you must also add an additional find-links (https://majid.info/python/threadframe/) and the allow-hosts.Adachi
@Adachi majid's site article on threadframe says: "Python 2.5 and later include a function sys._current_frames()" so I'm taking that adviceIcily
yes, it's true, but if you don't want to modify source code you can still use threadframe (although deprecated and useless).Adachi
W
5

You say that using the USR1 signal shuts down Zope when using Products.signalstack. But no special packages should be necessary, so I wonder if adding signalstack has this side effect of shutting down Zope.

At least for me, a few weeks ago, this worked fine on a Plone 4.3.something site:

kill -USR1 $(cat var/zeoclient.pid)
Wheat answered 14/4, 2016 at 20:20 Comment(2)
Ok - I've removed signalstack (but kept Products.LongRequestLogger) - and that does work at giving a stacktrace before I try to edit the portlet, but sadly it doesn't work afterwards. I'm confused as to why? I caught this in development, so I'm debugging a standalone instance rather than a ZEO client - that shouldn't matter right?Icily
Indeed, Products.singalstack was integrated into Zope 2.12.5 back in 2010.Elayneelazaro
A
2

Although the @maurits answer is right (and the simplest ones) sometimes I had issues seeing the traceback resulting from the kill command: sometimes is found on the event log, sometimes on the shell.

I prefer the integration of the buildout with haufe.requestmonitoring, configuring also the monitor long running requests feature. You will see the deadlocked traceback in your event log and also you activate a tool for monitoring low performance on your Plone.

Adachi answered 15/4, 2016 at 7:46 Comment(3)
thanks - but this didn't work for me either - I can't see a traceback in either locationIcily
If you don't see the traceback you probably not configured the long running request features as described in the link I gave youAdachi
I did do this, but then @Wheat method didn't produce a stack trace for me either. Not to worry - pdb got me there in the end. Thanks!Icily

© 2022 - 2024 — McMap. All rights reserved.