How to resolve deadlock issue in ColdFusion 9: coldfusion.util.AbstractCache$Lock
Asked Answered
L

2

2

I have been attempting to resolve an issue where the execution of certain scripts causes a deadlock, putting all subsequent requests into limbo, using up 99.9% of the CPU, and eventually effectively crashing the server.

Here is an example stack trace for one of the requests which has been put in limbo (waiting eternally):

Thread Stack Trace
Trace Time: 21:00:44.463 06-Jun-2012
Request ID: 6131
Script Name: http://www.example.com/allreviews.cfm
Started: 21:00:21.225 06-Jun-2012
Exec Time: 23238ms
Memory Used: (24%)230,667KB
Memory Free: 701,428KB
Thread ID: 0x191e (6430)
Thread Name: jrpp-494
Priority: 5
Hashcode: 1081611879
State: WAITING

"jrpp-494" prio=5 in Object.wait()

java.lang.Object.wait(Object.java:???)[Native Method]
- waiting on <0x9253305> (a coldfusion.util.AbstractCache$Lock)
java.lang.Object.wait(Object.java:485)
coldfusion.util.AbstractCache.fetch(AbstractCache.java:46)
coldfusion.util.SoftCache.get_statsOff(SoftCache.java:133)
coldfusion.util.SoftCache.get(SoftCache.java:81)
coldfusion.runtime.TemplateClassLoader.findClass(TemplateClassLoader.java:609)
coldfusion.runtime.RuntimeServiceImpl.getFile(RuntimeServiceImpl.java:785)
coldfusion.runtime.RuntimeServiceImpl.resolveTemplatePath(RuntimeServiceImpl.java:766)
coldfusion.tagext.lang.CustomTag.setName(CustomTag.java:21)
cfApplication2ecfm456206189._factor0(/srv/www/htdocs/www.example.com/www/Application.cfm:28)
cfApplication2ecfm456206189.runPage(/srv/www/htdocs/www.example.com/www/Application.cfm:1)
coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:231)
coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:416)
coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)
coldfusion.filter.CfincludeFilter.include(CfincludeFilter.java:33)
coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:279)
coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
coldfusion.filter.PathFilter.invoke(PathFilter.java:94)
coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
coldfusion.filter.CachingFilter.invoke(CachingFilter.java:62)
coldfusion.CfmServlet.service(CfmServlet.java:200)
coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doHttpServletRequest(FusionReactorCoreFilter.java:503)
com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doFusionRequest(FusionReactorCoreFilter.java:337)
com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doFilter(FusionReactorCoreFilter.java:246)
com.intergral.fusionreactor.filter.FusionReactorFilter.doFilter(FusionReactorFilter.java:121)
jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
jrun.servlet.FilterChain.service(FilterChain.java:101)
jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

If you're interested, you can see the full stack trace with what I'll call 'the locking script' at the top and all the others waiting on it.

When I first encountered this issue, I didn't have the stack traces. I posted the question, "When ColdFusion is maxing out the CPU, how do I find out what it's chewing/choking on?". I received many helpful responses and by looking at the stack traces I was able to determine that it was the same three scripts that were causing this deadlock issue over and over again.

In each case the top line in 'the locking script' reads:

coldfusion.compiler.ClassReader.skipFully(ClassReader.java:79)

And all the other requests are clogged up behind it have the following line in their respective stack traces:

- waiting on <0x9253305> (a coldfusion.util.AbstractCache$Lock)

One thing that was bothering me, was why my request timeout was not being respected; these scripts would just hang forever and never die. WTF, right? So I had to do it myself. So then when I kill 'the locking script', the others are released from limbo. At that point, if they are below the request timeout, they finish processing, and if they are over it (which most all of them usually are) then they simply proceed to timeout. But they won't timeout on their own and the requests just pile up until the active threads are used and the thread queue is full and everything seizes up.

Manually killing these each time they're requested is obviously not a solution, so, as my wife is always reminding me, "debug, debug, debug". Using a conditional <cfabort> I stepped through and found that it was making it all the way through the Application.cfm, through my header.cfm, and right up to the <cfinclude> of the problem script. If I put the <cfabort> inside the problem script (even at the very top), it would not abort and the deadlock issue would occur. If I put it just before the include, the request would abort and the deadlock issue would be avoided. Bizarre.

There's no code between these two places, right? Just before the include and just inside the include ought to be functionally equivalent, no? Probably not, because clearly something is going on in there.

I'm not using any <cflock> tags. The locking that is happening is appears to be happening at the template cache level. This same behavior is observed regardless of whether the 'Trusted Cache', 'Cache Template In Request', or "Component Cache' options are checked in admin (in any combination of checked/unchecked). I've cleared the template cache and the component cache each more than once. I restarted the CF server over and over...all to no avail.

During troubleshooting, I read this article describing a similar issue with a compiler cache lock in CF8 (8.0.1) along with instructions for applying a patch to fix it. But that isn't CF9...so obviously I can't apply their patch.

What to do? Has anybody else encountered this issue? ...And have a solution?

Lubin answered 8/6, 2012 at 1:22 Comment(3)
I already mentioned this, but just in case it went unnoticed, this investigation started out in this thread: When ColdFusion is maxing out the CPU, how do I find out what it's chewing/choking on?. If you're having this issue, the conversation there may prove handy as a starting point.Lubin
ClassReader extends ByteArrayInputStream, which does have bugs in Java 1.6 which could cause the deadlock you're seeing. Your code hangs at review.cfm line 13. Is review.cfm or the file it's including particularly large?Sufficient
Neither file is particularly large. I found, as I described, that an abort just prior to the the include on line 13 prevented the deadlock, but that an abort at the top of the included file did not. I have now also discovered that if a make a special exception for review 1872 and instead of including 1872.cfm, I include an exact duplicate of that file, 1872new.cfm, then the abort at the top of the include does work, and deadlock is avoided. This strongly suggests some sort of caching problem, and yet the problem occurs regardless of the state of the various caching toggles in admin.Lubin
L
2

It turns out that sometimes the class files are corrupted and need to be regenerated. When you experience this issue, it will manifest as described above, with a deadlock while attempting to load the corrupted class file.The steps to regenerate the class files are simple, as follows:

  1. Go to the ColdFusion Administrator > Server Settings > Caching

  2. Uncheck the following option:

    [  ] Save class files When you select this option, the class files generated by ColdFusion are saved to disk for reuse after the server restarts. Adobe recommends this for production systems. During development, Adobe recommends that you do not select this option.

  3. Click the [Submit Changes] button

  4. Restart the ColdFusion server

  5. Go to the ColdFusion Administrator > Server Settings > Caching

  6. Check the following option:

    [x] Save class files When you select this option, the class files generated by ColdFusion are saved to disk for reuse after the server restarts. Adobe recommends this for production systems. During development, Adobe recommends that you do not select this option.

  7. Click the [Submit Changes] button

  8. And your done! The problem disappears entirely, and all is once again as it should be. ;-)

Why and how do the class files become corrupted? I don't know. Perhaps this could be the subject of another question. All I know is, this solves the problem. I'm usually hesitant to accept my own answers as authoritative, so, if someone else has a better explanation of this problem and its solution, please feel free to post it.

Lubin answered 9/7, 2012 at 20:34 Comment(1)
Challenging problem, and thanks for all your thoughts here. But I would think what "fixed" things in this case was more just the restart of CF. Turning off that setting, and restarting, and turning it on really does nothing to the saved class files (cfusion/WEB-INF/cfclasses). If you said instead you had stopped CF and DELETED that folder, that would have also cleared out whatever bad file (java class compiled by CF from CFML source) was causing this. But you didn't, so turning that off and on did nothing, I'd say. Did you ever learn more?Cesarcesare
T
0

As a workaround, you could try disabling template caching entirely by setting "Maximum number of cached templates" to 0. Not ideal for production .. but might be better than crashing ..

Tales answered 2/7, 2012 at 1:42 Comment(1)
Thank you for your suggestion Jared. But yes, not caching is less than ideal. Fortunately, I was able to work out another solution. Thanks again. And cheers!Lubin

© 2022 - 2024 — McMap. All rights reserved.