Java thread blocking
Asked Answered
S

6

6

i have a problem with my java environement. I'm running Solr 1.3 (search engine) since more then a year now and suddenly i got alot of trouble with it. All my thread pool (250) got randomly blocked once or twice a day. I did not make any change on my solr application or my tomcat server.

I'm running tomcat 5.5.25 and Solr 1.3. I got a thread dump when the system is totally overloaded :

igot like 240 thread like this one :

"http-8080-Processor1" daemon prio=10 tid=0x0000000000b2e000 nid=0x193 waiting for monitor entry [0x000000004066c000..0x000000004066cb20]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.logging.StreamHandler.publish(StreamHandler.java:174)
    - waiting to lock <0x00007fe37e72b340> (a java.util.logging.ConsoleHandler)
    at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:88)
    at java.util.logging.Logger.log(Logger.java:472)
    at java.util.logging.Logger.doLog(Logger.java:494)
    at java.util.logging.Logger.log(Logger.java:517)
    at java.util.logging.Logger.info(Logger.java:1036)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1212)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
    at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
    at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
    at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
    at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
    at java.lang.Thread.run(Thread.java:619)

We can see that this thread is blocked and waiting on : <0x00007fe37e72b340>

The thread who actually own the <0x00007fe37e72b340> is this one :

"http-8080-Processor156" daemon prio=10 tid=0x0000000000df2000 nid=0x1e52 runnable [0x0000000044521000..0x0000000044521c20]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:260)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
    - locked <0x00007fe37e3abcd8> (a java.io.BufferedOutputStream)
    at java.io.PrintStream.write(PrintStream.java:430)
    - locked <0x00007fe37e3abca0> (a java.io.PrintStream)
    at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
    at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
    at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
    at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
    - locked <0x00007fe37e72cd90> (a java.io.OutputStreamWriter)
    at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
    at java.util.logging.StreamHandler.flush(StreamHandler.java:225)
    - locked <0x00007fe37e72b340> (a java.util.logging.ConsoleHandler)
    at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:89)
    at java.util.logging.Logger.log(Logger.java:472)
    at java.util.logging.Logger.doLog(Logger.java:494)
    at java.util.logging.Logger.log(Logger.java:517)
    at java.util.logging.Logger.info(Logger.java:1036)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1212)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
    at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
    at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
    at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
    at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
    at java.lang.Thread.run(Thread.java:619)

and this is the last part of my thread dump :

"ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10 tid=0x00007f6510349800 nid=0xbff waiting on condition [0x0000000041d8d000..0x0000000041d8dd20]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1548)
    at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-1" prio=10 tid=0x0000000000c26400 nid=0xbfe waiting on condition [0x000000004200e000..0x000000004200eca0]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00007f651b275510> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:946)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:906)
    at java.lang.Thread.run(Thread.java:619)

"Low Memory Detector" daemon prio=10 tid=0x00007f6510004400 nid=0xbfa runnable [0x0000000000000000..0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x00007f6510001000 nid=0xbf9 waiting on condition [0x0000000000000000..0x0000000040d5e340]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x00000000006bc400 nid=0xbf8 waiting on condition [0x0000000000000000..0x0000000040c5d2d0]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00000000006bb000 nid=0xbf7 runnable [0x0000000000000000..0x0000000040b5da30]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x0000000000690c00 nid=0xbf6 in Object.wait() [0x000000004065e000..0x000000004065ed20]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007f651aa10258> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0x00007f651aa10258> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x000000000068f400 nid=0xbf5 in Object.wait() [0x000000004055d000..0x000000004055dca0]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007f651aa10338> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0x00007f651aa10338> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x0000000000622400 nid=0xbeb runnable [0x0000000000000000..0x00007fff69fcbba0]
   java.lang.Thread.State: RUNNABLE

"VM Thread" prio=10 tid=0x000000000068a000 nid=0xbf4 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000000062cc00 nid=0xbec runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000000062e000 nid=0xbed runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x000000000062f400 nid=0xbee runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000000630400 nid=0xbef runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000000631800 nid=0xbf0 runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000000632c00 nid=0xbf1 runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000000634000 nid=0xbf2 runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000000635400 nid=0xbf3 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007f6510006800 nid=0xbfb waiting on condition 

JNI global references: 1201

I know this is not a thread deadlock prob since one thread is actually runnning with all the ressource every other thread want.

Anyone got an idea of what can cause this prob ?

Sorbose answered 20/8, 2010 at 14:13 Comment(0)
V
5

All your Threads are logging things. They all need to write on the disk from time to time. Everytime one of your 240 threads hits a logging line there will be disk access issues.

It baffles me that the Thread having the lock is in the RUNNABLE state.

I think it might be waiting for some external ressource to be released (like disk access for example)

Are you running low on disk space? Have you recently changed something in your storage system?

Verbalism answered 20/8, 2010 at 14:29 Comment(3)
I think this is the right track. I'd look at external factors. Also, if it's writing not to a traditional file but to a Unix named pipe, make sure that somebody is reading the other end of the pipe at a sufficient pace. If the buffer fills up you'll just block.Muskogean
The disk space is fine and we did not changed anything on the system. We tried to switch the system on another server and we got the same prob.Sorbose
I don't think this is an issue with Thread safety. It would have broken much earlier. 240 Threads is not a problem for the JVM. 240 Threads editing a single file could be. An ugly fix would be to reduce the amount of logged messages. Try tweaking the thing using getLoggerNames(), getLogger() and setLevel() from download-llnw.oracle.com/javase/1.4.2/docs/api/java/util/… and download-llnw.oracle.com/javase/1.4.2/docs/api/java/util/…Verbalism
J
5

If you are running under Windows and the java application starts a console, be careful not to click in the DOS box. Window's crappy mark and copy "feature" blocks output to the ConsoleHandler. So any logger trying to write to the screen will block. Writing to the Console is done in a native call and so the java thread will appear to be in a RUNNING state when in fact it is blocked, it's just there's no way to feed that blocked status back to the application (because you are in native space).

If the application is blocked (you have clicked in the DOS box) press escape to continue.

Jehoshaphat answered 20/6, 2012 at 13:21 Comment(0)
G
0

I never used java.util.logging, so I don't know whether my suggestion is useful, but netherless:
try to use different instance of java.util.logging.Logger, so not all 240 threads will be blocked on the same monitor
(it will help if different instances of Logger use different instances of java.util.logging.ConsoleHandler) .

Grapnel answered 20/8, 2010 at 14:26 Comment(0)
O
0

It seems that the thread that own "0x00007fe37e72b340" is blocked at the IO level. Maybe a disk (raid?) issue?

can you do a thread dump 5 minutes later the see if the same thread is still blocked?

Oversexed answered 20/8, 2010 at 14:27 Comment(1)
Thanks for the tips, ill try to monitor thread activity during the next crash in a couple of hour ;)Sorbose
A
0

Flushing after each log record is going to expensive if you have very verbose logs.

A quality fix would be to clean up the logging, probably based around auditing.

As a quick fix, override StreamHandler.flush or OutputStream.flush to not do so immediately. Only flush once every so often. Note, however that you could potentially lose logging data immediately before a crash if you do this.

Antiquate answered 20/8, 2010 at 14:35 Comment(0)
S
0

According to your log, the issue concerns the use of java.util.logging.ConsoleHandler.

First try to disable the console handler by removing it from the 'handlers' and '.handlers' list in '${TOMCAT_HOME}/conf/logging.properties'. See whether the problem still occurs.

If that helps, than it is definitely an issue with the output of the ConsoleHandler. Try checking whether there is are issues concerning the 'catalina.out' file. This is the file to where tomcat redirects its console output.

Sassoon answered 11/12, 2013 at 20:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.