AEM performance issues (slow memory leak) org.slf4j.helpers.BasicMarker and org.slf4j.helpers.BasicMarkerFactory
Asked Answered
D

1

16

I am currently using the Adobe Experience Manager (AEM also known as CQ) for a Client's site (Java platform). It uses OpenJDK:

java version "1.7.0_65" 
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17) 
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

It is running on Rackspace with the following:

vCPU: 4 
Memory: 16GB 
Guest OS: Red Hat Enterprise Linux 6 (64-bit)

Since it has been in production I have been experiencing very slow performance on the part of the application. It goes like this I launch the app, everything is smooth then 7 to 10 days later the CPU usage spikes to 400% (~4000 users/day hit the site). The site becomes exceptionally slow and never becomes an OOM exception.

Since I am a novice at Java Memory management I started reading about how it works and found tools like jstat and jmap. When the system was overwhelmed the second time around, I got a heap dump and dug into it.

It all seems to be pointing out at org.slf4j.helpers.BasicMarkerFactory and org.slf4j.helpers.BasicMarker as when I analyze it with MAT eclipse I see that the biggest retained object by retained size is:

org.slf4j.helpers.BasicMarkerFactory @ 0x6021a4f00
Shallow Size: 16 B Retained Size: 6.8 GB

and

When I run a Leak suspects report I get the following result:

Description
One instance of "org.slf4j.helpers.BasicMarkerFactory" loaded by "org.apache.felix.framework.BundleWiringImpl$BundleClassLoaderJava5 @ 0x60219a878" occupies 7,263,024,848 (96.71%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "<system class loader>".

Keywords
java.util.concurrent.ConcurrentHashMap$Segment[]
org.apache.felix.framework.BundleWiringImpl$BundleClassLoaderJava5 @ 0x60219a878
org.slf4j.helpers.BasicMarkerFactory

and

Shortest Paths To the Accumulation Point

Class Name                                                                                  Shallow Heap    Retained Heap

java.util.concurrent.ConcurrentHashMap$Segment[16] @ 0x6021a4f40                                80              7,263,024,784
.
... segments java.util.concurrent.ConcurrentHashMap @ 0x6021a4f10                               48              7,263,024,832
    .
    ... markerMap org.slf4j.helpers.BasicMarkerFactory @ 0x6021a4f00                            16              7,263,024,848
        .
        ... markerFactory org.slf4j.impl.StaticMarkerBinder @ 0x6021d3970                       16                         16
            .
            ... SINGLETON class org.slf4j.impl.StaticMarkerBinder @ 0x6021d38f8                 8                          24
                .
                ... [328] java.lang.Object[640] @ 0x6021d2ee8                                   2,576                   9,592
                    .
                    ... elementData java.util.Vector @ 0x6021d0fe0                              32                      9,624
                        .
                        ... classes org.apache.felix.framework.BundleWiringImpl$
                            BundleClassLoaderJava5 @ 0x6021c32e0                                96                     26,888
                            .
                            ... <classloader> class ch.qos.logback.classic.Logger @ 
                                0x600be4310                                                     16                         16
                            .   .
                            .   ...<class> ch.qos.logback.classic.Logger @ 0x600282a78          48                          48
                            .
                            ... <Java Local> java.lang.Thread @ 0x60077b450
                                pool-9-thread-1 Thread                                          104                     3,344
                                .
                                ... <class> ch.qos.logback.classic.Logger @ 0x60025b850         48                          48
                                .
                                ... <class> ch.qos.logback.classic.Logger @ 0x604b0a708         48                          48
                                .
                                ... <class> ch.qos.logback.classic.Logger @ 0x604b0a6d8         48                          48
                                .
                                ... <class> ch.qos.logback.classic.Logger @ 0x6049debe0         48                          48
                                .
                                ... <class> ch.qos.logback.classic.Logger @ 0x604535228         48                          48
                                .
                                ... <class> ch.qos.logback.classic.Logger @ 0x604124248         48                         48

Also when I run:

$ sudo -u aem jmap -histo PID


 num     #instances         #bytes  class name
----------------------------------------------
   1:      11460084      950827248  [C
   2:      10740160      257763840  java.lang.String
   3:       7681495      245807840  java.util.concurrent.ConcurrentHashMap$HashEntry
   4:       7563527      181524648  org.slf4j.helpers.BasicMarker
   5:        217007      173568376  [I
   6:        177602      158721184  [B
   7:         60611       69739136  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
   8:       1147481       69348496  [Ljava.lang.Object;
   9:       1797107       43130568  org.apache.jackrabbit.oak.plugins.segment.RecordId
  10:        208912       33824544  <constMethodKlass>
  11:        570143       31928008  org.mozilla.javascript.ast.Name
  12:         22350       27643920  <constantPoolKlass>
  13:        208912       26752544  <methodKlass>
  14:        821217       26278944  java.util.UUID
  15:        793800       25401600  java.util.HashMap$Entry
  16:        532946       21317840  org.mozilla.javascript.Node
  17:        792296       19015104  java.lang.Long
  18:        191294       18335600  [Ljava.util.HashMap$Entry;
  19:         22350       16133328  <instanceKlassKlass>
  20:        173883       15855152  [Ljava.lang.String;
  21:        635690       15256560  org.apache.sling.engine.impl.request.SlingRequestProgressTracker$TrackingEntry
  22:         18509       14662848  <constantPoolCacheKlass>
  23:        911112       14577792  java.lang.Integer
  24:        255426       14303856  org.apache.jackrabbit.oak.plugins.segment.SegmentNodeBuilder
  25:        519324       12463776  java.util.ArrayList
  26:        254643       12222864  org.apache.jackrabbit.oak.core.SecureNodeBuilder
  27:        137703       11016240  java.lang.reflect.Method
  28:        312116        9987712  org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState
  29:         19236        9828448  [Lorg.apache.jackrabbit.oak.plugins.segment.SegmentId;
  30:        242179        9687160  java.util.TreeMap$Entry
  31:        197121        9461808  java.util.HashMap
  32:         15041        9416328  <methodDataKlass>
  33:        387927        9310248  org.apache.jackrabbit.oak.plugins.segment.MapRecord
  34:        250049        8001568  org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder$UnconnectedHead
  35:        248586        7954752  org.apache.jackrabbit.oak.core.MutableTree
  36:        107865        7948112  [S
  37:        191950        7678000  java.util.LinkedHashMap$Entry
  38:        102212        6541568  org.mozilla.javascript.ast.PropertyGet
  39:         37021        6515696  org.mozilla.javascript.ast.FunctionNode
  40:        161905        6476200  org.mozilla.javascript.ScriptableObject$Slot

.....

8210:             1             16   org.slf4j.helpers.BasicMarkerFactory

I noticed:

   4:       7563527      181524648  org.slf4j.helpers.BasicMarker

and

8210:             1             16  org.slf4j.helpers.BasicMarkerFactory

When I go into the documentation of org.slf4j.helpers.BasicMarkerFacotry I see the following that gets my attention:

detachMarker
public boolean detachMarker(String name)
Description copied from interface: IMarkerFactory
Detach an existing marker.
Note that after a marker is detached, there might still be "dangling" references to the detached marker.

Specified by:
detachMarker in interface IMarkerFactory
Parameters:
name - The name of the marker to detach
Returns:
whether the marker could be detached or not

In particular:

Note that after a marker is detached, there might still be "dangling" references to the detached marker.
  1. Hopefully someone will be able to help pinpoint the cause of my issues as I am a little lost here ?
  2. Has anyone seen this before ?
  3. How could I go about troubleshooting this issue furthermore ?
  4. Do you agree that org.slf4j.helpers.BasicMarker and org.slf4j.helpers.BasicMarkerFactory seem to be the root cause of my issues ?
  5. Is my logging configuration a suspect ?
  6. Is this a slow memory leak or a performance tuning issue (in my opinion seem like the memory leaks slowly over a week or so) ?

Any advice is welcomed.

Thanks in advance.

Demetrademetre answered 20/7, 2015 at 18:47 Comment(6)
First, you've done an excellent job of troubleshooting. Congratulations! It sounds like there's clearly a resource leak: 1) perhaps org.slf4 Markers has a bug, 2) or perhaps "org.apache.felix.framework" is failing to free resources, 3) or maybe the problem is at the top of the stack - Adobe Experience Manager (AEM) might be the culprit. SUGGESTION: Open a support ticket with Adobe (if your client purchased AEM), and post to felix.apache.org and slf4j.org.Semiconscious
Thanks a lot for your prompt response ! I'll post to felix.apache.org. I'm subscribed to the slf4j mailing list and have already sent out a lengthy message like this one, yet no response for the time being. Regarding AEM support tickets, I've already posted one with Heap Dump, Thread dump and errlog. Their answer wasn't very useful as they never said anything about slf4j at all. They pointed their finger at one of my components. Anyways I looked at the code again but found nothing out of the ordinary and nothing involving slf4j. But I'll try again posting a support ticket with this message.ThxDemetrademetre
Excellent analysis @nabello. I would suggest the same thing as mentioned, Open a daycare ticket(Adobe support) and at the same time open a ticket in Apache Felix and SL4J. This most likely looks like a GC bug.Cochlea
@Demetrademetre this is an issue in Oak. Opened issues.apache.org/jira/browse/OAK-3476 to fix it on Oak sideRupertruperta
Based on the comments in OAK-3476, is it correct to say that disabling debug logs would remove the problem?Knecht
Nabello, if you haven't done so already, contact Adobe AEM customer care for the latest AEM/Oak related hotfix. Oak 1.0.23 for AEM6.0 or Oak 1.2.8 for AEM6.1 includes the fix for OAK-3476 mentioned by Chetan above.Teri
A
0

Are you using the MarkerFactory to create your Marker instances ?

Sounds obvious that theConcurrentHashMap holding your Marker is ever growing. That would be the case for e.g. if you create a marker with each time a different name (for e.g. using a date). Marker are to mark you log so you can filter them accordingly. You may want to post the part of the code where you are creating your marker.

Albarran answered 23/10, 2015 at 19:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.