How to hunt down obscure HA clustering bug in Wildfly 8.2.0.Final
Asked Answered
G

1

11

The setup

I have a Wildfly 8.2.0.Final application server running a cluster in domain mode using the full-ha profile. The cluster consists of two instances of wildfly, master and slave, each running on its own virtual machine.

The application

My project is deployed as a war-file on the application server. For test purposes my loadbalancer distributes the requests using round-robin.

Anonymous users can use the service provided by this project using a button, which will call in two steps first register then login. Login will use the session that is created during the register phase providing the credentials created during the register call.

The implementation

The login endpoint is a request scoped CDI bean, which among other members has a member which holds user information. The user information is a SessionScopped EJB Bean which is created during session instantiation and which is injected into the login endpoints CDI bean. The user information is supposed to get distributed among cluster members.

Observed behavior

Now the funny part:

  • Using Firefox:
    1. /rest/register -> 200 OK
    2. /rest/login -> 200 OK
  • Using Firefox in private mode:
    1. /rest/register -> 200 OK
    2. /rest/login -> 200 OK
  • Using Chrome:
    1. /rest/register -> 200 OK
    2. /rest/login -> 200 OK
  • Using Chrome in private mode:
    1. /rest/register -> 200 OK
    2. /rest/login -> 200 OK
  • Using Internet Explorer 11:
    1. /rest/register -> 200 OK
    2. /rest/login -> 200 OK
  • Using Internet Explorer 11 in private mode:
    1. /rest/register -> 200 OK
    2. /rest/login -> 500 Internal Server error

The logs

This is the stack trace of the 500:

    2015-05-07 10:05:31,734 ERROR [io.undertow.request] (default task-11) UT005023: Exception handling request to /testtest/rest/login: org.jboss.resteasy.spi.UnhandledException: java.lang.NullPointerException
        at org.jboss.resteasy.core.ExceptionHandler.handleApplicationException(ExceptionHandler.java:76) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.core.ExceptionHandler.handleException(ExceptionHandler.java:212) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:149) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:372) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:179) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:220) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51) [resteasy-jaxrs-3.0.10.Final.jar:]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [jboss-servlet-api_3.1_spec-1.0.0.Final.jar:1.0.0.Final]
        at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:85) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:61) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(SecurityContextAssociationHandler.java:78)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:131) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:56) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:45) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:63) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:58) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:70) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.security.handlers.SecurityInitialHandler.handleRequest(SecurityInitialHandler.java:76) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(JACCContextIdHandler.java:61)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:261) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:247) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.ServletInitialHandler.access$000(ServletInitialHandler.java:76) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(ServletInitialHandler.java:166) [undertow-servlet-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.server.Connectors.executeRootHandler(Connectors.java:197) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:759) [undertow-core-1.1.0.Final.jar:1.1.0.Final]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_65]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_65]
Caused by: java.lang.NullPointerException
        at org.jboss.weld.context.beanstore.http.AbstractSessionBeanStore.getLockStore(AbstractSessionBeanStore.java:113) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05]
        at org.jboss.weld.context.beanstore.AttributeBeanStore.lock(AttributeBeanStore.java:210) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05]
        at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:90) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05]
        at org.jboss.weld.context.PassivatingContextWrapper$AbstractPassivatingContextWrapper.get(PassivatingContextWrapper.java:76) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05]
        at org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:98) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05]
        at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:78) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05]
        at ru.exampl.testtest.lobby.UserController$Proxy$_$$_WeldClientProxy.toString(Unknown Source) [classes:]
        at org.apache.log4j.spi.LoggingEvent.<init>(LoggingEvent.java:106) [log4j-jboss-logmanager-1.1.0.Final.jar:1.1.0.Final]
        at org.apache.log4j.Category.forcedLog(Category.java:119) [log4j-jboss-logmanager-1.1.0.Final.jar:1.1.0.Final]
        at org.apache.log4j.Category.debug(Category.java:80) [log4j-jboss-logmanager-1.1.0.Final.jar:1.1.0.Final]
        at ru.exampl.testtest.rest.LobbyEndpoint.login(LobbyEndpoint.java:100) [classes:]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_65]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_65]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_65]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_65]
        at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:137) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:296) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:250) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:237) [resteasy-jaxrs-3.0.10.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:356) [resteasy-jaxrs-3.0.10.Final.jar:]

The code

And this is the relevant code which causes the error (or not in those cases where it just works):

package ru.exampl.testtest.rest;

[... lots of imports ...]

/**
 * The implementation of some general webservice endpoints.
 */
@RequestScoped
@Path("rest")
public class LobbyEndpoint {

    private transient static final Logger log4j = Logger.getLogger(LobbyEndpoint.class);

    @Inject
    private UserController userController;

    [... more injections ...]

    @POST
    @Path("/login")
    @Consumes(MediaType.APPLICATION_JSON)
    @Produces(MediaType.APPLICATION_JSON)
    public LoginResponse login(LoginRequest request) {
        log4j.log(userController);
        [... lots of code to handle login ...]
    }
}

Now the funny thing is, that userController is obviously not null, since log4j knows how to handle and log null objects. The object obviously has a reference but Weld can't access it through the proxy. This in itself would be only half as funny without the list of browsers where this very same rest endpoint just does its job and only one specific browser in one specific browser mode causing the webapp to crash.

Additional information

To be honest, I have NO clue how a browser through its requests could provoke such a behavior. But to make things even more funny, I recorded the actual requests with tshark and then modified Firefox to send the very same request the Internet Explorer 11 sends. And now hold your breath: all things being equal Firefox doesn't crash the webapp while Internet Explorer 11 in private mode does 100% of the time.

I recorded both requests to demonstrate (you can view them in wireshark):

The actual question

Now my question: where do I even start to debug this? How can a web browser delivering its requests through a proxy even have the slightest influence on the outcome of my session synchronization in my cluster and thus influencing the request processing on my application server, if the entire request is identical except for the order of the headers? Timing is not involved. Introducing random waits in the requests didn't change anything.

Gennie answered 7/5, 2015 at 8:38 Comment(2)
I can't look it up but I wouldn't guess that userController is null. I would guess that something in the toString() implementation of userController is null and raises this exception.Enlist
No, this can't be the reason. My actual live code didn't have this log message in the first place. I only added it during debug, while throwing out more and more possible complexity. When being called with Firefox for example, the log message will only print the value of the "pointer". So toString() should NEVER fail, unless the address on the heap is itself invalid.Gennie
L
2

I had a similar looking stack trace. But I forgot set the "instance-id" in the undertow section of my wildfly configuration. The problem didn't come up, as long as the server didn't try to replicate the session.

So the solution for me was simply setting an instance-id like this:

<subsystem xmlns="urn:jboss:domain:undertow:1.2" instance-id="${jboss.server.name}">

This can also be set via jboss-cli:

/profile=full-ha/subsystem=undertow:write-attribute(name=instance-id, value="${jboss.server.name}")

[Edit: trying to find explanation]

The NullpointerException happenes with this statement:

lockStore = (LockStore) session.getAttribute(SESSION_KEY);

At this statement the NPE will only occur if "session" is null. The "session" Object is acquired earlier by calling a getSession() Method which is implemented by a child class. I can find two possible candidates:

  • LazySessionBeanStore (org.jboss.weld.context.beanstore.http)
  • LazyCyclicSessionBeanStore (org.jboss.weld.context.beanstore.http)
  • EagerSessionBeanStore (org.jboss.weld.context.beanstore.http)

It's not obvious to me, why those classes could return null for "session". And how this is connected to setting the "instance-id" in the untertow section. What I can see is, that many logging statements are created at these. To go further on this, I would try to set the log level to "TRACE" on "org.jboss.weld".

Liverpudlian answered 19/5, 2015 at 4:40 Comment(3)
I tried the following: /opt/wildfly-8.2.0.Final-DC/bin/./jboss-cli.sh --connect --command="/profile=full-ha/subsystem=undertow:read-attribute(name=instance-id)" { "outcome" => "success", "result" => expression "${jboss.node.name}" } Shouldn't this be sufficient? I'll try your suggestion anyhow.Gennie
I have no idea why your suggestion works. I can't even find documentation that suggests why it should help. However I tried it and it did help. So you obviously fulfilled requirement A): testable hypothesis. So thanks a lot for your time. The 50 rep are yours.Gennie
I tried to start a possible explanation. But failed ;) I can't really tell why this is happening. At the moment, I have no system to reproduce this.Liverpudlian

© 2022 - 2024 — McMap. All rights reserved.