Recently, I hardened my Keycloak deployment to use a dedicated Infinispan cluster as a remote-store
for an extra layer of persistence for Keycloak's various caches. The change itself went reasonably well, although after making this change, we started seeing a lot of login errors due to the expired_code
error message:
WARN [org.keycloak.events] (default task-2007) type=LOGIN_ERROR, realmId=my-realm, clientId=null, userId=null, ipAddress=192.168.50.38, error=expired_code, restart_after_timeout=true
This error message is typically repeated dozens of times all within a short period of time and from the same IP address. The cause of this appears to be the end-user's browser infinitely redirecting on login until the browser itself stops the loop.
I have seen various GitHub issues (https://github.com/helm/charts/issues/8355) that also document this behavior, and the consensus seems to be that this is caused by the Keycloak cluster not able to correctly discover its members via JGroups.
This explanation makes sense when you consider that some of the Keycloak caches are distributed across the Keycloak nodes in the default configuration within standalone-ha.xml
. However, I have modified these caches to be local caches with a remote-store
pointing to my new Infinispan cluster, and I believe I have made some incorrect assumptions about how this works, causing this error to start happening.
Here is how my Keycloak caches are configured:
<subsystem xmlns="urn:jboss:domain:infinispan:7.0">
<cache-container name="keycloak" module="org.keycloak.keycloak-model-infinispan">
<transport lock-timeout="60000"/>
<local-cache name="realms">
<object-memory size="10000"/>
</local-cache>
<local-cache name="users">
<object-memory size="10000"/>
</local-cache>
<local-cache name="authorization">
<object-memory size="10000"/>
</local-cache>
<local-cache name="keys">
<object-memory size="1000"/>
<expiration max-idle="3600000"/>
</local-cache>
<local-cache name="sessions">
<remote-store cache="sessions" remote-servers="remote-cache" fetch-state="false" passivation="false" preload="false" purge="false" shared="true">
<property name="rawValues">
true
</property>
<property name="marshaller">
org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory
</property>
</remote-store>
</local-cache>
<local-cache name="authenticationSessions">
<remote-store cache="authenticationSessions" remote-servers="remote-cache" fetch-state="false" passivation="false" preload="false" purge="false" shared="true">
<property name="rawValues">
true
</property>
<property name="marshaller">
org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory
</property>
</remote-store>
</local-cache>
<local-cache name="offlineSessions">
<remote-store cache="offlineSessions" remote-servers="remote-cache" fetch-state="false" passivation="false" preload="false" purge="false" shared="true">
<property name="rawValues">
true
</property>
<property name="marshaller">
org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory
</property>
</remote-store>
</local-cache>
<local-cache name="clientSessions">
<remote-store cache="clientSessions" remote-servers="remote-cache" fetch-state="false" passivation="false" preload="false" purge="false" shared="true">
<property name="rawValues">
true
</property>
<property name="marshaller">
org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory
</property>
</remote-store>
</local-cache>
<local-cache name="offlineClientSessions">
<remote-store cache="offlineClientSessions" remote-servers="remote-cache" fetch-state="false" passivation="false" preload="false" purge="false" shared="true">
<property name="rawValues">
true
</property>
<property name="marshaller">
org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory
</property>
</remote-store>
</local-cache>
<local-cache name="loginFailures">
<remote-store cache="loginFailures" remote-servers="remote-cache" fetch-state="false" passivation="false" preload="false" purge="false" shared="true">
<property name="rawValues">
true
</property>
<property name="marshaller">
org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory
</property>
</remote-store>
</local-cache>
<local-cache name="actionTokens">
<remote-store cache="actionTokens" remote-servers="remote-cache" fetch-state="false" passivation="false" preload="false" purge="false" shared="true">
<property name="rawValues">
true
</property>
<property name="marshaller">
org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory
</property>
</remote-store>
</local-cache>
<replicated-cache name="work">
<remote-store cache="work" remote-servers="remote-cache" fetch-state="false" passivation="false" preload="false" purge="false" shared="true">
<property name="rawValues">
true
</property>
<property name="marshaller">
org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory
</property>
</remote-store>
</replicated-cache>
</cache-container>
<cache-container name="server" aliases="singleton cluster" default-cache="default" module="org.wildfly.clustering.server">
<transport lock-timeout="60000"/>
<replicated-cache name="default">
<transaction mode="BATCH"/>
</replicated-cache>
</cache-container>
<cache-container name="web" default-cache="dist" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
</cache-container>
<cache-container name="ejb" aliases="sfsb" default-cache="dist" module="org.wildfly.clustering.ejb.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
</cache-container>
<cache-container name="hibernate" module="org.infinispan.hibernate-cache">
<transport lock-timeout="60000"/>
<local-cache name="local-query">
<object-memory size="10000"/>
<expiration max-idle="100000"/>
</local-cache>
<invalidation-cache name="entity">
<transaction mode="NON_XA"/>
<object-memory size="10000"/>
<expiration max-idle="100000"/>
</invalidation-cache>
<replicated-cache name="timestamps"/>
</cache-container>
</subsystem>
Note that most of this cache configuration is unchanged when compared to the default standalone-ha.xml
configuration file. The changes I have made here are changing the following caches to be local
and pointing them to my remote Infinispan cluster:
sessions
authenticationSessions
offlineSessions
clientSessions
offlineClientSessions
loginFailures
actionTokens
work
Here is the configuration for my remote-cache
server:
<socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
<!-- Default socket bindings from standalone-ha.xml are not listed here for brevity -->
<outbound-socket-binding name="remote-cache">
<remote-destination host="${env.INFINISPAN_HOST}" port="${remote.cache.port:11222}"/>
</outbound-socket-binding>
</socket-binding-group>
Here is how my caches are configured on the Infinispan side:
<subsystem xmlns="urn:infinispan:server:core:9.4" default-cache-container="clustered">
<cache-container name="clustered" default-cache="default">
<transport lock-timeout="60000"/>
<global-state/>
<replicated-cache-configuration name="replicated-keycloak" mode="SYNC">
<locking acquire-timeout="3000" />
</replicated-cache-configuration>
<replicated-cache name="work" configuration="replicated-keycloak"/>
<replicated-cache name="sessions" configuration="replicated-keycloak"/>
<replicated-cache name="authenticationSessions" configuration="replicated-keycloak"/>
<replicated-cache name="clientSessions" configuration="replicated-keycloak"/>
<replicated-cache name="offlineSessions" configuration="replicated-keycloak"/>
<replicated-cache name="offlineClientSessions" configuration="replicated-keycloak"/>
<replicated-cache name="actionTokens" configuration="replicated-keycloak"/>
<replicated-cache name="loginFailures" configuration="replicated-keycloak"/>
</cache-container>
</subsystem>
I believe I have made some incorrect assumptions about how local caches with remote stores work, and I was hoping someone would be able to clear this up for me. My intention was to make the Infinispan cluster the source of truth for all of Keycloak's caches. By making every cache local, I assumed that data would be replicated to each Keycloak node through the Infinispan cluster, such that a write to the local authenticationSessions
cache on keycloak-0
would be synchronously persisted to keycloak-1
through the Infinispan cluster.
What I believe is happening is that the write to a local cache on Keycloak is not synchronous with respect to persisting that value to the remote Infinispan cluster. In other words, when a write is performed to the authenticationSessions
cache, it does not block while waiting for this value to be written to the Infinispan cluster, so an immediate read for this data on another Keycloak node results in a cache miss, locally and in the Infinispan cluster.
I'm looking for some help with identifying why my current configuration is causing this issue, and some clarification on the behavior of a remote-store
- is there a way to get cache writes to a local cache backed by a remote-store
to be synchronous? If not, is there a better way to do what I'm trying to accomplish here?
Some other potentially relevant details:
- Both Keycloak and Infinispan are deployed to the same namespace in a Kubernetes cluster.
- I am using
KUBE_PING
for JGroups discovery. - Using the Infinispan console, I am able to verify that all of the caches replicated to all of the Infinispan nodes have some amount of entries in them - they aren't completely unused.
- If I add a new realm to one Keycloak node, it successfully shows up on other Keycloak nodes, which leads me to believe that the
work
cache is being propagated across all Keycloak nodes. - If I log in to one Keycloak node, my session remains on other Keycloak nodes, which leads me to believe that the session related caches are being propagated across all Keycloak nodes.
- I'm using sticky sessions for Keycloak as a temporary fix for this, but I believe fixing these underlying cache issues is a more permanent solution.
Thanks in advance!
authenticatedSessions
cache is replicated as quickly as I would like it to. – Synecology