Keycloak: Persisted Cache Sync Timeouts to Infinispan Cluster on Rebuild From Embedded to Remote Infinispan
Asked Answered
C

0

11

I'm working on switching keycloak (v.3.4.0 final) from using embedded infinispan to a dedicated remote infinispan cluster (v8.2.8.final). I've gone through the upgrade process to use infinispan cluster as remote-store in lower environments without issues. In my production setting I am running into a timeout exception on InfinispanCacheInitializer

Where error is happening on Keycloak: https://github.com/keycloak/keycloak/blob/3.4.2.Final/model/infinispan/src/main/java/org/keycloak/models/sessions/infinispan/initializer/InfinispanCacheInitializer.java#L117

ERROR [org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer] (ServerService Thread Pool -- 54) ExecutionException when computed future. Errors: 13: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutExc
eption                                                                                                                                                                                                                                                                      
        at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.get(DefaultExecutorService.java:850)                                                                                                                                                          
        at org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer.startLoading(InfinispanCacheInitializer.java:102)                                                                                                                                 
        at org.keycloak.models.sessions.infinispan.initializer.DBLockBasedCacheInitializer.startLoading(DBLockBasedCacheInitializer.java:75)                                                                                                                                
        at org.keycloak.models.sessions.infinispan.initializer.CacheInitializer.loadSessions(CacheInitializer.java:41)                                                                                                                                                      
        at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$2.run(InfinispanUserSessionProviderFactory.java:150)                                                                                                                                
        at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:227)                                                                                                                                                                    
        at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory.loadPersistentSessions(InfinispanUserSessionProviderFactory.java:137)                                                                                                               
        at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$1.onEvent(InfinispanUserSessionProviderFactory.java:108)                                                                                                                            
        at org.keycloak.services.DefaultKeycloakSessionFactory.publish(DefaultKeycloakSessionFactory.java:68)                                                                                                                                                               
        at org.keycloak.services.resources.KeycloakApplication$2.run(KeycloakApplication.java:165)                                                                                                                                                                          
        at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:227)                                                                                                                                                                    
        at org.keycloak.services.resources.KeycloakApplication.<init>(KeycloakApplication.java:158)                                                                                                                                                                         
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)                                                                                                                                                                                            
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)                                                                                                                                                                     
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)                                                                                                                                                             
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)                                                                                                                                                                                                  
        at org.jboss.resteasy.core.ConstructorInjectorImpl.construct(ConstructorInjectorImpl.java:150)                                                                                                                                                                      
        at org.jboss.resteasy.spi.ResteasyProviderFactory.createProviderInstance(ResteasyProviderFactory.java:2298)                                                                                                                                                         
        at org.jboss.resteasy.spi.ResteasyDeployment.createApplication(ResteasyDeployment.java:340)                                                                                                                                                                         
        at org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:253)                                                                                                                                                                                     
        at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.init(ServletContainerDispatcher.java:120)                                                                                                                                                   
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.init(HttpServletDispatcher.java:36)                                                                                                                                                              
        at io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:117)                                                                                                                                                           
        at org.wildfly.extension.undertow.security.RunAsLifecycleInterceptor.init(RunAsLifecycleInterceptor.java:78)                                                                                                                                                        
        at io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:103)                                                                                                                                                           
        at io.undertow.servlet.core.ManagedServlet$DefaultInstanceStrategy.start(ManagedServlet.java:250)                                                                                                                                                                   
        at io.undertow.servlet.core.ManagedServlet.createServlet(ManagedServlet.java:133)                                                                                                                                                                                   
        at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:565)                                                                                                                                                                            
        at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:536)                                                                                                                                                                            
        at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:42)                                                                                                                                           
        at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43)                                                                                                                                                             
        at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105)                                                                                                                              
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)                                                                                                       
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)                                                                                                       
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)                                                                                                       
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)                                                                                                       
        at io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:578)                                                                                                                                                                             
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:100)                                                                                                                                             
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:81)                                                                                                                                                     
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)                                                                                                                                                                                          
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)                                                                                                                                                                                                         
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)                                                                                                                                                                                  
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                                                                                                                                                                                  
        at java.lang.Thread.run(Thread.java:748)                                                                                                                                                                                                                            
        at org.jboss.threads.JBossThread.run(JBossThread.java:320)                                                                                                                                                                                                          
Caused by: java.util.concurrent.TimeoutException                                                                                                                                                                                                                            
        at java.util.concurrent.FutureTask.get(FutureTask.java:205)                                                                                                                                                                                                         
        at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:88)                                                                                                                                                                      
        at org.infinispan.distexec.DefaultExecutorService$LocalDistributedTaskPart.getResult(DefaultExecutorService.java:1083)                                                                                                                                              
        at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.innerGet(DefaultExecutorService.java:868)                                                                                                                                                     
        at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.get(DefaultExecutorService.java:848)                                                                                                                                                          
        ... 44 more

Overview

  • Keycloak version: 3.4.0.final (aware this is an older version - and in use with custom implementation - not easy to upgrade)
    • startup script: ExecStart={{ keycloak_jboss_home }}/bin/standalone.sh -b using standalone.xml
  • Infinispan version: 8.2.8.final
  • Switching from embedded local cache to remote store configuration for the following caches:
    • users (distributed)
    • sessions (replicated)
    • authenticationSessions (replicated)
    • offlineSessions (replicated)
    • loginFailures (replicated)
    • authorization (replicated)
  • offline_user_session count: ~3 million

In an attempt to test out this sized cache sync from database, I've updated the configuration for timeouts in standalone.xml and standalone.conf files

keycloak/keycloak-3.4.0.Final/standalone/configuration/standalone.xml Updated coordinator timeout to be 3 hours and commented out query-timeout

<subsystem xmlns="urn:jboss:domain:transactions:4.0">                                                                                                                                                                                             
    <core-environment>                                                                                                                                                                                                                            
        <process-id>                                                                                                                                                                                                                              
            <uuid/>                                                                                                                                                                                                                               
         </process-id>                                                                                                                                                                                                                             
    </core-environment>                                                                                                                                                                                                                           
    <recovery-environment socket-binding="txn-recovery-environment" status-socket-binding="txn-status-manager"/>                                                                                                                                  
    <object-store path="tx-object-store" relative-to="jboss.server.data.dir"/>                                                                                                                                                                    
    <coordinator-environment default-timeout="10800"/>                                                                                                                                                                                            
 </subsystem> 

....
....
..

<!-- <timeout>                                                                                                                                                                                                                        
    <query-timeout>15</query-timeout>                                                                                                                                                                                                 
</timeout>-->   

/keycloak/keycloak-3.4.0.Final/bin/standalone.conf add blocking timeout to JAVA_OPTS

JAVA_OPTS="$JAVA_OPTS -Djboss.modules.system.pkgs=$JBOSS_MODULES_SYSTEM_PKGS -Djava.awt.headless=true -Djboss.as.management.blocking.timeout=10800"  

I'd like to note that after the first few attempts at getting this to work, when reverting back to using the embedded cache on the keycloak nodes, the data synced fine without any timeout errors in around 1.5 hours.

After startup of keycloak it takes about 60 minutes before it starts to sync offline user sessions. Taking a look at the queries being run by keycloak, I can see that the timeout errors occur about 5-10 minutes after it starts syncing offline_user_session records to offlineSessions cache

enter image description here

The queries being run before it times out are:

  1. delete from OFFLINE_CLIENT_SESSION where not (exists (select persistent1_.USER_SESSION_ID from OFFLINE_USER_SESSION persistent1_ where persistent1_.USER_SESSION_ID=OFFLINE_CLIENT_SESSION.USER_SESSION_ID))

  2. update OFFLINE_USER_SESSION set LAST_SESSION_REFRESH=$1

  3. DELETE FROM JGROUPSPING WHERE own_addr=$1 AND cluster_name=$2

  4. select count(persistent0_.OFFLINE_FLAG) as col_0_0_ from OFFLINE_USER_SESSION persistent0_ where persistent0_.OFFLINE_FLAG=$1

  5. select userrolema0_.ROLE_ID as col_0_0_ from USER_ROLE_MAPPING userrolema0_ where userrolema0_.USER_ID=$1

  6. select userentity0_.ID as ID1_76_, userentity0_.CREATED_TIMESTAMP as CREATED_2_76_, userentity0_.EMAIL as EMAIL3_76_, userentity0_.EMAIL_CONSTRAINT as EMAIL_CO4_76_, userentity0_.EMAIL_VERIFIED as EMAIL_VE5_76_, userentity0_.ENABLED as ENABLED6_76_, userentity0_.FEDERATION_LINK as FEDERATI7_76_, userentity0_.FIRST_NAME as FIRST_NA8_76_, userentity0_.LAST_NAME as LAST_NAM9_76_, userentity0_.NOT_BEFORE as NOT_BEF10_76_, userentity0_.REALM_ID as REALM_I11_76_, userentity0_.SERVICE_ACCOUNT_CLIENT_LI

  7. select attributes0_.USER_ID as USER_ID4_72_0_, attributes0_.ID as ID1_72_0_, attributes0_.ID as ID1_72_1_, attributes0_.NAME as NAME2_72_1_, attributes0_.USER_ID as USER_ID4_72_1_, attributes0_.VALUE as VALUE3_72_1_ from USER_ATTRIBUTE attributes0_ where attributes0_.USER_ID=$1

  8. select persistent0_.OFFLINE_FLAG as OFFLINE_1_47_, persistent0_.USER_SESSION_ID as USER_SES2_47_, persistent0_.DATA as DATA3_47_, persistent0_.LAST_SESSION_REFRESH as LAST_SES4_47_, persistent0_.REALM_ID as REALM_ID5_47_, persistent0_.USER_ID as USER_ID6_47_ from OFFLINE_USER_SESSION persistent0_ where persistent0_.OFFLINE_FLAG=$1 order by persistent0_.USER_SESSION_ID limit $2 offset $3

I setup Infinispan WebConsole UI so I can see the progress of the cache sync. Every time it gets 15k entries in (out of the ~3 million)

enter image description here

I am not positive the issue here since the syncing of offline sessions from database works fine for the embedded version, but for the remote infinispan startup it seems to have an issue either with the batching of the queries or there's another configuration I am missing either on keycloak of infinispan side.


Update - further testing

Setup a test environment with snapshot of database with 3.5 million OUS/OCS. The RDS instance was provisioned with 5500 IOPS. Upgrading to keycloak version 5.0, there were timeouts happening but running vacuum analyze on entire database resolved the issue and we were able to successfully standup remote infinispan. However, after successful runs, in our live environment we ran into the same timeouts and vacuum analyze did not fix the issue.

Catkin answered 5/4, 2021 at 21:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.