Wednesday, 4 February 2015

Keycloak clustering - Improving availability and scalability withInfinispan

One of the major features introduced in Keycloak 1.1.0.Final is improved clustering capabilities. Through clustering Keycloak you can provide high availability and scalability, which is obviously important if you rely on Keycloak to login to critical applications.

We could have chosen to build this clustering capability from the ground, but we're in the business of developing an identity and access management solution not a clustering solution. Second option could have been the excellent JGroups, but that's still low level stuff though. Luckily at JBoss we have no shortage of middleware and we obviously have a solution for this. Unless you've been living under a stone the last few years you should have heard about it, it's called Infinispan!

Infinispan is a distributed in-memory key/value data grid and cache. It's a mature project and loaded with features, which fits our needs perfectly.

In Keycloak we have 3 different types of data:
  • Realm and application meta-data
  • Users, credentials and role-mappings
  • User sessions

Each have different needs when it comes to clustering.

Realm and application meta-data are frequently read, but unless your sysadmin really loves reconfiguring things, not so frequently changed. There's also only so many realms and applications in an organization so the size is limited. This mean we can save it all in a database and cache it all in memory as its access. If a change is made it's written directly to the database. To make sure all nodes retrieve the updates we use an invalidation cache. An invalidation cache simply sends a message to the cluster that invalidates an entry in the cache. Next time the data is requested it won't be in the cache and the updated version is loaded from the database. This is beneficial for us compared to a replicated cache as it doesn't send sensitive data throughout the cluster and also reduces network traffic.

Users are by default handled the same way as realm and app meta-data. It's good practice to change your password once in a while, but certainly not every time you  log in! So that means users are also frequently read, but not so frequently changed. There can be a lot more users than realms though, so we set a maximum number of users that are cached. This results in active users being held in memory, while inactive users are purged from memory.

User sessions are very different as they are frequently updated. Every time a user logs in a new user session is created. We also have a mechanism to expire idle sessions, which results in every time a user session is accessed it's also updated. If we stored user sessions in a database and used an invalidation cache performance and scalability wouldn't be very good. Also, user sessions are not critical data so doesn't have to be persisted in a database. In the worst case scenario if a user session is lost a user has to log back in. For user sessions we use a distributed cache. This provides good performance as they are not persisted, it provides good scalability as they're split into segments where each node only holds a subset of the sessions in memory. Finally, if you really need higher availability for user sessions we recommend you configure replicating each segment to more than one node rather than persisting the sessions.