I've been running Kafka on Kubernetes without any major issue for a while now; however, I recently introduced a cluster of Cassandra pods and started having performance problems with Kafka.
Even though Cassandra doesn't use page cache like Kafka does, it does make frequent writes to disk, which presumably effects the kernel's underlying cache.
I understand that Kubernetes pods are managing memory resources through cgroups, which can be configured by setting memory requests and limits in Kubernetes, but I've noticed that Cassandra's utilization of page cache can increase the number of page faults in my Kafka pods even when they don't seem to be competing for resources (i.e., there's memory available on their nodes).
In Kafka more page faults leads to more writes to disk, which hamper the benefits of sequential IO and compromise disk performance. If you use something like AWS's EBS volumes, this will eventually deplete your burst balance and eventually cause catastrophic failures across your cluster.
My question is, is it possible to isolate page cache resources in Kubernetes or somehow let the kernel know that pages owned by my Kafka pods should be kept in the cache longer than those in my Cassandra pods?