Sieve, LRU and Caching

Came across this blog about a new caching algorithm Sieve which aims to be simpler than LRU. Turns out it is a NSDI'24 paper, so there must some truth to that claim. See the HN entry, where I stumbled upon it.

This led me to a HN tour, where I found the following things:

The O_DIRECT flag in open which tries to minimize cache usage.
The posix_fadvise stdlib function, which is a wrapper on the fadvise64 syscall, which allows you to inform the system on the access patterns going to be used with an fd to better cache the data.
This hacky util: https://github.com/Feh/nocache. Which is more useful for learning from, rather than using directly.
vmstat output, does include cache. (Funny, I have run that command many many times, never really looked at that column).
All the different ways of implementing LRU: https://www.geeksforgeeks.org/lru-cache-implementation/
This golang package for LRU which seems to be used by other major packages: https://github.com/hashicorp/golang-lru

Also came across this paper from Micro'19 which talks about a hardware SVM based approach to caching. A large part of the paper is devoted a an offline Attention based LSTM approach which while accurate is too unweildly obviously to use in hardware. The paper extracts some value out of the sparse attention learned to claim that you don't actually need sequence ordering to learn the needed patterns and a simple SVM would do the trick. The paper claims all this a learning, but it seems a bit of an anti-climax. Still, a Micro paper, so what can I say! (This paper is in a field you can term as "ML for Systems").

The listing on Wikipedia of Cache Replacement policies is worth scanning once.

The group's (behind Sieve) earlier papers on Caching may also be useful:

On this tangent, I re-stumbled on the classic "Power of Two Random Choices" work. The idea goes like this: in the process of placing balls in bins, chosing 1 bin at random leads to a larger variance in packing than picking 2 bins at random and placing the ball in the bin with the lesser packing. This idea comes up again and again in hashing and caching and load-balancing and though the work goes into the mathematical proof, the intuitive idea behind this is that picking two at random is a simplified, O(1) random approach towards figuring out LRU.

This image from this blog or this other one can help visualizing what has happened.

There is this follow up theoretical work that goes one step further with the original balls-and-bins model where the second or (kth) random choice for comparison is chosen not uniformly from the set and not independent of the first choice. Leads to some surprising results, that I have not fully explored.