Quitsies is a distributed and disk persisted caching system that implements a subset of the Memcached text protocol. It was built as a minimal drop-in replacement for Memcached, and has been running in our production pipelines for over a year.
This post explains why we needed Quitsies, and how we went about building it. Quitsies is open source, so you can try it yourself.
The Pipeline
Our team was tasked with consuming and combining several streams of real-time event data by resolving non-arbitrary joins between them. Each stream consisted of a discrete event type, where its relation to events of other streams would need to be determined by analysing its structure. This process was compounded by events joined to an event, creating a hierarchical chain of joins to be constructed and resolved in tandem. It was also possible for events to relate to events within its own stream.
The streams were unordered, with a >1% rate of duplicates and no guarantee as to how large a time interval an event might precede its own dependencies.
Our solution was to create a pipeline that could be deployed per stream which would resolve dependencies as per a configurable hydration ruleset, a definition of how event types relate such that the contents of one can be added to another for extra context. For each dependency it would query a cluster of key/value datastores and hydrate the payload with the result.
If all dependencies were resolved the payload would be added to the key/value store such that it could be referenced as a dependency by other events, and then continue through our pipeline. Otherwise, it would enter a separate pipeline for retrying after an interval.
Choosing a Key/Value Store
We calculated that we were likely to reach up to 500,000 reads/s and up to 10,000 writes/s based on stream volumes, and our cached payloads were of a substantial size (up to, say 10MB). Our store would also need to be distributed since our pipelines would be distributed.
We initially chose Memcached for this purpose as we expected its performance to meet our needs and the team was already familiar with it. However, Memcached came with the caveat that since the data was only stored in memory we would need to migrate the caches for any service or machine restarts.
The number of machines we had access to was too low for us to create permanent redundant pipelines. Therefore, our strategy for migrating caches during planned restarts was to duplicate the caches to new machines. Unplanned restarts would require replaying streams of data in order to rebuild the caches from scratch.
Unfortunately, the version of Memcached we were using didn’t have the LRU_Crawler command, meaning it was only possible to walk a limited number of keys within a cache for the purposes of copying the data. In order to arbitrarily move all data within a store we had to build a custom tool that removed copied entries as it walked, meaning the original cache would be destroyed in the process.
Then we were asked to restart our servers. All of them.
To restart all of our servers we had to migrate the caches across machines as we restarted them in epochs. The process was manual and required a coordinated effort from multiple engineers over the space of a week or so. After we had done this once, we were very eager not to have to repeat this effort and so we also searched for a more permanent and passive solution.
Building a Minimal Persisted Memcached Replacement with RocksDB
We were already aware of the promising performance benchmarks behind RocksDB, which appeared to satisfy our requirements whilst also providing disk persistence. However, our architecture was heavily dependent on the pattern of distributing caches over network IO. After failing to find a “Memcached with RocksDB backend” we decided to create our own.
This effort led to Quitsies, named after the rule in marbles allowing you to quit the game without penalty. Quitsies had the modest goal of implementing a small subset of the Memcached protocol around a RocksDB backend. This would allow us to drop it into the existing pipeline and benefit from disk persistence whilst leaving the architecture and other components unchanged.
There are a number of command incompatibilities between RocksDB and Memcached, the main one being key specific TTLs. With Memcached we were able to specify a TTL for each event, but in reality we only had one per event type. RocksDB supports TTLs across the whole dataset which was good enough for us. We also had no use for the flags parameter or compare-and-swap atomic operations in set commands, and opted to ignore them in the Memcached text protocol.
Once we deployed Quitsies in our Production system, we were surprised to find that the latency hit of the RocksDB backend, when compared to persisting data in memory like Memcached does, was negligible for our use case.
If a minimal persisted Memcached replacement sounds useful to you then feel free to grab Quitsies and try it for yourself. We would love to hear your feedback in the comments below or for specific requests please file an issue on the Quitsies GitHub repository.