Benthos is a dull and resilient stream processor that solves mundane streaming tasks. Its development is driven by our desire to defer these tasks to a common tool wherever sensible, allowing us to focus on solving more interesting business problems within our stream pipelines. This post explains why we chose to build it.
At Meltwater we maintain and integrate with a wide variety of data streaming pipelines. These pipelines naturally diverge in the queue systems they use, the format of their documents, the encoding of their payloads, etc. This would result in a disproportionate amount of our development effort being spent on building glue any time an architectural change was needed.
After getting frustrated with seemingly needing to solve the same problems on repeat and accumulating technical debt in the process, we decided to instead focus these efforts on a common streaming solution. In order to make this tool useful to others and generally viable it would need to be simple to understand, deploy and maintain. We call this combination “dull”.
What does a dull streaming tool look like?
Benthos was conceived as a tool that could be dropped into an existing stream pipeline, connecting to whichever queue systems are already being used. It would then perform an arbitrary list of dull and stateless processors. It would need to be high performance, with the ability to vertically scale those processors, and horizontally scale as per the queue systems used.
The most important feature, however, was that it would need to be as resilient as the protocols being used. This means that when using at-least-once protocols on both the source and the sink there should never be a possibility that a crash, lost connection or disk corruption could cause a lost message. Without this guarantee every deployment would be adding more risk, and risks aren’t dull. This guarantee would be the main distinction between it and similar tools such as Logstash, that rely on an internal buffer.
The mechanism within Benthos for guaranteeing resiliency without a buffer is outlined in this talk: https://youtu.be/NM7X4PIUQB0.
To summarise, it involves binding each message batch that passes through the service with a transaction, allowing an acknowledgement from the target sink to be propagated directly back to the source. Acknowledgments or offsets of a message are therefore never sent or committed until that same message has successfully been sent onwards.
What if we don’t need delivery guarantees?
Sometimes we might want to sacrifice some resiliency in favour of higher throughput and lower latencies. In which case Benthos can be hooked up with at-most-once protocols and given a memory buffer, effectively decoupling the sources and sinks and allowing messages to flow freely through the processing pipelines.
What counts as a dull processor?
Benthos processors are stateless operations that satisfy the following criteria:
- They solve a common streaming problem
- Their behaviour is simple to express
- Their behaviour is simple to monitor
The full list of processors can be found here: https://github.com/Jeffail/benthos/blob/master/docs/processors/README.md, and includes operations such as encoding, compression, archiving, content based filtering, JSON document manipulation, etc.
The goal is to include processors that offer generally useful behaviour without diving too deep into the domain of problems that ought to have a bespoke solution.
Final words
Wrapping mundane stream tasks in this way has resulted in wide adoption across our teams. The service is equally viable in pipelines where delivery guarantees are critical and also where resiliency can be sacrificed in favour of performance. The flexibility of its sources, sinks and processors means it’s easy to fit it within an existing pipeline, and its boring nature makes it easy to monitor and run in production.
We will continue developing Benthos with a focus on preserving its operational simplicity, opting only to add new features we believe are both common and truly deserving of the label “dull”.