Enriching 450M Docs Daily With a Boring Stream Processor

For our fairhair.ai platform we enrich over 450 million documents such as news articles and social posts per day, with a dependency tree of more than 20 NLP syntactic and semantic enrichment tasks. We ingest these documents as a continuous stream of data and guarantee delivery of enriched documents within 5 minutes of ingestion.

This technical feat required tight collaboration between two specialised teams: data science and platform engineering. Enabling both teams to efficiently work together around a common workflow execution engine was another problem we needed to solve. Hopefully that description fully piqued your interest because our solution (Benthos) is totally boring.

Deep Learning Models for Sentiment Analysis

Meltwater has been providing sentiment analysis powered by machine-learning for more than 10 years. In 2009 we deployed our first models for English and German. Today, we support in-house models for 16 languages.

In this blog post we discuss how we use deep learning and feedback loops to deliver sentiment analysis at scale to more than 30 thousand customers.

Micro Pipelines: Analyzing Big Data with Tiny Apps

One of our teams at Meltwater was recently faced with a problem that required relatively simple tasks applied to a large volume of data. To solve this we experimented with a pattern we call micro-pipelines, which are a sequence of microservices that work together to create efficient, fault tolerant systems.

This post provides an example of how we designed and built such a micro-pipeline.