Meltwater Blog

Inside Meltwater Engineering.

We build the platforms that help comms teams see around corners. Expect practical engineering lessons, data pipelines at scale, and product thinking from the people behind Meltwater.

Saving the Planet, one Brotdose at a time

Saving the Planet, one Brotdose at a time

berlin

We Germans are known for our love of renewable energy, energy saving lamps, and recycling. I am allowed to say that, as I am one of “them” :) Our office in Berlin had another idea to reduce waste: Lunch boxes (“Brotdose” being one possible translation of that). Most of us like to...

Employee Surveys: Our Journey, Approach and Learnings

Employee Surveys: Our Journey, Approach and Learnings

eNPS employee engagement survey

At Meltwater, we have been running a quarterly employee survey with 350+ people in the Product & Engineering group for three years. This post explains our journey, our approach and what we have learnt. Read on if you are wondering how an employee survey can help you understand your organization better and...

A Data Science Workflow for Developers

A Data Science Workflow for Developers

Data Science AutoML Machine Learning NLP

Our team was challenged with a project that involved performing actions based on plain-text requests. Having little experience in Data Science, Machine Learning (ML) and Natural Language Processing (NLP), our initial approach amounted to nothing more than “AI based on if-else statements”. To improve our approach, we invited our Data Science team...

Locality-sensitive Hashing in Elixir

Locality-sensitive Hashing in Elixir

elixir erlang profiling locality-sensitive hash simhash deduplication near-duplicate detection LSH

My team and I have built a solution that mines a stream of online articles for real-time insights for our customers. This component’s logic could be dramatically simplified if we could assume that it never receives near-duplicates of articles. While deduplication of identical documents is simple, detection of near-duplicates (i.e. “same thing,...

Monitoring your System’s Heartbeat using Cloudwatch

Monitoring your System’s Heartbeat using Cloudwatch

AWS Cloudwatch logging heartbeat monitoring

Have you implemented a system that is supposed to perform tasks at regular intervals? Does the repeated failure of such a system pose a threat to your quality of service? If so, I am sure you would want to be alerted, if your system suddenly stops performing these tasks. We at Meltwater’s...

JUGRI: The JUpyter - GRemlin Interface

JUGRI: The JUpyter - GRemlin Interface

gremlin jupyter knowledge graph python data science

Jupyter is a popular web framework used with Python to easily visualize and manipulate data. It can display the results of many databases using the Pandas library, but the popular Gremlin graph query language hasn’t been supported. To solve this problem we created and open-sourced JUGRI to show your Gremlin query results...

Risk-free Deployments with Immutable Web Apps

Risk-free Deployments with Immutable Web Apps

open source web apps immutablewebapps

Today we are excited to share our Immutable Web Applications methodology with you. Immutable Web Applications is a framework-agnostic methodology for building and deploying static, single-page applications that minimizes the complexity of live releases and enables continuous delivery through simple, flexible, atomic deployments. If you care about building web applications, and want...

Hosting the Elixir Berlin Meetup

Hosting the Elixir Berlin Meetup

elixir berlin

In Meltwater’s Berlin office, we are enthusiastic users and advocates for Elixir and ruby. Hence we were excited to get the chance to host the Elixir Berlin meetup for the first time this November. It was the #53’rd edition of the Elixir Berlin already, what a great streak! Besides hosting the event...

Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster

Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster

elasticsearch linear optimization load balancing fairhair.ai

At the heart of Meltwater’s and Fairhair.ai’s information retrieval systems lies a collection of Elasticsearch clusters containing billions of social media posts and editorial articles. The index shards in our clusters vary greatly in their access pattern, workload and size which presents some very interesting challenges. This blog post describes how we...