May 11, 2023

Promoting replica shards to primary in Elasticsearch, and how it saves us $12k during rolling restarts

Promoting replica shards to primary in Elasticsearch

At Meltwater, Elasticsearch is at the heart of our product - we’re constantly looking for ways to improve our usage of it and make it more performant. Recently we noticed when doing a routine rolling restart that the first backup taken after the restart took up to 7 hours instead of the normal 30 minutes. We also noticed that our snapshot storage suddenly increased in size by about 500TB. Elasticsearch performs incremental snapshots to only upload newly indexed data so both these observations were unexpected. There should have been no sudden change in the data caused by the restart. We took a closer look at this and were able to figure out what the problem was.

March 16, 2023

How to Communicate Effectively in a Software Development Setting

Effective communication is a crucial aspect of success in business and software development settings. It can help build better relationships with colleagues, increase productivity, and achieve better outcomes. In this blog post, we’ll discuss some key strategies to help you communicate effectively in business settings.

January 20, 2023

How we upgraded an old, 3PB large, Elasticsearch cluster without downtime. Part 7 - Final Architecture & Learnings

This is the 7th and final part of our blog post series on how we upgraded our Elasticsearch cluster without downtime and with minimal user impact. In this post, we will focus on several of the benefits we have seen after the upgrade and provide more details on how our architecture looks today.

December 16, 2022

How we upgraded an old, 3PB large, Elasticsearch cluster without downtime. Part 6 - Testing & Rollout strategy

Welcome to this sixth part of our adventure of upgrading our Elasticsearch cluster. Until now, we have explained how we structured our work, improved our system to make this migration possible, how we took advantage of this opportunity to make otherwise hard changes, and made sure to keep the system performing well under load. All of these changes were the result of hard work and planning, but in the end, we all knew that one day we will be faced with the ultimate question: When can we turn the switch, and start using our new and shiny cluster? We don’t think that anybody would want to be in the place of the single person who would snap their fingers and make the decision to switch. We didn’t want it either, so we decided to let the data guide us.

December 09, 2022

How we upgraded an old, 3PB large, Elasticsearch cluster without downtime. Part 5 - Running two Elasticsearch clients in the same JVM

This is part 5 in our series on how we upgraded our Elasticsearch cluster without any downtime and with minimal user impact.

Due to the large scope of this upgrade, it was clear from the beginning that this project was going to last for at least one year, if not more. This blog post describes how we reasoned about our development process and how we managed to support multiple Elasticsearch client libraries in our Java code bases for a long time.

← Older Blog Archives Newer →