Given the volume of many million posts per day that Meltwater has to process, we need a technology for search and storage that can handle this kind of volume.
We have been a pretty happy users of Elasticsearch since the 0.11.X days. While we have been through some up and downs, in the end we think our choice of technology was the right one.
Elasticsearch is used to back our main media-monitoring application, where customers are able to search and analyze media data, such as News articles, (public) Facebook posts, Instagram posts, blogs and Tweets. We gather this content using a mix of APIs and crawling, enrich them and make them searchable using Elasticsearch.
In this post, we share what we’ve learned, how you can tweak Elasticsearch to improve its performance, and which pitfalls to circumvent.