The NLP Search System of Meltwater Press

Meltwater Press

At the end of 2012 I joined the Engineering team of Meltwater Press (mPress), to start working on my master thesis on the mPress NLP search system. The title of the thesis is “Qualitative analysis for detecting shifts in system requirements of search systems”, and I am performing it under the supervision of the DAI-Labor of the Technical University of Berlin.

mPress is a web-based media contact database and distribution tool that enables users to identify the most appropriate journalists — based on their current coverage as well as traditional beats — to then build relevant media lists and efficiently distribute media releases, advisories and pitches.

So what do we mean by “shifts in system requirements”? As any software product, mPress has gone through a number of iterations since its launch. The product has been roled out in many different markets, one after the other, which has affected the requirements (e.g. in terms of supported languages). Of course the number of customers and users has also grown noticeably due to these expansions.

For example, let’s say that in 2010 we had a mPress system ‘A’, and now in 2012 we have a mPress system ‘B’. In my thesis, I am performing a ‘backwards’ analysis of the search system in order to identify the shift in requirements between ‘A’ and ‘B’. In other words, I test if the requirements of ‘A’ still fit for ‘B’, and if not, what improvements and potential optimizations are to be applied to ‘B’ to improve the overall system performance.

User Lifetime Activity

For this purpose, I analyzed the search logs of the last 2 years. From the logs I collect information that is crucial to understand the user’s behavior, interaction, and expectations of the system. For example we analyzed the activitely of the user, as you can see in the chart above. Based on these analyses, we are then able to identify the system’s limitations and improve its design and accordingly the quality of the service delivered.

If you want to learn more about Search Logs Analysis, I recommend these papers:

Search log analysis (SLA) is a helpful method for analyzing system requirements and accuracy. However, most previous studies about SLA are either about general purpose search engines (Google, Yahoo, Bing, etc.) or site-specific search engines, or a comparison of both. Only few studies discuss niche search engines that have specific purposes and a business oriented focus, such as the mPress NLP search system. This makes my research even more challenging and interesting!