A Data Science Workflow for Developers

Our team was challenged with a project that involved performing actions based on plain-text requests. Having little experience in Data Science, Machine Learning (ML) and Natural Language Processing (NLP), our initial approach amounted to nothing more than “AI based on if-else statements”.

To improve our approach, we invited our Data Science team from London to visit us in Berlin. We learned about some services available to us, best practices and the like, but the biggest takeaway was nailing down a methodological workflow, which has become our go-to approach to tackle any data problem.

Read on to learn our Data Science workflow, with a practical example, and see how it can help you in your projects.

Locality-sensitive Hashing in Elixir

My team and I have built a solution that mines a stream of online articles for real-time insights for our customers. This component’s logic could be dramatically simplified if we could assume that it never receives near-duplicates of articles. While deduplication of identical documents is simple, detection of near-duplicates (i.e. “same thing, just slightly different”) is a complex but well-researched problem space.

To solve our problem, we ended up building a locality-sensitive hashing library for Elixir. Read on to find out why and how we built and open-sourced ExLSH.

Monitoring your System’s Heartbeat using Cloudwatch

Have you implemented a system that is supposed to perform tasks at regular intervals? Does the repeated failure of such a system pose a threat to your quality of service? If so, I am sure you would want to be alerted, if your system suddenly stops performing these tasks.

We at Meltwater’s Premium Content team had this exact requirement. In this post we will share how we used Cloudwatch, to monitor the heartbeat of our system.

JUGRI: The JUpyter - GRemlin Interface

Jupyter is a popular web framework used with Python to easily visualize and manipulate data. It can display the results of many databases using the Pandas library, but the popular Gremlin graph query language hasn’t been supported.

To solve this problem we created and open-sourced JUGRI to show your Gremlin query results in the Jupyter Notebook. So if you are a Data Scientist using Python, and want to visualize your Gremlin graph queries using Jupyter, then JUGRI can be a handy addition to your toolbox.

Risk-free Deployments with Immutable Web Apps

Today we are excited to share our Immutable Web Applications methodology with you. Immutable Web Applications is a framework-agnostic methodology for building and deploying static, single-page applications that minimizes the complexity of live releases and enables continuous delivery through simple, flexible, atomic deployments.

If you care about building web applications, and want to make deployments easier and less risky, then this blog post is for you.