My team and I have built a solution that mines a stream of online articles for real-time insights for our customers. This component’s logic could be dramatically simplified if we could assume that it never receives near-duplicates of articles. While deduplication of identical documents is simple, detection of near-duplicates (i.e. “same thing,...
Featured story
A Data Science Workflow for Developers
Our team was challenged with a project that involved performing actions based on plain-text requests. Having little experience in Data Science, Machine Learning (ML) and Natural Language Processing (NLP), our initial approach amounted to nothing more than “AI based on if-else statements”. To improve our approach, we invited our Data Science team from London to visit us in Berlin. We... Read the story