Back

Speaker "David Talby" Details Back

 

Topic

Applied machine learning & deep learning for text mining

Abstract

Natural language processing is a key component in many data science systems that must understand or reason about text. This talk introduces the NLP library for Apache Spark, which natively extends Spark ML to provide open source, fully distributed & optimized versions of state of the art NLP algorithms.
State of the art NLP relies heavily on machine learning and, recently, deep learning algorithms. We've present recent academic research results and the challenges faced to implement them as part of the Spark NLP library. We'll walk through the libraries and API's chosen for model training and inference as well as how to efficiently distribute & cache large trained word embedding & NER models across a distributed Spark cluster.

Profile

David Talby is a chief technology officer at John Snow Labs, helping healthcare & life science companies put AI to good use. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.