Back

Speaker "David Talby" Details Back

 

Topic

NATURAL LANGUAGE UNDERSTANDING AT SCALE ON APACHE SPARK

Abstract

Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks.

David Talby presents the open source Spark NLP package for training distributed custom natural language machine-learned pipelines on Apache Spark. The library natively extends Spark ML and includes state-of-the-art deep learning models for word embeddings and named entity recognition. The talk walks through the library's goals, design and API's, using Jupyter notebooks that will be made publicly available after the talk. Best practices and industry use cases where the library has been applied will be discussed as well.

Profile

David Talby is a chief technology officer at John Snow Labs, helping healthcare & life science companies put AI to good use. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.