Speaker "David Talby" Details Back



Applied machine learning & deep learning for text mining


Natural language processing is a key component in many data science systems that must understand or reason about text. This talk introduces the NLP library for Apache Spark, which natively extends Spark ML to provide open source, fully distributed & optimized versions of state of the art NLP algorithms.
State of the art NLP relies heavily on machine learning and, recently, deep learning algorithms. We've present recent academic research results and the challenges faced to implement them as part of the Spark NLP library. We'll walk through the libraries and API's chosen for model training and inference as well as how to efficiently distribute & cache large trained word embedding & NER models across a distributed Spark cluster.


David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Previously, he was with Microsoft’s Bing Group, where he led business operations for Bing Shopping in the US and Europe, and worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a PhD in computer science and master’s degrees in both computer science and business administration.