Back

Speaker "Joy Chakraborty" Details Back

 

Topic

To Secure and Scale-Out Data Science Notebook for Spark using Docker and Kubernetes

Abstract

This presentation will provide technical design and development insights in order to set up a Kerberize (secured) JupyterHub notebook for Spark running in a Yarn cluster. Joy will show how Bloomberg set up the Kerberos-based notebook for Data Science community using Docker and Kubernetes by integrating JupyterHub, Sparkmagic, and Levy. Sparkmagic provides the Spark kernel for R, Scala and Python. Livy is one of the most promising open source software to allow to submit Spark jobs over http-based REST interfaces. This presentation will highlight the capabilities of Jupyterhub, Sparkmagic and Livy, along with the gap and development required in order to make the notebook to work with Kerberized HDFS/Yarn cluster running Hive, Spark and other services. Docker and Kubernetes strategies the scale-out design and minimizes the complex integration challenges involving networking and isolation which is essential for such project that will be covered in this presentation. No prior knowledge of any of these technologies is required in order to understand this presentation.

Profile

Joy is a Distributed System Architect with 17+ years of application software development experience and 10+ years of experience in designing, architecting and developing distributed systems. I have a special interest in distributed and parallel computing, and currently work on Cloud and Big Data technologies. I also actively participate in various software architectural organizations. I have been working in Bloomberg’s Data Platform team as a Data Engineer since 2014. My responsibility is to store and process petabytes of data reliably, predictably and securely for Data Science activities such as Machine Learning based prediction.