Back

Speaker "Jules Damji" Details Back

 

Topic

Workshop: Jumpstart with Apache Spark 2.x on Databricks

Abstract

Apache Spark 2.0 and subsequent releases of Spark 2.1 and 2.2 have laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data. In this introductory part lecture and part hands-on workshop, you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas: Agenda: Overview of Spark Fundamentals & Architecture What’s new in Spark 2.x Unified APIs: SparkSessions, SQL, DataFrames, Datasets Introduction to DataFrames, Datasets and Spark SQL Introduction to Structured Streaming Concepts Four Hands On Labs You will use Databricks Community Edition, which will give you unlimited free access to a ~6 GB Spark 2.x local mode cluster. And in the process, you will learn how to create a cluster, navigate in Databricks, explore a couple of datasets, perform transformations and ETL, save your data as tables and parquet files, read from these sources, and analyze datasets using DataFrames/Datasets API and Spark SQL. Level: Beginner to intermediate, not for advanced Spark users. Prerequisite: You will need a laptop with Chrome or Firefox browser installed with at least 8 GB. Introductory or basic knowledge Scala or Python is required, since the Notebooks will be in Scala; Python is optional.

Profile

Jules S. Damji is an Apache Spark Community Evangelist and Developer Advocate at Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies building large-scale distributed systems. He holds a B.Sc and M.Sc in Computer Science and MA in Political Advocacy and Communication from Oregon State University, Cal State, and Johns Hopkins University respectively.