Speaker "Carol Mcdonald" Details Back



* Streaming Design Patterns, Revolutionizing Architectures using the Kafka API

*  Build a Time Series Application With Apache APIS: Kafka, Spark Streaming and HBase

* Spark GraphX

* Spark Machine Learning


Streaming Design Patterns, Revolutionizing Architectures using the Kafka API 

 Building a robust, responsive, secure data service for healthcare is tricky. For starters, healthcare data lends itself to multiple models:  

• Document representation for patient profile view or update

• Graph representation to query relationships between patients, providers, and medications

• Search representation for advanced lookups

Keeping these different systems up to date requires an architecture that can synchronize them in real time as data is updated. Furthermore, meeting audit requirements in Healthcare requires the ability to apply granular cross-datacenter replication policies to data and be able to provide detailed lineage information for each record. This post will describe how stream-first architectures can solve these challenges, and look at how this has been implemented at a Health Information Network provider.

This talk will go over the Kafka API with these design patterns:

• Turning the database upside down

• Event Sourcing , Command Query Responsibity Separation ,Polyglot Persistence

• Kappa Architecture



More and more applications have to store and process time series data, a very good example of this are all the Internet of Things -IoT- applications.

This hands on tutorial will help you get a jump-start on scaling distributed computing by taking an example time series application and coding through different aspects of working with such a dataset. We will cover building an end to end distributed processing pipeline using various distributed stream input sources, Apache Spark, and Apache HBase, to rapidly ingest, process and store large volumes of high speed data.

Participants will use Scala  to work on exercises intended to teach them the features of Spark Streaming for processing live data streams ingested from sources like Apache Kafka, sockets or files, and storing the processed data in HBase.

Spark GraphX 
This Hands on lab  will help you get started using Apache Spark GraphX with Scala. GraphX is the Apache Spark component for graph-parallel computations, built upon a branch of mathematics called graph theory. It is a distributed graph processing framework that sits on top of the Spark core.
  • Use Apache Spark GraphX to Analyze Flight Data
    • Describe GraphX
    • Define a property graph
    • Perform operations on graphs
      • Lab: Apply graph operations
Spark Machine Learning
Decision trees are widely used for the machine learning tasks of classification and regression. This  HOL will  help you get started using Apache Spark’s MLlib machine learning decision trees for classification.
  • Use Apache Spark MLlib to Predict Flight Delays
    • Describe Spark MLlib
    • Describe a generic classification workflow
    • Describe common terms for supervised learning
    • Use a decision tree for classification 
    • Lab: Create a DecisionTree model to predict flight delays on streaming data


Carol Mcdonald: Carol is an HBase Hadoop instructor at MapR. She has extensive experience as a software developer and architect, building complex mission-critical applications in the banking, health insurance and telecom industries. Carol has over 15 years of experience working with Java and Java Enterprise technologies in many roles of the software development life cycle, including design, development, and technology evangelism. As a Java Technology Evangelist at Sun Microsystems, Carol traveled worldwide, speaking at Sun Tech Days, JUGs, companies, and conferences. Previously in her career, Carol was a software developer for Shaw Systems, Hoffman La Roche, and Digital Equipment Corporation. Carol holds a BS in Geology from Vanderbilt University, and an MS in Computer Science from the University of Tennessee-Knoxville.