Speaker "Brian Hess" Details Back
-
Name
Brian Hess
-
Company
Datastax
-
Designation
Product Management
Topic
Big Data Analytics with Cassandra and Spark
Abstract
Apache Cassandra is the leading distributed database in use at thousands of sites with the world’s most demanding scalability and availability requirements. Cassandra's bread and butter is being able to serve up millions of concurrent transactions (reads, writes, updates) while providing zero downtime and linear scalability. The world is not just transactional data, however, and there is a need to analyze the transactional data captured and served in this online transactional system. Apache Spark is a distributed data analytics computing framework that has gained a lot of traction in processing large amounts of data in an efficient and user-friendly manner. It comes with a suite of tools from bulk analytics, SQL support, machine learning, graph analytics, and streaming capabilities. All it needs is data to process. The combination of Spark and Cassandra provides a powerful combination of real-time data collection with analytics of that data for deep insight. After a brief overview of Cassandra and Spark, this class will present an overview of various aspects of the integration of Cassandra and Spark.