Speaker "Martin Lurie" Details Back



Monitoring adverse drug reactions with Spark Streaming, Kafka, and Impala


This session will outline how to simulate adverse drug reaction reporting.  We'll use the FDA Adverse Drug Reaction data and send it to an Apache Kafka message broker with a python client to simulate real time streaming from medical centers around the world.   To monitor The events as they occur we'll use Spark Streaming to subscribe to the Kafka message topic. We'll compare current values vs a window of prior events.

Multiple subscribers is a key benefit of using a Kafka Message Broker topic. Flafka will subscribe to the same topic as Spark Streaming and persist the messages for SQL analysis. For summary reporting of events we'll fire up Impala and use SQL. To make things look nice we'll connect Microsoft Excel to Impala to get a histogram of the data.  For investigating specific patients we'll simulate multiple case workers running queries with the JMeter test harness.  Since there are auditing and security requirements for medical data we'll illustrate some capabilities of Cloudera Navigator and Cloudera Manager.


Marty Lurie started his computer career generating chads while attempting to write Fortran on an IBM 1130. His day job is Hadoop Systems Engineering at Cloudera, but if pressed he will admit he mostly plays with computers. His favorite program is the one he wrote to connect his Nordic Track to his laptop (the laptop lost two pounds, and lowered its cholesterol by 20%). Marty is a Cloudera Certified Hadoop Administrator, Cloudera Certified Hadoop Developer, an IBM-certified Advanced WebSphere Administrator, Informix-certified Professional, Certified DB2 DBA, Certified Business Intelligence Solutions Professional, Linux+ Certified, and has trained his dog to play basketball. You can contact Marty at