Building Genomic Data Processing and Machine Learning Worflows using Apache Spark


At Epinomics, we are advancing epigenetic research to drive personalized medicine, using epigenomic data analysis. Our goal is to provide an analysis resource to the community that will promote high quality, replica-table, and interpretable results. We work with academic and commercial users to get their genomic sequencing data and metadata in our system. We find some epigenetic features from the sequenced genome, which are called ""chromatin accessibility"" which is indicative of the instrumental epigenetic changes responsible for differential gene expression and disease development. We have a spark based pipeline which retrieves chromatin accessibility data from the epigenome and runs analysis finding overlapping accessibility using GraphX, cluster this data and run machine-learning algorithms. In this talk we will provide a primer on epigenomics, details about how we have built a spark based data pipeline focusing on parallel bioinformatic analysis and using machine learning models to learn the isights for building Epigenomic landscape to help accelerate the personalized immuno-therapy field.


Anupama Is Director for AI engineering at Target. She is leading multiple teams of data scientists and engineers to effectively deliver impactful digital experiences for our customers.Her teams power guest personalization for offers, products , personalized search and platform to serve millions of recommandations .
Before Target she was responsible for taking the search for USA's 6 th most visited site Reddit to next level. Prior to Reddit she spent 3 years leading the engineering efforts for a genomic data startup, Epinomics, which got acquired by 10xGenomics