Back

Speaker "Aedin Culhane" Details Back

 

Topic

ENTER THE MATRIX: UNSUPERVISED FEATURE LEARNING WITH MATRIX DECOMPOSITION TO DISCOVER HIDDEN KNOWLEDGE IN HIGH DIMENSIONAL DATA

Abstract

Supervised learning is among the most powerful tools in data science but it generally requires a training dataset in which one knows the classes of the input features apriori. Unsupervised learning is applied when data is without labels, the classes are unknown or one seeks to discover new groups or features that best characterize the data. I will provide an overview of unsupervised learning algorithms, including dimension reduction and matrix factorization approaches that learn low-dimensional mathematical representations from high-dimensional data. I will describe and do my best to demystify matrix factorization approaches, including principal component analysis, correspondence analysis, non-negative matrix factorization, t-SNE and methods for simultaneously learning the structures of multiple data sets

Profile

Aedin Culhane, PhD , develops computational approaches to integrate and analyze large scale cancer genomics data at Biostatistics and Computational Biology at the Dana-Farber Cancer Institute, Harvard TH Chan School of Public Health  She is an R developer and maintains several Bioconductor/R packages for clustering, matrix factorization and integrative exploratory analysis of big data in genomics.  She is a member of the technical advisory board for Bioconductor, a founding member of the Boston R/Bioconductor for genomics meetup and is an advocate for reproducibility in academic research.