Speaker "Doohee You" Details Back
-
Name
Doohee You
-
Company
Hulu
-
Designation
Senior Research And Software Engineer
Topic
Silhouettes; A cheap computational graphical approach to the validation of big data classification.
Abstract
Demand of computational cheap unsupervised classification validation is increasing due to fast growing need of data drive business decision development. Most commonly accepted graphical classification validation method is silhouette coefficient, which requires expensive computation due to computation theory. Hence, this paper suggest a computationally cheap and innovative application of cluster validation method. Method: Mean and SD of each cluster (N=3,777,481, k=5) is clustered from unsupervised classification method and plot it on one dimensional graph by each variable using layer of histograms to identify overlap of cluster which represents poor classification. Overlap of historgram is equivalent to negative silhouette coefficient. Results: Less than quarter of (µ=18.03% SD=0.02) of time spent for graphical validation of cluster classification performance compare to using silhouette coefficient plot for validation. This method provides graphical aid to understand distribution of mean and SD that shows overlap of each cluster to understand classification validity using fractional computation resources.. Conclusions: Mean and SD driven cluster value distribution graph allows significantly fast classification validation than Silhouette coefficient driven method. Impact: Reducing computation time and power will allow faster unsupervised classification validation to support prompt decision making in fast moving data-driven business.