Back

Speaker "Connor Carreras" Details Back

 

Topic

Best Practices for Effective Large-Scale Data Wrangling

Abstract

In the past few years, "data wrangling" has become a hot buzzword in the data science and analytics spaces. You're probably familiar with the statistic that up to 80% of time during an analytics project is spent preparing, cleaning, or munging data--in fact, you may have experienced this challenge yourself. For a single user, decreasing the time spent wrangling data can be as simple as using new tools or more deeply profiling your data. But how can you efficiently scale data wrangling to an entire department or enterprise? Ideally, an effective large-scale data wrangling practice can lead to significant time savings by allowing analysts and data scientists to collaborate and re-use each others' transformation logic. This session will explore the role of data wrangling in a data science and analytics pipeline, and explain how you can make data wrangling a repeatable, reusable part of that pipeline. You will hear examples of how some of the largest organizations are leveraging enterprise-scale data wrangling to accelerate analysis processes and uncover new sources of business value. This session will also discuss the ideal workflow for data wrangling, and the user profiles for effective data wrangling.

Profile

Connor Carreras is a Senior Customer Success Manager at Trifacta, where she uses cutting-edge data wrangling techniques in support of customers’ big data initiatives. Connor brings her prior experience in the data integration space to help customers understand how to adopt self-service data preparation as part of an analytics process. She is also a co-author of the upcoming O'Reilly book, Fundamentals of Data Wrangling.