Speaker "Sudha Viswanathan" Details Back



Walmart Customer Identity Graph - Powered by Spark


Overview: Walmart has multiple susbsidiaries and each one of them generates a unique customer id. Our goal is to identify our customers across channels and provide a 360 degree view of the customer. For example, when a customer shops in walmart store and then happen to login to, Walmart must be able to identify that customer as an existing store customer. Not only between stores and online world but also across channels, we need the same capability. An active customer of should not be treated as a new customer when he/she logs in to Every interaction and tracsaction data of walmart contains some form of customer identity (such as cookies, emailIDs, Walmart IDs, 3P IDs etc.). When such information is embedded within the streaming events, we need a platform to identify and link the identites belonging to the same customer. Hence we built the Customer Identity Graph platform using Spark processing engine, which uses Union find algorithm with path compression at the back end. I would like to present the following: 1. The journey of building the graph platform for Walmart customers that handles 20+ Billion vertices and 30+ billion edges and an incremental 200M new linkages every day. 2. Why we chose to build our own graph processing framework using Spark instead of using GraphX or other distributed graph databases. 3. How we handle Data Quality chanllenges 4. Optimization strategies implemented to overcome scalability and performance challenges faced while building and traversing the Graph 5. How the online servable identity graph enables high throughout with low latencies in real-time streaming Level of difficulty: intermediate
Who is this presentation for?
Big Data Engineers
Prerequisite knowledge:

What you'll learn?
Takeaway: The Feasibility of building your own Graph framework using Spark The idea of leveraging Graph in real-time to achieve high throughput


I am a lead Big Data Engineer at Walmart Labs pioneering in the area of building scalable and reliable platforms to enable Walmart to know its customers better and to delight their shopping experience. I have solid background in the full life cycle of data and systems to enable data driven decision making. Currently, I am building Customer Identity Graph, with 30+ billion nodes to enable Walmart to know its customers irrespective of the channel which brings them to Walmart; Machine Learning is helping me solve Graph data quality issues, which is otherwise a near-impossible mission. Previously, I worked at JP Morgan Chase and I have productionised machine learning pipelines that solved critical business challenges.