Speaker "Idris Iqbal Tarwala" Details Back



Predicting Hybrid-Cloud Capacity with Regression Models and Steady State metrics


At Walmart there are hundreds of products under each technology pillar. These typically consist of micro services or applications that are deployed in the thousands into the hybrid cloud (public and private), Managing these computes at this scale warrants capacity management which is completely data driven and backed by Machine Learning. To solve this problem, A Capacity Analysis Tool (internally codenamed Thanos) was designed specifically to cater to Walmart E-commerce’s needs. It captures both real-time and historical performance and app metrics of an application which are deployed in a Hybrid cloud environment (Private and Public Datacenters). The metrics in conjunction with other supplemental data allow thanos to auto scale the VMs in real-time and also forecast future capacity needs. In this talk I will be showcasing our exploration into data measurements that help us create a training model to infer the best deployment profile for apps based on constraints. The Algorithm at the heart of Thanos can be applied to any Hybrid-Cloud deployment scenarios, and to that end we are planning to open-source the algorithm and associated tooling in the near future.
Who is this presentation for?
DevOps/SRE's/Management/App Developers
Prerequisite knowledge:
What you'll learn?
- How to manage capacity at scale. - How to capture and store relevant Vm and app metrics data - Which ML models to use for predictive scaling


Idris joined Walmartlabs performance team in 2018, with the focus on performance testing Walmart's Hybrid Cloud Infrastructure to optimize capacity and maximize performance. Prior to Walmartlabs he worked at Xilinx, and comes with 7+ years of experience in silicon chip design, embedded software & cloud infrastructure. He was leading the platform team at Xilinx to launch the first FPGA as a service in the cloud for AWS, Alibaba and Huawei cloud services.