Speaker "Shub Shrivastava" Details Back
-
Name
Shub Shrivastava
-
Company
Google
-
Designation
Solution Architect
Topic
Optimizing Gen AI Inference: A Path to 30% Savings on Managed Kubernetes
Abstract
Moving generative AI from proof-of-concept to production introduces significant challenges in cost, scalability, and operational complexity. This session dives deep into optimizing your AI infra to address these hurdles effectively. We will explore how leveraging a managed K8s platform provides the necessary scaffolding for high-performance model serving. Attendees will learn actionable architectural patterns for reducing overhead, including techniques for maximizing GPU utilization through advanced bin-packing and dynamic scheduling. We will also cover how to implement intelligent autoscaling that tightly matches infrastructure spend to real-time inference demand, and how to strategically utilize spot compute capacity without sacrificing reliability. Join us to discover how these optimizations can streamline your operations and lower your generative AI inference costs by up to 30%.
Who is this presentation for?
Engineers, architects, leaders
Prerequisite knowledge:
Kubernetes, AI Infra, Gen AI Inferencing
Profile
Based in the San Francisco Bay Area, Shub Shrivastava is a Senior Cloud Architect at Google Cloud who guides the world's leading companies in building the next generation of scalable applications. She specializes in the intersection of platform engineering and artificial intelligence, with a deep focus on AI Infra and Agentic AI workflows. As a trusted advisor to major enterprise customers, Shub's real-world insights have directly influenced the product roadmap, helping shape the future of cloud-native and AI platforms. She is a regular speaker at industry events, sharing best practices for building resilient and intelligent AI infrastructure