Observability and AIOPs


Observability and AIOPs (Artificial Intelligence for IT Operations) are two related concepts that have emerged to tackle the increasing complexity of modern software systems. Observability involves monitoring and understanding the internal state of a system through telemetry data, while AIOPs uses machine learning to automate tasks such as predicting potential problems and recommending actions to prevent failures. Together, these concepts can help organizations improve the reliability, performance, and efficiency of their operations by providing insights and automating tasks that were traditionally performed by human operators.  We will deep dive into these concepts in this session to understand how observability and AIOPs can improve overall reliability of cloud services and customer experience.


Shailesh is a senior Engineering executive with over 20 years of experience leading large Engineering and SRE teams in areas of networking subsystems, load balancers (ADCs), high availability, observability engineering, multi-cloud, and AI.  He led key AI and ML initiatives at Adobe and prior companies in areas of AIOPs to detect, resolve and predict customer issues faster.  He speaks at conferences on SRE, observability, AI/ML, stress management/wellness related topics.