Session: Streaming AIOps on Kubernetes at Scale

AIOps platforms utilize big data, machine learning and other analytics technologies to do IT operations such as anomaly detection, causality determination, deployment rollback with automation. AIOps can be used to do proactive monitoring and alerting on the health of applications to decrease MTTD, it also can be used for faster decision making during incidents for system restoration, so that MTTR can also be reduced.

In this talk, Derek and Vigith will walk through building a large scale AIOps platform with open source technologies, to do anomaly detection on Prometheus metrics from the applications, and achieve auto rollback when high anomaly happens. Those who attend this presentation will learn:

  • How to run in-cluster streaming processing pipelines for anomaly detection, and get it integrated with application deployment.
  • How to get this AIOps platform deployed in hundreds of Kubernetes clusters to serve thousands of applications.

Presenters: