Microsoft researchers released an open-source artificial intelligence (AI) framework for agents that operate in cloud environments. Dubbed AIOpsLab, it is a principled research framework that enables developers to build, test, compare, and improve AIOps agents. The framework is supported by Azure AI Agent Service. The AIOpsLab uses an intermediary interface, a workload and fault generator, as well as an observability layer that shows a wide array of telemetry data. Notably, the company said that a research paper on the framework was accepted at the annual ACM Symposium on Cloud Computing (SoCC’24).
Microsoft Releases AIOpsLab for Cloud-Based Agents
Cloud-based services and enterprises that leverage them often face significant operational challenges, specifically in fault diagnosis and mitigation. AIOps agents, also known as AI agents for IT operations, are software-based tools that are used to monitor, analyse, and optimise cloud systems and solve these operational challenges.
Microsoft researchers, in a blog post, highlighted that when it comes to incident root cause analysis (RCA) or triaging, these AIOps agents rely on proprietary services and datasets, and use frameworks that only cater to specific solutions. This fails to capture the dynamic nature of real-world cloud services.
To solve this pain point, the company released an open-source standardised framework dubbed AIOpsLab for developers and researchers that will enable them to design, develop, evaluate, and enhance the capabilities of agents. One of the fundamental ways it solves the problem is by strictly separating the agent and the application service using an intermediate interface. This interface can be used to integrate and extend other system parts.
This enables the AIOps agent to address the problem in a step-by-step manner, mimicking real-life scenarios. For instance, the agent can be taught to first find the problem description, then understand the instructions, and then use available application programming interfaces (APIs) to call as actions.
The AIOpsLabs also comes with a workload and fault generator that can be used to train these AI agents. It can create simulations of both faulty and normal scenarios to enable the AIOps agents to gain knowledge of solving them and eliminate any unwanted behaviour.
Additionally, the AIOpsLab also comes with an extensible observability layer that offers monitoring capabilities to the developer. As the system collects a wide array of telemetry data, the framework can only show the data relevant to particular agents, allowing developers a granular way of making changes.
AIOpsLab currently supports four key tasks within the AIOps domain — incident detection, localisation, root cause diagnosis, and mitigation. Currently, Microsoft’s open-source AI framework is available on GitHub with the MIT licence for personal and commercial use cases.