Welcome to number 4 in our series of posts that look at what is on the IT radar and whether you need to have it. In our previous posts we examined:
This post is all about using artificial intelligence (AI) for IT operations – or AIOps. And in this case, AIOps for your network management. We’ll help you understand AIOps, how you can use it and whether you should.
What is AI Ops?
AIOps is a term we first heard from Gartner a few years ago[1] and is frequently used by many tech vendors in their marketing. At its most basic, AIOps is the application of AI and Machine Learning into IT operations. For network management this may mean its use in data processing, data monitoring and alert routing – all the way up to automation of the response. AIOps sorts through all the noise to highlight significant alerts and events for human intervention, or in some cases trigger action.
AIOps is of interest not only because of the sheer volume of telemetry and operations data generated across cloud and on-premises applications, but also because we are trying to find better ways to handle and use data to make better, informed decisions. We are looking for the hidden gems inside our data.
It seems we simply may not be able do these things without help.
Getting started with AI Ops
Choosing a vendor
AIOps is not baked into every piece of infrastructure and different vendors approach AIOps in different ways. Some solutions are quite comprehensive including auto-discovery and automated dependency mapping, while others utilise pre-planned scripts to log in to devices and run specific commands. With such a variance, true AIOps vendor independence doesn’t really exist yet, and while the pre-planned script type of capabilities come close, they don’t really meet the definition of AIOps that we’re seeking.
Because of this, you do need to look carefully at each vendor to make sure their approach and capabilities will meet your needs moving forward. The reality is you’re likely to end up with various AIOps capabilities in your environment based on the infrastructure vendors you choose to work with.
Network automation
If you want to allow AI to take control of your network, you need it to be able to:
- Identify if there is a problem
- Flag the issue
- Take automated action if appropriate, or
- Alert so that someone else can act on behalf of the agent running it.
The natural path is then for AI to lead the automation of responses. While taking this path is optional in terms of your network management, this is where you can really see the benefits of AIOps. This automation is also attractive to executives as it provides opportunities to make better use of existing IT resources on other higher-value or mission critical activities.
The challenge here is the way IT teams feel about the concept of automation taking control and action, making changes on your network, where you would normally require change approval and human intervention.
The biggest risk to organisations taking an AIOps path is not having proper automation planning. Thinking about all the scenarios that may occur out of your automation method is essential if you don’t want your automation and response to lead to risk, or chaos. All scenarios need to be factored in for your AIOps to be able to respond and handle them correctly.
While there are benefits to going down the full automation path, there may still be reasons for your organisation to avoid it or implement it slowly.
Getting started with AIOps
The suggested starting point for implementing AIOps in your network is to start with network monitoring, then move to network management and testing automation.
A simple way to get started is to use it to filter your data and alerts so you can start to understand what is relevant and important, rather than the issues you want to be aware of, but don’t want to take action on. Commence using AIOps for your network monitoring and start to build a database of:
- What your alerts are
- Actions taken for specific alerts and scenarios.
Once you have compiled your database from actual alerts, you can start to understand the specific scenarios to factor in when you are ready to introduce AIOps into your network management and later, for response automation.
You need to be aware that AI requires time to learn, and this learning process is where you get your benefits. The learning curve that your AI engine needs to go through in its initial monitoring period, before it can even start performing other operational functions, could be lengthy and dependant on:
- The amount of data available to help it learn - Some platforms require large amounts of data before they can learn enough to present you with useful information.
- The complexity of your network - The presence of complex routing, constant config changes and backend processes are factors that impact the rate of learning.
You need to be prepared to give your AI solution time to learn before you can even start to think about quantifying ROI.
Should I be using AIOps?
Despite the obvious benefits in the long term, our advice is not yet.
Given the potentially lengthy learning curve and stages that you need to go through to successfully implement AIOps without introducing chaos into your network, there are less incentives to get started now. Instead, having the right management tool in place will still provide some of the benefits around visibility, automated alerts etc.
Instead, AIOps will be most beneficial in very large sites with huge volumes of existing data which could be used immediately to help reduce this lengthy learning curve, delivering an ROI a lot quicker.
Also, the inability to implement a vendor independent solution could be another reason to wait. If you want to have different vendors for aspects of your network e.g., your wireless, switching etc., the solutions are not there yet. When on-premises monitoring platforms are available as a solution, this is probably a good indication of when AIOps has reached the tipping point of being something that you should have.
[1] Forbes: AIOps: What You Need To Know
What’s the score?
So, we’re giving AIOps a Matrix Importance Score of 3 – Useful: Keep an eye on this, but nothing you need to do now.
For those of you joining us for the first time, and to refresh the memory of returnees, a Matrix Importance Scale (MIS) is used to measure the level of importance of this technology for you:
- 1. Yawn: Don’t waste your time unless you’re bored
- 2. Interesting: Could be good to know just in case
- 3. Useful: Keep an eye on this, but nothing you need to do now
- 4. Important: You definitely need to know about this - now
- 5. Critical: What do you mean you haven’t started implementing this?