In recent years, the publicly traded observability service New Relic started adding more machine learning-based tools to its platform for AI-assisted incident response when things don’t quite go as planned. Today, it is expanding this feature set with the launch of a number of new capabilities for what it calls its “New Relic Applied Intelligence Service.”
This expansion includes an anomaly detection service that is even available for free users, the ability to group alerts from multiple tools when the models think it’s a single issue that is triggering all of these alerts and new ML-based root cause analysis to help eliminate some of the guesswork when problems occur. Also new (and in public beta) is New Relic’s ability to detect patterns and outliers in log data that is stored in the company’s data platform.
The main idea here, New Relic’s director of product marketing Michael Olson told me, is to make it easier for companies of all sizes to reap the benefits of AI-enhanced ops.
“It’s been about a year since we introduced our first set of AIops capabilities with New Relic Applied Intelligence to the market,” he said. “During that time, we’ve seen significant growth in adoption of AIops capabilities through New Relic. But one of the things that we’ve heard from organizations that have yet to foray into adopting AIops capabilities as part of their incident response practice is that they often find that things like steep learning curves and long implementation and training times — and sometimes lack of confidence, or knowledge of AI and machine learning — often stand in the way.”
The new platform should be able to detect emerging problems in real time — without the team having to pre-configure alerts. And when it does so, it’ll smartly group all of the alerts from New Relic and other tools together to cut down on the alert noise and let engineers focus on the incident.
“Instead of an alert storm when a problem occurs across multiple tools, engineers get one actionable issue with alerts automatically grouped based on things like time and frequency, based on the context that they can read in the alert messages. And then now with this launch, we’re also able to look at relationship data across your systems to intelligently group and correlate alerts,” Olson explained.
Maybe the highlight for the ops teams that will use these new features, though, is New Relic’s ability to pinpoint the probable root cause of a problem. As Guy Fighel, the general manager of applied intelligence and vice president of product engineering at New Relic, told me, the idea here is not to replace humans but to augment teams.
“We provide a non-black-box experience for teams to craft the decisions and correlation and logic based on their own knowledge and infuse the system with their own knowledge,” Fighel noted. “So you can get very specific based on your environment and needs. And so because of that and because we see a lot of data coming from different tools — all going into New Relic One as the data platform — our probable root cause is very accurate. Having said that, it is still a probable root cause. So although we are opinionated about it, we will never tell you, ‘hey, go fix that, because we’re 100% sure that’s the case.’ You’re the human, you’re in control.”
The AI system also asks users for feedback, so that the model gets refined with every new incident, too.
Fighel tells me that New Relic’s tools rely on a variety of statistical analysis methods and machine learning models. Some of those are unique to individual users while others are used across the company’s user base. He also stressed that all of the engineers who worked on this project have a background in site reliability engineering — so they are intimately familiar with the problems in this space.
With today’s launch, New Relic is also adding a new integration with PagerDuty and other incident management tools so that the state of a given issue can be synchronized bi-directionally between them.
“We want to meet our customers where they are and really be data source agnostic and enable customers to pull in data from any source, where we can then enrich that data, reduce noise and ultimately help our customers solve problems faster,” said Olson.