Speed is key to being a digital leader and providing great holistic experiences to customers. The focus on speed of innovation, time to market, and mean time to resolution of problems, have changed how software is developed, built, and run.

Traditional approaches can’t keep up with highly dynamic web-scale environments

To keep up with the complex software environments that are required to provide the necessary agility, enterprises are aiming for automation of their software operations, leveraging the power of AI. This is necessary because manual approaches don’t work in highly complex and dynamic web-scale environments. And, automation enables teams to focus on innovation rather than operations.

The challenge, though, is that the information required to automate IT operations traditionally comes from multiple disjointed sources. This siloed approach makes it hard to interpret data in an intelligent way.

Is data correlation good enough?

Early AIOps tools tried to resolve this challenge by acting as a “manager of managers,” ingesting and consolidating the data provided by the other data sources. As this approach relies heavily on data correlation, the quality of the analysis typically isn’t precise enough to support automation. Imprecise data can lead to incorrect conclusions (for example, it’s raining and many people are opening up umbrellas. With this data alone, data correlation might suggest that the umbrellas are causing the rain).

For a fun read about ridiculous data correlations between things that don’t actually have anything to do with one another, see Spurious Correlations.

Dynatrace changes the AIOps game by opening up the Davis AI-engine

Dynatrace approaches AIOps differently.

  • Dynatrace can replace multiple other data sources and provide high fidelity data in a consistent data model. With a single OneAgent installation per host, you can collect all relevant metrics along your application-delivery chain.
  • Dynatrace not only sees every component, it also understands how everything is connected—it understands the relationships and interdependencies between each layer, component, and bit of code in your application environment.

This means that external data and events are ingested in context with a Smartscape topology model. In this way, Dynatrace can provide true causation insights and precise root-cause analysis.

“Dynatrace’s deterministic AI allows us to be confident in its answers and automate remediation without human interaction, paving the way for autonomous cloud operations. Knowing that Dynatrace is delivering precise answers in context is enabling us to further evolve self-healing workflows that remove the need for human intervention in the resolution of recurring problems.”

Jonathan Hayes
VP Global IT Service Excellence, Experian

Dynatrace AI root-cause analysis seamlessly integrates custom metrics and 3rd party events

Here are a few examples of the types of automated insights that you can gain by feeding Davis, the Dynatrace AI engine, with additional external data that isn’t captured by OneAgent:

  • Identify a problem deployment (and who triggered it) when something goes wrong.
  • Configuration events can be identified as the root cause of detected problems (for example, a DNS change event).
  • Ingested events can be correlated with auto-detected peaks during load tests.

How to get started

The Dynatrace platform enables the ingestion of customer-defined metrics and events through custom OneAgent plugins and the REST API (for full details, check out the Dynatrace developer program).

Plugins for third-party integrations can be a great resource for additional root-cause information. For example, it can promote tighter integration between your continuous integration and deployment toolchain by providing information about recent rollouts, responsible product owners, and even remediation actions.

Once you’ve set up ingestion of external data (see this example of how to push deployment events from Jenkins), the new root-cause analysis engine analyzes both the custom metrics as well as the custom events that are pushed from your third-party integrations.

Custom metrics as a source for root-cause analysis

Below is an example where a user has defined a specific JMX metric with the title Account creation duration where the time needed to create a new account is reported. Once the JMX metric is registered and monitored, it becomes a first-class citizen within the Dynatrace root-causation engine.

In case of a problem that affects real users of this application, this JMX metric will be analyzed automatically. If the metric reveals abnormal distribution compared to the past, the metric will be referenced within the root cause of the problem, as shown below:

Using third-party events for root cause analysis

Third-party events represent additional external data that the new root-cause analysis collects while traversing problem-affected topology.

Such events are either be semantically predefined (for example, deployments, configuration changes, or annotations) or they can be generic problem events that are triggered by changes in severity level (for example, availability issues, errors, slowdowns, resource saturation, or just informational events). External events can even contain key-value pairs that add additional contextual information to detected events.

See the following example of a third-party deployment event that was sent through our REST-based Events API and identified within the root-cause analysis process.

Adopt AIOps and start automating with Dynatrace today

During the past four years, the Dynatrace AI has proven that fully-automated analysis of problems is the only valid approach, especially in highly dynamic microservices environments where manual root-cause analysis can be nearly impossible.

Now, we’ve not only opened up Davis, the Dynatrace AI-Engine, we’ve also improved it significantly, making it easier for your organization to adopt automation and AIOps.

All the improvements detailed above have once again pushed the boundaries of automated AI based root-cause analysis far into the future and opened up Dynatrace as a platform for third-party integrations and numerous AIOps use cases.

Read more about the detailed capabilities within our help on Dynatrace next-generation root-cause analysis.