Dynamic Threshold

Introduction

In the domain of system monitoring, the conventional practice of setting fixed thresholds for alerts has been a long-established standard. However, this approach frequently fails to accommodate the dynamic characteristics of contemporary IT environments. The introduction of thresholdless alerting represents a paradigm shift, offering several notable advantages over traditional fixed thresholds.

Dynamic alerting leverages advanced analytics, machine learning, and other sophisticated techniques to detect anomalies and potential issues in real-time without the need for predefined thresholds. This means that the system can adapt to changing conditions, user behaviors, and varying workload during the day, providing a more accurate and responsive monitoring framework.

One of the primary advantages is the reduction in false positives and negatives. Fixed thresholds can often be either too strict, resulting in a flood of alerts, or too permissive, leading to missed critical issues. Dynamic thresholding, by learning the normal behavior of a system, can better distinguish between normal fluctuations and genuine anomalies, improving the signal-to-noise ratio and ensuring that only actionable alerts are generated.

Another key benefit is the ability to anticipate and predict issues before they escalate. By understanding the patterns and trends in system behavior, dynamic alerting can provide proactive insights that allow for early intervention and preventative maintenance. This not only helps in maintaining system uptime but also in optimizing resource allocation and performance.

In addition, dynamic alerting can enhance operational efficiency. It reduces the need for constant manual adjustments to thresholds, freeing up IT personnel to focus on more strategic tasks. It also provides a more comprehensive view of system health, enabling faster and more effective decision-making.

Beginning with SAP Focused Run 5.0 SP00, the dynamic threshold approach has been integrated into the System Monitoring scenario, offering an alternative to the static threshold method. This new approach leverages additive model time series analysis, which is a feature of the SAP HANA Predictive Analysis Library (PAL). The model is commonly known as Prophet forecasting.

Prerequisites

Following prerequisites must be fulfilled.

To benefit from the new feature, the PAL must be deployed on your SAP Focused Run system as a first step. This step can be skipped, if metric forecasting is already in use. For details, please refer to Predictive Analytics Setup – Metric Forecasting chapter of SAP Focused Run Master Guide. 

Also, please pay attention to the upgrade post processing steps mentioned in the SAP note 3540663(opens in new tab)
Alternatively, a new background job can be scheduled via MAI_TOOLS --> Administration --> Schedule all MAI Jobs --> Option 1.

To predict the thresholds, a certain amount of stored data is required (100 per default). That means, for a newly configured system, it will take some time to reach a minimum number of required data points.

Per default none of the standard delivered metrics having the dynamic threshold type active. Means, it must be activated manually for every metric individually. Please note, that this cannot be enabled for any metric. It can be activated only for numeric threshold types. Moreover, current model is only trained for absolute numeric values like "Dialog Response Times in ms". Means, metrics showing the percentage values, like "CPU Utilization (%)" might not provide the desired result.

 

 

Configuration

To enable the dynamic threshold approach, select a suitable metric from any custom template. On a tab "Threshold" there is now an additional option for all numeric threshold types, called "Use Dynamic Threshold". By ticking it, one of the below options can be selected from the drop-down menu.

a) Use Dynamic Threshold Only: In this configuration, only the dynamic threshold is applied to determine metric ratings, provided that a model exists. This option is suitable if you prefer the alerting to rely solely on the model's predictions.

b) Combine Dynamic Threshold with Defined Thresholds: In this mode, both dynamic and standard thresholds are evaluated. The metric rating is determined by the worst-case rule - if either the dynamic or standard threshold for yellow/red is met, the system applies the corresponding rating. This configuration is recommended if you require a "fallback threshold" to ensure alerts are raised independently of the model's forecast.

When the "Use Dynamic Threshold" option is enabled and a monitoring template has been re-applied, the system leverages an Artificial Intelligence (AI) time series algorithm from the HANA PAL library to compute thresholds and determine metric ratings. This model is trained using the historical numerical metric data specific to the configured managed object, enabling it to predict future metric values and establish appropriate thresholds.

In scenarios where a model is not yet available, such as with new managed objects that lack sufficient historical data, the system defaults to using predefined standard thresholds. Once a model is available, the system's behavior depends on the selected configuration option. 

 

Usage

The results can be seen in the System Monitoring application. The Metric Monitor will display per default in addition to the monitored values, the calculated dynamic threshold lines. 

In general, any numeric metric, whether it's standard or custom, can be set up for dynamic thresholding if the necessary conditions are fulfilled. However, the usefulness of the threshold values for a specific metric depends on the data being reported. For instance, metrics that mostly report zero as the monitored value, such as the number of errors or long-running processes, will not yield meaningful dynamic thresholds. Therefore, it is important to select the suitable metrics.

Below is an example of calculated dynamic thresholds for the ABAP Dialog Response Time metric, where the load is continuously measured.

 

Remark: it might take up to one hour, to see the switch from a static to dynamic approach after performed reconfiguration. 

Under The "Metric Details" you can find additional information, if dynamic threshold is used and which method is applied: 

  • Threshold Type: Numeric Threshold(Green/Yellow/Red)
  • Green to Yellow: 2000 ms
  • Yellow to Red: 3000 ms
  • Use Best rating of Last N: false
  • Use Dynamic Threshold: true
  • Dynamic Threshold Method: Use dynamic threshold only
  • Direction: Exceeds
Best Practice
 

It's advised to initially enable dynamic thresholding for just a few metrics and apply them to a limited number of systems. Metrics that represent performance with clear numerical values make excellent starting points, such as response time metrics. If the standard thresholds are currently set as counter types, they should be converted to numeric types before applying dynamic thresholding options.

The prerequisites for determining which metrics are suitable for dynamic thresholding generally include:

  •  Data Variability: The metric should exhibit variability over time, rather than being consistently static or zero.
  • Sufficient Data: There should be enough historical data to build a meaningful model of the metric's behavior.
  • Relevant Range: The metric should have values within a range that reflects normal and abnormal operating conditions.
  • Predictable Patterns: The metric should display identifiable patterns or trends, such as daily, weekly, or seasonal fluctuations.
  • Impactful Metrics: The metric should be relevant to the performance or health of the system being monitored, so that dynamic thresholds provide meaningful alerts.
  • Normalization: The metric should be normalized or consistently measured under similar conditions to avoid skewing the thresholding algorithm.
  • Response to Workload: Metrics that reflect system load or performance under varying conditions often make good candidates for dynamic thresholding.

These prerequisites help ensure that the dynamic thresholds generated are meaningful and useful for monitoring and alerting purposes.

FAQ

The whole raw monitoring data (up to 28 days) is considered. The aggregated monitoring data scheduled via Aggregation Framework, is not used.

Yes, as of Focused Run 5.0 FP01 this option will be offered.

No, at the moment this is not possible.

Yes, as of Focused Run 5.0 SP00, the best of last N approach is also available for the numeric thresholds.    

There might be several reasons for that.

  1. The dynamic threshold calculation did not run yet for a new metric.
  2. There is not enough data points to calculate the dynamic thresholds and therefore the static thresholds are used as fall-back.
  3. Combined approach has been applied and the static thresholds would be breached first. In this case, the dynamic ones are ignored.
    Remark: starting with Focused Run 5.0 FP01 both threshold types can be displayed.