Service Availability Management

Service Availability Management in SAP Focused Run provides availability reporting for business-critical systems, databases or services. It calculates the availability based on outages detected by system monitoring and compares it to defined availability Service Availability Levels. Unplanned outages are automatically imported from system monitoring. Planned downtimes are automatically imported from work mode management. They must be reviewed and adjusted by system administrators and confirmed by IT service managers or other supervisors before they are taken into account in the availability reporting. The adjusted downtime data is called service outages. Service Availability Management can be the single source of truth for Availability Service Level Agreement reporting.

The high level process is as follows:


Service Availability Management consists out of the following pages:

  • The Overview page shows the calculated availability for the selected entities.
  • The Outages page shows the outages for the selected entities and allows to create or maintain outages.
  • The Service Availability Definitions page shows the Service Availability Definitions for the selected entities and allows to maintain additional service definitions. 
  • The Analytics page  provides up time reporting.

Prerequisites for using Service Availability Management:

  • Simple System Integration was performed for each system, database or service that has to be managed by Service Availability Management

 Service Availability Management Configuration:

  • Maintain a Service Definition for each system, database or service that has to be managed by Service Availability Management as explained below in section "Service Availability Definitions"
  • Perform optional Customizing as explained below
  • As of SAP Focused Run 3.0 FP02: Adapt the scheduling of job "SAP_FRN_SAM_SPLIT_OUTAGE" as explained in the master guide section 6.1.3.

Usage

How to open Service Availability Management

  1. Open the Launchpad
  2. Select Advanced System Management or Advanced Application Management
  3. Select Service Availability Management

Overview/Service Reporting

The Service Reporting page shows the calculated availability for the selected systems, databases and services in selected  reporting periods. It gives you a quick overview whether the availability service level agreement was met or breached.


From the Service Reporting you have the following options:

  • Switch between a monthly or yearly display depending on the defined reporting period. 
  • Select different reporting periods 
  • Select whether the displayed availability is calculated based on confirmed outages only or based on all outages. 
  • Select a reporting period to open the Availability Charts for the selected reporting period and compare the availability of multiple systems /services
  • Select an entity to open the Availability Charts for this entity  and compare current and previous reporting periods

Availability Charts

From the Availability Charts  you have the following options:

  • Compare the availability for multiple systems in one reporting period or the availability for a system in multiple reporting periods.
  • Select one system to drill down to the availability for months or days and to identify the months or days where outages occurred and the availability dropped.
    (Whether you can drill down to days or drill down to months depends on the reporting period defined in the Service Availability Definition.)


Outage Summary

The Outage Summary view provides an overview over open and confirmed outages and Service Level Agreement.


The following data is shown:

Entity and Entity typeIdentifier and type of the system, database or service

Reporting Period

The reporting period for which outages were recorded

SLA Breached

The SLA is breached when calculated availability based on confirmed outages is below the availability SLA for the reporting period

Open outages

Number of outages in status New, in process, to be reviewed. They need to be processed and confirmed or hidden

Confirmed outages

Number of outages that have been processed and are confirmed

Unplanned Confirmed DowntimeDuration of all confirmed unplanned outages in minutes 
Unplanned Total Downtime Duration of all (confirmed and unconfirmed) unplanned outages in minutes 

Remaining Downtime Confirmed

Shows how many minutes are left in the current reporting period until the SLA is breached. If the value becomes negative, SLA is already breached. It considers only the confirmed outages.

Remaining Downtime All 

Shows how many minutes are left in the current reporting period until the SLA is breached. If the value becomes negative, SLA is already breached. It considers both confirmed and unconfirmed outages.

Availability Confirmed (%)

Shows the calculated availability in the current reporting period. It considers only confirmed outages for the calculations.

Availability All (%)

Shows the calculated availability in the current reporting period. It considers both confirmed and unconfirmed outages.

Availability Threshold (%)

Shows the SLA threshold defined in the Service Definition

The Outage Overview  provides the following options: 

  • Use the filter button to filter on systems  with open outages and to filter on entities where SLA is breached
  • Select one or multiple entities and press the mass maintenance button. This opens the Outage Overview view showing all outages for the selected entities in the selected reporting period. From here you can do mass or single maintenance of outages.
  • Click on one system. This opens the Outage Overview view showing all outages for the selected system in the selected reporting period. From here you can do mass or single maintenance of outages.

Outage Overview

The outage overview shows the list of outages for the selected entities in the selected reporting periods. It allows to edit existing outages or create new ones.

The following data is shown:

DataContent
Entity and Entity typeIdentifier and type of the system, database or service

Type

  • Planned - The entry is a planned downtime and therefore normally not SLA relevant
  • Unplanned - It is an unplanned outage and therefore  normally SLA relevant

Status

  • New - Initial status
  • In process - It is in process by system administrator
  • to Be Reviewed - the outage has been processed by system administrator and waiting to be reviewed and confirmed or rejected by service managers or other supervisors
  • Confirmed - the outage has been confirmed by service managers or other supervisors and is used in SLA calculations and availability reporting

Category

The category of the outage as maintained in the Outage Details

SLA relevant 

Whether the outage is SLA relevant or not. Only the duration of SLA-relevant outages during agreed service times  is considered by availability reporting. Unplanned outages are most of the time SLA relevant while planned downtimes are not. But this can be changed in the outage details

Downtime (mins)

The duration of the outage in minutes . The duration is adjusted so that it shows only the parts of the outages that lie inside the agreed service time and outside contractual maintenance periods. If the complete outage is outside agreed service times or during a contractual maintenance period, the duration is shown as 0 .

Start and End time

Start and End time of the outage

Source

  • MAI - The outage has been created from on an availability alert for the selected entity after the current status of the alert turned from yellow or red to green. 
  • Work Mode - The outage has been created from a planned downtime in IT Calendar and Work Mode Management for the selected entity and has been transferred to SAM after the planned downtime was completed
  • Manual - The outage has been created manually in Service Availability Management

Hidden

Whether the downtime has been "hidden" or not. Hidden outages are outages that the system administrator wants to exclude from reporting because for example they are based on false alerts. Hidden outages are only shown if the filter "Show Hidden Outages" is set to "Yes".

Outages are displayed in the outage list if the following conditions apply:

  • Planned or unplanned  outages were created manually for the selected managed objects in the selected time frames
  • Unplanned outages that were created automatically from an availability alert for the selected managed objects in the selected time frame after the current status of the availability alert turned from yellow or red to green
  • Planned outages that were created automatically from a planned downtime work mode for the selected managed objects in the selected time frame after the work mode was completed 
  • Since SAP Focused Run 3.0 FP02:
    If availability alerts remain open over the month's end,  Unplanned Outages will be created automatically on the first day of the next month covering the last month. The end time stamp of the outages will be set to the last day the of month at midnight in the time zone of the service definition.  The comments section of the outage will contain the text "Outage Created from Batch job at month end". Another outage for this availability alert will be created later, if the status changes to green or if another month's end is reached. This change ensures that the monthly availability is calculated correctly also in case of longer lasting outages. 

Mass Maintenance of Outages

You can select one or several outages and perform the following mass maintenance actions for them together:

  • Hide Outage: Set the selected outages to hidden. This will hide them from the outage list and exclude them from availability reporting. Hidden outages can be shown if " Show Hidden Outages" is "Yes". Hidden outages are outages that the system administrator wants to exclude from reporting. For example, because they are based on false alerts.
  • Unhide Outage: Remove the hidden flag from the selected hidden outages. Afterwards they are shown again in the outage list and can be processed. Hidden outages can only be selected if " Show Hidden Outages" is set to "Yes".
  • Approve Outage: Set all selected outages to status confirmed. Confirmed outages will be used in SLA calculations and availability reporting
  • Reject Outage: Set all selected outages to status in process so that they must be maintained again by system administrator
  • Review: Set all selected outages to status to be reviewed
  • Modify: Date Time Maintain common start and end time for all selected outages together
  • Set reason: Maintain common reason for all selected outages together

Create Outage

Normally, unplanned outages are detected automatically by system monitoring and transferred to Service Availability Management. Planned downtimes are imported from work mode management or system monitoring. So, the manual creation of outages is only necessary in special cases.

Proceed as follows to create new outage:

  1. Select the Create Outage button
  2. Maintain the following data:

Data Content

Entity

  • The system, database or service for which the outage shall be created

Type

  • Planned -  The entry is a planned downtime and therefore normally not SLA relevant
  • Unplanned - It is an unplanned outage and therefore normally SLA relevant

Category

The category of the outage

SLA relevant 

Whether the outage is SLA relevant or not. Only the duration of SLA-relevant outages is considered by availability reporting. Unplanned outages are by default SLA relevant while planned downtimes are not. But this can be changed in the outage details

Start and End time

Start and End time of the outage

Reason

Textual description of downtime reason

Business Impact

Textual description of business impact

Other Comments

Other Comments

Click on "Save" to create the new outage. Click on Email to send a notification email about the new outage.

Please note: New outages are by default in status "New". They need to be reviewed and set to confirmed before they are taken into account for availability calculations.

Edit Outage

  1. Select one Outage. This opens the Outage Details screen. 
  2. Maintain the following data:

Data Content

Type

  • Planned -  The entry is a planned downtime and therefore normally not SLA relevant
  • Unplanned - It is an unplanned outage and therefore normally SLA relevant

Category

The category of the outage

SLA relevant 

Whether the outage is SLA relevant or not. Only the duration of SLA-relevant outages is considered by availability reporting. In most cases, unplanned outages are by default SLA relevant while planned downtimes are not. But this can be changed in the outage details.

Start and End time

Start and End time of the outage

Reason

Textual description of downtime reason

Business Impact

Textual description of business impact

Other Comments

Other Comments
As of FRUN 3.0 FP02 the "other comments" field might contain automatically generated information about the outage. You can keep the information or overwrite it. 

Status

  • New - Initial status
  • In process - It is in process by system administrator
  • to Be Reviewed - the outage has been processed by system administrator and waiting to be reviewed and confirmed or rejected by service managers or other supervisors
  • Confirmed - the outage has been confirmed by service managers or other supervisors and is used in SLA calculations and availability reporting 

If the outage is already set to completed you can only edit the reason, business impact and comments or revert the status.

Since SAP Focused Run 2.0 FP02 an  Alert Details tab is added for unplanned outages reported by MAI. On this tab, you can view some details of the originating alert such as Alert Name, Alert Status, and Alert Start Time.

Since SAP Focused Run 3.0 FP02, the comments section is in some cases  populated with some explanatory texts informing about contributing alerts, that the outage was split at month's end or in case of planned outages that it was scheduled on short notice and is therefore SLA relevant. It is possible to change these texts manually when processing the outages.

Service Availability Definitions

Each system, database or service that has to be managed by Service Availability Management needs to have an active service definition.

Select the Service Availability Definition page to see the service definitions for the selected systems.

The Service Availability Definitions Overview shows the following data for each service definition

Data Content

Status

  • Completed - Service definitions whose end dates has already passed ( today > end date)
  • Active - Service definitions that are active (start date < today < end date). For active service definitions, you can only change the end date
  • Inactive - Service definitions that start in the future (today < start date). Only inactive Service definitions can be deleted.

Title

Title

Entity and type

Identifier and type of the selected system, database or service

Start and end date

First and last validity date of the service definition

Configuration

Service Availability Definitions

Edit Existing Service Availability Definition

Select a service availability definition to see it's details. For existing service availability definitions, it is possible to change the end date and to add new Contractual Maintenance Periods. The other settings cannot be changed.

 

Add New Service Availability Definition

Select Button " Add new service availability definition" to create a new service availability definition:

To create a new service availability definition, you need to maintain the following data

General Data:

Data Content

Title

Title

Start and end date

First and last validity date of the service definition

Time Zone

The time zone in which availability patterns and contractual maintenances are defined

In the entities tab: 

Entity /Entity type: Select  system, database or service for which the new service definition is valid.

In the Availability tab:

DataContent

SLA Threshold (%)

The minimum allowed availability in %.

E.g.: 99.5 %, 95%

Reporting Period

The period for which the availability data shall be calculated. Possible values are monthly or yearly

Pattern

Define the daily or weekly pattern for the agreed service time during which the entity must be available per SLA.

Examples:

  • The agreed service time is 7X24 (7 Days 24 hours). Enter a daily pattern with start time 00:00 am and 24 hours 00 minutes duration
  • The agreed service time is 5X8 (from 8 am to 4 pm on work days). Select  weekly pattern. Select Monday to Friday with start time 08:00 am and 08 hours 00 minutes duration

In the Contractual Maintenance tab, you can define reoccurring periods and specific dates, during which maintenances are allowed without affecting the SLA. If Contractual maintenances overlap with agreed service times, the agreed service time is shortened on this day.

Contractual maintenances are shown in IT Calendar as an own event type.

Please note: Contractual Maintenance Periods do not suppress alerts while they are active. If you want to perform maintenance activities with system restart during a contractual maintenance period you need to schedule a planned downtime from work mode management to suppress alerts.

Best Practices

  • Define Service Definitions with an end date which is far in the future (e.g.31-Dec 2099) to avoid that service definitions expire unnoticed and availability reporting is no longer provided.
  • It is not allowed to change SLA threshold or agreed service times in active service definitions. Proceed as follows to change the SLA threshold or the agreed service time for a system with active service definition:
    1. Change the end time of the active service definition to the end of the current reporting period
    2. Create a new service definition with a start date in the next reporting period

Handling of HANA Replication Scenarios

HANA Replication Scenarios are defined as follows in SAP Focused Run:

  • 1 Virtual Database with relations to the related physical databases
  • 2-n physical databases that act either as primary or secondary database in the HANA replication scenarios

Before SAP Focused Run 2.0 FP01 the HANA replication scenarios were not taken into account:

  • A service availability definition could be created per each physical and virtual database alone.
  • Outages had to be maintained per database 
  •  Availability was calculated per database and not taking into account the complete cluster

 This has been improved as of SAP Focused Run 2.0 FP01:

  • A service availability definition can be created only on the virtual DB representing the complete replication cluster.
  • Unplanned Outages are created automatically for the complete replication cluster and only for the times when no primary database is available
  • Unplanned Outages are maintained for the complete replication cluster
  • Planned Downtimes are imported from SAM only for planned downtimes on the Virtual DB.

Service Availability Definitions on HANA DBs which have been created before the upgrade to SAP Focused Run 2.0 FP01 will not be affected.

 

Automating the creation of service definitions via Web Services

As of Focused Run 3.0 FP01 you can optionally automate the creation of service definitions using webservices instead of defining them manually via the Service Definition UI.

Optional Configuration: Prevent certain availability alerts from generating outages in Service Availability Management

By default, all system monitoring alerts of category Availability and context types Technical System and Technical Instance create an SLA relevant unplanned outage in Service Availability Management as soon as the alert was confirmed or the status turns to green.

You might want to exclude some availability alerts from creating outages in Service Availability Management.

As of Focused Run 2.0 FP02, this can be achieved as below:

  1. Identify the technical names of the availability alerts that should not lead to an outage in System Monitoring Template maintenance. 
  2. Call transaction SM30 in the FRUN System.
  3. Enter the table SAM_BL_ALRT_CONF.
  4. Select Maintain.
  5. Enter the technical name of the alerts . Enter one alert name  per row.
  6. Press save. Depending on your system settings, you might get prompted for a customizing transport request that allows to transport the table contents to other Focused Run systems.

Based on our experience we recommend you to exclude the following alerts from generation outages in Service Availability Management  (technical names are given): 

  • ABAP_INSTANCE_AVAILABILITY 
  • JAVA_INSTANCE_AVAILABILITY
  • SLT_SCHEMA_ALERT
  • HDB_ANOMALY_ALERT
  • HDB_RESTARTED_SERVICES_ALRT_4
  • HDB_REPLICATION_TIME_LAG_1018
  • HDB_REPLICATION_QUEUE_SIZE_1017
  • HDB_ANOMALY_ALERT
  • HDB_AVL_1015_REPLICATION_STATUS_ALERT

Optional Configuration: Handling of Instance Outages

By default, ABAP and Instance Availability alerts create SLA relevant unplanned outages in Service Availability Management and reduce the calculated availability of the affected systems. As instance availability alerts do not necessarily mean that the complete system was unavailable, this behavior is not always desired.

You have the following options:

  • You can exclude ABAP and JAVA instance availability alerts from generating outages in Service Availability Management as explained above.
  • As of SAP Focused Run 3.0 FP02 you can change the behavior so that ABAP and JAVA instance availability alerts are reported in Service Availability Management by default as non-SLA relevant outages instead of SLA relevant outages.  This means, they still appear in the outage list but they do not reduce the availability of the affected system as they are non-SLA relevant. This behavior is inactive by default. You can activate it by maintaining table WMM_CUSTOM in transaction SE16. Add an entry with field "SAM_INSTANCE_OUTAGE_NO_SLA " and value = "X".
    While the outages are in process, the SLA relevant flag can still be set manually if required.

 

Optional Configuration: SLA relevant "emergency " planned downtimes and planned downtime extensions

By default, planned downtimes create non-SLA relevant planned outages in Service Availability Management and do not reduce the calculated availability of the affected systems. In some case this is not desired.

  • If a planned downtime is scheduled less than x minutes in advance, for example to restart the system after an unplanned parameter change, it should be treated as an SLA relevant "emergency" planned downtime and reduce the availability of the system. 
  • If a planned downtime is in progress and the executed maintenance activity takes longer than originally planned, it might be necessary to extend the end time of the planned downtime. The originally planned part should be treated as non-SLA relevant and does not reduce the calculated  availability. The extended part of the planned downtime should be treated as SLA relevant and reduces the availability

As of SAP Focused Run 3.0 FP02 you can activate this behavior by maintaining table WMM_CUSTOM in transaction SE16. Add an entry with field "WMM_EMERGENCY_PD_MIN" and value = <xx>. <xx> is a value in minutes.

For example, if table WMM_CUSTOM contains an entry with field "WMM_EMERGENCY_PD_MIN"  and value 60 the following will happen:

  • You schedule a planned downtime in IT Calendar starting 59 minutes from now. When the planned downtime is completed it will appear in Service Availability Management as a SLA relevant planned outage.
  • You schedule a planned downtime in IT Calendar starting 61 minutes from now. Once the planned downtime is completed it will appear in Service Availability Management as a non-SLA relevant planned outage.
  • You schedule a planned downtime for tomorrow from 3-4 pm to perform some maintenance activity. Tomorrow at 3.55 pm you realize that the maintenance activity takes longer than anticipated. You extend the planned downtime in IT Calendar from 4 pm to 5 pm. Once the planned downtime is completed it will appear in Service Availability Management as two  planned outages. The first planned outage lasts from 3-4 pm and is non-SLA relevant. The 2nd planned outage lasts  from 4-5 pm and is SLA relevant. A default text in the comments section explains that this planned downtime was extended while it was in progress.

 

Optional Configuration: Outage Customization

You can define the following from the Service Availability Management Page Outage Customization:

  • Unplanned outages reported by MAI that are shorter than X  minutes are automatically hidden so that no one needs to process them and they don't appear in the outage list.
  • Planned outages reported by work mode management are automatically  set to non-SLA relevant are confirmed.
    This setting must not be used in combination with WMM_CUSTOM parameter WMM_EMERGENCY_PD_MIN,

 

Service Availability Management Reporting

Advanced Analytics & Intelligence provides additional reporting capabilities for Service Availability Management.

 

Additional Information

How the Availability is Calculated

The availability is calculated as follows: Availability (%)= (1 - OT  / AST ) * 100 %

  • AST ( Agreed Service Time ) is the duration of the agreed service time per Reporting Period 
  • OT (Outage Time) is the duration of all system outages that occurred during the Agreed Service Time

  • aAST (adjusted Agreed Service Time ) is the duration of the agreed service time per Reporting Period  reduced by the overlapping part of Contractual Maintenance Periods and  the overlapping part of non-SLA relevant  planned downtimes
  • OT (Outage Time) is the duration of all system outages that occurred during the adjusted Agreed Service Time

Example

  • The agreed service time for a system is on work days from 9 am to 5 pm.
  • The reporting period is monthly. The current month has 21 work days.
  • A Contractual Maintenance Period is scheduled every 1st Friday of the month from 4 pm to 9 pm.
  • The customer has requested an additional planned downtime on the 2nd Friday from 2 pm to 6 pm for release upgrade. The planned downtime is outside the contractual maintenance period.Despite of this it  is not SLA - relevant because it was requested by customer.
  • A system outage occurred on the 2nd Tuesday from 1 pm – 6 pm.

Service Availability Management will calculate the monthly availability as follows:

Data Content
The duration of the agreed service time is AST = 21d*8h*60 min = 10080 min
The system outage lasted for 5 hours (300 minutes). But only 4 hours (240 minutes) are during AST Ot = 240 min
The system availability is calculated as follows

Availability (%) = (1 - 240 min / 10080 min) ∗ 100 % = 97.62 %