Monitor-generated events indicate the presence of device or resource anomalies. Alerting helps you interpret and act on events by providing a single aggregation and response system for all events. Events can also be produced by diagnostic and third-party tools.
Much of the work of interpreting and responding to an alert is automated. Alerting correlates related-cause alerts, automatically suppresses redundant alerts, notifies operators, and creates incident tickets for alerts that need attention.
The following figure shows the alert handling workflow:
Alert terminology
The following terms are used in alert management:
Term | Description |
---|---|
ID | Sequential number that uniquely identifies an alert or inference. In the alert list, the ID field also indicates the alert state using color-coding. |
Subject | Alert description summary, which includes metrics associated with the alert. |
Description | Brief description of the alert source and cause. This might include metrics with threshold crossings, monitor description, device type, template name, group, site, service Level, and component. |
Source | Platform or monitoring tool that generated the alert. |
Metric | Service name of a threshold-crossing alert. |
First Alert Time | Time when monitoring started for a resource. An alert is generated to provide notification that monitoring started for the resource. |
Alert Updated Time | Most recent alert time. Updated when an alert is unsuppressed manually or with the alert First Response policy. |
Elapsed Time | Elapsed time since the first alert was generated. |
Action/Status | Current alert status and most recent alert action. |
Last Updated Time | Time when alert status was last updated. |
Device Type | Device type associated with an alert. |
Resource | Resource name associated with the alert. |
Repeated Alerts | Count of the number of duplicate alerts generated by the resource. |
Incident ID | Unique incident ID associated with the alert. Alerts are associated with incidents by:
|
Entity Type | Category of the source that generated the alert:
|
Alert lifecycle
The alert lifecycle describes alert status transitions, from Open status to Closed status, as a result of actions applied to the alert.
Alert action
The following actions can be applied to an alert:
Action | Description |
---|---|
Acknowledge | A received alert needs to be acknowledged. After you acknowledge the alert, a comment is displayed as Acknowledged and includes the user name. From the Actions dropdown on the slide-out, click Acknowledge. |
Create Incident | A ticket can be created for the generated alert, assigning users and setting the priority. After an incident is created, the status of the alert changes to Ticketed and the incident ID along with the details is displayed. |
Attach And Update Incident | Map an alert to an existing ticket or update the ticket with the alert contents. This action is generally used to update the same ticket with related alerts. |
Attach Incident | Map an alert to an existing ticket without updating the ticket with the alert contents. |
Suppress | Suppress the current alert and all duplicate alerts. A new alert of the same type is displayed as a fresh alert and not as a duplicate alert. The status of the alert changes to Suppressed. The Suppress for setting suppresses alerts for a specified time interval. If a repeated alert occurs when the alert is in snoozed state, the alerts repeat count increments and the snooze duration is reset based on the repeated alert attributes. |
Unacknowledge | Undo the Acknowledge action taken on an alert. Example: If a solution did not address a specific problem, unacknowledge the alert. The status of the alert changes to Open or Ticketed provided an incident ID is associated with the alert. |
Unsuppress | Undo a Suppress action taken on an alert. The status of the alert changes to Open or Ticketed provided an incident ID is associated with the alert. |
Close | Close an alert when an issue is solved and the alert is resolved. The alert state changes to OK. | Heal | There is an option called Heal. When the user selects the Heal action, an OK alert with the identical properties is created, giving the appearance that the alert has been healed. This will make it easier for users to manually fix the alert.This option is applicable to heal the alert for critical and warning types of alerts even if they are in any action. We do not have the option to perform heal action on multiple alerts simultaneously, but can perform heal alert action only on one alert at a time. |
For correlated alerts, an action can be performed on the entire inference, but not on a single alert.
Alert status
Alert status describes a logical condition of an alert within the alert lifecycle. Alert status should not be confused with alert state, which can be critical, warning, or OK.
Both automatic and manual alert actions can cause an alert status change, as shown in the following figure:
Status | Description |
---|---|
Open | The initial alert status is Open. |
Correlated | Alert correlation processing changes the alert status to Correlated. Alerts correlated to an inference have a Correlated status and subsequently inherit the inference alert status. Correlated alerts do not change status independently but transition with the associated inference alert status. Suppress and Acknowledge actions can be applied to an inference alert and the correlated alert logically inherits the associated inference alert status but the alert, itself, retains a Correlated status. Therefore, you do not need to suppress a correlated alert because the Correlated status is a final status for alerts that are part of an inference. |
Ticketed | The Create Incident action transitions open alerts to a Ticketed status.A Ticketed alert retains a Ticketed status even if an Unacknowledge or Unsuppress action is applied. |
Acknowledged | Acknowledged alerts are set to an Acknowledged status. |
Suppressed | Suppressed alerts are set to a Suppressed status. |
Closed | The Closed status is a final alert status. Alerts can be closed manually only when the alert is in the OK state. |
You can monitor alert status in the Alert Details page comments section:
After waiting until the problem no longer is displayed as an alert the alert is placed in the OK state. In the OK state, if the same alert reoccurs, a new alert is created. Otherwise, the repeat count is incremented for the alert.