Monitor-generated events indicate the presence of device or resource anomalies. Alerting helps you interpret and act on events by providing a single aggregation and response system for all events. Events can also be produced by diagnostic and third-party tools.

Much of the work of interpreting and responding to an alert is automated. Alerting correlates related-cause alerts, automatically suppresses redundant alerts, notifies operators, and creates incident tickets for alerts that need attention.

The following figure shows the alert handling workflow:

Event Management

Alert terminology

The following terms are used in alert management:

TermDescription
IDSequential number that uniquely identifies an alert or inference. In the alert list, the ID field also indicates the alert state using color-coding.
SubjectAlert description summary, which includes metrics associated with the alert.
DescriptionBrief description of the alert source and cause. This might include metrics with threshold crossings, monitor description, device type, template name, group, site, service Level, and component.
SourcePlatform or monitoring tool that generated the alert.
MetricService name of a threshold-crossing alert.
First Alert TimeTime when monitoring started for a resource. An alert is generated to provide notification that monitoring started for the resource.
Alert Updated TimeMost recent alert time. Updated when an alert is unsuppressed manually or with the alert First Response policy.
Elapsed TimeElapsed time since the first alert was generated.
Action/StatusCurrent alert status and most recent alert action.
Last Updated TimeTime when alert status was last updated.
Device TypeDevice type associated with an alert.
ResourceResource name associated with the alert.
Repeated AlertsCount of the number of duplicate alerts generated by the resource.
Incident IDUnique incident ID associated with the alert. Alerts are associated with incidents by:
  • Manually creating an incident.
  • Escalating an alert as an incident.
Entity TypeCategory of the source that generated the alert:
  • Resource: Alerts originating from managed resources.
  • Service: Service mapped to the resources.
  • Integration: Alerts originated by monitoring the installed integrations.
  • Client: Alerts not generated by monitoring but which are a representation of logical clustered alerts. For example, correlated or grouped inference alerts or RCA alerts based on resource dependency mapping).

Alert lifecycle

The alert lifecycle describes alert status transitions, from Open status to Closed status, as a result of actions applied to the alert.

Alert action

The following actions can be applied to an alert:

ActionDescription
AcknowledgeA received alert needs to be acknowledged. After you acknowledge the alert, a comment is displayed as Acknowledged and includes the user name. From the Actions dropdown on the slide-out, click Acknowledge.
Create IncidentA ticket can be created for the generated alert, assigning users and setting the priority. After an incident is created, the status of the alert changes to Ticketed and the incident ID along with the details is displayed.
Attach And Update IncidentMap an alert to an existing ticket or update the ticket with the alert contents. This action is generally used to update the same ticket with related alerts.
Attach IncidentMap an alert to an existing ticket without updating the ticket with the alert contents.
SuppressSuppress the current alert and all duplicate alerts. A new alert of the same type is displayed as a fresh alert and not as a duplicate alert. The status of the alert changes to Suppressed. The Suppress for setting suppresses alerts for a specified time interval. If a repeated alert occurs when the alert is in snoozed state, the alerts repeat count increments and the snooze duration is reset based on the repeated alert attributes.
UnacknowledgeUndo the Acknowledge action taken on an alert. Example: If a solution did not address a specific problem, unacknowledge the alert. The status of the alert changes to Open or Ticketed provided an incident ID is associated with the alert.
UnsuppressUndo a Suppress action taken on an alert. The status of the alert changes to Open or Ticketed provided an incident ID is associated with the alert.
CloseClose an alert when an issue is solved and the alert is resolved. The alert state changes to OK.
HealThere is an option called Heal. When the user selects the Heal action, an OK alert with the identical properties is created, giving the appearance that the alert has been healed. This will make it easier for users to manually fix the alert.

This option is applicable to heal the alert for critical and warning types of alerts even if they are in any action. We do not have the option to perform heal action on multiple alerts simultaneously, but can perform heal alert action only on one alert at a time.

For correlated alerts, an action can be performed on the entire inference, but not on a single alert.

Alert status

Alert status describes a logical condition of an alert within the alert lifecycle. Alert status should not be confused with alert state, which can be critical, warning, or OK.

Both automatic and manual alert actions can cause an alert status change, as shown in the following figure:

Alert status
StatusDescription
OpenThe initial alert status is Open.
CorrelatedAlert correlation processing changes the alert status to Correlated.
Alerts correlated to an inference have a Correlated status and subsequently inherit the inference alert status. Correlated alerts do not change status independently but transition with the associated inference alert status. Suppress and Acknowledge actions can be applied to an inference alert and the correlated alert logically inherits the associated inference alert status but the alert, itself, retains a Correlated status. Therefore, you do not need to suppress a correlated alert because the Correlated status is a final status for alerts that are part of an inference.
TicketedThe Create Incident action transitions open alerts to a Ticketed status.
A Ticketed alert retains a Ticketed status even if an Unacknowledge or Unsuppress action is applied.
AcknowledgedAcknowledged alerts are set to an Acknowledged status.
SuppressedSuppressed alerts are set to a Suppressed status.
ClosedThe Closed status is a final alert status. Alerts can be closed manually only when the alert is in the OK state.

You can monitor alert status in the Alert Details page comments section:

Track Actions on an Alert

After waiting until the problem no longer is displayed as an alert the alert is placed in the OK state. In the OK state, if the same alert reoccurs, a new alert is created. Otherwise, the repeat count is incremented for the alert.