ITIL Intermediate OSA - Event Management Tutorial

Welcome to lesson 2 ‘Event Management’ of the ITIL Intermediate OSA Tutorial, which is a part of the ITIL Intermediate OSA Certification Course. In this module, we will discuss the purposes, objectives, scope, activities, key concepts, triggers, inputs & outputs, challenges, risks, CSFs and KPIs of Event Management process.

Let us begin with the objectives of this lesson.

Objectives

By the end of this ‘Event Management’ lesson, you will be able to:

  • Explain the event management interpretation and analysis, principles, techniques, relationships, and application of these.

  • Explain the design strategy, components, activities, and operations including organizational structure and interfaces with other processes.

In the next section, we will look into the purpose of Event Management.

Preparing for a career in ITIL Intermediate OSA? Check out our Course Preview on ITIL OSA here!

What is the purpose of Event Management?

The purpose of event management is to manage events throughout their lifecycle. This lifecycle of activities to detect events, make sense out of them and determine if the appropriate control action is coordinated by the event management process.

Event Management is, therefore, the basis for operational monitoring and control. If events are programmed to communicate operational information as well as warnings and exceptions they can be used as a basis for automating many routine operations management activities.

We now know the purpose of Event Management. Let us look into the objective and scope.

Objectives of Event Management

The Objective of Event Management is to detect all changes of state that have significance for the management of a Configuration Items (referred as CI) or IT Service and determine the appropriate control actions for events and ensure these are communicated to the appropriate functions.

They provide the trigger or entry point for many Service Operation processes and also provide the means to compare actual operating performance and behavior against design standards and SLAs.

We have looked at the purpose and objectives of event management. Let's learn about its scope.

What is the scope of Event Management?

Event Management could be used on any aspect of Service Management that requires controls and which can be automated like:

  • Configuration Items

  • Environmental Conditions

  • Software licenses

  • Security and

  • Normal activity.

When we are talking about configuration management some CIs will be included because they need to stay in a constant state (e.g., a router on a network needs to be in an active state. Event Management tools confirm this by monitoring responses to “status requests”).

Other CIs will also be included as their status often changes and Event Management can be used to automate this and update the CMS (e.g., the updating of a file server). Environmental conditions such as a place catching fire or detecting any smoke can also be automated using Event Management Process.

Software license monitoring for usage to ensure legal license utilization also comes under the scope of Event Management. Firewall Breach is a major security issue which can be addressed through event management and normal activities like information on mainframe utilization; batch job completion etc. also comes under its scope. Is Event management similar to monitoring?

Let us learn about how event management is different from monitoring in the next section.

Event Management vs. Monitoring

How is Event management different from monitoring? Event Management and Monitoring are the two areas which are very closely related, but slightly different in nature.

Event Management is focused on generating and detecting meaningful notifications about the status of the IT infrastructure and services.

While it is true that Monitoring is required to detect and track these notifications, Monitoring is broader than Event Management. For example, Monitoring tools will check the status of a device to ensure that it is operating within acceptable limits, even if that device is not generating events.

In simple terms, Event Management works with occurrences that are specifically generated to be monitored. Monitoring tracks these occurrences, but it also actively seek out conditions that do not generate events.

Like Service Management lifecycle, does Event management add value to the business? Let us know in the next section.

Event Management - Value to the Business

Event Management provides mechanisms for early detection of incidents. In many cases, it is possible for the Incident to be detected and assigned to the appropriate group for action before any actual service outage occurs.

Event Management makes it possible for some types of automated activity to be monitored by exception-thus removing the need for expensive and resource intensive real-time monitoring while reducing downtime.

When integrated into other Service Management processes (e.g., Availability or Capacity Management), Event Management can signal status changes or exceptions that allows the appropriate team to perform early response actions, thus improving the performance of the process.

This will allow the business to benefit from more effective and more efficient overall Service management.

Event Management provides a basis for automated operations, increasing efficiencies and allowing expensive human resources to be used for more innovative work, such as designing new or improved functionality or defining new ways in which the business can exploit technology for increased competitive advantage.

Let us now learn about the policies of Event management.

Event Management - Policies

Every process should have a set of policies. Here we will discuss the policies defined for event management process.

Event notifications should only go to those responsible for the handling of their further actions or decisions related to them. This avoids needless notifications to those not directly involved in processing events.

Event Management and Support should be centralized as much as reasonably possible. This avoids conflicts in the management of events

All Application events should utilize a common set of messaging and logging standards and protocols wherever possible. This allows for consistent handling of events. It can also result in faster implementation of new events and their handling actions as well as establish common expectations for how events will be recognized and handled.

Event handling actions should be automated wherever possible. This eliminates potential incidents that could be caused by human error.

A standard classification scheme should be in place that references common handling and escalation processes. This supports a consistent approach for taking actions on events in a manner that supports operational and service level objectives.

All recognized events should be captured and logged. This will provide a means for examining incidents, problems and trends after events have occurred.

In order to analyze the trends and problems, we must have an understanding of different types of events. Let us proceed to learn them.

Event Management - Principles and Basic Concepts

What are the types of events that help us analyze the trends/problems in event handling?

There are different types of events like:

Informational

Events that signify regular operation like -notification that a scheduled workload has completed, a user has logged in to use an application and an e-mail has reached its intended recipient are considered to be informational events.

Exception

Exception event is an event which signifies that something has failed. Example:

  • A user attempts to log on to an application with the incorrect password.

  • An unusual situation has occurred in a business process that may indicate an exception requiring further business investigation (e.g, a web page alert indicates that a payment authorization site is unavailable-impacting financial approval of business transactions).

  • A device’s CPU is above the acceptable utilization rate.

  • A PC scan reveals the installation of unauthorized software.

Warning

Events that signify unusual, but not exceptional events, are considered to be a warning. These situations may require closer monitoring. In some cases the condition will resolve itself, for example, in the case of an unusual combination of workloads-as they are completed, normal operation is restored.

In other cases, operator intervention may be required if the situation is repeated or if it continues for too long. These rules or policies are defined in the Monitoring and Control Objectives for that device or service.

For example:

  • A server’s memory utilization reaches within 5% of its highest acceptable performance level or

  • The completion time of a transaction is 10% longer than normal.

Next, we will learn about the key concepts of Event management.

Event Management - Key Concepts

The flow of the Event management process can be described as follows:

Step 1 - Occurrence of an event

Events occur continuously, but not all of them are detected or registered. It is, therefore, important that everybody involved in designing, developing, managing and supporting IT services and the IT infrastructure that they run on understands what types of events need to be detected.

Most Cls are designed to communicate certain information about themselves in one of the two ways:

  • Once a device is interrogated by a management tool, which collects certain targeted data. This is known as polling.

  • Secondly when the CI generates a notification when certain conditions are met.

The ability to produce these notifications has to be designed and built into the CI. A general principle of event notification is that the more meaningful the data it contains and the more targeted audience, it is easier to make decisions about the event.

Step 2 - Notification Generated

Service Design processes should define which events need to be generated and then specify how this can be done for each type of CI. During Service Transition, the event generation options would be set and tested.

Step 3 - Detected

Once generated, the event will be detected by an agent running on the same system, or transmitted directly to the management tool.

Step 4 - Logged

There should be a record of the event and any subsequent actions. The event can be logged as an event record in the event management tool or it can simply be left as an entry in the system log of the device or application that generated the event.

Step 5 - First Level Event Correlation and Filtering

The purpose of first level event correlation and filtering is to decide whether to communicate the event to a management tool or to ignore it. If ignored, the event will usually be recorded in a log file on the device, but no further action will be taken.

Step 6 - Significance

Every Organization will have its own categorization of the significance of an event, but it is suggested that at least three broad categories be represented. And they are information, warning, and exception.

Willing to take up a course in ITIL Intermediate OSA? Check out our ITIL OSA Course Preview!

Step 7 - Second Level Correlation

Correlation is normally done by a ‘Correlation Engine’, usually part of a management tool that compares the event with a set of criteria and rules in a prescribed order. These criteria are often called Business Rules, although they are generally fairly technical.

The idea is that the event may represent some impact on the business and the rules can be used to determine the level and type of business impact.

Examples of what Correlation Engines will take into account include:

  • Number of similar events (e.g. this is the third time that the same user has logged in with the incorrect password, a business application reports that there has been an unusual pattern of usage of a mobile telephone that could indicate that the device has been lost or stolen)

  • Number of CIs generating similar events

  • Whether a specific action is associated with the code or data in the event

  • Whether the event represents an exception

  • A comparison of utilization information in the event with a maximum or minimum standard (e.g. has the device exceeded a threshold?)

  • Whether additional data is required to investigate the event further, and possibly even a collection of that data by polling another system or database

  • Categorization of the event

  • Assigning a priority level to the event.

Step 8 - Further Action required?

Next step of the process flow will focus on the understanding of whether any Further Action is required or not. If the Second level correlation activity recognizes an event, a response will be required.

Examples of Responses:

  • Generating a record in the incident management system

  • Generating an RFC

  • Scripts that execute specific actions

  • Paging systems that will notify a person or team of the event by mobile phone

  • Database actions that restrict access of a user to specific records or fields.

Step 9 - Response Selection

At this point in the process, there are a number of response options available. It is important to note that the response options can be chosen in any combination. For example, it may be necessary to preserve the log entry for future reference, but at the same time escalate the event to an Operations Management staff member for action.

Some of the options available are the auto-response and Alert or human intervention.

Auto Response: Some events are understood well enough that the appropriate response has already been defined and automated.

This is normally as a result of good design or of previous experience (usually Problem Management). The trigger will initiate the action and then evaluate whether it was completed successfully. If not, an Incident or Problem Record will be created.

Alert and Human Intervention: If the event requires human intervention, it will need to be escalated. The purpose of the alert is to ensure that the person with the skills appropriate to deal with the event is notified.

The alert will contain all the information necessary for that person to determine the appropriate action – including reference to any documentation required (e.g. user manuals). Incident, problem or change? Some events will represent a situation where the appropriate response will need to be handled through the Incident, Problem or Change Management process.

These are discussed below, but it is important to note that a single incident may initiate any one or a combination of these three processes:

  • Open an RFC

  • Open or Link to a problem

  • Record Just before closing the loop

Step 10 - Review Actions

Review Actions should be done. In many cases, this can be done automatically, for example polling a server that had been rebooted using an automated script to see that it is functioning correctly. In the cases where events have initiated an incident, problem and/or change, the Action Review should not duplicate any reviews that have been done as part of those processes.

Rather, the intention is to ensure that the handover between the Event Management process and other processes took place as designed and that the expected action did indeed take place. This will ensure that incidents, problems or changes originating within Operations Management do not get lost between the teams or departments.

The Review will also be used as input into continual improvement and the evaluation and audit of the Event Management process.

Close Events: Some events will remain open until a certain action takes place, for example, an event that is linked to an open incident. However, most events are not ‘opened’ or ‘closed’. In the case of events that generated an incident, problem or change, these should be formally closed with a link to the appropriate record from the other process.

Though the picture in a very simple way represents the flow of the Event management concept, it is very elaborate in nature. Hence the concept of significance is further divided into three events.

In the next slide, let’s learn about the Triggers of Event Management.

Event Management Triggers

Event Management can be initiated by any type of occurrence. The key is to define which of these occurrences is significant and which need to be acted upon.

Triggers can be exceptions to:

  • Any level of CI performance defined in design specifications, OLAs or SOPs (Standard Operating Procedures)

  • An exception to an automated procedure or process, e.g., a routine change that has been assigned to build a team has not been completed in time.

  • An exception to a business process that is being monitored by Event Management

  • The completion of an automated task or job

  • A status change in a device or database record

  • Access to an application or database by a user or automated procedure or job

  • A situation where a device, database or application, etc. has reached a predefined threshold of performance.

So far we have learned about the concepts, events, and triggers of Event Management. Now we will look at the inputs and outputs of Event Management.

Event Management Inputs and Outputs

Every process has an input as well as an output. Here we will discuss the inputs for Event Management Process.

Inputs for this process can be:

  • Operational and Service Level Requirements

  • Alarms, alerts, and thresholds for recognizing events

  • Event Correlation tables, codes and automated response

  • Operational Procedures and likewise many others

Outputs at the same time can be:

  • Events that have been communicated and escalated

  • Event Logs

  • Events that indicate an incident has occurred

  • Events that indicated breaching of SLAs

  • Events and alerts that indicate completion status of deployment and Populated SKMS with event information and history.

Next, we will learn about event management interfaces.

Event Management Interfaces

By Interfacing with other processes, we mean how Event Management as a process talks to other processes or interacts with other processes Event Management can interface to any process that requires monitoring and control.

Event Management Interface with business applications and/or business processes to allow potentially significant business events to be detected and acted upon Event Management Incident, problem and Change Management have a close relationship with Event Management.

For example, a single Incident may initiate any one or a combination of these three processes, such as a non-critical server failure is logged as an incident. However, as there is no workaround, a Problem Record is created to determine the root cause and resolution. Additionally, an RFC is logged to relocate the workload onto an alternative server while the Problem is resolved.

Event Management also interfaces with Capacity and Availability Management that are critical in defining what events are significant, what appropriate thresholds should be and how to respond to them.

Configuration Management can use events to determine the current status of any CI in the infrastructure and that is how they interface with event management.

Asset Management can use Event Management to determine the lifecycle status of assets. The event can be a rich source of information that can be processed for inclusion in Knowledge Management systems.

We will learn about information management in detail in the next section.

Information Management

Information plays a vital role in the successful functioning of any organization. Communication and management of this information is a process in itself. In this slide, we will learn about Information involved in Event management.

The key information involved in Event Management includes SNMP messages, which are a standard way of communicating technical information about the status of components of an IT infrastructure and Management Information Bases (MIBs) of IT devices.

A MIB is a database on each device that contains information about that device, including its operating system, BIOS version, the configuration of system parameters, etc. The ability to interrogate MIBs and compare them to a norm is critical to being able to generate events.

Vendor’s monitoring tools agent software and Correction engines contain detailed rules to determine the significance and appropriate response to events. There is no standard Event Record for all types of events.

The exact contents and format of the record depending on the tools being used and what is being monitored (e.g., a server and the Change Management tools will have very different data and probably use a different format).

However, there is some key data that is usually required for each event to be useful in analysis. It should typically include Device, Component, Type of Failure, Date/Time, Parameters in Exception and Value.

Let us now proceed to learn about the Event Management Metrics, Challenges, and Risks.

Event Management - Metrics

It is important to measure and report on individual components, particularly around their availability, reliability, and performance. Hence metrics play a vital role in measuring the performance of any process.

Similarly, Event Management metrics can be used to feed into the overall end-to-end service measurement.

Specialized Event Management software can perform event correlation, impact analysis and root cause analysis for all events. These events can be interpreted to isolate and report on the true cause and impact.

Event Management data provides a cost-effective method to improve the reliability, efficiency, and effectiveness of the IT infrastructure.

Example of Event Management metrics can include:

  • Number of events by category

  • Number of events by the significance

  • Number and percentage of events that required human intervention

  • Number and percentage of events that resulted in Incidents or changes

  • Number and percentage of events caused by existing Problem or Known Errors

In the next section, we will discuss the challenges and risks faced in event management.

Event Management - Challenges and Risks

Challenges and risks are implied with all the processes. Mitigating the challenges and risks are one of the most critical success factors for any process implementation.

Challenges:

There are a number of challenges that might be encountered with the implementation of Event Management, such as:

  • It may be difficult to obtain the necessary funding for the tools and effort required implementing and developing the tool before benefit realization.

  • One of the most common challenges is setting the correct level of filtering. Setting the level of filtering incorrectly can result in either being flooded with relatively insignificant events or not being able to detect relatively important events until it is too late.

  • Rolling out of the necessary monitoring agents across the entire IT infrastructure may be a difficult and time-consuming activity requiring an ongoing commitment over quite a long period of time – there is a danger that other activities may arise that could divert resources and delay the rollout.

  • Acquiring the necessary skills can be time-consuming and costly.

Risks related to Event Management include:

  • Failure to obtain adequate funding

  • Ensuring the correct level of filtering

  • Failure to maintain momentum in rolling out necessary monitoring agents across the IT Infrastructure.

In the next slide, we will look at the Critical success factors (CSF) and Key performance indicators (KPI) of Event Management.

Event Management - CSFs and KPIs

Each organization should identify appropriate CSFs or Critical Success Factors based on its objectives for the process. Each sample CSF is followed by a small number of typical KPIs or Key Performance Indicators that support the CSF.

These KPIs should be adopted with careful consideration. Each organization should develop KPIs that are appropriate for its level of maturity, its CSFs and its particular circumstances. Achievement against KPIs should be monitored and used to identify opportunities for improvement.

They should be logged in the continual service improvement (CSI) register for evaluation and possible implementation.

The following table lists some CSFs and their corresponding KPI:

CSF

KPI

Detecting all changes of state that have significance for the management of CIs and IT services

  • Number and Ratio of events compared with the number of incidents

  • Number and percentage of each type of event per platform or application versus the total number of platforms and applications underpinning live IT services

Providing the trigger or entry point for the execution of many service operation processes and operations management activities

  • Number and percentage of events that required human intervention and whether it was performed

Provide the means to compare actual operating performance and behavior against design standards and SLAs

  • Number and percentage of incidents that were resolved without impact to the business

  • Number and percentage of events that resulted in incidents or changes

  • Number and percentage of events caused by existing problems or known errors

The next section talks about the Event Management – Design.

Are you curious to know what ITIL Intermediate OSA Certification is all about? Watch our Course Preview for free!

Event Management - Design

Design is working on the mechanism of identifying errors, defects and at the same time developing software applications to overcome such events. Let us see what it means in Event management.

Instrumentation is what can be monitored about CIs and the way in which their behavior can be affected. Instrumentation is partly about a set of decisions that need to be made and partly about designing mechanisms to execute these decisions.

Decisions that need to be made include:

  • What needs to be monitored?

  • What type of monitoring is required?

  • When do we need to generate an event?

  • Who are the messages intended for?

Mechanisms that need to be designed include:

  • How will events be generated?

  • Does the CI already have event generation mechanisms as a standard feature and, if so, which of these will be used?

  • Are they sufficient or does the CI need to be customized to include additional mechanisms or information?

  • What data will be used to populate the Event Record?

Error messaging is vital for all components (hardware, software, networks, etc.). It is important that all software applications are designed to support Event Management. This might include the provision of meaningful error messages and/or codes that clearly indicate the specific point of failure and most likely cause.

Detection and Alert is a Good Event Management design will also include the design and population of the tools used to filter, correlate and escalate events.

Examples of detection and alert mechanisms are:

  • Knowledge of who is going to be supporting the CI

  • Knowledge of what constitutes normal and abnormal operation of the CI

  • Knowledge of the significance of multiple similar events (on the same CI or various similar CIs).

Thresholds themselves are not set and managed through Event Management. However, unless these are properly designed and communicated during the instrumentation process, it will be difficult to determine which level of performance is appropriate for each CI.

The next section talks about the use of Even rule sets and correlation engines.

Use of Event Rule Sets and Correlation Engines

A rule set consists of several rules that define how the event messages for a particular event will be processed and evaluated.

For example, a warning event may be generated each time a disk log file reaches its capacity, but an exception event will be generated if more than four warning event has been generated.

Rules themselves are typically embedded into monitoring and event handling technologies. They consist of Boolean kinds of algorithms to correlate events that have been generated in order to create additional events that need to be communicated.

These algorithms can be codified into event management software typically referred to as correlation engines.

So, to work on such design what is the technology involved? Let us study about it in the next slide.

Event Management - Technology

Here are the features that are desirable for any Event Management technology:

  • Multi-environmental, open interface to allow monitoring and alerting across heterogeneous services and an organization’s entire IT Infrastructure.

  • Easy to deploy, with minimal set-up cross.

  • “Standard” agents to monitor most common environment/components/systems

  • Open interface to accept any standard (e.g., SNMP) event input and generation of multiple alerting.

  • Centralized routing of all events to a single location, programmable to allow different location(s) at various times.

  • Support for design/test phases – so that new application/services can be monitored during design/test phases and result fed back into the design and transition.

  • Programmable assessment and handling of alerts depending upon symptoms and impact.

  • The ability to allow an operator to acknowledge an alert, and if no response is entered within a defined timeframe, to escalate the alert.

  • Good reporting functionality to allow feed-back into the design and transition phases as well a meaningful management information and business user “dashboard”.

  • Such technology should allow a direct interface into the organization’s Incident Management processes (via entry into the Incident Log), as well as the capability to escalate to escalate to support staff, third-party suppliers, engineers etc. via e-mail, SMS messaging, etc.

With this, we come to the end of this module.

Summary

In this module, we focused on the purpose, objectives, scope, activities, key concepts, categories of event, triggers, CSFs and KPIs of Event management.

The next lesson talks about Incident Management.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Name*
Email*
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Email*
Phone Number*
Company*
Job Title*