How prepared are you for a Major Incident?


Scroll Down
15-May-2018 15:19:11

In my last piece, I talked about Problem Management and how it is often confused with Major Incident management. In fact, while the two are linked, they are very different beasts. However strong your Problem Management, you must be ready for business critical incidents.

In my last piece, I talked about Problem Management and how it is often confused with Major Incident management. In fact, while the two are linked, they are very different beasts. However strong your Problem Management, you must be ready for business critical incidents.

Do you know a major incident when you see it?

Often the signs are quite obvious- the desk gets flooded with similar tickets from many users, critical business functions are being affected and steam is pouring from the ears of the management team. While these are obvious indicators, it isn't very helpful to use panic levels as your definition. ITIL requires that the IT team and the business agree on what constitutes a major incident- this will be based on severity, urgency and impact. ISO 20000 requires the following steps for major incident management:

  1. Agreement on what constitutes a major incident
  2. A distinct and separate procedure for ‘major’ vs other incidents
  3. An outline of responsibilities and responsible parties
  4. A defined review process

Let’s take a closer look at each of these requirements to help you create your Major Incident plan before you need it.


Agreeing on Major Incident definitions

What constitutes a major incident for one business might not for another. The same goes for business units within one business. At Plan-Net, we have SLAs set up to manage Incident classification that is specific to each client. If you run an in-house team, SLAs are just as important. Each department will likely need its own set of resolution times, resources and communication lines according to its needs and business function.


Major incident roles & responsibilities

Running around like a headless chicken during a Major Incident is not a good look. Roles and processes should be strictly defined before a Major Incident strikes. When kicking off your MI process, nothing quite beats a war-room style meeting by getting all relevant parties together and reminding them of their roles.

1) Major Incident Manager

He or she will be responsible for overseeing the major incident process, ensure that the appropriate resources are engaged and the users and management team are kept informed of the progress. Depending on the size of the IT team, it could be a Service Desk analyst or a more senior technical manager with knowledge specific to the incident type.

2) Problem Manager

While this resource will need to be involved, it should be a different person to the Major Incident Manager. A Problem Manager will be most useful after the resolution to help with root cause analysis but this can take time. The Incident Manager will be pushing for an immediate fix so that normal business can resume ASAP.

3) Service Desk

It goes without saying perhaps, but it must be decided how much of your Service Desk should be allocated to the Major Incident. In serious cases, it might be decided that it should be all hands on deck for the Major Incident and everything seemingly unrelated should go on hold.

4) Change manager

If major changes had to be implemented in order to restore service, your Change Manager will need to be involved.

5) SLA manager

Someone needs to be recording downtime and SLA misses so that this can be reported internally and to the customer or management teams.


There is one vital role missing from the above list...the customer!

Whether your customer is within your own business or a paying client, they need to be kept in the loop as much as possible. Your Incident Manager should be providing a quick and concise summary at least every hour - more frequently if possible.

Here are the main points to provide to the customer, it needn’t be war and peace but should include the following:

  • Short description of the cause of the downtime
  • Impact of the downtime
  • Estimated time of resolution

Creating a template in advance will help the MI manager keep to the point and ensure timely delivery of updates.

Root Cause Analysis

Once the incident has been resolved, you will need to produce a report on how it happened, why it happened and how to prevent it happening again. This is where your Problem Manager steps in. By working through the tickets you can perform an RCA. This should be checked against the solution used to get up and running again so that any loose ends are dealt with. A patched-together temporary solution may not be sufficient.

If you are experiencing repeated Major Incidents, go back to your Problem Management processes and look for root causes regularly.

If you didn’t get a chance to read my tips on Problem Management yet, you can find it here.

New call-to-action

Download our FREE
End User Support e-book

If you share any level of responsibility for delivering high quality It to your organisation, our FREE e-book ‘Happy Users, Easy Life’ is for you.

New call-to-action

Leave a Comment

Next step

Pete Canavan
Pete recommends our FREE webinar consultation

About the author

Pete Canavan

Pete Canavan is Support Services Director at Plan-Net. An accredited ITIL Service Manager, he has a proven track record in IT with special expertise in the Legal & Financial Services industries.

With two decades in the IT field, Pete has acquired extensive experience in business relationship development, service transformation, project and people management, training and client/supplier relations.

Pete's other passions, besides Plan-Net of course, are his family and football.

Email Pete:

Connect with Pete Canavan on LinkedIn

Talk to us today about Business Advantage IT

If you’d be interested in discovering how Plan-Net could help give your organisation Business Advantage IT, get in touch.

Did you find this article useful?
Sign up to receive more from Plan-Net