Problem Management is viewed by many organisations as a secondary function with the importance of its benefits often not being fully understood.
In a majority of firms you will see responsibility for Incident Management assigned, or indeed a dedicated Incident Manager. However responsibilities for Problem Management are often still overlooked or watered down.
All too often IT Teams are left firefighting issues retrospectively or having a high number of Incidents because they don’t have an effective Problem Management process in place.
There is a great deal of confusion over what constitutes a ‘problem’. Confusing problems with Incidents or Major Incidents is a common cause of inefficiency and business critical IT failures. At Plan-Net, we follow Best Practice Service Management principles to help define the terms and the processes around both Problem and Incident management.
To help you discern the difference, it is best to think of a Problem as an illness and Incidents as symptoms of a Problem. A Major Incident differs further in that it impacts a lot of people at the same time – a system failure for example. Seeing a repeating set of similar Incidents may be an indication that something is deteriorating and might eventually fail completely, therefore causing a Major Incident.
Successful Problem Management is upheld by 3 essential pillars - people, process and technology. In order to ensure readiness for Problem resolution, you need to have decided on a framework, roles and responsibilities and how to manage IT assets in advance:
- People: Identify, train and utilise your Infrastructure & Application teams and leaders. Such individuals will be paramount in identifying and ultimately removing Known Errors from your environment and therefore it is essential they understand the fundamentals and objectives of Problem Management
- Process: Use a Best Practice aligned Problem Management process that is tried and tested. ITIL remains widely regarded as the de facto standard for Service Management. Ensure ownership and responsibility for Problems; and their resolutions; are accurately defined, often achieved through a RACI Matrix
- Technology: Ensure it is sufficient to enable you to accurately identify, manage and report on Problems, including their links into Change / Release Management and Knowledge Management.
For some organisations, not understanding the cost of poor Problem Management is a major factor. By analysing the frequency of problems and looking at Incident root causes, you will not only be able to find areas for improvement in your Problem Management process but can also put financial numbers to the problems that have caused user downtime or major Incidents. This, in turn, justifies the time and resources required and demonstrates a clear ROI.
Analysts must follow strict categorisation of Incidents in your service desk system. All too often, Incidents are mislabeled. This obviously causes issues when proactively looking for trends that could cause a major Incident and makes retrospective root-cause analysis equally difficult.
All too often, Incidents find themselves ignored in an ‘Incident hospice'. A lack of discipline among analysts can lead to Incidents being excused because they are assumed to belong to a bigger Problem – which itself never gets fixed. Aside from instilling the discipline to act upon recurring problems, it is important to give analysts the right tools to label tickets so that they do get flagged and recurring issues have a chance to be cured. Once again ownership is key.
Implementing and sticking to a solid Problem Management process will save your organisations time and money in the long run. In our next piece, we will talk about Incident Management. I hope that the tips above provide you with thoughts on how you could reduce the number and/or frequency of Major Incidents in the first place but being prepared for Incidents is just as crucial, no matter how robust your Problem Management process.