Problem Management is the teenager in the ITSM household: No one understands it or appreciates it. It’s often left sulking in its bedroom whilst the other more popular activities like Incident and Request and Change are all downstairs laughing, eating dinner, and watching TV together.
As I write this article, I’m thinking of the many people I encounter who are caught up in the ITSM current, not formally trained (and, there’s nothing wrong with that), but swept along into a world of ITIL acronyms and words with new meanings. For these good people, a Problem always was, you know, just a problem. Something that needed fixing. A Ticket, needing some work to fix it. Then along came that ITIL/ITSM thing and people started talking about Incidents and Problem as if they are somehow different things.
Now I’m a quiet fan of ITIL even though it’s a bit like the United Nations: big, bureaucratic, got many weaknesses, and some people love to criticize it, but overall it does good and the world would be a much worse off without it. Among the many good things ITIL brought us was a formal agreed standard set of words, phrases and definitions to IT. However, the one area of misunderstanding that is repeatedly made worse by ITIL is the use of that word Problem as a distinct and separate item. It’s very confusing to those not deep in ITSM. Now I’m not about so suggest an alternative, but if we are to understand the potential of Problem Management we need to agree what the words actually mean.
So let’s tackle some definitions. Don’t panic, I’ll make it as gentle as possible, and it’s in my own words and my opinion, without quoting manuals or best practice guides. Sometimes it’s easier that way.
When an end user contacts the Service Desk (or help desk if you prefer) they don’t have a Problem. When IT fixes something that is broken and gets end users/customers working again, they haven’t resolved a problem. Something going wrong is not a problem. Someone being unable to work or having an error or needing help is not a problem. A Major Incident is not a problem (it’s an incident with very high impact). Being unable to get to the Intranet is not a problem. These are all incidents. Incidents are things that stop people working. Resolving them gets people working again.
A Problem is the underlying cause of multiple or repeating or significant incidents. The Problem might be what caused something to go wrong (again), or why someone—and many others—were unable to work, or what led to the major incident that happens every Friday at midnight. An Incident is when someone is unable to connect to the Intranet. The fact the Intranet fails every Monday morning at 9:00 a.m. is a problem.
A Problem is what makes those incidents happen. The purpose of Problem Management is to find the cause, work out what needs to be done to make that cause go away, and stop incidents happening in the future. So Progressing Problems is about identifying an unknown root cause usually of multiple incidents, and identifying the change required to stop those incidents occurring again. The best result of a Problem investigation is a Change Request. But the impact of that Change on IT and the end users can be huge. Changes created from Problem Management can radically alter the work environment.
Solving Problems is about analysis, diagnosis, investigation, and knowledge.
I’m a big advocate of the scientific, skeptical, and evidence-based approach to life. Problem Management is an exciting example of that approach in an IT context. It’s about working through information and identifying trends. Proposing theories and working to produce the evidence to prove the theory. Problem Managers are the scientists of ITSM. And that’s not the most boring of science either. A good Problem Manager are the sexy rock star of ITSM, IT’s version of building the Large Hadron Collider for IT, innovating and transforming IT knowledge. A good Problem solution will make your IT end users more productive tomorrow.
So Problem Management is research and evidence based. It’s also weird in a sub-atomic particle way in that, no matter how compelling the evidence, a Problem cannot exist until you decide it is a Problem. If you are “happy” that 20% of your end-users have a slow login every Monday morning, then that’s not a Problem because you have decided it is not a Problem. If that slow login is impacting your delivery of IT service and customer satisfaction, then you might decide that it is a Problem after all.
Then—still in a weird way—Problem Management serves no purpose in the present, it’s all about changing the future. The diagnosis of a problem leads to Changes which affect the quality if your IT Service in the future. Work now for a better life tomorrow.
Now let’s summarize: Lots of people misunderstand the meaning of the word Problem. It doesn’t exist unless you say it does, and it only affects the future not the present. No wonder many IT departments feel their resources are better allocated fighting fires and keeping the lights on!
But without scientific research we would have no innovation and improvement in life. Without Problem Management the same can be said of IT. If you will permit me to vent, why would you not want to reduce the number of incidents impacting your end-users? Why would you not want to improve what you do? If you continue to put all your effort into firefighting, you’re eternally putting fires out. But the fire starter is still out there somewhere so the fires will keep occurring.
If I’ve helped explain what I mean by Problem Management, and I’ve hopefully described the value in this work, let’s finish with a few pointers to getting started in Problem Management.
Step 1: Recognize that you have to dedicate some time and resource to working on resolving the root cause of those fires. Make time. Meet with the IT team and explain the objective. Identify one or multiple people who will help make this happen. A journey of a thousand steps starts with the willingness to take one step. Take this one.
Step 2: How can you even see your problems? This one’s a bit tricky. Start with what you’ve already got. You need to take time to analyze the work you have been doing and the incidents you receive over time in order to spot patterns. Some will be really obvious while others are hidden. Your Service Desk data should help you understand your peaks, the most common incidents, and the most frequently impacted services. This leads to an essential question: Are you capturing the right data? I’ve seen many Service Desk’s designed to capture volumes of data but not extract any meaningful information. Often deciding the right data to capture is the most daunting step. This is why categories are so important, both on creation of incidents and also on resolution and closure.
Here’s a handy hint I picked up from one customer: Put one or two simple drop down lists on your incident resolution window to allow the resolver to identify what caused the incident. For example:
1. User Error (indicating education / knowledge / communication)
2. IT Change (indicating IT change procedure weaknesses)
3. Fault (indicating unreliability)
4. Limit of Configuration/Permission/Restriction (indicating need extending behind available service)
An alternative approach is add another drop down list to identify on the resolution what could have stopped the incident happening in the first place. Example:
To stop next time:
1. IT Infrastructure Reliability
2. IT Communication
3. Software stability / fix
4. Software enhancement
5. Training / Knowledge
6. IT service enhancement (offer more, better)
7. Better IT Procedure
Notice in both cases, doing this we are asking the support analyst to analyze and comment beyond just ‘What it Was’ and ‘What I Did’.
Pop this drop down list on your incident resolution window and make it mandatory. Then give it a month and then take a look at the data. Soon you’ll be able to see where you could get benefit by focusing your research efforts. The usual market research would indicate that a bigger proportion of incidents are often caused by IT change, and could be resolved by greater IT change control. However every business is different.
Step 3: How do you diagnose and solve the problems that you decide should be solved? Well, here it’s back to my favorite word: Process. A good Problem Management process, such as LANDesk Service Desk, is simple and easy to understand. It guides the Problem team through Investigation, Diagnosis, and the recommendation of a suitable solution. By making Problem Management another process-driven part of your Service Desk or IT operation, it becomes routine and consistent, and over time you see great benefits.
So, in summary, Problem Management it’s often misunderstood and frequently neglected. However, it’s well worth investment of time and effort if you want to actually improve IT business productivity.
What do you think? What experiences have you had with Problem Management? Let me know in the comments below.