Accident investigation and analysis
Introduction
Many organisations and companies take extensive proactive measures to identify, evaluate and reduce occupational risks. However, despite these efforts things still go wrong and unintended events occur. After a major incident or accident, conducting an accident investigation is generally the next step. A thorough accident or incident investigation may uncover a wealth of knowledge about safety management practices in the organisation. Previously overlooked weaknesses of safety management or safety culture may be uncovered, as well as underestimated risks, unknown risks or insufficient control over known risks.
Government agencies may or may not be involved in the investigation and analysis of events. In many countries inspectorates, accident research boards, or criminal prosecutors may investigate some or all of the serious occupational incidents that occur. In this article we focus only on incident investigation conducted (internally) by companies themselves. This article contains an overview of steps in incident investigation and some methods for analysis.
Incidents and accidents
There are many different definitions of what constitutes an incident or accident. In general the focus is on unintended and unforeseen events which have unintended consequences. In this article we will use the terms incident and incident investigation interchangeably with accident or accident investigation. In addition we focus on the field of occupational health and safety, although we realise that the same investigation techniques may be relevant for the investigation of other types of incidents as well. For a more thorough definition of incidents and accidents see: Reporting and monitoring occupational accidents and diseases in Europe
Many organisations carry out formal incident investigations. There are many primary and secondary goals to conduct an incident investigation, for example:
· to understand what went wrong (especially the causation and processes that were involved);
· to learn and improve so as to prevent future incidents from occurring;
· to blame who is responsible for the event and to punish this company or individual;
· to comply with requirements posed by governing bodies.
From an OSH management perspective the main goals should be to understand the incident and to use that knowledge to improve (to learn) in order to prevent similar incidents from occurring in the future.[1] In addition, an investigation is often an important part of transparency as it enables the organisation to explain what happened to important stakeholders. It is recommended that the investigation should always explicitly focus on understanding, learning and improving. Blame and fear of retribution can seriously complicate any efforts to uncover what really caused an unwanted event. See, for learning from incidents, Learning from incidents and accidents. Some steps need to be taken into consideration, which will be described in this article.
There is great variation in the extent, scope and type of incident investigations that are conducted. There is also a great diversity in the methodologies that investigators use to analyse, depict and explain their findings. The remainder of this article is divided into two sections:
· the first section discusses the investigation of an incident. The focus in this section is on preparing for incident investigation and gathering information after it occurs;
· the second section is focused on the analysis of an incident’s causes. In this section some relevant methods and models are discussed which can help an investigator conduct the analysis in a structured manner.
Investigating incidents
In the investigation phase the focus is on gathering information to determine what events exactly occurred before, during and immediately after the incident. Relevant steps include: preparing for an investigation, choosing what to investigate, securing the scene of the accident and gathering information.
The period directly following the occurrence of a serious incident is often a time of great turmoil, confusion and intense emotions. The direct consequences of the event need to be mitigated and it may be necessary to contact internal and external stakeholders, such as emergency services, co-workers, the relatives of the victims, the press and government officials. When a serious accident occurs in the workplace, everyone will be too busy dealing with the emergency at hand to worry about putting together an investigation plan. Therefore it is generally better to develop an effective accident investigation plan before an accident occurs.
These are some aspects that the plan should cover:
· how and when the decision is made to secure the scene of an incident;
· who decides whether an incident needs to be investigated;
· who is responsible for conducting investigations (investigation leader);
· who is assigned to conduct investigations (investigation team);
· how information about the incident will be stored and managed;
· the (mix of) competencies required for accident investigators;
· whether outside assistance should be hired and if so under which conditions;
· the internal reporting lines – who gives orders, receives and acts on investigation reports;
· what equipment is needed to conduct an investigation and where should it be kept (investigation kit);
· the methods that will be used to analyse and report findings.
The investigation plan should be updated to account for experience gained. Some companies also employ personnel who are formally trained in conducting incident investigations.
The initial question is which incidents need to be investigated, and in what way. Often the severity of an incident’s consequences is the main criterion used to determine whether an investigation is initiated. However, focusing only on the severity of an incident should be considered incomplete. An alternative approach is to look not only at the severity of an incident but to also use additional criteria. Some possible reasons for choosing to investigate incidents are, for example:
1. High actual severity of the incident’s consequences such as loss of life, loss of containment or extensive (property) damage;
2. A legal or procedural requirement to do so;
3. The occurrence of similar incidents earlier in time, within the same organisation or sector;
4. Incidents with limited actual consequences but with a high potential for serious consequences;
5. Near misses when an incident is just barely avoided.
The investigation of near misses and incidents with limited actual consequences allows the company to identify and control unforeseen hazards or inadequate control measures before they cause a more serious incident. In this way regular incident investigation can help to improve safety performance continuously.
When a serious incident occurs, the emergency response is clearly the most important first step to take. However, when the decision is made to conduct an investigation, the investigation should be commenced quickly as well. One of the first and most important actions to take is to secure the location of the incident. The decision to secure the scene of the incident may best be made by local (management) personnel. Securing the scene may be done even before the formal decision to investigate is taken.
Securing the scene of the incident is important to make sure that (material) evidence is not moved or removed, which can easily happen after an incident. Securing the scene of the incident quickly is a great help for the investigation team in the fact-finding phase. If the incident is quite serious, the public authorities may also order that the scene of the incident be secured.
The next step is to gather useful information about what contributed to the accident directly and indirectly. In this phase the investigation team should be present as planned in the investigation plan.
Some steps they may take are:
· Create a thorough and complete overview of the positions and directions of all relevant physical objects and circumstances at the location of the incident, for example, using; photographs, physical samples, position measurements, and (air) samples, or by retrieving electronic data such as photographs already taken.
· Access any databases that contain data on technical systems (e.g. process control systems).
· Interview eyewitnesses as soon as possible after the accident.
· Retrieve and review documents related to the incident such as; maintenance records, work procedures, training procedures, training records, permits to work or relevant audit reports.
· Conduct additional interviews with involved personnel to explore the organisational factors relevant for the incident (e.g. usual ways of working, adherence to relevant safety procedures in everyday practice).
There can be no prescribed order or sequence in which to gather information. In general one should attempt to retrieve the most vulnerable information first. If it is not yet clear whether some information will eventually be needed to conduct the analysis, the general advice is to retrieve and include the information anyway and evaluate its relevance later. Eyewitness statements in particular can be very vulnerable, for example, from discussions with co-workers. Interviews with eyewitnesses should thus be scheduled as a high priority.
Analysing accidents and incidents
After information has been gathered, the next step is to conduct the actual analysis and to determine the cause(s) of the incident. In many cases, the gathering of information and the analysis of the incident will need to take place as an iterative process. Starting the analysis may, for example, clarify the need to gather additional information via new interviews.
The first step in conducting an analysis will generally be the development of a detailed timeline. The goal of using a timeline during the investigation is to structure events and actions in time accurately. Doing this helps the investigator to evaluate potential causal pathways. After the timeline is (partly) developed there are many models and methods available that can help the investigator to analyse, structure and communicate the findings.
The information gathered can be used to develop a detailed step-by-step description of the incident and all the events that led up to the incident. The investigator should focus on making an overview of all the (potentially) relevant events, actions taken by people, state of equipment etc. A technique that can be used to depict a basic timeline is a Sequential Timed Event Plot, also known as a STEP diagram. Events, activities, system states etc. can thus be organised into a single overview.
The timeline should be focused mainly on ‘what’ happened (the events) and less on why things happened (the causes). This is because there may be multiple (interacting) causes for any event on the timeline and causes may not be close together in time or place. In order to represent the causes of an incident other techniques are likely to be more appropriate.
An investigation should aim to uncover not only immediate causes but also underlying or root causes. The rationale here is that the same underlying or root causes occur in many incidents, while specific direct causes may occur only once. Different incident investigation tools and methods are available to help investigators uncover the underlying causes. These methods are often strongly related to underlying models of causation. Hollnagel (2004) distinguishes the methods of analysis into three types; simple sequential accident methods, epidemiological accident methods and systemic methods. The simplest accident methods describe the accident as the end result following a sequence of events that occur in a specific order. In a sequential accident model, the accident is an unexpected, unintentional event leading to an unwanted outcome caused by one or more preceding events. Clear causality and identifiable cause-effect links are assumed in these models. An example of this type of method is a basic event chain model. The well known domino-effect model (Heinrich, 1931) can be seen in relation to sequential analysis methods.
Epidemiological methods posit that an accident is the result of multiple factors which are present simultaneously as both clear and more hidden (latent) factors. These models focus more on performance deviation rather than unsafe acts, environmental conditions that could lead to the performance deviation, barriers that could have prevented the accident and dormant latent conditions that may have been present for a long time before contributing to the occurrence. The Swiss-cheese model of incident causation first proposed by Reason (1990, 1997) can be seen as an example of epidemiological models of incident causation. Systemic methods try to describe the characteristic performance on the level of the system as a whole, instead of a structural decomposition of the system. In this view accidents are emergent phenomena of the normal functioning of a system. In this sense accidents are normal and natural, expected phenomena. In addition, systemic models attempt to account for so-called non-linear effects, whereby a small input into the system (‘cause’) can have a much larger amplified effect later. The systemic models have developed their own new methods for incident investigation, such as: Accimap, FRAM, STAMP and Extended.
The choice of model and method can have consequences for the type of factors that are uncovered in an analysis and this may result in different conclusions and recommendations. One important difference between methods may be the extent in which they allow and help the investigator to ‘dig deep’ and to find underlying or ‘root’ causes of incidents. Other factors to consider are:
· the complexity of the method;
· ease of use of the method;
· amount of training required for using the method;
· level of use of the method (in the sector);
· the presence of a professional community;
· quality of the method for communicating results;
· maturity of the analysis method.
There is no general rule for what is the most suitable analysis model for any particular enterprise. A full comparison of all the available methods is beyond the scope of this article. There are, however, several sources that describe and compare incident investigation methods, some of which are freely available online. In practice an investigator could best combine multiple techniques, such as an event chain or timeline and an epidemiological or systemic method (e.g. Tripod Beta, STAMP). Apart from the type of method used there are likely to be many other factors that determine the quality and depth of an investigation, for example, the training and experience of the investigators, and the available resources (see also section 3.1 on preparing for an investigation). At the moment there is little empirical evidence on which factors are most important for the development of a good (in-depth) incident analysis.
Summary
For many organisations, incident investigation and analysis is an important step in efforts to learn from mistakes and to improve occupational safety and health. Organisations investigate incidents in order to understand not only what went wrong, but also why it went wrong, and what can be done to prevent reoccurrence. From an Occupational Safety and Health (OSH) management perspective the main goal should be to understand the causes of the incident. Blame and fear can seriously complicate any efforts to uncover what really caused an unwanted event. The period directly following the occurrence of a serious incident is often a time of great confusion. As a result the best time to develop an effective incident investigation procedure is before an incident occurs. Each organisation should consider the criteria for whether to investigate incidents arising from its operations. Ideally this decision should not only focus on the seriousness of the event (outcomes) but also on the potential for learning from the event. When an accident occurs there are many different methods available which can be used to conduct an analysis. The best safety improvements may be expected when the method selected allows the investigator to dig deep and to uncover ‘root causes’ or ‘underlying causes’ rather than just the immediately apparent causes.