Causes of Incidents and Accidents
INTRODUCTION
Aviation safety experts have realized for some time that aircraft incidents and accidents almost always result from a series of events, each of which is associated with one or more cause factors. Thus, the cause of an accident or incident has many aspects. Some internationally accepted definitions in the context of the investigation of an aircraft accident or incident are listed below (ICAO, 1994):
· Causes are actions, omissions, events, conditions, or a combination thereof, that lead to an accident or incident.
· Accidents are occurrences associated with the operation of aircraft, from the time any person boards an aircraft with the intention of flight until the time all persons have disembarked, that results in one or more of the following:
ü A person is fatally or seriously injured.
ü The aircraft sustains damage or structural failure that adversely affects the structural strength, performance, or flight characteristics of the aircraft and would normally require major repair or replacement of the affected component.
ü The aircraft is missing or completely inaccessible.
· Incidents are occurrences, other than accidents, associated with the operation of aircraft that affect or could affect the safety of operation.
The definition of cause given above takes into account the many events involved in an accident or incident. These events can be viewed as links in a chain. Investigations of some hull loss accidents in the United States have revealed as many as 20 links in the chain; the average is just under 4 links.1For example, after an exhaustive technical and legal investigation into one controlled flight into terrain (CFIT) accident, an official commission concluded that at least 10 essential cause factors were involved.2 If any one of these 10 cause factors had not been present, or if some of the factors had occurred in a different order, the accident would not have happened. The most effective accident prevention strategy must take into account all the links in the chain of events that lead to incidents and accidents.
Subdividing an incident or accident into a chain of events reveals important information. If one more element is added to the chain in an incident, for example, the consequences of the incident might be much more serious, even resulting in an accident. Conversely, removing one link in the accident chain could substantially mitigate the consequences or, possibly, prevent all adverse consequences. In other words, from a safety management viewpoint the only meaningful difference between many incidents and accidents is the consequences. For example, an aircraft may experience several abnormalities involving equipment malfunction, unexpected adverse weather conditions, and loss of situational awareness by the flight crew. As a result, the aircraft may take longer than expected to slow down after landing. If the aircraft happens to be landing at an airport with runways of the minimum required length with water hazards at the end, there could be a catastrophe. The resulting investigation might lead to a comprehensive review of procedures and systems related to approach and landing. If the same sequence of events happened at an airport with runways of the minimum required length but with a grassy field at the end, the aircraft might run off the end of the runway and experience minor damage and no crew or passenger injuries. In that case, there
1. |
Accident data in this chapter are primarily related to hull loss accidents. For the purposes of this study, data from all accidents and from fatal accidents are not significantly different from data from hull loss accidents in terms of causes and trends in the accident rate. |
2. |
A CFIT accident occurs when a mechanically sound aircraft collides with the ground, typically because the flight crew loses situational awareness and does not understand the flight path of the aircraft relative to the ground. ''Loss of control'' accidents include collisions caused by engine failure, icing, stalls, or other circumstances that interfere with the ability of the flight crew to direct the motion of the aircraft. |
would be a review of the incident within the aviation community but little public notice. But if the same sequence of events happened at an airport with runways that were significantly longer than required for that aircraft to land, the aircraft might still stop well short of the end of the runway. In that case, depending upon the attentiveness of air traffic controllers and the inclination of the flight crew, a report might not be filed, and there might be no examination of how to prevent a similar series of events from happening in the future. In that case, the opportunity to take proactive safety measures before an accident happens would be lost. The challenge of aircraft safety management is identifying and focusing attention on truly hazardous conditions before a potential accident becomes a reality. In the example described above, routine use of flight recorders or quick access recorders (QARs) to monitor stopping distance would provide operators with an independent means of detecting potentially hazardous abnormalities.
In 1996, accidents involving jet transport airplanes occurred in the United States at the rate of about 1.46 accidents per million flights (or one accident for every 685,000 flights). Except for a few, well publicized tragedies, most passengers involved in commercial aircraft accidents are not killed or injured. However, as shown in Figure 3-1, the rate for hull loss accidents involving large jet transports has improved only slightly in the last 20 years. The same is true for fatal accidents. Accident rates can be computed in terms of accidents per passenger-trip, accidents per passenger-mile, or accidents per passenger-flight hour. Depending upon the method chosen, the accident rates for Part 135 operators, who commonly use turboprop aircraft, are about three to eight times higher than the accident rates for Part 121 operators, who operate most of the large jet transports (FAA, 1994). Rates for Part 135 operators are declining, but slowly. Regulatory changes to establish more uniform safety standards for Part 121 and 135 operators are intended to address the disparity in accident rates.
Analyses of the chains of events in accidents are generally useful just for preventing similar accidents. Because there are so few accidents in the United States relative to the number of flights, focusing safety programs on accidents alone addresses only a small fraction of potential accidents and is reactive rather than proactive. A proactive approach that could eliminate risks before they cause accidents requires an effective means of tracking the chains of events in both incidents and accidents. Preventive action (not just remedial action) could then be taken—based on how often individual links in the chain recur and their potential for contributing to future incidents and accidents. Every abnormal event in the incident or accident chain could be examined to identify the cause factors that explain why it happened and to describe the underlying problems and deficiencies that should be corrected.
One approach for visualizing an incident or accident is as a chain of events that must occur in a certain sequence. Another is shown in Figure 3-2, which shows a system of disks spinning at random. Each disk contains a hole that must line up precisely with the holes in the other disks before a beam of light can pass through the entire system. The probability
FIGURE 3-1 Worldwide hull loss accident rates, 1959 through 1996. The data depicted in all Chapter 3 figures cover commercial jet aircraft heavier than 60,000 pounds. Available data on aircraft manufactured by the states of the former Soviet Union are incomplete and are, therefore, excluded. Data on accidents caused by sabotage, hijacking, military action, experimental test flying, or suicide are also excluded.
FIGURE 3-2 "Spinning disk" view of accident and incident events.
Source: Reason Model (Reason, 1990).
that the disks will all line up after being spun "independently" is less than if the positions of two or more disks are linked. This is why primary and backup hydraulic systems are physically separated as much as possible—so that a single damaging event will not disable all hydraulic systems. Assuming that the disks represent events in a chain leading to an accident, corrective action to prevent one or more of the events (i.e., filling in the holes) is one approach to preventing other accidents that might involve the same events.
When an official investigator reports the "probable causes" of an accident or incident, consideration should be given to all of the events and cause factors. Cause factors can be grouped into the following categories:
· human factors/personnel error
· malfunction or failure of aircraft structures, engines, or other systems
· deficient maintenance
· hazardous environment involving weather, volcanic ash, birds, etc.
· air traffic management errors
· any combination of the above
Identifying the precise cause factors for each event can be complicated, requiring good judgment and accurate interpretation of the facts. There could be more than one cause factor for each event, and some cause factors naturally overlap. Human factors include mistakes caused by voluntary acts, failure to act, and other factors associated with actions or inaction.3
Cause factors associated with aircraft, engines, and systems include deficiencies in the design, manufacture, maintenance, or operation of the aircraft or its systems. Maintenance-related cause factors include improperly performed maintenance and inadequate maintenance procedures and plans. Environmental cause factors include hazardous weather, volcanic ash, sand, dust, and birds. Cause factors associated with air traffic management include deficiencies in weather reporting, regulations, and the air traffic control system (navigational aids; air traffic control directives; and airport facilities, runways, and taxiways).
Combinations of factors and cascading cause-and-effect sequences must be carefully examined to understand all of the cause factors. For example, to prevent accidents caused by system failure, the system that failed could be modified to prevent similar failures in the future. In addition, understanding if the failure was triggered by the failure of some other system, improper maintenance, abnormal operating environment, etc., may suggest additional corrective action.
PRIMARY CAUSES
The term "primary cause," defined as the most critical cause factor associated with a particular incident or accident, can be deceiving and is often subject to interpretation. One cause factor may contribute more to the consequences of an accident or incident than the others, but making this determination may also depend on one's point of view. Take the case of an accident involving an uncontained engine failure that severed all of the aircraft's hydraulic lines.4 The uncontained engine failure disabled the hydraulic systems needed for conventional flight control; the aircraft was controlled by varying the thrust asymmetrically on the remaining engines until the plane was just short of a runway on which the aircraft was attempting to make an emergency landing. The aircraft crash landed, broke apart, and caught fire. The flight crew and some of the passengers survived the accident.
The official investigation found that the original material from which a large rotating part of the engine was fabricated contained a defect that ultimately resulted in a crack. The crack grew over the life of the part and finally fractured, resulting in shrapnel damage to the aircraft and its hydraulic systems. The investigation further disclosed that the part had undergone numerous inspections designed to locate defects like the one that ultimately resulted in the part failure. Inspections were made at the part's material manufacturer, the forging manufacturer, the engine manufacturer, and during routine maintenance of the engine by the operator. The official report on this accident determined that the probable cause was inadequate consideration given to human factors limitations in the inspection and quality control procedures
3 |
. An expanded list of cause factors appears in Appendix D. |
4 |
. In an uncontained engine failure, a piece of the engine, such as a rotor disk, is ejected from the engine. Commercial jet engines are designed to contain blade failures but not disk failures. The danger of disk failures is, therefore, addressed through stringent manufacturing and inspection procedures. |
FIGURE 3-3 Primary cause factors for hull loss accidents involving large commercial jet airplanes worldwide.
Source: Boeing, 1997.
used by the airline's engine overhaul facility. As a result, a fatigue crack originating from a previously undetected metallurgical defect was not detected. The subsequent catastrophic disintegration of the part produced debris with a pattern of distribution and with energy levels that exceeded the level of protection provided by design features of the hydraulic systems that operated the aircraft's flight controls. In this example, one link in the chain was singled out as being more significant than the others, which included the processes used to produce the basic material for the rotating part, numerous inspections designed to detect the defect before it became a crack, and the susceptibility of the aircraft design to damage by the distribution of debris in this particular failure. The committee believes that a safety management program should have an inclusive view of what constitutes a significant cause to ensure that corrective action addresses multiple cause factors and provides multiple assurances that a similar accident or incident will not occur in the future.
CAUSES OF JET TRANSPORT ACCIDENTS
For the purposes of analyzing the most common causes of accidents, the committee reviewed official accident reports and various summaries of those reports. Figures 3-3 and 3-4 show the primary cause factors cited in official reports of accidents resulting in the loss of aircraft. Data from both the U.S. and worldwide fleets of commercial jet airplanes are shown for two periods: since the beginning of the "jet age" (1959 to 1996) and during the last 10 years (1987 to 1996) (Boeing, 1997).
Figure 3-3 shows that there were 50 hull loss accidents worldwide between 1959 and 1996 in which the airplane was a primary cause factor. Figure 3-4 shows that 15 of these accidents took place in the United States. For the 50 accidents worldwide, Figure 3-5 shows the breakdown by aircraft system.
Although data on primary cause factors are readily
FIGURE 3-4 Primary cause factors for U.S. hull loss accidents involving large U.S.-registered commercial jet airplanes.
Source: Boeing 1997.
FIGURE 3-5 Airplane system cause factors for hull loss accidents involving large commercial jet airplanes worldwide, 1959 through 1996.
Source: Boeing, 1997.
available, these data are often misleading when used to identify detailed trends in accident causes. For example, Figure 3-2 might lead one to believe that maintenance is a growing problem. The percentage of maintenance-related accidents during the 10-year period ending with 1996 is more than three times higher than for the entire 38-year period (1959 to 1996). However, when dealing with small numbers, small changes can produce large changes in percentages. In fact, only three accidents during the 10-year period were attributed primarily to maintenance. As already discussed, the attribution of primary causes is sometimes problematic. If the primary cause of two of those accidents had been attributed to one of the other cause factors associated with those accidents, there would have been no percentage increase in maintenance-related accidents.
One manufacturer examined a large number of past accidents and identified all actions that could have broken the chain of events leading up to the accidents. This examination indicated that operators could significantly reduce accident risks by taking the following measures:
· implementing a comprehensive flight operations quality assurance (FOQA) program such as the British Airways Safety Information System (BASIS) to monitor adherence to standard operating procedures and identify operational irregularities that could foreshadow accidents and incidents5.
· establishing training programs that emphasize basic piloting skills, upset recovery techniques, cockpit discipline, the use of standard operating procedures, and crew coordination and crew resource management
An accident prevention strategy that considers all cause factors involved in incidents and accidents—not just primary cause factors—has a greater potential to prevent accidents by eliminating factors that are common in many incidents and accidents. These common factors serve as "traps" that may be easier to identify and eliminate than a unique, extremely rare factor that may be labeled the "primary cause" in a given accident. For example, if a series of accidents appears to be unrelated, corrective action might focus on the specific circumstances of each accident. A comprehensive review, however, might reveal a fundamental deficiency, such as poor pilot training, safety management, or aircraft maintenance, that is common to the entire series of accidents. Identifying and correcting these fundamental deficiencies is important because they can lead to many types of incidents and accidents.
Trend analysis based on reliability, or mean time between failure, could add another dimension to the safety management process. One could theoretically do trend analyses of aircraft components, structures, etc., to keep faulty parts from becoming causal in the chain of events leading up to an incident or accident, and this would enhance safety. However, because of the redundancies built into the design of aircraft structures and systems, the failure of any single component does not pose a threat to continued safe operation. In fact, FAA-approved minimum equipment fists allow aircraft operation with some equipment out of commission. Also, for economic reasons, airlines and manufacturers already use component reliability analyses to keep their aircraft in the air. For at least the timeframe of this study—the next 10 years—the committee believes that a focused effort to determine mean times between failure, which would require collecting and analyzing vast amounts of data, might not identify specific safety trends and would bog down the safety
CAUSES OF JET TRANSPORT INCIDENTS
Data on incidents involving jet transport airplanes provide a slightly different picture. To begin with, many organizations do not have adequate incident reporting systems, and it is very difficult to obtain complete and consistent records of incidents. Whereas accidents tend to be highly visible, are consistently reported, and are carefully investigated, incidents include a broader range of situations and cause factors, are so numerous that available resources in industry and government are insufficient to conduct thorough investigations of most reported incidents, and reporting them often depends on the initiative of the personnel involved (who may have a conflict of interest if the report is likely to have negative consequences for them). In addition, broadly accepted definitions of what constitutes an incident are imprecise and, in practical settings, they are interpreted differently by different organizations and individuals.
Table 3-1 shows data resulting from an examination of 2,032 incidents worldwide that were reported over a 10-year period for aircraft built by a particular manufacturer. The aircraft included in this examination accounted for about one-fourth of the world's large transport airplanes. The reader should keep in mind that manufacturers have a special interest in preventing incidents and accidents associated with system malfunction. Therefore, a jet transport manufacturer's database may be biased toward incidents in which aircraft system performance is involved. Wherever possible, each incident was broken down into a sequence of events. A
TABLE 3-1 Causes of Aircraft Incidents
total of 1,618 events were identified and categorized by the links in the chain of events and their cause factors. Table 3-1 shows the number and percentage of the cause factors associated with each event. Figure 3-6shows a breakdown of all cause factors for all events by aircraft system. This analysis gave equal emphasis to all factors in the chains of events.
SUMMARY
Because accidents are rare, analyses of accident records can provide guidance on broad areas of concern but are inherently incapable of preventing other types of accidents. Incidents are more frequent and are a rich source of safety data, but the quantity of the data is so large that it is difficult to identify meaningful risks and avoid unfruitful diversions. The process is complicated because some accidents are truly unique and may not be indicative of future hazards, whereas some seemingly inconsequential incidents are disasters waiting to happen.
FIGURE 3-6 Airplane-related cause factors in worldwide incidents involving large commercial jet aircraft produced by a particular manufacturer (about 25 percent of the worldwide commercial jet fleet), 1987 through 1996.
Accidents and serious incidents almost always have multiple causes, although many analyses and safety records focus on "primary" causes. This narrow focus diverts attention from other cause factors that were essential links in the chain of events and that should also stimulate corrective action to prevent future accidents. With careful analysis, however, a safety management process can identify accident prevention strategies that eliminate factors ("traps") that recur in many different accidents. Such a process could effectively reduce many different types of accidents by eliminating the cofactors necessary for their occurrence.
Personnel error (human factors) is the most common cause of both incidents and accidents. CFIT and loss-of-control accidents, which almost by definition involve human factors, account for more than half of all fatal accidents. Similarly, inappropriate crew response and fuel exhaustion, which are also essentially human factors problems, are the major contributors to propulsion-related fatal accidents. Although aircraft system malfunctions are involved in a relatively small fraction of aircraft incidents and accidents, improvements in aircraft systems often improve safety by making aircraft more robust—providing flight crews with more accurate information to improve their situational awareness and reducing the likelihood that a human error will result in an incident or accident.
Finding 3-1. Safety management processes that focus on the primary causes of accidents are reactive and are unlikely to address some important cause factors adequately. Data from investigations of accidents and incidents are essential for planning proactive corrective action, which should address all important cause factors.