Safety‎ > ‎

What is a Safety Incident?

posted 22 Jul 2012, 11:39 by Micha @SARMobile   [ updated 1 Dec 2013, 16:45 by Support SarMobile ]
safety: noun
  1. The condition of being protected from or unlikely to cause danger, risk or injury
  2. Denoting something designed to prevent injury or damage
So a safety program is a program designed to prevent injury or damage. A safety program does this by promoting activities and behaviours that reduce the likelihood of danger, risk or injury and discourages activities and behaviours that increase the likelihood of danger, risk or injury. One of the first things we must do then is categorize activities and behaviours according to impact on safety. In aviation this is not as easy as one might first expect. Obviously when we see news footage of the broken and smoking wreckage of an air plane crash, we know something has happened to negatively affect safety. Unfortunately it is too late for the victims of that crash. To be effective a safety program must identify behaviours and activities before injury and damage result. Ideally before there is danger or risk. This can be difficult, especially with the tools in common use by aviation safety analysts today, particularly as used by those who don't invest the time to learn how to use them. One example is the Swiss Cheese model by James T. Reason at the University of Manchester in 1990. Many people focus on the imagery of the cheese slices as a metaphor for an organization's defensive safety barriers, neglecting the other more salient aspects of the model. For example this image from the ASTRA Project makes the author's point very well but is not really the model presented by Reason with three layers of latent error followed by one layer of active errors. 

For this reason I use the concept of risk surface when advising clients how to decide if a particular event is implicated in safety or not. The shape of the risk surface is determined by factors that could be considered contributory or direct causes of an accident, had one occurred. If two incidents or events have congruent risk surfaces, and one is implicated in safety, then the other must also be implicated in safety. A case study may help here.

Risk Surface Congruence - A Case Study

I was searching the internet for information on a program of interest to members of the SARMobile team, myself included, when I came upon the web site of a volunteer air search and rescue group. Their web site contains a great deal of information, obviously intended for members, but available to anyone. This shows a degree of openness which I find refreshing, and is a touchstone of a good safety culture.

About a year ago they had an incident which was unambiguously implicated in safety. One of their search aircraft had a loss of separation (a near miss) with an aircraft not involved with the group activity in the vicinity of an aerodrome listed in the national Air Information Publication (AIP). I don't have all the details, but reading between the lines it appears that the group was conducting search training near the aerodrome. In most jurisdictions aircraft may not operate in the vicinity of an aerodrome, at altitudes set aside for aircraft arriving or departing that aerodrome, unless the intention is to land. An investigation was conducted, meetings held and new protective policies promulgated.

Then just last month, during a debriefing, feedback indicated that the communications frequency in use for an aerodrome not listed in the AIP was not presented in the pre-operations briefing. Seeing this I dashed off a quick email to their published contact address with a recommendation of one way to implement the solution they proposed for this safety issue. I did not expect to receive a reply, but imagined that any reply would be along the the lines of "thanks for the suggestion". Instead the reply I did receive sought to assure me that no safety issues arose from this incident. Actually the wording was strange under the circumstances, but very telling as well. But more on that later. For now let's confine ourselves to the question: is the second event implicated in safety or not?

Let's have a closer look at the the shape of the risk surface. First, whether or not the aerodromes are listed is immaterial. The pertinent regulations do not make that distinction. This is wise since the laws of physics will not be abated in the event that a mid air collision happens near an unlisted aerodrome. So the new protective policy enacted in response, that aircraft would avoid aerodrome airspace listed in the AIP, was doomed to be ineffective. In the first incident a loss of separation apparently occurred because a pilot did not act appropriately in the vicinity of an aerodrome. There could be many factors that contributed to this incident. A failure to brief the location or other data of this aerodrome with respect to the training task the aircraft was flying would certainly be contributory. In the second incident important information about an unlisted aerodrome was not briefed. If the second event is seen as a safety incident then they have an opportunity to improve their protective policy. But since the group believes "no safety issues arose" from this event their barriers remain porous. These two events have congruent risk surfaces because they both involve search training operations in the vicinity of an aerodrome. Over time other events with more or less similar risk surfaces will occur, until a sufficiently alarming outcome refocuses the attention of the safety program on the issue. We are left to hope that the alarm does not come in the form of injury or death.

I mentioned earlier that the wording of the reply I received took me aback. So much in fact that I did a little digging, the same thing I would do before taking on a new client. I searched news reports, publicly accessible government databases, the group's publicly published information and the like. There are limits to what I can learn through this, but it can be surprisingly informative. I was looking for indications of other activities with similar risk surface shapes to the two I have already covered here, except I took one more level of abstraction. I wasn't looking for incidents involving operations near aerodromes. Since the first incident likely involved an infringement of regulations I was looking for events that likewise involved transgressions. I very quickly found two.

The person who replied to me appears to be the ultimate owner (through a numbered company) of an air plane that was involved in a hand starting that resulted in substantial damage to two aircraft and minor injuries to one person. Applicable regulations require that when an aircraft is started either a person qualified to operate the aircraft is situated at the controls ready to act, or the aircraft must be restrained so that it can not get out of control. According to accident investigation records neither of these preventative measures were taken. I was not able to establish a definitive link between the aircraft, pilot and the group in question; but the proximity to the group makes this an item of interest that would be on my list for investigation if I was taking the group on as a client.

The second incident seems to have been a case of continued operation under Visual Flight Rules into Instrument Meteorological Conditions. Pilots are licensed to fly in two broad categories of flight depending on weather conditions. They are often referred to as visual flight and instrument flight. The regulations that apply in each case are called Visual Flight Rules (VFR) and Instrument Flight Rules (IFR), and the weather conditions Visual Meteorological Conditions (VMC) and Instrument Meteorological Conditions (IMC) respectively. So, when a pilot who is only authorized to fly under Visual Flight Rules operates an aircraft into an area of Instrument Meteorological Conditions it is called continued VFR into IMC. This is a very dangerous activity often resulting, as in this case, fatalities and destruction of aircraft. This particular accident does not directly involve the group I am talking about, but another group that falls under the same overarching national body. So again the proximity makes this an item of interest.

This is as far as the information I can gather from public sources can take me. What follows is a description of what I would do if I had access to the people and records of the group, not a description of deficiencies I have found. However, if you notice a similarity between what I describe below and your group, you should take a good long look at your safety program before someone like me or a government agency is doing that for you.

We now have three incidents involving the breaking of a regulation that resulted in either death, destruction of property or a close call. What do we do with this? If I was advising this group on their safety program I would conduct interviews with the members trying to see if these are isolated events or not. It is unlikely that they would be isolated. It is quite rare for a pilot to actually come to grief during the first incident of continued VFR into IMC, or first hand start, for example. It is reasonable to assume that there have been multiple events similar to these three, or that involve other regulatory infractions. I am also always interested in the level of authority given to the people involved. To understand why I feel this is important we have to take a short detour through some safety theory.

Human Error

Most accidents involve some sort of human error. Human errors can be either errors of commission or errors of omission. Errors of commission typically involve failure to follow regulations or procedures, taking short cuts or making incorrect assumptions. Errors of omission usually occur during the evolution of an accident. Failing to execute the appropriate check list during an engine failure for example. Human errors, as discussed by Mostia (2003)1, may be divided into two categories:
  • Errors of Intent
  • Errors of Action
Errors of intent occur when someone exercises authority to intentionally override or ignore a policy, procedure or regulation; or violate the intent of a policy. Clearly the three events I am interested in are all errors of intent (another area of risk surface congruence). The pilot must intentionally decide to operate according to the search training task rather than the rules governing operations near aerodromes. One might claim to not be aware of an aerodrome, but then one is not following the regulation requiring the pilot to be familiar with all data pertinent to the flight. Aviation can be a harsh mistress. A pilot may encounter IMC accidentally at night, but continuing on is an intentional error of omission. The appropriate action for a VFR pilot encountering IMC is to turn about and return to where visual conditions were available. Finally there is no possible excuse for not following regulations for starting aircraft. Errors of intent usually occur when the person with authority does not believe in the purpose or rational behind the policy or regulation; or because they are willing to override the undesirable implications of the policy or regulation.

Returning to  James T. Reason but going beyond the cheese metaphor, we find that systemic errors of intent fall into the category of latent failures. They can lay dormant for long periods of time without contributing to an accident. A safety program may wait for an accident or incident to implicate these systemic errors then deal with them. Unfortunately that is often too late for the victims of the accident. It is much better to recognize and deal with latent failures when they are only an inconvenience. An organization with a truly effective safety program will spend as much time and effort, if not more, at this stage as they do when they have a close call or a loss. Often the fact that they invest the time and effort earlier in the process means they never have to deal will a close call or a loss. 

So, back to our search and rescue group. With Mostia and Reason in mind, my next step would be to examine non-operational areas, human resources and finance for example. If there is a culture of latent failure involving errors of intent, they will often show up in short cutting or abuse of fiscal and personnel policies. Again, using the idea of risk surface congruence, if an infraction under a policy or regulation in operations is a safety issue, then an infraction under a financial or personnel policy must also have safety implications. If nothing else a culture of subverting non-operational policies and regulations will eventually bleed over into operations.

So, what is a safety incident? Any occurrence that is similar to (has a risk surface congruent to that of) any other safety incident. This will include those events that might not seem to be so at first look. A safety program should be about preventing accidents, not explaining the ones that have already happened. To do that takes constant vigilance and a willingness to see what other dismiss as not safety related. 

Safety is no accident.

[1] Mostia, B. September 2003. Avoid Error. Chemical Processing.