Safety, Health, and Environmental (SHE) disasters occur when numerous seemingly unrelated conditions and events coincide in ways that have received little or no serious attention by maintenance and operational staff. Still, executives and government regulators expect that a logical set of rules and procedures overseen by responsible management can avert the worst from happening.
Yet mishaps keep occurring at an alarming rate in large and small organizations. Postmortem investigations invariably demonstrate that avoidable factors precipitated or could have predicted the event. That experienced operating personnel failed to mitigate the disaster, in spite of extensive standards and procedures, strains the organization’s credibility in the mind of the public. Surely, countless incidents should have, by now, provided a reliable model for the prevention of catastrophic events.
James Reason[1] models the anatomy of industrial accidents as an unfortunate alignment of organizational influences, unsafe supervision, preconditions for unsafe acts, and the unsafe acts themselves. In this model, an organization’s defenses against failure are a series of barriers, with individual weaknesses in individual parts of the system. Those weak points vary continually in size and position. The system as a whole fails when all individual barrier weaknesses align, permitting “a trajectory of accident opportunity”, so that a hazard passes through all of the holes in all of the defenses, leading to a failure.
Root cause analysis after the fact often exposes a series of foretelling incidents preceding the main event, by minutes, hours, and often by weeks or months. The ubiquitous maintenance department finds itself involved in most such “pre-events”. And these are recorded as work orders in the Maintenance Management System (CMMS). Analysis of the work order database would, seemingly, provide ample opportunity for preemptive action to “plug the holes” as they are discovered. However the typical estrangement between the CMMS and RCM knowledge base usually precludes recognizing opportunities in the existing preventive maintenance strategy. Likewise the conclusions and maintenance changes recorded in accident follow-up software applications seldom find their way back to the RCM knowledge base which loses synchronization with the plan.
ISO 14001:2004 has been conjectured to provide a framework for a holistic, strategic approach to the organization’s environmental policy, plans and actions by enabling it to:
- Identify and control the environmental impact of its activities, products or services, and to
- Improve its environmental performance continually, and to
- Implement a systematic approach to setting environmental objectives and targets, to achieving these and to demonstrating that they have been achieved.
LRCM fulfills the requirements of ISO 14001 with regard to human-machine interaction implicated in virtually all man-made disasters. The most difficult and important aspect is contained in the third requirement, “…demonstrating that they [targets] have been achieved”. The RCM analytical technique applied day-to-day as a “living” process promotes the routine examination of each maintenance event with respect to the consequences of the observed failure. In Living RCM work orders are considered to be instances of “knowledge records”. Each record of the referenced RCM “knowledge base” consists of the basic elements that describe failure and its causal event (called a “failure mode”). The RCM knowledge elements are responses to the seven questions:
- What system function was compromised?
- In what way? Was it a partial or complete or potential failure?
- Why? What event caused the failure?
- What happened or could have happened (at the component, system, organizational, and societal levels)?
- Why does the failure matter?
- What maintenance can be done to mitigate or avoid the consequences of the failure? If none, then
- Should the failure cause be designed out or should the failure be permitted to occur?
These questions are revisited in the light of current observations at the moment of closing the work order. Requiring the technician, planner, or engineer to link the executed work order to the appropriate knowledge record invokes the above seven-question RCM thought process. As a result, should the facts warrant a clarification or change, the knowledge record will be updated, particularly Question 4 “What happened or could have happened” as a result of the failure. Questions 5, 6, and 7 will be reconsidered. The key advantage of revising knowledge on-the-fly is the responsiveness to vivid, relevant facts fresh in the minds of all involved.
LRCM, pioneered at Cerrejón Coal, provides an audit trail of evolving knowledge. LRCM is the responsible approach to HSE. Not only does the process ensure continuous growth of relevant knowledge as required by ISO-14001, but also that the state of organizational knowledge at any given moment in time prior to an incident will be available for investigation. Continuous review and refinement of knowledge as prescribed by LRCM improves the probability that an accident will be prevented. The ability to assess an organization’s preventive measures given the state of its knowledge at any point in time ensures accountability.
© 2011, Murray Wiseman. All rights reserved.
- [1]Reason, J. (1990). Human error. Cambridge University Press.↩
- Criticality analysis in RCM (100%)
- The elusive P-F interval (100%)
- Deepwater Horizon (100%)