Data or reliability analysis is the main role of the Reliability Engineer (RE) in maintenance. In support of his quest for good data, the RE should organize and teach technicians, planners and supervisors how to complete maintenance work orders so that the data is “good”. This is not as difficult as it sounds. The key is “structured free text”. Drop down menu lists of failure codes won’t give you good data for analysis. Structured free text will. Text commentary provided by the technicians should separate “What I found” and “What I did”.
“What I found” provides the RE with failure mode life endings either by Failure or Suspension. The tech should structure his “What I found” information by providing the failure mode in three segments as follows:
- A part – This could be a part, assembly, system, or component depending on the level of detail needed commensurate with the consequences of failure.
- An action phrase describing a change of state (Fell off, jammed, disintegrated, stopped working, …)
- A due to clause (rust, fatigue, corrosion, dirt, …)
Note that 2 and 3 are optional depending on the consequences of failure. If not indicated, the RE performs reliability analysis (Weibull, Pareto, top ten, simulation, …) on 1, failure of the part for any reason.
“What I did” provides the failure mode life beginnings. I renewed (replaced, rebuilt, cleaned, calibrated, adjusted …)
The RE analyzes samples. Samples are a collection of failure mode life cycles whose two major defining events are beginnings and endings. With the “What I found” and “What I did” structured text, the RE has almost everything he needs to perform his analysis. One more critical piece of information is needed for each “What I did” recorded in the “structured free text”. That is the “Event type” which is one PF, FF, or S (i.e. Potential Failure, Functional Failure, or Suspension).
If the what I did was in response to a failure then indicate either PF or FF.
If “What I did” was preventive, that is, the failure mode didn’t fail. It wasn’t even failing. It may have been worn, but there was still an indefinite amount of life left in the part or component. Then indicate S.
Why do we need to discriminate carefully between Suspension and Failure? Because the algorithms of our reliability analysis software need to know if the failure mode failed or not. If it didn’t fail then the algorithm will understand that the renewed failure mode survived “at least” to that moment. This is the only way that the software can do what it does well. With failure events well discriminated with suspension events the software algorithm can predict failure probabilities and remaining useful life estimates with stated confidence intervals.
There are valuable added bonuses gained by the Reliability Engineer when he teaches the staff how to deliver structured free text[1] on the work order:
- The drop down menu items selected will be accurate.
- The Event type indicated for each significant “What I did” allows software to keep track of the internal working ages of each important failure mode. Now age-reliability relationship curves can be generated automatically and on demand by software.
- Software can now correlate failure with condition monitoring and operational data in order to determine if it contains any predictive capability. If so, then it can generate a decision model that the Reliability Engineer can deploy as an automated agent.
- The Reliability Engineer can update the RCM/FMEA knowledge base, particularly the Effects text with new information and insight gained from the structured “What I found” and “What I did” free text comments on the work order. Gradually less free text will be required, since the related RCM knowledge record will contain the necessary details. This implies that each significant work order should be linked to one or more RCM records. Reliability Analysis, the major activity of the Reliability Engineer, is simply the counting up of instances (occurrences) of RCM knowledge records. In other words, a work order should be an instance of one or more failure modes that are fully described in the RCM knowledge base in the context of their Function, Failure, Effects and Consequences. Samples should comprise collections of instances of Failure Modes that the Reliability Engineer targets for analysis.
© 2011 – 2016, Murray Wiseman. All rights reserved.
- [1]The need for structured free text on the work order is obviated by using a Living RCM work order user interface, such as Mesh. With Mesh the SAP catalogs are maintatined in perfect synchronization with the RCM knowledge base whose tree view in the work order form eliminates the possibility of an incorrect selection.↩
- Free text on the work order (47.3%)
- Confidence in predictive maintenance (34.2%)
- Measuring and Improving CBM Effectiveness (13.3%)
- How to assess EAM and CBM predictive capability (13.1%)
- The elusive P-F interval (10.5%)
- Dynamic RCM knowledge growth (RANDOM - 0.9%)