The False Idol: An RCM History Lesson

Mar 19

The origin story of RCM that you know is likely missing some important facts.

The year is 1960. Commercial aviation has just transitioned from piston engines to jets. The Boeing 707 and Douglas DC-8 are entering service. The industry is watching the crash rate with growing alarm. And maintenance is eating 30% of airline operating expenses — most of it spent on time-based overhauls that engineers assumed were preventing failures.

They were not.

Four major airlines — United, American, Pan Am, TWA — started pulling their actual in-service failure data. What they found was heretical: scheduled overhaul had little measurable effect on the reliability of complex systems unless a component had a single dominant age-related failure mode. Worse, overhauls themselves were introducing infant-mortality failures in freshly reassembled components. They were spending a fortune to make equipment less reliable.

By 1967, United Airlines engineers Thomas Matteson and F. Stanley Nowlan presented the foundational paper at an AIAA meeting. The methodology that would become RCM was born — not as a philosophy for industrial maintenance, but as a cost-reduction tool for airline regulatory compliance. In 1968, MSG-1 was published. Under the old approach, United's DC-8 program required scheduled overhaul of 339 items. The results were stark. The DC-8's pre-RCM maintenance program required scheduled overhaul of 339 items. The DC-10 — the first major aircraft certified under the new MSG logic — required overhaul of only 7 items. None of them were engines. Structural inspection man-hours told the same story: the DC-8 consumed 4,000,000 hours of structural inspection work during its first 20,000 operating hours; the 747, developed under the same methodology, required 66,000. A 98% reduction, with no degradation in safety or dispatch reliability.

This is a genuinely remarkable intellectual achievement. Nowlan and Heap deserve every bit of the reverence they get in the reliability community.

But here's what the hagiographers leave out.

The Context RCM Was Built For

RCM was built to do one thing: help a small number of airlines develop minimum acceptable maintenance programs for new aircraft types under FAA regulatory oversight using highly standardized fleets of carefully documented equipment.

Think about what that actually means. A major airline's 737 fleet might have 100 aircraft. Every single one of those 737s was built to the same design, certified to the same standard, assembled to the same tolerances, and maintained according to the same Boeing documentation. The 737 has roughly 2,000 maintainable items at the level a CMMS would track as discrete records.

When United Airlines' engineers identified failure modes for the 737, they were identifying failure modes for all 737s that would ever fly. The analysis was done once and distributed to the industry through MSG-3, the Maintenance Review Board process, and the FAA's Advisory Circulars. Every airline flying a 737 inherits that analysis. They don't redo it. They don't convene a workshop. They upload the manufacturer's maintenance program and get on with it.

Now consider your refinery.

A typical Gulf Coast refinery has 80,000 to 120,000 equipment records in its CMMS. Large integrated complexes can exceed 200,000. These are not standardized fleets of identical equipment. They are heterogeneous collections accumulated over decades of construction, modification, debottlenecking, and acquisition — pumps from a dozen manufacturers in every metallurgy, heat exchangers with forty years of undocumented modifications, instrumentation ranging from pneumatic transmitters installed in 1974 to HART devices installed last year. No two facilities are the same. There is no equivalent of the FAA telling you what your minimum maintenance program is. There is no 737 MRB Report to upload.

RCM was built for a world of roughly 2,000 carefully documented, standardized equipment items under regulatory oversight. You are operating in a world of 100,000+ poorly documented, inconsistently configured equipment items maintained by a workforce with shrinking experience, following work orders that frequently contain nothing more useful than "CHK PUMP."

This is not a difference in degree. It is a difference in kind.

How We Got Here: The Moubray Metastasis

In 1986, a South African mechanical engineer named John Moubray founded a company called Aladon Ltd in Lutterworth, England. His 1991 book Reliability-centred Maintenance adapted the Nowlan and Heap methodology for universal industrial application. By the early 2000s, Moubray’s RCM II framework had been applied at over 600 sites across 32 countries.

RCM II is a serious intellectual contribution, but what Moubray created, inadvertently or otherwise, was a consulting industry. By the time SAE JA1011 was published in 1999 – the standard defining what "counts" as real RCM – the methodology had become a credentialing system. You could not challenge RCM without being accused of not doing it right. Bad results from RCM weren't evidence that RCM was wrong for your context. They were evidence that you needed a better RCM consultant.

The industry had found its priesthood, and the priesthood was very comfortable.

The Five Lies of Industrial RCM

Lie #1: Your failure modes are unknown.

Your centrifugal pump fails on bearing wear, seal degradation, impeller erosion, coupling misalignment, and cavitation damage. It fails on these modes at your refinery, at the refinery across the highway, and at every refinery in the world. Your pump in slurry service has the same considerations as the pump in slurry system in Europe. Your pump in caustic service has the same considerations as the pump in caustic service in Asia. These failure modes have been catalogued in bearing manufacturer literature, API 610, the Hydraulic Institute standards, and thousands of root cause analyses. Spending twelve months in a workshop re-deriving failure modes that have been documented thousands of times is not analysis. It is ritual.

Ritual exists precisely where people face problems they feel they cannot fully control. The RCM workshop is a safe activity — it almost always produces failure modes and mitigations that everyone can agree on. The analysis no longer confronts uncertainty. It merely performs the appearance of confronting it.

Lie #2: Criticality analysis tells you what to work on.

The first step in most RCM programs is a criticality analysis. This sounds rigorous. In practice, it is the industry's most elaborate form of procrastination.

Criticality scores are driven by opinion, not data. They change depending on who is in the room, what happened at last turnaround, and which failure is still politically radioactive. And even when performed "correctly," what does criticality analysis actually change about daily maintenance execution? Nothing. The scores live in a spreadsheet. The work orders remain identical.

Your plant already knows what matters (and regardless of that, they’ll still fight over what work needs to be done immediately – that’s a different problem that criticality never solves). You don't need a scoring matrix to tell you the crude unit charge pump is more important than the utility air compressor. You need to stop spending four months arguing about a number and start fixing the execution problems on the equipment that everyone already knows matters.

Lie #3: The analysis gets implemented

One systematic review found that approximately 25% of 1,300 identified changes from an RCM program were actually completed. Another longitudinal study of 204 process improvement projects found that roughly half of all successful projects backslide within one year, two thirds within two years. The analytical work gets done. The results do not make it sustainably into the real world.

Every plant manager reading this knows the pattern. A consultant delivers a beautiful binder of analyses. The plant uploads 30% of it if they're lucky. Technicians read none of it. Reliability does not improve. But because the process was thorough, everyone pretends it was valuable.

Lie #4: Aviation proves RCM works for industrial plants.

Aviation is the perpetual trump card in this argument. "It works for airlines." You’ve seen it on LinkedIn. You've heard it at every reliability conference for thirty years.

Here's what this argument ignores: aviation does not ask every airline to re-analyze the 737's failure modes from first principles. The analysis is done once, globally, by the manufacturer, under FAA oversight, and distributed to all operators through the MSG-3 process. No airline convenes an RCM workshop on a 737. They inherit the work.

Furthermore, aviation has already moved past classical RCM. The DoD formalized this evolution in 2007 with DoD Instruction 4151.22, establishing Condition Based Maintenance Plus — CBM+. The definition is pointed: "maintenance performed based on evidence of need." Not maintenance performed because a workshop said so. Rolls-Royce monitors 20-30+ parameters per flight on thousands of engines worldwide. The Boeing 787 makes 146,000 parameters available for continuous monitoring. Modern aviation maintenance is a data-driven, continuously-updated system. The last thing an airline does is convene a cross-functional team to re-examine failure modes that have been stable for decades.

Finally, the safety and reliability of commercial aviation has FAR more to do with a host of other issues such as extensive redundancy, zonal separation, online sensors, digital controls, human factors, safety and quality management systems, etc. than RCM.

Using 1960s aviation logic to justify a consulting engagement at your plant in 2025 is like citing the Wright Brothers to argue for propeller-driven fighter jets.

Lie #5: The problem is upstream.

This is the deepest lie, and the one I spend the most time on. The assumption embedded in every RCM engagement is that the problem is in the analysis — that if you could only identify the right failure modes and the right tasks with the right intervals, reliability would follow.

It won't.

Because the problem at your plant is not that you have the wrong maintenance strategy. The problem is that your maintenance strategy — whatever it is — is not being executed with discipline. Your technicians are working from work orders that contain no actionable detail. They're using the wrong lubricant because no one told them the right one. They're reinstalling bearings by feel because the correct installation tools are in a cage nobody can access. They're restarting pumps on bad lineups because the operator startup reference doesn't exist. The knowledge of what should happen never reaches the point where the work actually occurs.

You can optimize a strategy that never reaches the technician's hands all you want. It won't help.

What Nowlan and Heap Actually Solved

Here is what RCM genuinely solved, and why Nowlan and Heap deserve their reputation: they proved that most equipment failures are not age-related, and that reflexive time-based overhaul was causing more failures than it prevented. That insight is correct, important, and largely accepted today.

But the answer to "stop doing unnecessary overhauls" is not "spend 18 months doing FMEA workshops." The answer is: deploy the knowledge that already exists, compiled from decades of failure data, at the point of execution.

The failure modes for your pump are solved. The P-F intervals for your bearings are documented by SKF and Timken. The seal installation sequence is in the Flowserve IOM. The problem is not that this knowledge doesn't exist. The problem is that it never reaches the technician's hands in a usable form, with the tools and materials needed to act on it, at the moment the work is being done.

‍ ‍

That is an execution problem. And you cannot solve an execution problem with more analysis.

The Monday Morning Question

Here's what I'd ask you to consider before you write another check for an RCM engagement:

When was the last time a PM work order at your plant contained: the specific bearing model and clearance specifications, the correct lubricant with the correct quantity, the torque values for the coupling bolts, the vibration acceptance criteria before restart, and a pre-startup checklist that the operator actually used?

If the answer is "never" or "sometimes" or "I'm not sure" — you don't have an analysis problem. You have an execution problem.

No RCM binder will fix it. No criticality matrix will fix it. No Weibull curve will fix it.

The knowledge exists. The methodology for delivering it exists. What the industry has been missing for forty years is the will to stop performing the ritual and start solving the actual problem.

Stop re-analyzing solved problems. Start delivering the answers you already have to the people doing the work.

‍ ‍

Peter Munson