Tear Down this RCM Wall

Apr 22

Reliability-Centered Maintenance is not just the wrong path to world-class maintenance. It is the wall blocking the path.

For forty years, the RCM advice industry has erected an escalating series of analytical prerequisites between plants and the maintenance programs they need — criticality analysis before you can determine equipment strategies or prioritize corrective work, FMEA before you can build strategies, failure-mode-to-task mapping before you can justify a PM, an APM software platform before any of it is usable – and most likely master data remediation before you can even get to step 1 above. Each prerequisite is defended as essential over and over in conferences, classes, posts, and proposals. Each one requires a consulting engagement that no plant is staffed to do internally. Each one consumes budget and patience before anything reaches the floor.

At the end of the chain — often well before the end of the chain — most plants have spent their budget, exhausted their patience, and changed nothing about what happens the next time a technician stands in front of a pump.

RCM is aimed at the analytical rigor that comforts consultants, engineers, and executives, but it misses the entire point of the exercise: proper execution of maintenance tasks that protect and restore equipment function.

Here is the scorecard of the RCM era: Losses up. Spending up. Reliability down. The public longitudinal data — Solomon’s twenty-two-year California refinery cost series, NERC’s forty-year fossil availability record, Marsh’s fifty-year hydrocarbon major-loss tracker — all tell the same story, and none of them tell a story RCM can be proud of. Maintenance costs roughly doubled in real terms while margins stayed flat. Forced outage rates plateaued for a generation and have now reversed. Per-decade major loss frequency climbed from single digits in the 1970s to twenty-nine in the 2010s. This is what forty years of RCM has presided over.

Four decades of implementations. No industry-wide outcome study. No meta-analysis. No published five-year sustainment data. Not one peer-reviewed paper demonstrating that RCM-implemented sites outperform comparable non-RCM sites over time. The evidentiary record the methodology demands from every equipment strategy it reviews — the methodology has never produced for itself.

Every empirical challenge to RCM’s outcomes gets one of two responses. The data is contaminated by external factors. The implementation was insufficient. Point to the Solomon cost curve, and you will be told it’s environmental compliance. Point to IOGP’s nine straight years of “inadequate maintenance, inspection, and testing” atop the causal factor list for global upstream process safety events, and you will be told those operators were cutting spend without doing the analysis. Point to any RCM program that was quietly abandoned after two years, and you will be told the facilitator wasn’t certified, the leadership wasn’t committed, the culture wasn’t ready.

What evidence, if produced tomorrow, would the RCM community accept as showing the methodology has failed on its own terms?

The answer is none. We can stop pretending this is a technical debate. It isn’t. It is the defense of a belief system built and fortified by those who reap its rewards.

Every defense of RCM I have ever heard relocates the failure to the implementer. That is what doctrines do when their promises stop working. And doctrines do not fall to evidence. They fall when the practitioners holding them up get tired of defending them.

There is a different way. Deploy what is already known to work. Focus on closing the execution and operational defects that turn every plant into a reliability sieve. Get the technician in front of the pump doing the right work the first time. We call this the Maintenance Execution Framework, and we have laid it out in detail — but the framework is secondary to the shift it requires, which is this: stop letting the consultants set the prerequisite chain.

The wall has to come down. The wall is why the technicians at the pump never get what they need. That is the only test that matters, and it is the one test the RCM era has consistently failed.

The Prerequisite Chain

When a plant needs help – or is told they need help – they usually start with an analysis. The analysis will almost always tell you the same things – because most plants are stuck in the 1980s to the point where humans are still writing things on paper for other humans to type into a computer. The analysis results will be as follows – not because the consultants are lying to you, but because almost all plants that are asking for help have the same problems:

Your crews are doing productive maintenance work roughly 2-3 hours out of a 10-hour shift. The rest is meetings, travel time, permitting, waiting, getting jerked around by schedule changes, etc.
Your planning and scheduling functions are not meeting the simple precepts of Doc Palmer’s nearly three-decade-old book
You have inefficient and redundant management processes, KPIs, data entry, etc.
You need PM optimization.
If you’re working with a firm with a reliability practice, you need master data cleanup, criticality analysis, FMEAs, RCAs, equipment strategies, the whole 9
If you’re working with a firm that has people who actually know what a compressor looks like (I seriously know maintenance consultants who do not), you may be told that you need precision maintenance training.

You can Venmo me $100K for that and save yourself a little money and a lot of hassle and time.

All of the maintenance process improvement stuff is fairly quick – weeks to months – but also a lot of plants say that they can fix that up themselves. This is not how a consulting partner is made.

The real money is in the months and months of teams doing all the reliability analysis stuff. A standard “focused” approach is going to keep them on your dime for 18 months easily.

But, this is (apparently) more than a consulting ploy. This is industry best practice, enshrined in the pantheon of RCM luminaries, standardized in SAE JA1011, and lionized by every reliability engineer’s analytical little heart.

Here, industry paradigm meets economic incentive to ensure that a plant never feels they have paid the full tithe until they have checked off every block: master data cleanse, rationalized equipment taxonomy and hierarchy, asset criticality analysis, FMEA of every equipment item, equipment strategies based on risk priority number to cover all identified failure modes, RCA for bad actors, all loaded into an APM platform, ticked and tied. (Job plans often get no mention because they don’t have the CMMS and planning expertise to do that work).

Each step is presented as the logical prerequisite for the next. Each step has its own practice or sub-practice, its own software license, its own timeline. A criticality analysis runs three to six months. An FMEA program for a medium refinery runs one to three years. Master data remediation is a permanent engagement that never reaches completion because the data degrades faster than it can be cleaned. The APM implementation is a seven-figure capital project with a two-year timeline.

Add it up. A plant that follows the prescribed sequence — criticality, then FMEA, then APM, then strategy deployment — is looking at three to five years and several million dollars – and that is quite often before a single PM task changes in the CMMS. That timeline assumes nothing goes wrong, no budget gets cut, no project champion transfers to another site, and no organizational restructuring resets the program to zero. In practice, all of those things happen. The 2005 Reliabilityweb survey of 250+ companies found that over 85 percent of completed RCM analyses are never implemented.

The RCM prerequisite chain is a process that, by its own industry’s admission, fails to produce a usable output five times out of six. Here is why, in their words. “This list of reasons included: We didn't have enough resources to implement. We ran out of funding. Management did not support the effort. It was too difficult.” The answer isn’t to exhort the industry to more funding, more time, more patience, but the RCM crowd cannot see fit to admit it.

It is a self-sustaining ecosystem of analytical prerequisites, and the only thing it reliably produces is more analysis. The plant pays the toll at every gate. The equipment keeps failing on the same nine-month cycle it has been on since 2014 — because nobody’s budget survived long enough to reach the part where the PM task list actually changes.

Meanwhile, the consultancy publishes the case study. The APM vendor cites the logo. The conference presentation shows the criticality matrix. Everyone declares victory. The pump fails in April.

The Proof That the Wall Was Never Needed

In 1995, EPRI codified a template-first, fast-deployment PM methodology roughly concurrent with John Moubray’s RCM2. The Preventive Maintenance Basis Database began as a binder with 39 equipment templates. Today it covers 250+ component types, more than 19,000 catalogued failure modes, and task-effectiveness data for over 1,800 PM tasks. Plug in a component type, assign a three-letter code for criticality, duty cycle, and service severity, and the template returns recommended tasks, intervals, degradation mechanisms addressed, and technical basis. Deviations are justified against the template baseline — not rebuilt from a blank FMEA worksheet.

The results: US nuclear fleet capacity factor rose from 56% in 1980 to 92%+ by 2022. Refueling outage duration dropped from 80+ days to 25–38 days. Fleet O&M generating cost fell from $52.83/MWh to $31.76/MWh. Participation in EPRI’s PM analytics platform reached 99% of the North American nuclear fleet, with over $200 million in documented annual savings.

John Moubray’s response was to publish “The Case Against Streamlined RCM” in 2000, calling template approaches “indefensible” and warning practitioners they could face prison sentences. Twenty-five years later: zero criminal prosecutions for choosing a template-based methodology.

This screed is telling – it is the work of a zealot that believes his way is the only way. Zealotry is at its peak when ideology meets economic incentive, and there was a whole circus happy to build and defend a wall against progress that did not involve them.

Who Built the Wall

RCM did not come from the burning bush. As outlined here, it was developed very specifically by the airline industry for the specific use case of regulated, standardized aircraft fleet moving into the jet age and away from time-based overhauls. One of the key forces in applying it to the rest of industry – and commercializing the approach – was John Moubray.

Moubray and Aladon

Moubray’s Aladon LLC created the economic template: proprietary RCM2 training at $5,000–$15,000 per seat, a certified facilitator credential gating who could run workshops, consulting engagements at $50,000–$500,000 per system analysis, and a book that moved over 100,000 copies. Aladon built a franchise of “exclusive providers” certified to deliver RCM2/RCM3 training and facilitation. The facilitator certification program defines 45 distinct competencies, requires a 10-day advanced course, and mandates biennial technical audits. The network’s revenue depends on the premise that full-rigor RCM is the only defensible approach, and Moubray himself issued a bull in 2000 condemning attempts to shortcut the rite.

JA1011

Moubray’s approach was seemingly knighted by the standards lords when his approach largely became SAE JA1011, “Evaluation Criteria for Reliability-Centered Maintenance Processes.” The standard defined “RCM” as conformance to a seven-question process in a specific sequence. It does not validate outcomes. It validates process conformance.

The aviation lineage, from Nowland and Heap to JA1011, seems impressive. The truth is that Moubray found entry into a SAE committee that had a very specific mission that had nothing to do with brownfield plant maintenance. JA1011 was produced at the request of the U.S. Navy as a replacement to its RCM manual that was meant as a specification for manufacturers to develop and deliver maintenance programs as part of the procurement package for new aircraft, ships, and other weapons systems.

Every single environment where RCM was originally developed and deployed shared six enabling conditions that made seven-question functional FMEA economically rational:

The OEM performed the analysis and priced it into the procurement contract
The fleet was homogeneous (one aircraft type, one ship class)
The engineering data package was a contract deliverable from the manufacturer
The maintenance plan was written and regulator-approved before the asset entered service
The analytical unit was a cleanly-bounded designed-as-a-system unit with well-defined functional interfaces
Regulatory compulsion (FAA Maintenance Review Board approval, DoD Logistic Support Analysis requirements) made the spend non-discretionary

JA1011 was meant to replace MIL-STD-2173, which states on its first page: “This standard is to be used by contractors during development of new systems and equipment.” This RCM approach was not built for operators trying to reverse-engineer what their long-dead vendors should have delivered decades ago.

A process plant has none of those six conditions. A seventy-year-old refinery carries a hundred thousand equipment tags installed by dozens of vendors across six decades of expansions, revamps, and acquisitions. The original OEMs are dead, merged, or indifferent. The engineering data packages — if they ever existed — are a motley mix of scanned PDFs, paper libraries, and even a microfiche archive nobody has looked at since 1987. Fleet homogeneity collapsed the moment unit 2 was debottlenecked in 1994 with a different pump vendor than unit 1. There is no contractor amortizing the FMEA cost across a thousand hulls; there is a maintenance engineer with a shrinking budget and a Monday morning. Asking that engineer to run a Moubray-style RCM workshop on a 1978 bottoms pump is not deploying a proven methodology — it is forcing a burden into a use case it was never meant to address.

JA1011 was drafted primarily by NAVAIR engineers who needed contract-referenceable language to replace a cancelled military standard, with Moubray and other commercial RCM proponents joining late to shape wording. It was a procurement-compliance artifact for the Department of Defense, fit-for-purpose for the environment it was written in. Its subsequent four-decade commercial deployment to process industries — brownfield, heterogeneous, operator-funded, regulator-agnostic — is off-label use by a consulting industry that had every reason not to mention the label. The result is a standard that certifies method fidelity while ignoring whether the method fits the problem. That is not a technical standard. It is a marketing charter.

EPRI’s template-based, simplified RCM methodology, which had already driven the US nuclear fleet from 56% to 92%+ capacity factor, does not conform to JA1011 and is therefore “not RCM.” The effect was a definitional fence that branded EPRI’s demonstrably successful maintenance methodology as illegitimate, while certifying a methodology with a documented 60%+ failure rate as the gold standard.

Strike Up the Bandwagon

Meanwhile, from the U.S. Navy side of the RCM lineage, Anthony “Mac” Smith and Glenn Hinchcliffe codified a parallel nine-step “Classical RCM” process in RCM: Gateway to World Class Maintenance. Different toll booth, identical fee. Same workshop duration. Same FMEA depth. Same wall. Smith never trademarked his method, never built a franchise network, never threatened critics with prison — but the methodology he defended was formally indistinguishable from Moubray’s. Between them, they established the intellectual consensus that full-rigor FMEA was the only legitimate starting point for maintenance strategy development — and that consensus still holds.

Terry O’Hanlon, whose marketing and self-promotion abilities remind me of the R. Lee Ermey character in Fletch Lives (“Too many pictures of myself?” “Nah, worked for the Ayatollah.”), has built up the Reliabilityweb site, publishing house, certification, and conference series. Navy reliability legend Jack Nicholas, “reliability sherpa” Ramesh Gulati, and other RCM adherents are or were the key icons at the revivals. O’Hanlon’s annual December fete in Florida, like SMRP, Reliable Plant, and others, plows little new ground, bolting analytical and software bells and whistles onto the established consulting canon. The annual parade of the same faces and same topics should be a reminder that things really are not improving. But in the carefully managed confines of Terry’s Florida pilgrimages, everything is sunny.

Around these clusters grew a federation of sub-disciplinary franchises, each one staking its flag in a different upstream bottleneck. Christer Idhammar at IDCON argues reactive culture. John Woodhouse and the Institute of Asset Management argue the management system. Doc Palmer argues planning and scheduling. Jim Fitch at Noria argues lubrication. Mark Paradies argues root-cause investigation. Seiichi Nakajima and the TPM camp argue operator-based care. Heinz Bloch argues precision installation. Accenture and the Big Four wrap all of it in digital-transformation decks and sell it to Fortune 500 CFOs as seven-figure programs. Each is right, to some extent, that its bottleneck matters. But not one of them has pointed at the document that reaches the technician’s hand at the moment of the task and said: this is where reliability is made or lost. Every pole of the reliability influencer world sits upstream of execution. The RCM delusion wears different colors at different conferences. The pump still fails in April.

What Actually Fails — And Why the Wall Doesn’t Help

While the wall consumes years and millions, the actual causes of equipment failure remain untouched. Missed or poorly executed PMs. Installation defects introduced during intrusive maintenance. Absence of acceptance criteria. Lack of performance trending. Misalignment from initial installation. Design mismatches between metallurgy and actual service. None of these are problems that FMEA solves. Every one of them is a problem that FMEA assumes has already been solved.

At one plant – by no means an outlier – PM compliance was reported at 100%. The correlation between PM execution and failure prevention was r = 0.023 — essentially zero. $1.21 million in PM-induced failures were documented within 30 days of PM completion. One pump’s failure rate increased from two per year to 3.3 per year after a monthly PM was imposed, with every single post-PM failure occurring within thirty days of the maintenance event.

Every major Chemical Safety Board process safety investigation since 2000 traced back to failure to execute on known damage mechanisms. Chevron Richmond personnel made at least six internal recommendations to inspect or upgrade the sidecut line between 2002 and 2011. Zero were implemented. A 2002 failure at Chevron’s Salt Lake City facility had flagged the identical hazard a decade earlier. BP Texas City had no PM procedures for five instruments the CSB identified as contributory. These were delivery failures, not analytical failures. The knowledge existed. The execution system did not deliver it.

The gap between knowing and doing is more important than the gap between ignorance and knowing. The RCM wall is built entirely on the wrong side of that divide.

What Actually Moves the Needle

EPRI proved the alternative at fleet scale. Deploy a template-based equipment strategy from a failure-experience library. Make the tasks executable with acceptance criteria, conditional logic, and prescribed responses. Learn from execution data and modify for confirmed bad actors. Reserve formal analysis for the genuine tail — the novel, the extreme-service, the safety-critical system where no template exists.

For common equipment, a properly built strategy deploys in a day from a standard template and covers 77% of the failure modes an FMEA would identify. The 23% it misses are dominated by internal surface conditions requiring disassembly — bearing housing bore wear, shaft keyway fatigue, casing weld cracking — none of which are common drivers of pump failure in real service. The modes that drive real failures — bearing degradation, seal wear, misalignment, looseness, performance loss from wear ring opening — are all in the covered set.

The execution reference — structured work instructions that deliver expert knowledge at the point of action — is what turns an equipment strategy into a reliability outcome. The difference between “inspect mechanical seal for leakage and condition” and a three-state acceptance criterion with quantitative thresholds is the difference between a PM that produces results and a PM that produces compliance metrics. No FMEA produces this. No APM platform generates it. It has to be built, and building it is faster, cheaper, and more impactful than every step in the RCM prerequisite chain.

Your maintenance and reliability programs are trying to fill a sieve. Fix the sieve before you optimize. Verify that technicians have acceptance criteria. Verify that findings generate follow-up work orders. Verify that the seven enabling conditions of the Maintenance Execution Framework — reference, proper tools, on-spec materials, equipment access, continuity, safe restart, and loop closure — are present when the technician shows up. If any condition is absent, the system has disabled correct execution before the technician touches the equipment. No amount of FMEA effort will fix that.

Tear Down the Wall

The wall was not built by one man. It was built by a methodology, a standards committee, a franchise network, a conference circuit, a publishing ecosystem, and a generation of APM vendors and management consultancies who found that selling analysis to organizations that need execution was doable and profitable. Addressing execution is beyond most of this crowd. The wall has been maintained for forty years. The maintenance industry’s performance metrics have been flat for forty years. That is not a coincidence.

Nuclear is the clearest counterexample to the idea that performance comes mainly from doing more failure analysis. The industry improved by standardizing proven maintenance approaches, embedding them in industry guidance and peer oversight, and backing them with regulatory consequences for non-performance. U.S. nuclear capacity factor ultimately rose into the 90% range. The lesson is not that analysis is useless. It is that known answers only matter when they are deployed at scale and sustained through governance.

The RCM crowd will object that you cannot build effective strategies without understanding failure modes. They ignore that the failure modes for 80-90% of industrial equipment are already understood, already published, and already available in every OEM manual, a raft of standards and texts, and every senior mechanic’s mental model. The question was never whether the failure modes are known. The question is whether the organization delivers that knowledge to the technician at the point of action. The wall guarantees that it does not, because the organization exhausts its resources building the wall instead of building the delivery system.

Moubray’s methodology, Smith and Hinchcliffe’s nine-step process, and every derivative RCM program that followed addressed the wrong half of the problem. The analytical rigor they demanded was not the binding constraint. The binding constraint was the translation from analytical output to executable work instruction, the scheduling of that instruction into a craft work package, the craft execution of that package, and the closure of the learning loop. RCM produced thick strategy documents that sat in binders and SharePoint sites while 1976 carbon-steel elbows corroded to credit-card thickness in HF service. The analysis was rigorous. The delivery system did not exist.

It is time to stop paying the toll. Deploy the known answer. Build the execution system. Fix the sieve. Let the data tell you where to analyze deeper. The pump does not need another FMEA. It needs someone to wire the task list to the scheduling engine, write an execution reference with acceptance criteria, and make sure the technician has a torque wrench and a grease gun when he shows up.

That is not a five-year program. That is a Tuesday.

By Peter J. Munson

Peter Munson