First, Do No Harm: An Oath Your Maintenance Program is Missing

Mar 9

In a process industry plant running a $100 million annual maintenance budget, somewhere between one-quarter and one-half of equipment failures are not random. They are not caused by age, process conditions, or bad luck. They are caused by maintenance-induced defects. The bearing that failed three weeks after the PM. The seal that leaked six days after the overhaul – and was working fine before anyone touched it. More than half of equipment that fails prematurely in general industrial settings does so following a maintenance intervention.

If 25 percent of your failures trace back to maintenance activities — a conservative estimate for a refinery or chemical plant — and corrective work costs three to five times what planned work costs to execute, you are spending a significant fraction of your maintenance budget fixing damage your maintenance program created. A 50 percent reduction in induced failures — not elimination, half — returns roughly 12 to 15 cents of every maintenance dollar you currently spend. On a $100 million budget, that's $12 to $15 million per year. Not from spending less on planned maintenance. Not from cutting headcount. From executing work in a way that stops breaking equipment.

That is the money on the table if you just stop the harm. Not the upside from catching failures earlier. Not the revenue from improved uptime.

Now here's the hard part: the approaches your plant is most likely considering right now — better RCM, more training, a connected worker platform, a culture program — will not get you there. None of them address the mechanism that is actually causing the damage.

Why Maintenance Work Breaks Equipment

When a technician opens a pump bearing housing, they introduce risk. Every intrusion is an opportunity for contamination, incorrect reassembly, wrong torque, missed alignment. The equipment that goes back together is not the equipment that came apart. Whether it is better or worse than it was depends entirely on whether the technician had the right information, the right tools, the right materials, and enough time and attention to use them correctly.

Most of the time, they don't have all of those things. Not because they are careless or undertrained, but because the system never provided them.

Consider what the typical PM work order delivers to the technician at the equipment. Not the planning system. Not the CMMS configuration. The actual document in the technician's hands when they open the bearing housing. At most plants, it says something like: PM pump. Inspect and lubricate. Sometimes there's a parts list. Sometimes there are safety steps. Rarely is there an acceptance criterion — a number, a threshold, a pass/fail standard — for a single check the technician is about to perform.

The OEM published the bearing installation specifications. The bearing manufacturer documented the temperature limits. The seal vendor specified the flush rates. API standards define the alignment tolerances. Every piece of information needed to perform this task at a precision level that extends equipment life instead of shortening it already exists. None of it is on the work order.

This is the Maintenance Execution Gap. Not a knowledge deficit. Not a training failure. A delivery failure. The knowledge exists and does not reach the technician at the moment of action.

That gap — between what is known and what is delivered — is where equipment failures are manufactured, one PM, one repair at a time.

The Maintenance Execution Framework

We have previously dismissed the three answers that industry sells: more analysis, more training, better culture. These responses share a structural flaw: none of them changes what the technician sees and knows at the equipment, at the moment they open the housing.

The Maintenance Execution Framework starts from a different diagnosis. The problem is not insufficient analysis. The problem is not insufficient training. The problem is not a culture deficit. The problem is that the system does not deliver what is needed to the technician at the moment of action. Seven specific enablers, when present together, close that delivery gap. When any one is absent, the system has disabled correct execution before the technician arrives.

Most of these enablers will not surprise you. Several of them are things your plant is almost certainly trying to do already — inconsistently, without a structural mechanism to ensure they happen every time on every critical task. The point is not that these are new ideas. The point is that we have been funding upstream interventions at full cost while leaving the execution conditions underfunded, unmanaged, and unverified.

The Execution Reference is the one component that most plants do not have in any recognizable form, which is why it is worth treating separately. A standard PM work order assumes that training delivered the knowledge and the procedure serves as a reminder. An Execution Reference assumes the opposite: training has decayed, memory is unreliable under time pressure, and the document itself is the knowledge delivery system.

That is not a philosophical distinction. It is a design specification.

An Execution Reference for a bearing inspection contains the equipment's failure history — not in a separate reliability database, but in the document the technician holds. It contains specific acceptance criteria: bearing temperature below 180°F, vibration below 0.15 inches per second, oil color clear amber. It contains conditional logic: if temperature is above 180°F, generate a corrective notification and request vibration analysis within seven days. It differentiates the critical steps — the ones where an error is immediate and irreversible — from routine steps, so a technician under pressure knows exactly where to slow down. It sequences checks from least invasive to most invasive, so the technician gathers maximum diagnostic information before opening anything.

The acceptance criterion is the key unit. Structured checklists with explicit numerical thresholds have improved inspection accuracy from 33 to 63 percent in controlled studies. The mechanism is not magic: quantitative thresholds eliminate the judgment call at the equipment. The technician no longer decides whether the temperature seems high. They measure it and compare it to a number. The decision was made by the engineer who wrote the reference, at a desk with the OEM data and the failure history in front of them. The technician at the equipment executes the decision. That is the right division of labor — and it is one almost no PM program is structured to support.

Proper Tools means the precision instruments the job requires, verified and available at the time of execution. Bearing installation requires an induction heater or hydraulic press. The pipe-and-hammer method that substitutes in too many shops transmits force through the rolling elements, creating microscopic raceway dents that become failure initiation sites. The bearing is new. Its operating life is already shorter than the manufacturer specified. This is not a training issue — most experienced technicians learned the correct method. Many of them use the correct methods in their own garages on weekend project cars. It is a system issue: the tools weren’t purchased, or were and are locked up, or just “we don’t use those.” Not very long ago, people didn’t use cut resistant gloves, face shields, or even safety glasses in many settings. Proper tools, like proper PPE, cannot be optional. The Execution Reference identifies which tools the task requires. The planning process stages them. Without both, the technician improvises.

On-Specification Materials and Fluids means the correct lubricant at the required cleanliness level, components stored under manufacturers' specifications. A bearing that sat on a shelf in an open Gulf Coast warehouse for fourteen months is not a new bearing. Contaminated lubricant applied at installation defeats the oil film the bearing needs within its first operating hours. These failures are invisible at the time of installation and obvious six weeks later when the bearing fails "prematurely" — which is to say, exactly when it was going to fail given the contamination introduced at installation. Most plants have specifications for materials. The question is whether the PM planning process verifies those specifications are met before the work order is released to the field.

Equipment Access means proper isolation and operating condition verified before work begins. Not as an administrative checkbox but as a prerequisite for correct execution. A technician working on partially isolated equipment is managing risk instead of executing the task. Cognitive resources consumed by uncertainty about isolation are cognitive resources not applied to the PM itself. The Execution Reference specifies the isolation requirements. The safe-work process verifies them. Both must happen before the bearing housing opens.

Continuity means protection from interruption during critical work. A technician pulled off a bearing installation to respond to a production emergency returns to the task with a degraded mental model and measurable probability of error. They may not remember which step they stopped at. They may skip verification steps they mentally marked as "already done." The critical step — the one they cannot undo if they get it wrong — may be exactly where they pick up from a different cognitive state than when they stopped. Continuity is not about protecting technicians from inconvenience. It is about recognizing that cognitive state at the point of action determines execution quality, and interruption degrades cognitive state in ways that matter at the precision level pump maintenance requires.

Safe Restart means a verified process for returning equipment to service. The most carefully executed bearing installation is one improper startup away from failure. Correct fill level, correct rotation direction, correct warm-up procedure, correct initial condition verification — each is a check that can be designed into the restart sequence or omitted. The plants that have these checks do them. The plants that don't, improvise at startup. Improvisation at startup is how carefully rebuilt pumps fail within days.

Learning by Doing means structured data captured at every execution, fed back to improve the reference. Not upstream failure mode prediction — downstream learning from what actually happened. When the technician records that the drive-end bearing temperature was 186°F on a pump whose reference flags 180°F as the action threshold, that measurement enters the system. The next PM cycle, the reference includes the trending data. The reliability engineer sees a pattern across executions without having to mine raw CMMS data for it. The reference improves with use instead of degrading with drift. This is the closing of the loop that makes the whole system self-correcting.

The Piecemeal Problem

Here is what plant maintenance budgets actually fund: a significant investment in RCM and other upstream analysis, a significant investment in training programs, an emerging investment in connected worker platforms and digital procedure management, and a diffuse investment in culture initiatives under various names.

When it comes to proper tools, the real question should be, how can we NOT fund this. You are already spending your money on the consequences of not having these tools, and on things that aren’t fixing the problem:

An ultrasonic grease gun is roughly the cost of replacing the drive-end and non-drive end bearings of a critical centrifugal pump one time, or a catered lunch for your big consulting project kickoff.
A mid-size industrial induction heater is the cost of nine early bearing replacements, or three ruggedized tablets that aren’t helping much if the work isn’t done properly.
A torque wrench is the cost of two bearings on a 150-hp motor, or an hour or two of time for the managing director you are paying to show up and talk about your new RCM project.

The path to spending on the wrong things is the path of least resistance. And almost none of that investment is structured to ensure that all seven execution conditions are present together, on every critical task, every time.

The execution conditions are not independently sufficient. An Execution Reference in the hands of a technician who does not have the induction heater, who is working with contaminated lubricant pulled from open shelf stock, who is going to get pulled off the job in forty minutes to respond to a compressor trip — that Execution Reference does not close the execution gap. It is a better document delivered into the same broken system.

Conversely, staged tools, clean materials, proper isolation, and protected time, applied to a work order that says "PM pump. Inspect and lubricate" — that is a precision execution environment in service of an instruction that cannot take advantage of it. The technician has everything they need to do the job precisely. The job has not told them what precise looks like.

The framework is only valuable when its conditions are present as a system. The industry spends heavily on the upstream conditions — analysis, training, culture — and inconsistently on the execution conditions, because the execution conditions are less visible, less glamorous, and less amenable to being sold as a discrete consulting engagement or software platform. You cannot sell a plant manager a “materials cleanliness program.” You can sell them an RCM engagement. You cannot sell them “interruption protection protocols.” You can sell them a culture workshop. You cannot sell them an “equipment availability and safe restart program.”

The result is a substantial and growing investment in approaches that do not address the mechanism causing the failures, and an underfunded, unmanaged set of conditions that determine whether the work being performed every day in the plant is producing or destroying equipment life.

What Asymmetric Sufficiency Means in Practice

The Maintenance Execution Framework is the binding constraint. Remove it and upstream investments flounder — not because RCM is wrong or training is useless or culture doesn't matter, but because none of them can produce reliable outcomes without the delivery system that connects their outputs to field execution.

The reverse is not true. You can close much of the execution gap without completing another RCM study, running another training program, or commissioning another culture assessment. The knowledge already exists. The bearing tolerances are published. The alignment specifications are documented. The failure history exists in your CMMS. The execution conditions — tools, materials, access, continuity, restart — are requirements your engineers already know. None of this requires new analysis. It requires a system that delivers what is known to the person who needs it, at the moment they need it, in a form they can act on.

That is not the same as saying RCM, training, and culture are irrelevant. It is saying that they are not addressing the real constraint. Do not spend the upstream dollar if the downstream system does not exist to use it. Without the Maintenance Execution Framework, the upstream investment funds another binder, another certificate, another consulting report that no work order has ever referenced.

Where to Start

Pick the ten worst-performing assets in your plant by failure cost over the past two years. Pull the PM work orders for those ten assets. Read each one as if you were a two-year technician seeing this equipment for the first time.

Score each work order against four questions: Does it have specific acceptance criteria with numerical thresholds? Does it have conditional logic — if you find this, do that? Does it include the equipment's failure history as context for the inspection? Does it identify which steps are critical?

If the answer to all four is no — and at most plants it will be — you have just identified the execution gap in your program. Not as a theoretical concept. As a document in your hand.

Rebuild the worst one. Add a failure history box drawn from the equipment's notification history. Add three acceptance criteria with specific thresholds and conditional actions. Mark the critical step with a hold point. Attach it to the next scheduled PM. Walk the first execution with the assigned technician. Read the findings at closeout.

You will not see the full return in one cycle. You will see something more important: a technician who recorded actual measurements instead of "PM complete." A finding that generated a follow-up corrective work order before the equipment failed. An equipment record that is one cycle more accurate than it was.

That is how the loop starts. One asset. One execution reference. One technician who saw their work matter.

The money on the table — that 12 to 15 percent of your maintenance budget being spent on self-inflicted damage — does not require a platform, a consulting engagement, or a culture program to start recovering. It requires an execution framework that starts with the principle: “First, do no harm.”

By Peter J. Munson

Peter Munson

First, Do No Harm: An Oath Your Maintenance Program is Missing

Why Maintenance Work Breaks Equipment

The Maintenance Execution Framework

The Piecemeal Problem

What Asymmetric Sufficiency Means in Practice

The CMMS Upgrade Opportunity

RCM x AI = Your Fast Pass to Nowhere

Contact