MTBF (mean time between failures) and MTTR (mean time to repair) are the two reliability metrics that appear in every maintenance management conversation. They define the failure frequency and recovery speed of your equipment - the fundamental inputs to maintenance planning, capital replacement decisions, and availability calculations. The problem is that most plants calculate them incorrectly, not because the math is wrong, but because the underlying data has systematic gaps.
This article covers how those gaps arise, how to identify them in your CMMS data, and how sensor data from operational intelligence platforms can supplement CMMS records to produce more accurate reliability metrics.
What MTBF and MTTR Actually Measure
MTBF measures the average operating time between failure events for a piece of equipment. Formally: total operating time divided by number of failures in a given period. An asset with 4,000 hours of operation time and 4 recorded failures has an MTBF of 1,000 hours.
MTTR measures the average time to restore the equipment to operating condition after a failure. Formally: total maintenance time for corrective work divided by number of failure events. If those 4 failures required a cumulative 24 hours of repair time, MTTR is 6 hours.
Equipment availability, the metric that production planners care about, is derived from both: Availability = MTBF / (MTBF + MTTR). With MTBF 1,000 hours and MTTR 6 hours: 1,000 / 1,006 = 99.4% availability. A ten-hour increase in MTTR drops that to 99.0%. This sensitivity is why MTTR matters as much as failure frequency.
The Five Sources of CMMS Data Gaps
CMMS records typically undercount failures and misstate repair times for five systematic reasons:
1. Failures reported without work orders
In many plants, minor failures and quick repairs are handled by the operator on shift without a CMMS work order being opened. A technician who replaces a blown fuse in ten minutes may not create a work order. Those failures are invisible in the reliability calculation. This is the most common cause of MTBF overstatement.
2. Failure onset vs. failure detection gap
CMMS records the time a work order is created, not the time the failure began. For equipment that degrades gradually - bearing wear, motor insulation breakdown, pump seal leakage - the actual failure onset may precede detection by hours or days. MTBF calculated from CMMS work order creation dates understates the true failure frequency because partial failures are recorded as full failures at detection, not onset.
3. MTTR starting point ambiguity
MTTR should measure time from failure detection to restored operation. In practice, CMMS work orders are often created before or after the actual failure event - when the technician starts the job, when the supervisor schedules it, or after the repair is complete. Work orders closed by the next shift or the next day introduce multi-hour MTTR errors even for quick repairs.
4. Classification inconsistency
The same failure mode may be recorded under different work order types by different technicians or shifts - corrective maintenance, breakdown maintenance, emergency work order, or production-reported fault. Reliability calculations that pull from only one work order classification will miss failures recorded under the others.
5. Partial failures and restores
Equipment that is returned to operation in a degraded state and then fails again days later may be recorded as one failure event or two, depending on CMMS discipline. Partial restores that let a line keep running but at reduced capacity are often not recorded as failures at all, distorting both MTBF and actual availability.
How Sensor Data Supplements CMMS Records
Operational intelligence platforms that monitor equipment sensor data continuously can address three of the five gap sources:
Detecting unreported failures
Sensor anomalies that are never recorded in the CMMS appear as deviations in the sensor time-series data. If a motor temperature spiked and returned to normal during a quick repair that never generated a work order, the temperature anomaly is still in the historian. Comparing anomaly alert timestamps to CMMS work order records reveals unreported failure events. Plants that do this exercise typically find their MTBF is 15-30% lower than CMMS data alone suggests.
Establishing failure onset with precision
The timestamp of the first statistical anomaly alert for a sensor provides a more accurate failure onset marker than the CMMS work order creation date. For bearing failures, this can be hours or days before the CMMS event. Using anomaly alert timestamps as the failure start point produces MTBF values that better reflect actual equipment reliability, and MTTR values that correctly capture the diagnostic delay (time between failure onset and technician response).
Cross-validating MTTR with operational status
Equipment that has been repaired but is still running abnormally - elevated temperature, unusual vibration - has not been truly restored. Sensor data after a CMMS work order close provides an objective check: was the equipment actually restored to normal operating baseline, or did the work order close prematurely? Post-repair sensor drift is a documented phenomenon in bearing replacement work where a misalignment or over-tightening creates a new degradation pattern within days.
A Practical Approach to Improving Your Reliability Data
Three steps that improve MTBF and MTTR accuracy without a full data quality project:
- Require a work order for every equipment intervention. This is a culture and process change, not a technology change. Even a 5-minute repair should generate a CMMS record. Most CMMS platforms support mobile apps that make quick work order creation possible on the plant floor. The discipline of capturing every intervention is the single highest-impact improvement to MTBF data quality.
- Use sensor anomaly alerts as a cross-check. Monthly: compare anomaly alerts to CMMS work orders for the same equipment and time period. Any anomaly alert without a corresponding work order is a gap to investigate. Any work order without a preceding anomaly is an opportunity to validate whether sensor coverage is adequate.
- Track MTTR with sensor-confirmed restore. Close a work order as restored only when the relevant sensors have returned to normal operating baseline, not when the technician signs off. This requires an operational intelligence platform that can flag post-repair anomalies - which is the integration Relynk provides with SAP PM and IBM Maximo.
The reliability metric that matters most is not the number on the dashboard - it is whether that number accurately reflects equipment behavior. Plants that find their MTBF is lower than previously calculated are not in a worse position. They are in a more honest position, and a more actionable one. Accurate metrics drive better maintenance investments than flattering metrics do.
Connect your sensor data to your CMMS
Relynk's anomaly detection integrates directly with SAP PM, IBM Maximo, and eMaint to create structured work orders with sensor context. Reliability metrics improve when every anomaly becomes a traceable record.
Request a Demo