Predictive Maintenance Is Not AI - Why Statistical Baselines Beat ML for Most Plants

Every industrial software vendor is now selling AI-powered predictive maintenance. The pitch is compelling: machine learning models trained on your sensor data will predict failures before they happen, with accuracy that no human analyst could match. The problem is that most of these claims require conditions that most manufacturing plants cannot satisfy - and the simpler statistical approach delivers better practical results for the majority of industrial equipment.

This is a position worth stating clearly: for the majority of predictive maintenance use cases in industrial operations, statistical baseline anomaly detection outperforms machine learning models on deployment timeline, operational cost, and practical detection performance. The cases where ML is genuinely necessary are narrower than the market suggests.

What Machine Learning Predictive Maintenance Actually Requires

A supervised machine learning model for equipment failure prediction requires labeled training data: historical sensor readings tagged with the outcome (failure / no failure) for each time period. The model learns to associate sensor patterns with failure events. To build a useful model, you need enough labeled failure events to train on - typically hundreds of failure instances for each equipment type and failure mode you want to predict.

Here is the practical problem: most industrial plants are trying to reduce failures. They do not have hundreds of labeled bearing failures, hundreds of labeled pump seal failures, and hundreds of labeled gearbox faults in their historian. They may have a dozen. If a plant has done maintenance well, they have even fewer. You cannot build a reliable failure prediction model from twelve labeled events. The model will overfit, the confidence intervals will be too wide to be actionable, and the false positive rate will make the system useless.

Unsupervised anomaly detection using deep learning (autoencoders, LSTM networks) avoids the labeled data requirement but introduces different problems: long training periods (typically 3-6 months of clean data minimum), sensitivity to sensor calibration drift, and interpretability challenges. When the model flags an anomaly, it cannot explain why. An operations engineer who can't interpret an alert will stop acting on them.

What Statistical Baseline Detection Requires

Statistical baseline anomaly detection - the approach Relynk uses - requires 14-90 days of historical sensor data and a configurable deviation threshold. That is it. No labeled failure events. No training period. No data scientist.

The approach: for each sensor tag, compute a rolling mean and standard deviation over a configurable lookback window. An anomaly is flagged when a reading deviates more than a configurable number of standard deviations (z-score) from the rolling mean. Relynk defaults to 2.5 sigma, which triggers when a reading is in the top 1.2% of the historical distribution for that sensor.

This approach catches early-stage bearing wear, motor temperature rise, pump cavitation, and similar degradation patterns - because those failure modes produce gradual drift from baseline, which is exactly what z-score deviation detects. The detection is explainable: "Bearing vibration on Press 4 main drive is reading 3.1 standard deviations above its 60-day baseline, which corresponds to 14.2 g/s peak acceleration compared to a normal of 8.7 g/s." An operations engineer can act on that.

Statistical baseline anomaly detection chart

The Deployment Timeline Comparison

A typical Relynk deployment - from first sensor connected to production anomaly detection - takes under a week for Starter and Professional plan customers. The baseline builds automatically on historical data from the historian. No model training. No data labeling. No specialist engagement.

ML-based predictive maintenance deployments at comparable facilities typically take 6-18 months to reach production. The timeline breaks down as: 2-3 months for data discovery and quality assessment, 2-4 months for model development and validation, 2-4 months for integration and deployment, ongoing model maintenance. The total cost including the initial consultant engagement is typically $200,000-$800,000 for a mid-size facility, before licensing fees.

This is not a criticism of machine learning - it is a statement about the economics and timelines of different approaches. A statistical baseline system running today on 500 sensors is more valuable than an ML model that might be ready in 12 months, even if the ML model would eventually be marginally more accurate.

Where ML Predictive Maintenance Is Actually Justified

There are genuine use cases where ML predictive maintenance is worth the investment:

High-consequence, high-frequency failure environments: If a specific piece of equipment fails 20+ times per year at significant cost and you have years of historian data, a supervised model is feasible and the payback period is shorter.
Complex multivariate failure signatures: Some failure modes are not visible in any individual sensor but emerge from the interaction of multiple signals. ML excels at detecting these correlations when the training data exists.
Remaining useful life (RUL) estimation: Statistical methods can detect that something is wrong. Estimating how long until failure requires more sophisticated modeling. For maintenance planning with long lead times (major overhauls, capital equipment), RUL estimation justifies the ML investment.
Very large fleets: Organizations operating hundreds of identical assets (wind turbines, compressors, pumps) can justify the ML investment because the model trained on one asset class applies across the entire fleet. The economics are fundamentally different from a single-facility manufacturer.

A Practical Recommendation

Start with statistical baseline anomaly detection. Get it running on your sensor data. Measure the alerts. Validate which ones corresponded to real issues. Build the culture of responding to anomaly alerts before failures occur.

After 12-18 months, you will have labeled data: anomaly alerts that were true positives (actual failure precursors) and false positives (equipment behavior that was unusual but not a failure). That labeled data is the foundation for a supervised ML model if you later decide the investment is justified. You will also know which equipment types and failure modes are the highest-priority targets for more sophisticated modeling.

The sequence matters. Statistical baselines provide value in days. ML models require months of foundation work that statistical baselines help build. Skipping the foundation to go directly to ML typically produces a system that fails to deliver value in production and is abandoned within 18 months.

Statistical baselines, running in under a week

Relynk builds baselines from your existing historian data without labeled failure events or ML training periods. See what it would flag on your own sensor data.

Request a Demo

Back to Blog