What 11 Days of Vibration Data Could Have Prevented

In October 2021, a press machine at an auto parts manufacturing plant outside Chicago failed unexpectedly. The cause was a bearing that had been degrading for at least 11 days. The data existed. No one saw it. The repair and production recovery took 38 hours. A shipment to a Tier 1 OEM was delayed. The total cost, including overtime, expedited parts, and the late-delivery penalty clause in the contract, came to just under $140,000.

The plant's reliability engineer at the time was Simen Bakken. He had been in industrial operations for over a decade and had seen this pattern before. Equipment with instrumented sensors, control systems generating continuous data, and maintenance teams who only learned about problems after the shift report. The failure was not a data gap. It was an attention gap.

The Sensor Data Was There

The press machine in question had an accelerometer on the main drive bearing. It had been installed as part of a condition monitoring upgrade two years earlier. The vibration readings fed into one of three SCADA systems the plant ran: a Siemens WinCC installation that covered the stamping lines, an older Allen-Bradley system on the legacy machines, and a Wonderware SCADA for utilities.

The WinCC historian had the full vibration time-series. Looking at the data after the failure, the pattern was obvious: a steady upward drift in high-frequency vibration amplitude starting 11 days before the failure, with two acceleration events on days 7 and 9 that should have triggered an alert. But the plant's alarm thresholds in WinCC were configured for emergency shutoffs - hard limits set at values that indicated imminent mechanical failure, not statistical deviation from normal operating baseline. The drift never crossed those hard limits until the moment the bearing seized.

This is the central problem with threshold-based alarming in industrial environments: operators tune hard limits aggressively high to avoid nuisance alarms, which means early-stage degradation is invisible. The bearing failure was not a sensor problem. It was a detection methodology problem.

Why Excel Was the Analysis Tool

The plant had three SCADA systems that did not communicate with each other. Aggregating sensor data across production lines required manual exports from each historian, loaded into a shared drive folder, and processed in Excel by the reliability team - typically once a week. By the time an analyst pulled the vibration data, formatted it, and ran trend analysis, the problem was already in the past tense.

This is not an unusual setup for manufacturing facilities built before 2015. SCADA systems were deployed for control and monitoring of individual production cells. They were not architected for cross-system analytics. Integrating them into a unified data layer was a project that typically required an OT consultant engagement of six months or more - a project that was always deprioritized behind production demands.

Vibration data trend showing bearing degradation

What Was Actually Needed

After the bearing failure, David spent about a month analyzing what would have changed the outcome. Not what would have been ideal - but what was actually achievable without a multi-year modernization program and without touching the control architecture.

The requirements were specific. First, read data from the existing SCADA historians without modifying the control systems or adding agents to the OT network. Second, build statistical baselines for each tag rather than relying on manual threshold configuration. Third, route alerts directly to the maintenance team - not into a dashboard that required someone to be watching it. And fourth, do all of this within the OT security constraints that the plant's IT team would actually approve.

The fourth requirement turned out to be the hardest. The existing available tools at the time either required agent installation on the SCADA server (a non-starter for the OT security team) or were general-purpose IoT platforms that had no native understanding of OPC-UA or SCADA historian formats. The read-only API approach, which Relynk is built around, was the architecture that solved the security review problem.

Statistical Baseline vs. Hard Threshold

The detection approach Relynk uses is built around the specific failure mode that the bearing incident illustrated. Hard thresholds catch emergency conditions. Statistical baseline deviation catches the early-stage drift that precedes failures. For bearing wear, motor temperature rise, pump cavitation, or gearbox noise, the failure signature is a trend, not a step change.

Relynk ingests 30-90 days of historical data per tag and computes a rolling statistical baseline. Alerts trigger when a sensor's reading deviates beyond a configurable z-score threshold from that baseline - by default, 2.5 standard deviations. In the bearing scenario, the drift that started on day 1 would have generated a warning-level alert on approximately day 4, and a critical-level alert on day 7 when the acceleration event occurred. At a default 47-minute average from anomaly detection to work order creation, the maintenance team would have had a structured work order - with sensor readings, trend chart, and asset location - before the end of that shift.

What This Actually Changes

The 38-hour downtime figure is the headline number, but it understates the operational impact. Unplanned downtime in manufacturing doesn't just lose production hours - it disrupts shift scheduling, consumes maintenance resources that were allocated elsewhere, triggers emergency parts orders at premium pricing, and damages customer relationships when commitments can't be met. The bearing failure also caused secondary damage to the press machine's spindle housing that added another $22,000 to the repair cost.

With 11 days of advance notice, the repair would have been a planned maintenance event. The bearing would have been replaced during a scheduled weekend window. Total labor: 4 hours. Parts cost at standard pricing: under $800. No line downtime. No shipment delay.

That gap between $140,000 and $800 is the business case for operational intelligence. It is also why Relynk exists.

The Lesson That Generalized

After Relynk launched, we talked to operations teams at dozens of manufacturing facilities. The bearing failure scenario - different equipment, different failure modes, same fundamental pattern of visible data and invisible alerts - came up repeatedly. The details changed. A pump cavitation event in a food processing plant. A motor winding temperature anomaly at a paper mill. A conveyor tension drift at a distribution center. In every case, the historian data showed the problem. In every case, no automated process connected that data to the right person in time.

This is the operational intelligence problem at its core. Industrial facilities generate more sensor data than any human team can monitor at the frequency required to catch early-stage failures. The solution is not more dashboards. It is automated statistical analysis that watches every tag, every minute, and routes structured alerts to the person who can act on them - before the bearing seizes, not after.

See how Relynk handles your sensor data

Bring a data export from your current SCADA or historian. We'll run the baseline analysis and show you what Relynk would have flagged in your own facility.

Request a Demo

Back to Blog