The Alarm Flood Problem: Why More Alerts Make Operations Teams Less Safe

The EEMUA 191 guidelines for alarm management - the industry standard referenced by most process safety frameworks - specify that a well-managed control room should present operators with no more than one alarm per 10 minutes during normal operations, and no more than 10 alarms per minute during plant upsets. Most industrial facilities in practice generate far more than this. Studies in petrochemical and power generation facilities have documented SCADA systems with 30,000-80,000 active alarms, the majority of which are standing alarms that operators have learned to ignore.

The alarm flood problem is counterintuitive: the plants with the most alarms are not the best-monitored plants. They are the plants where operators have stopped reading alarms because the signal-to-noise ratio has become unmanageable. When an alarm workstation generates 300 alerts per hour, operators develop a conditioned response to dismiss all alarms without reading them. The alarms that matter - the ones that precede actual failures - get dismissed with everything else.

How Alarm Flood Develops

Alarm flood develops incrementally over years of SCADA configuration changes that are never reviewed holistically. The typical pattern:

A new sensor is installed. Someone adds an alarm threshold. An equipment configuration change causes a previously-normal operating range to trigger an alarm. The alarm fires repeatedly. Rather than investigating whether the threshold is correct, the operator acknowledges it on each shift because it is "always there." Within six months, it becomes a standing alarm - an alarm that is always active and always acknowledged without action.

Multiply this pattern across years of operations, across hundreds of sensors, and you get a SCADA alarm database where 60-80% of the alarms are standing alarms that generate no useful information. The remaining 20-40% are buried in the noise.

The EEMUA 191 standard and ISA-18.2 (the American national standard for alarm management) both provide alarm rationalization processes for addressing this accumulation. Alarm rationalization is the systematic review of each alarm in the system: does it require operator action? Is the threshold set correctly? Is it redundant with another alarm? For large SCADA systems, alarm rationalization projects can take 12-18 months and cost $200,000-$500,000 in consultant time.

The Cost of Alarm Desensitization

The Texas City refinery explosion in 2005, which killed 15 people and injured 180, was preceded by multiple alarm activations that operators did not respond to - not because they were negligent, but because those alarms had been standing alarms for so long that the response had become reflexive acknowledgment without action. The CSB investigation noted that operators were "desensitized" to the alarms. This is the safety consequence of alarm flood that process industry standards were written to address.

The economic consequence is more common than the safety consequence but less dramatic: maintenance teams respond slowly to alarms because response rate and alarm volume have become inversely correlated. The more alarms generated, the less each one is acted on. Equipment degradation alarms that should trigger maintenance scheduling get dismissed with the same reflex as the standing alarms that have been wrong for three years.

Why Statistical Anomaly Detection Is Different from SCADA Alarms

The alarm flood problem is specific to threshold-based alarming in SCADA systems. Statistical anomaly detection - the approach used in operational intelligence platforms - addresses the root cause differently.

SCADA alarms fire when a value crosses a fixed threshold. If the threshold is set too low, the alarm fires constantly and becomes standing. If the threshold is set too high, the alarm only fires when the failure is imminent - too late for preventive action.

Statistical baseline anomaly detection fires when a value deviates from its own historical norm - not from a fixed threshold. An anomaly alert from Relynk means: "this sensor is behaving differently from how it has behaved for the past 60 days." The baseline adjusts automatically as equipment ages and operating conditions change. Alarms that would have become standing alarms under a fixed threshold system - because the equipment shifted to a new normal operating range - are automatically recalibrated.

The de-duplication and suppression logic further reduces noise: Relynk groups correlated anomalies from related sensors into a single alert, and applies a 15-minute suppression window to prevent the same anomaly from generating repeated alerts during a transient event. The result is an alert volume that is manageable - typically 2-15 alerts per day across a production facility, compared to hundreds of SCADA alarms.

Complementing Your SCADA Alarms, Not Replacing Them

Operational intelligence anomaly detection is not a replacement for SCADA alarms. SCADA alarms provide safety-critical notifications: high-high pressure, low-low level, emergency stops. These alarms should remain in the SCADA system with their hard thresholds and direct operator notification. They are part of your safety instrumented system and should not be modified by an operational intelligence platform.

What operational intelligence adds is the early-warning layer that SCADA alarming cannot provide: statistically-derived anomalies that appear days or weeks before a failure reaches the hard threshold. As discussed in our article about what 11 days of vibration data could have prevented, the bearing failure scenario is exactly this: the SCADA alarm threshold was set for emergency shutoff, not early detection. The anomaly was in the data but not in the alarms.

The combined system: SCADA alarms for emergency response, operational intelligence alerts for early degradation detection. They route to different teams - SCADA alarms to the control room operator, Relynk alerts to the maintenance scheduler - with different response expectations. The separation of concerns reduces alarm flood while maintaining safety coverage.

Practical Steps for Reducing Alarm Flood Without a Full Rationalization Project

If a full ISA-18.2 alarm rationalization project is not in the current budget, these steps address the highest-impact sources of alarm flood:

Identify and shelve standing alarms: Export your SCADA alarm history for the last 90 days. Any alarm that was active more than 50% of the time during that period is a candidate for shelving - disabling without deleting - pending a proper rationalization review. Most SCADA systems support alarm shelving.
Disable chattering alarms: An alarm that fires more than 10 times per 24 hours (ISA-18.2 threshold for "chattering") is providing no useful information. Identify the top 20 chattering alarms by frequency and address their root causes - typically misconfigured thresholds or sensor calibration issues.
Route maintenance-relevant alarms separately: Alarms that do not require immediate operator action but indicate a maintenance requirement should be routed to a maintenance ticket queue rather than to the operator workstation. This requires classifying alarms by response type, which is a simplified version of the alarm rationalization process.
Add an anomaly detection layer for early warning: Deploying statistical baseline detection separates early-warning signals from emergency alarms, which reduces pressure on the SCADA alarm system to serve both purposes. SCADA alarms can be trimmed to genuine emergency thresholds once early-warning detection is operational.

Early-warning detection that complements your SCADA alarms

Relynk's anomaly detection generates targeted maintenance alerts based on statistical deviation - separate from SCADA emergency alarms. Less noise, more actionable signals.

See How It Works

Back to Blog