“Which button do I press when they all flash at once?”

Alarms and alarm systems are often an integral aspect of a Human-Machine Interface (HMI), along with displays and controls. An effective HMI should support optimal human performance. However, experience is that ineffective alarm management has often been cited as a contributory factor in major accidents around the world. Alarm management refers to the processes and practices for designing, operating, monitoring and maintaining alarm systems.

Alarms help personnel to maintain a system or process within a safe operating ‘envelope’ and help them recognise early and respond to faults, malfunctions or abnormal conditions. The Engineering Equipment & Materials Users’ Association (EEMUA) Publication 191 (2013) is recognised by many regulators worldwide as good practice for alarm design and management. This document was my key reference as a UK HSE inspector, when assessing alarm management in the oil, gas and chemical industries.

An alarm can be defined as an audible or visual indication of an equipment malfunction, process deviation, or abnormal condition requiring a response by a person. A key point is that every alarm should have a clearly defined operator response. If a response cannot be defined, then the signal generated by the Human-Machine Interface should not be an alarm.

As a Regulator, my over-riding principles were (1) that major hazard safety must not depend on human response to an alarm and (2) there is a limit to the amount of risk reduction which can be claimed by using alarms. This is because even if the hardware aspects of an alarm system are 100% reliable, human performance may not always be reliable when carrying out the following steps:

  • detecting an alarm;
  • diagnosing the cause of an alarm;
  • working out a response, and
  • carrying out this response in a correct and timely manner.

Common problems

A poorly designed alarm system is often a usability issue:

  • too many alarms created by the designers (helped by increasing technology);
  • too many alarms generated in an upset or abnormal situation – ‘flood’;
  • inappropriate set points (often leading to ‘nuisance’ alarms);
  • operator actions to alarms not clearly defined;
  • ineffective annunciation (so alarm missed by the operator); and
  • lack of prioritisation of alarms.

The benefits of alarm management

Better alarm handling can have a significant effect on the safety of your business (the cost of not improving alarm handling can literally be your business in some cases). An improved alarm system can bring tighter quality control, improved fault diagnosis and more effective plant management by operators. Rather than having to manage a process upset or abnormal situation, effectively managing alarms can prevent upsets from happening in the first place.

A 3-step approach

Alarm management is essentially a design issue because trying to put matters right later is much more difficult. This is therefore a Human Factors Engineering issue. The HSE Information Sheet ‘Better Alarm Handling’ (CHIS6, 2000) outlines a 3-step approach to better alarm management:

  1. Find out if you have a problem (e.g. by comparing alarm metrics with good practice, talking to operators and reviewing incidents);
  2. Decide what to do and take action (e.g. by forming a team, forming priorities, implement some quick wins, consider training for abnormal conditions and upsets, assess workload);
  3. Check and manage what you have done (e.g. develop an alarm strategy, audit and review your progress).

Alarms in focus: Milford Haven Refinery

This incident prompted a more critical approach to alarm management by UK HSE and led to the production of guidance. On 24th July 1994 there was a major explosion at the oil refinery at Milford Haven Refinery jointly owned by Texaco and Gulf. The damage cost about £48 million to repair. There was also two months lost production from the main facility and four months lost production from the area that was damaged. The owners were prosecuted and fined a total of £200,000 plus costs.

Milford Haven: Site damage caused by the explosion
Milford Haven: Site damage caused by the explosion (HSE, 1997)

One of the causes of the explosion was the number of alarms and the poor design of the alarm systems that obscured safety information from the panel operators. In the last 11 minutes before the explosion the two operators had to recognise, acknowledge and act on 275 alarms. Alarms were presented to operators faster than they could be responded to. Most of the 2040 alarms were displayed as “high” priority, despite many of them being informative only. Safety-critical alarms were not distinguished.

As a result, they missed key information that could have prevented the explosion. A lightning strike caused a significant process upset. For several hours after the lightning strike the operators were heavily loaded with alarms at a rate estimated to be in excess of 1 every 2-3 seconds. During this period several operators failed to identify the build-up of liquid in a knock-out vessel. This eventually overfilled and resulted in the explosion taking place.

 

A recommendation from the UK HSE report into the explosion was:

“The use and configuration of alarms should be such that:

  • safety critical alarms are distinguishable from other operational alarms;
  • alarms are limited to the number an operator can effectively monitor; and
  • ultimate plant safety should not rely on operator response to a control
    system alarm” (HSE, 1997). 

A HSE Contract Research Report (166/1998) concluded that the Texaco problems were widespread in industry and that they can be prevented.

Some key questions

  • Is the alarm needed at all?
  • What should be the priority of the alarm?
  • What action does the person need to take (if any)?
  • What are the consequences of the alarm being ignored or missed?
  • How much time does the person have to respond?

Relationship to other topics

A review of alarms and alarm management cannot be undertaken in isolation from several other key human factors topics. It is also necessary to consider the working environment (lighting, noise, temperature), staffing levels, workload and fatigue. These topics will all influence how effectively people will identify and respond to alarms.

Alarms in focus: Esso Longford (1998)

In his book ‘Lessons from Longford’, Andrew Hopkins discusses alarm overload (or flood) at the Esso gas plant. He states that:

“The alarm problem was compounded enormously by the sheer number of alarms which operators were expected to deal with – at least three or four hundred a day! One on occasion an incident occurred which led Esso incident investigators to count the number of alarms. The figure for a 12-hour shift was 8,500 or 12 alarms every 60 seconds!”

How many alarms is enough?

The EEMUA 191 guide (2007) states that in steady-state operations, there should be an average of no more than one alarm every ten minutes.  Following a major plant upset, there should be less than ten alarms displayed in the first ten minutes following a major plant upset. These benchmarks would be a challenge for many major hazard facilities.

More information on alarm management

Better Alarm Handling. UK HSE (2000). A free information sheet (CHIS6) produced by the HSE’s onshore Human Factors Team. This short guidance provides a 3-step approach, and contains key principles from the first edition of the EEMUA guide (1999) which was published with the support of the HSE.

HSE Inspector’s Toolkit – Alarm handling. Short extract from a toolkit produced by the UK HSE’s Human Factors Team for use by non-specialist Inspectors. This 6-page guide introduces the topic, provides some key principles and includes a brief Question Set used by HSE Inspectors, which organisations in any industry will find useful to assess their management of alarms.

Alarm systems: A guide to design, management and procurement (EEMUA Publication 191, Third Edition, 2013). Engineering Equipment & Materials Users Association Publication No 191. ISBN 0 85931 076 0. This is a key reference for this topic – I used this guidance to support my inspections on alarms. Since it was first published in 1999, EEMUA 191 has become the globally accepted and leading guide to good practice for all aspects of alarm systems. The guide, developed by users of alarm systems with input from the UK HSE, gives comprehensive guidance on designing, managing and procuring an effective alarm system. The new Third Edition has been comprehensively updated and includes guidance on implementing the alarm management philosophy in practice; applications in geographically distributed processes; and performance metrics and KPIs. Highly recommended reading.

The Management of Alarm Systems: A review of current practice in the procurement, design and management of alarm systems in the chemical and power industries. HSE (1998). Contact Research Report 166/1998. A report based upon visits to 15 chemical and power plants, a survey of 96 control room operators and review of the literature on alarm systems. This information was analysed to highlight current industry best practice. Recommendations are given on the management of existing alarm systems and the procurement of new ones.