Human reliability in maintenance

What is maintenance?

Maintenance is the performance of tasks that are required to ensure that key structures, systems and components are capable of continuing to perform their intended function. These tasks may include overhaul, inspection, replacement, modification or repair in order to retain an item in (or restore it to) the state in which it can perform its intended functions. Maintenance is a combination of technical, administrative and managerial actions during the lifecycle of an item.

“Imagine a phenomenon that has created huge financial losses each year and, worse, resulted in death and injury throughout the world. . . there is such a phenomenon, yet it receives little attention and rarely makes the headlines. We are referring to maintenance error”.

Reason and Hobbs, Managing Maintenance Error: A practical guide (2003)

There are two main types of maintenance:

Corrective maintenance: this is a reactive type of maintenance, undertaken when an item fails in some way – for example, when your car breaks down and you call a breakdown service or a mechanic for repairs. This type of maintenance is unscheduled, and restores the system from a failed state to a working state. It may involve the repair or replacement of broken components.

Maintenance error -
With technological advances and increased automation, there still remains the need for maintenance, even if humans are gradually removed from the direct control of equipment and systems.

Preventive maintenance: this is a planned activity, according to set intervals or according to other set criteria, for example, you may take your car for an annual service or when it has travelled a certain distance. The objective of this type of maintenance is to reduce the probability of a failure (or other degradation) of a required function. Preventative maintenance activity is scheduled. This is a proactive measure to control items from deteriorating, and therefore prevent a system failure. Activities may include replacement of serviceable items, lubrication of moving parts, cleaning or inspection.

Maintenance is a key aspect of managing an asset, whether that asset is an aircraft, an offshore production platform, railway infrastructure or a printing press. For example, the Australian Transport Safety Bureau (ATSB) has estimated that every hour of flight is associated with 12 hours of maintenance.

Why human factors in maintenance?

Maintenance is a key area where there is significant interaction between people and equipment, and so it is necessary to understand the role of human factors in these activities. During every maintenance activity there is a potential for human failures to be introduced. Typically, human reliability issues during maintenance involve either introducing a fault that was not present before the maintenance task was initiated, or failing to detect an unsafe condition during the maintenance (i.e., something is missed). Maintenance failures can remain dormant for many years before having their effect (sometimes called ‘latent failures’).

Structural changes across industry have led to several concerns relating to maintenance:

  • Economic pressures have led to reductions in staffing, changes in organisational structures and shift schedules.
  • An increase in maintenance work being undertaken by external contractors. Maintenance is one of the most subcontracted functions in many industries. Clients should maintain Intelligent Customer Capability when contracting out key functions, particularly in safety critical industries.
  • Many industries are facing the retirement of experienced staff, including maintenance specialists.
  • Ageing facilities and equipment are increasing the volume of maintenance activities.

The following quote from the aviation industry outlines how human performance in maintenance can be quite different from other activities that also rely on optimal human performance:

“Despite the extensive documentation that accompanies maintenance, the day-to-day work of maintainers may be less visible to management than the work of pilots or controllers. Pilots work under the constant scrutiny of quick access recorders, cockpit voice recorders and flight data recorders, not to mention passengers and the public. The performance of air traffic controllers is carefully monitored, and their errors tend to become immediately apparent to either fellow controllers or pilots. In contrast, if a maintenance engineer has a difficulty with a maintenance procedure at 3 AM in a remote hangar, the problem may remain unknown to the organisation unless the engineer chooses to disclose the issue. Once a maintenance error has been made, years may elapse before it becomes apparent, by which time it may be difficult to establish how it occurred”.

Australian Transport Safety Bureau (ATSB), 2008

There are two aspects of maintenance that are relevant to human factors:

First, maintenance activities expose people to a variety of hazards. Those exposed include the maintenance technicians, other workers and members of the public. There are physical hazards such as noise, vibrations, excessive heat and cold, radiation, gases, fibres, high physical workload and strenuous movements. Maintenance tasks may involve carrying heavy materials, bending, kneeling, reaching, pushing and pulling, and working in restricted places. There are also psychosocial hazards, such as time pressure placed on maintenance crews to complete the tasks as soon as possible. In many cases, maintenance is completed during “downtime”, when equipment (such as an aircraft or a production line) is unavailable, which can lead to considerable pressure on maintenance crews. Maintenance often involves dealing with complex problems in non-routine situations.

Second, regular maintenance, that is correctly planned and executed, helps to ensure that equipment and the work environment remain in a safe state for the workforce and other people. Human failures during maintenance have led to deaths and serious injuries to members of the public, for example when using facilities such as lifts, public transport systems, vehicles or fairground rides. The consequences of human failures during maintenance can occur weeks or months after the maintenance activity has been completed. In some cases, maintenance failures may reduce the effectiveness of standby or safety equipment. Preventative maintenance is particularly interesting from a risk management perspective, as it often requires the disassembly and inspection of normally functioning systems, albeit with this risk of human failure.

In my career, across different industries, I have seen a focus on the safety of people carrying out maintenance, but less consideration of the second aspect – how human performance issues during maintenance can lead to incidents. The quality of maintenance is heavily reliant on the performance of maintenance staff, but we know that humans are not perfect! And so it is this area of “maintenance error” on which we will focus in this article.

Why is maintenance prone to human failure?

Maintenance is often a complex activity and is characterised by variability in tasks, location, working conditions, the nature of equipment, environmental conditions, resources and time constraints. Inevitably, this may then lead to variability in human performance.

It may involve a wide range of tasks, including the removal and replacement of a large number of components. These tasks require careful vigilance by those involved. Maintenance is often undertaken in difficult working conditions, involving poor postures, insufficient lighting and under time constraints. Maintenance often requires coordination and communication between several departments within a company, or between different companies (including other subcontractors). Where maintenance teams are subcontracted, aspects such as the working environment, work organisation and time restrictions may be determined by the client.

Ideally maintenance would be undertaken in a permanent workplace, such as a dedicated workshop containing the necessary tools and equipment. However, maintenance often occurs in the field – where the breakdown occurs. This increases the likelihood of human performance issues, as the appropriate tools may not be available and workers may improvise. They may also be working under time pressure and in difficult environmental conditions.

Maintenance - removal and replacement -
There is usually only one way in which something can be taken apart, but there are many possible ways in which it can be reassembled. Therefore, incorrect replacement of parts is not surprising.

Although equipment is increasingly designed around the user, this tends to be limited to the user under normal operations. Equipment is rarely designed with maintenance activities in mind. For example, in the aviation sector, a great deal of effort has been spent on cockpit design in order to improve human performance, but less effort has been made on designing the aircraft for maintainability.

The remainder of this article aims to provide an understanding of why maintenance failures occur, and how the potential for human failures during maintenance can be managed.

The costs of maintenance error

Poor maintenance quality can affect both safety and commercial performance. For example, the tragic accident on the Piper Alpha oil platform in 1988 involved a series of errors during maintenance, resulting in the loss of 167 lives and over $4 billion in costs.

Human failure during maintenance continues to be cited as a causal factor in major accidents worldwide, such as Macondo and Buncefield. Data from several European countries indicates that around 10-15% of all fatal accidents at work are related to maintenance operations.

A tragic example of the consequence of human failures during maintenance was the Clapham Junction railway crash in 1988, in which 35 people lost their lives. An immediate cause of the incident was a number of wiring errors made during maintenance work. Although these wiring errors led to the crash, the report outlines a series of underlying causes – some of these will be discussed in this article. The wiring errors were undetected when the work was inspected, tested and commissioned back into service.

Department of Transport Investigation into the Clapham Junction railway accident, HMSO 1989 ISBN 0 10 108202 9

Equipment reliability and production can be impacted if maintenance does not meet the desired standard. Premature equipment failures can have significant commercial consequences, even if no accident occurs. Production may be lost. Maintenance may need to be repeated. Additional work may be required to correct subsequent damage to facilities and equipment.

Good maintenance is therefore good business. And this article aims to help you manage human failures during maintenance.

In October 2021, a global outage which saw Facebook, Instagram and WhatsApp go offline for more than six hours was reported by Facebook (which owns Instagram and Whatsapp) to be due to human error during maintenance. Billions of users across the world were affected, with Facebook reportedly losing between $60 million and $100 million in advertising revenue losses. During the outage Facebook had to use Twitter – a rival platform – to communicate with its users, causing reputational damage.

“During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally. Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command”.

Santosh Janardhan, VP Infrastructure, Meta (Facebook)

What can go wrong?

Maintenance may introduce the ingredients for an incident that would not have otherwise occurred, or failures during maintenance may result in the subsequent failure of key safety functions at a time when they are most needed.

Understanding the types of human failures in maintenance is important because they tend to lead to different consequences – and they need to be managed differently. When we examine maintenance failures that have occurred in the past, the physical actions of maintenance teams can be placed into one of a few categories, for example:

  • People might fail to do something that they should have done (often called “errors of omission”)
  • People can do something they should not have done (“errors of commission”)
  • Something is performed at the wrong time or in the wrong order
  • Something is performed, but without the correct precision or setting.
Maintenance error -

History shows us that reassembly and installation activities are more vulnerable to human failure than tasks that involve taking something apart. Unfortunately, these failures are also less detectable than those that occur during disassembly tasks. Significantly, human failures that cause injury to personnel may be different to the failures that affect the quality of maintenance work.

It’s relatively easy to identify the physical action that led to maintenance being completed incorrectly, such as a technician fitting the wrong part or not using the correct torque setting. However, this tells us very little about why that physical action occurred or what the technician was thinking at the time.

Besides the physical action, we can also identify the psychological aspects of a maintenance failure. For example, we can consider the person’s intentions at the time of their action. There are several ways of categorising these psychological issues, such as the following:

  1. Perception error: People may comment that “I didn’t see it” or “I didn’t notice the difference”
  2. Memory lapse: “I forgot”
  3. Slip: “I didn’t mean to do that”
  4. Wrong assumption: “I assumed that the situation was X”
  5. Technical misunderstandings: “I tried to do it the right way but I didn’t understand what I had to do”
  6. Procedure violation: “Nobody follows that procedure…” or “I know this is not the right way, but it will be okay this once…”.

The focus on the physical actions or the thinking patterns of a maintenance technician places inappropriate emphasis on the role of the individual person. The root causes of maintenance failures or anomalies in human performance are wider issues. When we know the underlying causes of these failures we can develop countermeasures that are tailored to the root causes of the problem. In the next section, we will look further into why human failures occur during maintenance tasks.

Maintenance failure causes $30 million property damage

On 28 July 2005, 4 months after a devastating incident that killed 15 workers and injured 180, the BP Texas City refinery experienced a major fire that caused a reported $30 million in property damage. The incident investigation determined that an 8-inch diameter carbon steel elbow inadvertently installed in a high-pressure, high-temperature hydrogen line ruptured after operating for only 3 months. The escaping hydrogen gas from the ruptured elbow quickly ignited and a huge fireball erupted in the unit.

Maintenance error - BP Texas City
Error during maintenance leads to refinery fire at BP Texas City, 28 July 2005

This incident occurred after a maintenance contractor accidentally switched a carbon steel elbow with an alloy steel elbow during a scheduled overhaul in February 2005. Carbon steel and low alloy steel are visually indistinguishable and the three elbows removed during maintenance were of the same dimensions. During their replacement, the maintenance contractor inadvertently switched carbon steel elbow 1 with the alloy steel elbow 3. The alloy steel elbow was resistant to high temperature hydrogen attack (HTHA) but the carbon steel elbow was not.

Although material identification testing may have picked up this human failure prior to installation; this incident could have been prevented at the design stage by avoiding piping configurations that allow critical alloy components to be interchanged with non-compatible piping components.

Why do maintenance failures occur?

The good news is that although maintenance failures continue to occur, they keep on happening in remarkably similar ways. Far from being entirely unpredictable happenings, maintenance mishaps fall mostly into well-defined clusters, shaped largely by situation and task factors. There are certain situations and conditions that can lead people into the same maintenance failures, regardless of who is doing the job.

“These ‘error traps’ clearly imply that we are dealing primarily with error-provoking tasks and error-inducing situations
rather than with error-prone people”

Reason and Hobbs, Managing Maintenance Error: A practical guide (2003)

If we are to reduce maintenance failures, understanding the type of failure is helpful – but clearly only part of the story. And knowing a person’s intentions is another part of the jigsaw. However, in order to prevent a failure from happening again (or preferably to prevent it from happening at all), we really need to understand the situations and conditions that could set someone up to fail. I have described Performance Influencing Factors (PIFs) elsewhere on this website. For example, see this page Human factors and Homer Simpson, where I define human factors as “making it easy for Homer to do the right thing“. Our actions and decisions are ‘nudged’ and shaped, sometimes without us knowing, by these Performance Influencing Factors. These factors are related to the characteristics of People, the nature of the Work that they are doing and the wider Organisation in which they are working.

The essence of human factors is identifying which of these factors are relevant to an activity – and then making them optimal. Of course this is simple to say, but can be challenging in practice. Let’s look at some of the more common Performance Influencing Factors that are relevant to maintenance. I have written elsewhere about several of these factors, see the links in the left column:

IssueExplanation and relevance to maintenance
Allocation of resourcesFor maintenance the resources required will include people, time, tools and equipment, and procedures. Maintenance is vulnerable to being under-resourced, as it is not always seen to contribute directly to production targets and therefore may not receive the priority it deserves. Compared with many other activities, the effects of any shortfalls may not be readily detected. Maintenance staff may alter their work practices (e.g., by taking shortcuts) to overcome resource difficulties in the genuine belief that such behaviour will benefit the organisation and that it is expected of them.
CommunicationsFormal communications should be an essential part of maintenance management. Non-routine maintenance activities and those that span shift changeover give rise to particular communication demands. The existence of formal communication systems, such as permit-to work, does not inevitably ensure that the right information is communicated to the right people at the right time, or ensure that information is communicated unambiguously and that the recipient properly understands it. Weaknesses in communication systems can cause a lack of co-ordination between different departments (e.g., maintenance and operations) or within a single department. Communication channels should also encourage maintenance staff to raise potential concerns with management.
CompetenceAs maintenance tasks are often varied, the experience and skills of maintenance staff are important factors in ensuring high standards of performance. Problems may arise when staff are asked to carry out tasks which they are not technically competent to undertake. This is a particular problem in maintenance because tasks are often carried out by single individuals with little support from others. In addition to the technical competencies, staff also need to have good interpersonal skills encouraging teamwork and communication.
Environmental factorsMaintenance is more susceptible to environmental factors because the conditions are rarely ideal for maintenance activities. Maintenance tasks are often carried out in the field rather than in dedicated maintenance areas. Consequently, maintenance staff can be exposed to a range of environmental factors which can affect their performance and increase the likelihood of human error and stress. These include high or low ambient temperatures, high humidity and noise levels, and poor ventilation and lighting. Difficulties created by poor working environments can also increase the likelihood that errors are not detected, e.g., by hindering post-maintenance testing. If the environmental conditions are particularly onerous (e.g., very high ambient temperatures), there may be a need to limit the exposure time of individual workers.
FatigueWhere shift work is a part of maintenance, there are several concerns. One is the potential errors caused by the shift-handover process. Failures in communication at crew or shift handovers are a common contributory factor to accidents associated with maintenance tasks carried out by multiple teams. This is particularly the case when safety systems have been over-ridden or there have been deviations from normal working practice, or the new crew/shift have been absent from work for a lengthy period. Shift work often requires staff to work outside of normal waking hours. This can influence their sleep patterns and their performance. There are also the wider social impacts of shift rotas. Shift schedules which fail to take account of human limitations can adversely affect maintenance performance. Both shift periods and patterns need to be considered.
Facility and equipment designIt is important that plant and equipment are designed so that the required maintenance can be carried out reliably and safely. Many of the situations that can cause poor maintenance performance can be eliminated or alleviated by improved design. Common problems include components that are poorly labelled, not easily accessible and which can be fitted incorrectly (e.g., the wrong way round).

An additional problem is that the positioning of plant and equipment may not provide sufficient working space for maintenance activities. Poor design features are often easy to identify, but are often not corrected by designers, contributing to dissatisfaction among maintenance staff.
Procedures and permitsThe role of maintenance procedures is to provide sufficient information to allow the user to carry out tasks correctly, while permits and isolation certificates ensure that the appropriate safeguards are in place to allow the task to be carried out safely. Maintenance tasks are generally very varied, and so access to a comprehensive set of maintenance procedures is needed. The level of detail provided must suit the needs of the user and can vary from full procedures to checklists or job aids.

The reasons often quoted for staff not following maintenance procedures and permits are that they are perceived to be inaccurate, out-of-date, impractical, too time consuming, or that they do not describe the ‘best’ way of carrying out the work. To ensure that the documentation is followed correctly, it also needs to be formatted and presented clearly. Mistakes are often made when procedures are not understandable or easy to use. This issue is particularly significant for maintenance because such mistakes are not always easily detected and corrected.
Management of changeChanges particularly affecting maintenance include the introduction of new technology, increased use of multi-skilling, reduced staffing levels, and increased maintenance intervals (which reduces familiarity with tasks). Any changes have to be properly considered and managed to ensure the desired benefits are achieved. Issues such as the competence requirements of retained or new staff and the effects of the proposed changes on their stress levels and morale need to be considered.
Roles and responsibilitiesThe maintenance programme should have clearly specified roles, responsibilities and accountabilities. Successful implementation of the maintenance policy requires co-operation between different departments (such as production and maintenance), and between various trades (e.g., fitters and electricians). Problems can arise during maintenance if the responsibilities of maintenance staff are unclear or not well understood, especially where maintenance staff have to interface with other groups.
SupervisionSupervisors have an important role in correcting poor working practices while encouraging good ones. This can be difficult for maintenance supervisors, since maintenance is often carried out with little direct supervision. Supervisors may not be provided with the support and training necessary. Maintenance standards are often governed by the lowest standards tolerated by the supervisor. In cost-conscious environments, supervisors may focus on meeting production targets and allow safety and maintenance standards to decline. Where this occurs over a long period, staff and supervisors come to regard these lower standards as acceptable.
TeamworkMaintenance tasks need to be co-ordinated with other activities so it is important that maintenance staff work effectively with e.g., operations staff. Maintenance teams are often temporary, comprising people from different maintenance disciplines who may not normally work together. Problems can arise in this case if the teams do not form effectively and quickly. On the other hand, permanent teams typically have detailed knowledge of each other’s strengths, weaknesses and capabilities; which can allow unauthorised working practices to develop. Teams can suffer when an ‘outsider’ joins, perhaps to cover for leave. That outsider may not understand some of the unwritten rules (or ‘team culture’) that the team has developed.
Work designThe workload of individuals needs to be controlled to avoid excessive stress or tiredness which can lead to poor maintenance performance. Although such issues are relevant for all tasks, it is often more difficult to plan the workload of maintenance staff who are often called upon to respond quickly to unexpected equipment breakdowns. Additional problems arise because maintenance tasks are often carried out during unsocial hours (e.g., nights and weekends). Equally, there is a need to avoid under-utilising staff as this induces boredom and a loss of skills, again leading to poor maintenance performance. Poor work design can have an adverse effect on job performance and occupational health, from factors such as excessive mental or physical stress (e.g., unrealistic timescales), excessive boredom (e.g., poor job variety) and lack of motivation (e.g., poor job satisfaction).
Performance Influencing Factors affecting maintenance (adapted from Improving Maintenance, HSE, 2000)

Maintenance error and ‘blame’

“human error should become a warning flag for regulators and managers, a possible symptom that individual workers have been unable to achieve the system goals because of difficult working environments, flaws in policies and procedures, inadequate allocation of resources, or other deficiencies in the architecture of the system”

Dr Assad Kotaite, President of the International Civil Aviation Organization (ICAO) Council, 2001

Even highly experienced and motivated maintenance staff can experience a human failure. And under certain conditions, they may choose not to follow procedures.

The use of the term “human failure” or “maintenance error” should not be taken to mean that we have a problem with people. In many cases, these failures are symptoms of underlying problems. In the case of Clapham Junction, the official investigation notes that even though there were wiring errors by an individual, we must consider the wider issues:

“(The maintenance technician) must and does carry a heavy burden of responsibility for the accident and its consequences. As the evidence at the Investigation developed, however, it became abundantly clear that such a responsibility was not his alone. The evidence which the Investigation heard demonstrated that such responsibility must be shared by the many others who had permitted a situation to exist. . . in which not only could such errors be made in the first place, but they could be permitted to remain undetected when the work was inspected, tested and commissioned back into public service”

Department of Transport Investigation into the Clapham Junction railway accident HMSO 1989 ISBN 0 10 108202 9 (page 63).
(I have removed the name of the maintenance technician from this quotation).

Although maintenance personnel must take responsibility for their actions, managing the threat of maintenance failures requires an understanding of organisational factors.

Wider organisational issues that may have a significant impact on maintenance performance include:

  • Inadequate systems to monitor and learn from maintenance failures.
  • Allowing commercial pressures to influence the quality of maintenance work.
  • Not providing suitable equipment to complete maintenance tasks.
  • Organisational structures that inhibit communication between maintenance and other personnel.

Maintenance workers must be encouraged to report near misses and minor events, since they provide valuable learning opportunities and identify emerging trends in performance. This is only likely to occur if the organisational culture supports such openness.

Improving maintenance (counter-measures)

All is not lost. Human failures in maintenance are largely predictable and whilst they cannot be eliminated entirely, they can be managed. Because of this, there will be an expectation from Regulators that this topic is addressed in a structured and proactive way. This article aims to help you achieve that goal.

Given that human failure is inevitable, focussing on individuals is not sufficient to manage maintenance issues. We cannot simply give people some human factors training, expecting dramatic improvements in performance. The most effective interventions will optimise those factors that can influence human performance, such as fatigue, workload, pressure, experience, design issues, roles and responsibilities etc. Optimising Performance Influencing Factors such as these will greatly improve human reliability in maintenance activities. Maintenance failures reflect the interplay of people, work and organisational factors.

In order to assess whether your organisation is susceptible to human performance issues during maintenance, an assessment and improvement approach could include:

  1. Identify potential activities or areas of concern by speaking to a range of stakeholders (e.g., maintenance teams and their management), gathering incident data, or conducting risk assessments.
  2. Assess the Performance Influencing Factors: review incident reports to understand underlying causes, and/or gather workforce perceptions using a survey or questionnaire, and/or conduct workplace audits.
  3. Improvement: prioritise areas for improvement and implement changes that will optimise the Performance Influencing Factors, or other issues identified.

The table below outlines some suggested improvements for the more common Performance Influencing Factors that are relevant to maintenance activities:

IssueGuidance and suggested improvements
Allocation of resources– consider all resources (people, tools, equipment, spare parts, time)
– resources are determined in advance and routinely reviewed
– checks are made of the use of these resources
– review opportunities and consequences of short cuts
– strategies in place for when maintenance demands exceed resources
Communications– use a range of communication methods, understanding the pros and cons of each
– recognise the needs of the information ‘sender’ and the information ‘receiver’
– clear processes for handing information from one group or shift to another
– allow sufficient time for shift handovers in the shift system
– ensure greater control for the handover of high-risk maintenance activities
– provide a means to ensure up-to-date maintenance status of plant and equipment
– ensure that communications are received by staff who have multiple workplaces
– provide a means to learn from staff concerns and near-misses etc.
Competence– determine maintenance competency requirements and assess staff against these
– understand that satisfactory completion of maintenance work does not mean that the work was completed safely or to required procedures
– allocate work according to competence and experience
– ensure opportunities for maintenance staff to develop competencies
– training should include dealing with unusual and unforeseen demands
– increase awareness of human performance to encourage reporting of issues
Environmental factors– avoid environmental extremes where possible
– schedule maintenance at times when noise and high temperatures are less likely
– understand the impacts of high and low temperatures on dexterity and heat stress
– ensure appropriate lighting levels, reducing shadows and glare
– acknowledge that noise levels can be a distraction as well as impact on hearing
– allow for the limitations that PPE places on maintenance workers
Fatigue– review the need for shift-based maintenance
– minimise routine work undertaken on a shift
– ensure that working arrangements allow staff get enough sleep of sufficient quality
– avoid critical maintenance activities between 0200 and 0500
– avoid critical maintenance activities after more than 12 hours at work
Facility and equipment design– ensure that components and lubrication points are easy to reach
– prevent designs that allow components to be incorrectly fitted or assembled
– ensure components are correctly labelled or identified
– ensure that machinery noise levels do not interfere with communications
– create designs that highlight poor maintenance practices, such as a lack of lubrication
– prevent awkward or uncomfortable working postures
– ensure sufficient space to conduct maintenance
– maintenance should require the use of standard tools
Procedures and permits– ensure that maintenance procedures remain relevant, accurate and practical
– the detail in procedures should reflect the safety or quality requirements
– ensure that the text and diagrams are clear
– ensure easy access to relevant procedures and other information
– the format of procedures should be informed by the working environment
Management of change– have a process for assessing the impact of proposed changes on maintenance
– track the impact of changes on maintenance
– consider how downsizing can impact on loss of experience and data
– assess the use of less experienced personnel for maintenance activities, such as contractors
– assess whether structural changes can lead to a loss of communication channels
– reduce uncertainty felt by staff and increase the control that staff have
Roles and responsibilities– clear understanding of who has overall responsibility for the maintenance program
– maintenance program should include all activities, including servicing, overhaul, repair, inspection, testing, surveillance etc.
– senior management overview of maintenance activities that are contracted out (including quality of work and provision of resources)
– clarity on who can undertake which maintenance activities
– clearly defined interfaces between maintenance dept. and other parts of the organisation
– clarity on who performs routine maintenance (such as testing and minor repairs)
– ensure that the role of contractors is clearly understood
Supervision– encourage supervisors to demonstrate commitment to high standards
– ensure that supervisors are rigorously monitoring their staff and take consistent actions
– ensure that supervisors are aware of desired working practices
– provide supervisors with training in line management and non-technical skills
– embed a process to monitor and review supervisor performance
Teamwork– ensure knowledge and expertise is shared within the team
– provide training on team behaviours and communications skills
– ensure there is time allocated to maintain team development and cohesiveness
– provide team members with an understanding of the capabilities of other team members
Work design– manage workload to suit available resources
– design tasks to reflect the physical and mental capabilities of the workforce
– provide staff with a variety of work to maintain competencies and interest
– be open to suggestions from staff on improving work design
– ensure that work planning allows for interactions with other departments
– ensure that work planning reflects the balance of planned and unplanned maintenance
– design work to avoid the need for incomplete work to be handed over to others
– break down tasks into segments that can be completed within a shift
– avoid a single individual maintaining a series of similar equipment
Improving maintenance Performance Influencing Factors (adapted from Improving Maintenance, HSE, 2000)

Error recovery

We acknowledge that human performance is at times imperfect. Whenever humans are involved in an activity, it is possible that human failures will occur at some point. As maintenance engineers are human, this means that human failures in maintenance are inevitable. So, if human failure is inevitable, a key role of human factors in maintenance is to ensure that when they do occur, human failures are identified before they lead to any safety or production issues.

Another way of categorising failures is to determine whether they are reversible or irreversible. A well-designed system or procedure should mean that failures by maintenance teams are reversible. For example, if an engineer installs a part incorrectly, it should be easily spotted and corrected before the equipment is released back into service.

If maintenance failures do happen, two key defences will ensure that the system is “error tolerant”:

  1. Ensure that all potential human failures can be detected, and
  2. Ensure that the consequences of any undetected errors are contained.

The importance of good design

Designing equipment -
Some simple measures during design can have a significant impact on reducing maintenance failures throughout the life of the facility or equipment.

Designing for maintainability seeks to reduce human failures and improve performance by considering future maintenance and inspection requirements during the design of new equipment. This should include considering the ease with which future maintenance tasks can be performed. For example:

  • Design the equipment so that the right way of performing a task is the only way (i.e., setting staff up to succeed).
  • Make equipment easily accessible (e.g., not requiring ladder access, ensuring parts to be maintained are front-facing).
  • Ensure that components subject to wear or greater probability of replacement can be easily inspected, accessed, removed and replaced.
  • Use standard layout of systems and components to reduce the likelihood of incorrect re-wiring or reassembly.
  • Require standard tools for maintenance (i.e., the need for specialised tools may lead to a lack of maintenance, or the fabrication of work-arounds).
  • Allow sufficient working space around equipment (for the person and all that they need).
  • Ensure that labelling supports the easy identification of components and systems.
  • Provide interlocks that prevent incorrect or untimely operation of certain functions.
  • Provide adequate task lighting (consider brightness and the prevention of glare or shadows).

Ideally, as discussed above, the design should be error tolerant. Some predictable human failures in maintenance are difficult to prevent and so when they do occur, the design should enable these to be easily detected and therefore recovered without significant consequences.

Inherent safety

In line with the famous words of the late Professor Trevor Kletz, (“what you don’t have can’t leak“), it is worth considering that inherently safe design can prevent the need for maintenance in the first place.

Maintenance failures are heavily influenced by the design of the task and the design of the equipment being maintained. Equipment that is difficult to maintain, or components that can be incorrectly fitted, will contribute to maintenance failures. Although the management of Performance Influencing Factors can have a significant impact, it is more effective to prevent human failures at an engineering level by, for example, reducing the complexity of the plant (and therefore reducing maintenance requirements).

Further reading

Human factors in maintenance: A question set designed to be used by HSE Inspectors when carrying out inspections or audits.

Human factors guidance for selecting appropriate maintenance strategies for safety in the offshore oil and gas industry, Research Report 213, RR213, Health and Safety Executive, 2004. The aim of this project was to identify ways in which Human Factors ‘best practice’ may be integrated into an offshore maintenance strategy. Issues identified in the literature and in discussions with industry representatives were translated into a question set, aimed at guiding both HSE Inspectors and the industry in ensuring that maintenance strategies address key human factors issues.

Reason, J. and Hobbs, A., Managing Maintenance Error: A practical guide. Ashgate Publishing, (2003). ISBN 978 0 7546 1591 0.

CSB Safety Bulletin: Positive material verification: Prevent errors during alloy steel systems maintenance, No. 2005-04-B. U.S. Chemical Safety and Hazard Investigation Board, 12 October 2006.

Up ↑