On March 23rd, 2005, during the startup of an isomerization (ISOM) unit, the BP Texas City Refinery, Texas, experienced an industrial disaster. A massive explosion and fires killed 15 people and injured another 180. A key factor was the siting of temporary occupied trailers close to a process unit handling highly hazardous materials (the CSB noted that all fatalities occurred in or around trailers).
The ISOM unit, part of the refining process, was being restarted after a month-long maintenance operation. Startup of such processes is recognised in the industry as a significantly more hazardous period, when incidents are much more likely. Other refineries have experienced major accidents during startup (including the 1998 Equilon Refinery accident in Anacortes, WA, with six fatalities; the Texaco Milford Haven explosion and fire in the UK in 1994; and the 2000 BP Grangemouth refinery fire).
The incident was investigated by the US Chemical Safety & Hazard Investigation Board (CSB) who reported in March 2007. In August 2005, the CSB recommended that an independent panel conduct a review of BP’s safety culture, safety management systems, and safety oversight at all its U.S. refineries – which led to the Baker Panel Report (Jan, 2007).
The Baker Panel Report concluded that significant process safety issues exist at all five U.S. refineries – the issues were not unique to the Texas City refinery.
“The Texas City disaster was caused by organizational and safety deficiencies at all levels of the BP Corporation. Warning signs of a possible disaster were present for several years, but company officials did not intervene effectively to prevent it” (CSB, 2007, page 18).
‘Anatomy of a Disaster’
The CSB produced an excellent video called ‘Anatomy of a Disaster’ which describes the incident and the main contributory factors. This video runs for around 55 minutes, and the human factors section starts at 35 minutes, and is a useful introduction to key aspects of human factors.
Key human factors issues
Whilst not a comprehensive summary, I have outlined some of the key human factors issues documented in the CSB report, using the topics on humanfactors101.com as a structure. Several of the failures could be reported under a number of human factors headings; which highlights that although the list of key human factors topics is a useful structure, there are many overlaps and interactions between the topics. This is the essence of human factors – the subject is a complex interaction of many factors, which requires a consideration of the three aspects outlined in the introduction (People, Work and Organisation).
In documenting these human factors issues, the CSB investigation is clear that people were ‘set up to fail’ by system deficiencies and it would be inappropriate to focus on individual human errors. For example, it states that:
“Although actions or errors by operations personnel at the BP Texas City site, as described in the preceding section, were immediate causes of the March 23 accident, numerous latent conditions and safety system deficiencies at the refinery influenced their actions and contributed to the accident” (CSB, 2007, p.69).
“The broader aspects of this investigation revealed serious management safety system deficiencies that allowed the operators and supervisors to fail” (CSB, 2007, p.69).
“Numerous underlying latent conditions collectively influenced the decisions and actions of the operations personnel at the AU2/ISOM/NDU complex. These safety system deficiencies created a workplace ripe for human error to occur” (CSB, 2007, p.99).
There was inadequate operator training for abnormal and startup conditions, and although overfilling incidents are well-documented, the hazards of overfilling distillation towers were not well understood by Texas City management and operations personnel. Nor was there effective methods to verify operator competence. The refinery training budget was reduced over the years from 1998, and during the same period Learning and Development personnel were reduced from 28 to eight.
“Inadequate training for operations personnel, particularly for the board operator position, contributed to causing the incident. The hazards of unit startup, including tower overfill scenarios, were not adequately covered in operator training” (CSB, 2007, p.91).
“The hazards of unit startup were inadequately covered in operator training and did not prepare the Board Operator for the tasks he was responsible for on the day of the incident. This insufficient training was compounded by the lack of annual performance appraisals, individual skill development plans, and abnormal situation management simulator training. BP provided only basic general training to its operators” (CSB, 2007, 294).
A 2001 internal audit identified that several operating procedures were out of date and did not accurately reflect actual working practices on particular units. Procedures did not include experience from previous startups.
“The ISOM raffinate section startup procedure lacked sufficient instructions for the Board Operator to safely and successfully start up the unit” (CSB, 2007, p.75).
This created a culture where procedural deviations were common and routine, and where procedures were seen as guidance, rather than instructions. The CSB noted that in the previous five years, most of the 19 startups involved deviations from written procedures, and these deviations were made without any management of change assessment. Operators relied on knowledge of previous startups and developed informal work practices, partly to avoid delays in startup:
“These deviations were not unique actions committed by an incompetent crew, but were actions operators, as a result of established work practices, frequently took to protect unit equipment and complete the startup in a timely and efficient manner” (CSB, 2007).
In some cases, procedures were not available at all, for example, there were no procedures for the calibration, inspection, testing, maintenance, or repair of the five instruments the CSB considered to be contributory causes in the incident.
“The BP Texas City tragedy is an accident with organizational causes embedded in the refinery’s culture” (CSB, 2007, p.175).
“The disaster at Texas City had organizational causes, which extended beyond the ISOM unit, embedded in the BP refinery’s history and culture” (CSB, 2007, p.139).
In the 32 years prior to the investigation, the BP Texas City refinery had 39 fatalities, one of the worst rates of any US workplace in recent history. On average, one worker had died every 16 months. In 2004 alone, three major accidents resulted in three fatalities. The Texas City Refinery’s HSSE Business Plan warned in 2005 that the refinery would be likely to “kill someone in the next 12-18 months”.
The safety culture onsite influenced staff behaviours, for example, it created a work environment that encouraged operations personnel to deviate from procedures. It also enabled an acceptance of faulty equipment – for example, a malfunctioning control valve was reported to a supervisor, who subsequently signed startup documents stating that all control valves had been tested and were operational prior to startup, despite the issue not being addressed. Critical alarms were not checked before startup – and the supervisor initialed on the startup procedure that those checks had been completed.
“A workplace environment characterized by poor motivation, unclear expectations around supervisory /management behaviors, no clear system of reward and consequences, and high distrust between leadership and the workforce, had developed over a number years within the site” (Mogford Report, 2005, p.153).
Investigations and learning
Incidents (including two previous startup incidents) were often ineffectively investigated and appropriate corrective actions were not taken. The CSB reported that incident investigations too often focused on ‘operator error’ as the root cause (and this may explain why staff were reluctant to report issues or near-misses). Management had failed to create an effective reporting and learning culture.
“BP had not implemented an effective incident investigation management system to capture appropriate lessons learned and implement needed changes. Such a system ensures that incidents are recorded in a centralized record keeping system and are available for other safety management system activities such as incident trending and process hazard analysis” (CSB, 2007, p.100).
Many of the process safety system deficiencies that led to the incident had been previously identified in BP audits:
“Many of the safety problems that led to the March 23, 2005, disaster were recurring problems that had been previously identified in audits and investigations” (CSB, 2007, p.138).
The company failed to learn from previous major events, such as the explosion and fires at the BP UK Grangemouth refinery in 2000.
Various communications issues contributed to the genesis of this disaster. At the morning shift directors’ meeting, the raffinate startup was discussed, and it was concluded that this section would not be started up, but this key information was not communicated to the ISOM operations personnel. Shift handovers were rushed and vague; or didn’t happen at all (for example, The ISOM-experienced Day Supervisor arrived for his shift over an hour late and did not handover with the night shift.
An entry in the centralized control room logbook: “ISOM: Brought in some raff to unit, to pack raff with” was interpreted by the Day Board Operator to mean that only the tower was filled, whereas in fact the heat exchangers, piping and associated equipment had also been filled during the previous shift.
“BP had no policy for effective shift communication, nor did it enforce formal shift turnover or require logbook/procedural records to ensure communication was clearly and appropriately disseminated among operating crews” (CSB 2007, p.77).
The CSB report ineffective supervisory oversight and technical assistance during unit startup. The shift started with two supervisors: one with 20 years ISOM experience, the second with none. The Day Supervisor, an experienced ISOM operator, left the plant mid-morning due to a family emergency. After they left, the raffinate startup operators lacked experienced supervision, even though BP’s safety procedures required such oversight.
“There was little investment in supervisory/management training, and an absence of role models within supervision, and, as a result, supervisory /management behaviors were inadequate. There were no clearly documented expectations for supervisors’ roles, including those stepping up to an acting supervisory role” (Mogford Report, 2005, p.153).
ISOM operations personnel experienced a ‘flood’ of alarms (hundreds of alarms registering in a short period) and weren’t able to assess the situation or warn others prior to the explosion.
Not only was there malfunctioning instrumentation and incorrectly calibrated instrumentation during the startup; a poorly-designed interface made it difficult to determine that the tower was overfilling. Panel operators didn’t have visibility of actual process conditions.
Different control screens showed how much liquid raffinate was entering the unit and how much raffinate product was leaving the unit; making it less clear that there was an imbalance between the two readings.
There was insufficient numbers of operators given the high workload during the unit startup, despite previous internal recommendations that any startup required two panel operators.
A Texas City document from 2004 discusses staffing cutbacks:
“In the face of increasing expectations and costly regulations, we are choosing to rely wherever possible on more people dependant and operational controls rather than preferentially opting for new hardware. This strategy [will place] greater demands on work processes and staff to operate within the shrinking margin for error” (CSB, 2007, p.86).
An internal study in 2002 warned of high levels of overtime resulting from reduced staffing levels. The Steelworkers Union also expressed concerns (2000) relating to staffing levels:
“Through the Joint Health and Safety Committee, PACE Union 4-449 is notifying the company, BP, of its concern on the issue of the complement of operators relative to providing adequate staffing levels to assure safe and environmentally sound operations at the Texas City Refinery site. Issues include operator staffing levels below the numbers required for ‘safe off staffing’. This involves the day to day operation of units with less than the minimum numbers of operators required. The situations worsen when staffing of extra board decreases to the extent of operators working excessive amounts of overtime, which adds worker fatigue into potential job performance problems” (CSB, 2007, p.285).
The staffing issues reported above are related to fatigue. BP didn’t have a fatigue-prevention policy for refinery personnel.
On the day of the incident, several key staff (Day Board Operator, Night Lead Operator, Day Lead Operator) had worked between 29 and 37 consecutive 12-hour shifts. Some of these personnel rarely had breaks, and ate meals at the control panel. The CSB analysis is that operator fatigue likely contributed to the incident by impairing operator performance, degrading judgment and causing cognitive fixation:
“the CSB concludes that fatigue of the operations personnel contributed to overfilling the tower” (CSB, 2007, p.289).
The Baker Panel Report (2007) concluded that overtime rates were excessive, would likely compromise safety, and were symptomatic of understaffing.
In the years prior to the 2005 incident, the company underwent many corporate, leadership and organisational changes. Major changes to the organisation, such as staff reductions, changes to the management structure, policy changes and budget reductions were generally not assessed for their impact on the management of process safety. Many of these changes led to a reduced emphasis on process safety. An external audit of the refinery stated that they hadn’t previously seen so many changes over a short period.
“The Texas City site had an overly complex and changing organization which was not conducive to good communication and clear accountability” (Mogford Report, 2005, p.153).
Although mistakes were made by front-line staff (such as a failure to follow procedures), the CSB investigation recognises that there are wider issues to be considered:
“Simply targeting the mistakes of BP’s operators and supervisors misses the underlying and significant cultural, human factors, and organizational causes of the disaster that have a greater preventative impact” (CSB, 2007, p.19).
In this case, organisational failures included: cost-cutting, production pressures, inadequate process for shift handovers, inadequate operator training program, outdated procedures, recurring operational problems during startups, failure to address reported equipment malfunctions, not investigating previous major events, and a failure to assess organisational changes.
Learning the lessons
Following the BP Texas City incident, in my role at the UK HSE, I was responsible for ensuring that the above lessons were communicated to all UK refineries. I produced several documents to communicate key lessons, including this one-page guide containing 9 key questions from the CSB and Baker Panel investigation reports; and visited all refineries to check on their progress against the main recommendations.
I also produced a series of Discussion Questions that can be used to ensure that the key lessons from BP Texas City are learned by your organisation.
Ten years on . . .
The CSB released the following video on the tenth anniversary of the Texas City disaster.
Three key reports are in the public domain:
The CSB Final report, Report No. 2005-04-I-TX, U.S. Chemical Safety and Hazard Investigation Board (CSB), March 2007. The key issues described in this report include safety culture, regulatory oversight, process safety metrics and human factors. An excellent example of how to investigate the technical, human and wider organisational issues of a major incident. Also contains significant commentary throughout as to what constitutes good practice in these areas – so I’d recommend this report as good value for those wanting to learn about human and organisational factors generally, not just the specifics of this incident.
The Baker Panel Report: “The report of the BP U.S. refineries independent safety review panel” (2007) – arising from an early recommendation of the CSB. This report majors on safety culture, leadership, safety management systems, and process safety management.
The Panel’s charter was to make ‘a thorough, independent, and credible assessment of the effectiveness of BP’s corporate oversight of safety management systems at its five U.S. refineries and its corporate safety culture’. The Panel stated that the deficiencies identified likely apply more widely and so this work is recommended for all major-hazard industries, as well as other complex organisations.
“People can forget to be afraid” (Baker Panel, 2007, p.i).
The Mogford Report: “Fatal accident investigation report: Isomerization Unit Explosion Final Report” (December 2005). This was BP’s internal investigation, led by J Mogford. Refers to procedures, safety culture, organisational change, supervision, management and leadership, competence, communications and audit.