DisasterCast show

DisasterCast

Summary: Engineers make the news by designing cool things, building great things, or causing spectacular disasters. Apollo 11 is famous for putting astronauts on the Moon - Apollo 13 is famous for putting astronauts in extreme peril. The Curiousity Rover landed on Mars. The Mars Polar Lander crashed iinto Mars. The Golden Gate bridge is a spectacular landmark - the Tacoma Narrows bridge was a spectacular failure. There are place names hardly anyone would know except for the tragic events that happened there. Bhopal, Potters Bar, Chernobyl, Flixborough, Seveso, Fukushima. This is a podcast about how not to be famous.

Podcasts:

 Episode 20 – An Unexpected Risk Assessment | File Type: audio/mpeg | Duration: 39:51

There is a fine line between confidence and stupidity. In the 1970s the London Ambulance Service tried to implement a computer aided despatch system, and failed because they couldn’t get the system’s users to support the change. In the late 1980s they tried again, but the system couldn’t cope with the expected load. Clearly, implementing a system of this sort involved significant managerial and technical challenges. What better way to handle it then, than to appoint a skeleton management team and saddle them with an impossible delivery timetable. The London Ambulance Service Computer Aided Despatch System and Management Aided Disaster is described on this Episode by George Despotou. George also talks about the safety challenges of connected health. Episode 20 transcript is here. References ZDNet News Item about 999 System outage London Ambulance Service Press Release Anthony Finkelstein LASCAD page with an academic paper, the full report and case study notes University of Kent LASCAD case study notes [pdf] Caldicott Report mentioned in George's Connected Health piece The Register news article mentioned in George's piece BBC News article on hacking heart pumps George's Dependable Systems Blog

 Episode 19 – Star Trek Transporters and Through Life Safety | File Type: audio/mpeg | Duration: 36:21

Have you ever noticed that very few people get hurt during the design of a system. From precarious assemble-at-home microlight aircraft to the world’s most awesome super-weapons, the hazards that can actually occur at design time are those of a typical office environment - power sockets, trips, falls and repetitive strain injury. Our safety effort during this time is all predictive. We don’t usually call it prediction, but that’s what modelling, analysis, and engineering judgement ultimately are. We’re trying to anticipate, imagine and control a future world. And even though it’s easy to be cynical about the competence and diligence of people in charge of dangerous systems I really don’t think that there are evil masterminds out there authorising systems in the genuine belief that they are NOT safe. At the time a plant is commissioned or a product is released there is a mountain of argument and evidence supporting the belief of the designers, the testers, the customers and the regulators that the system is safe. Why then, do accidents happen? That’s what this episode is about. We’ll look at some of the possible reasons and how to manage them, then discuss an accident, the disaster that befell Alaska Airlines Flight 261. Just in case you’ve got a flight to catch afterwards, we’ll reset our personal risk meters by discussing an alternate way to travel, the transporters and teleportation devices from Star Trek and similar Sci Fi experiences. Transcript is available here. References Memory Alpha (Star Trek Wiki) article on Transporters. NTSB Report on the Alaska Airlines 261 Crash.

 Episode 18 – Friendly Fire | File Type: audio/mpeg | Duration: 25:23

This episode is about military fratricide accidents, also known as friendly fire, blue-on-blue, and the reason why your allies are sometimes scarier than your enemies. Friendly fire accidents are a prime example why system safety isn’t just an activity for practice and peacetime. When warfighters can’t trust their own weapons or their own allies it puts a serious dent in their operational capability, and that’s generally considered a bad thing. There’s a reason why Wikipedia has a page dedicated specifically for United States Friendly Fire Incidents with British Victims. It’s actually not a long list, but the cultural and strategic impact makes it feel much longer. Blue-on-blue incidents lead to distrust, lack of communication and lack of cooperation. Given that lack of communication and coordination is often cited as a cause of friendly fire, you can probably already picture the cycle of unintentional violence that can spiral from one or two incidents. At a tactical level, friendly fire incidents occur for one of three reasons: 1) Misidentifying a friendly unit as a valid target; 2) Firing at a location other than intended; or 3) A friendly unit moving into an area where indiscriminate firing is occurring. Since technology is increasingly being used to help identify targets, aim weapons and navigate, it is inevitable that technology will be complicit in a growing number of friendly fire accidents. In some respects the role of technology in these accidents is similar to medical device failures - accidents would occur at a higher rate without the technology, it just isn’t a perfect solution. This isn’t an excuse not to make the technology better though. In particular, when friendly fire accidents happen because our electronic devices have unexpected failure modes, that’s a sign that better safety analysis has an important role to play. In this episode we are going to look at three friendly fire incidents. Apart from the use of technology and the nationality of the perpetrators, see if you can spot the common thread. The episode transcript is available here.

 Episode 17 – Glenbrook and Waterfall | File Type: audio/mpeg | Duration: 19:47

In 1999, at a place called Glenbrook, just outside of Sydney, Australia, two trains collided killing seven people. In 2003, at a place called Waterfall, just outside of Sydney, Australia, a train derailed killing seven people. Same operator, same regulator, same state government, same judge leading the inquiry. Justice Peter Aloysius McKinerny was not impressed to find that his first lot of recommendations hadn't been followed. Episode Transcript is here. References Special Commission of Inquiry into Glenbrook. Independent Transport Safety Regulator Waterfall Reports.

 Episode 16 – Certain Questions | File Type: audio/mpeg | Duration: 28:32

Honesty and humility about uncertainty are an important part of safety. At one end of the spectrum is false certainty about safety, and at the other is dogmatism about particular ways of achieving safety. Both involve overconfidence in designs, methods, and the correctness of the person making the judgement. The main feature of this episode is an interview with senior safety researcher Michael Holloway. The episode also covers the 1971 Iraq Grain Disaster. Episode 16 Transcript is here. References Project Syndicate Report on Iraqi Disasters Science Magazine Article on Iraq Poison Grain Disaster Bulletin of the World Health Organisation article on the Iraqi Poison Grain Disaster

 Episode 15 – Disowning Fukushima | File Type: audio/mpeg | Duration: 30:53

Sociologist John Downer talks about his recent paper, "Disowning Fukushima: Managing the Credibility of Nuclear Reliability Assessment in the Wake of Disaster". If you're in the business of producing or relying on quantitative risk assessment, what do you do when an event such as Fukushima occurs? Do you say that the event didn't happen? Do you claim that the risk assessment wasn't wrong? Do you say that their risk assessment was wrong, but yours isn't? Maybe you admit that there was a problem, but claim that everything has now been sorted out.

 Episode 14 – Three Mile Island and Normal Accidents | File Type: audio/mpeg | Duration: 34:11

This episode of DisasterCast covers the Three Mile Island nuclear accident, and "Normal Accidents", one possible explanation for why disasters like Three Mile Island Occur. Normal Accidents is the brainchild of the sociologist Charles Perrow. If you haven’t explicitly heard of him or of Normal Accidents, you’ve probably still encountered the basic ideas which often appear in the press when major accidents are discussed. If you read or hear someone saying that we need to think “possibilistically” instead of “probabilistically”, it’s likely that they’ve been influenced, at least in part, by Normal Accidents. In particular, there were a number of news articles written after Fukushima which invoked Normal Accidents as an explanation. Risk assessment is not a science. Whilst we can study risk assessment using scientific methods, just as we can study any human activity, risk assessment itself doesn’t make testable predictions. This may seem a bit non intuitive. Consider nuclear power. We’ve had lots of nuclear reactors for a long time - isn’t that enough to tell us how safe they are? Put simply, no. The probabilities that the reactor safety studies predict are so low, that we would need tens of thousands of years of operational evidence to actually test those predictions. None of this is controversial. Perrow goes a step further though. He says that the reason that we have not had more accidents is simply that nuclear reactors haven’t been around long enough to have those accidents. In other words, he goes beyond believing that the risk assessments are unreliable, to claiming that they significantly underestimate the risk. The theory of Normal Accidents is essentially Perrow’s explanation of where that extra risk is coming from. His starting point is not something that we should consider controversial. Blaming the operators for an accident such as Three Mile Island misses the point. Sure, the operators made mistakes, but we need to work out what it was about the system and the environment that caused those mistakes. Blaming the operators for stopping the high-pressure injectors would be like blaming the coolant for flowing out through the open valve. Perrow points to two system features, which he calls “interactive complexity” and “tight coupling” which make it hard for operators to form an accurate mental model of the systems they are operating. The bulk of his book consists of case studies examining how these arise in various ways, and how they contribute to accidents.

 Episode 13 – Therac-25 and Software Safety | File Type: audio/mpeg | Duration: 34:41

This episode discusses the Therac-25 accidents, and includes an interview with software safety researcher Richard Hawkins. Despite the widespread use of software in critical applications such as aircraft, rail systems, automobiles, weapons and medi...

 Episode 12 – Piper Alpha | File Type: audio/mpeg | Duration: 29:49

Piper Alpha Overview On the 25th Anniversary of the destruction of the Piper Alpha oil platform, everyone is discussing the importance of not forgetting the lessons of Piper Alpha. What are those lessons though? Hindsight bias can often let us believe that accidents are caused by extreme incompetence or reckless disregard for safety. These simplistic explanations convince us that disaster could never happen to us. After all, we do care about safety. We do try to do the right thing. We have good safety management, don't we? The scary truth is that what we believe about our own organisations, Occidental Petroleum believed about Piper Alpha. As well as the usual description of the accident, this episode separately delves into the design and management of Piper Alpha. In each segment, we extract themes and patterns repeated across multiple systems, multiple procedures, and multiple people. Design From a design point of view, there were four major failings on Piper Alpha, all teaching lessons that are still relevant. Failure to include protection against unlikely but foreseeable events An assumption that everything would work, with no backup provision if things didn’t work. Inadequate independence, particularly with respect to physical co-location of equipment A design that didn’t support the human activity that the design required to be safe Organisation There are three strong patterns in the management failings of Piper Alpha. A lack of feedback loops, and an assumption that not hearing any bad information meant that things were working. A tendancy to seek simple, local explanations for problems, rather than using small events as clues for what was wrong with the system An unwillingness to share and discuss information about things that went wrong Additionally, there were severe problems with the regulator - not a shortage of regulation, but a shortage of good regulation. Transcript for this episode is here.

 Episode 11 – Kegworth and Checklists | File Type: audio/mpeg | Duration: 27:11

This episode examines British Midlands Flight BD92 (Kegworth). In the Kegworth accident, the Boeing 737 experienced an engine failure, but the pilots shut down the wrong engine. As usual, it's a bit more complicated than it first sounds. "Tick Box" often used as a criticism of safety, but this may be a bit unfair. Checklists have an important role to play in preventing accidents, and arguably could have made a difference for Kegworth. As they are adopted into other domains, we should consider what we hope to get out of using checklists, and how they can be used wisely. In this episode I also try translating the idea that there are seven basic plots from literature to the world of accidents. I've provided the first four universal accident narratives, but help from listeners is needed to finish the list. The next episode will be about medical devices. I'm looking for someone with expertise in the domain to be interviewed or to contribute a segment, so let me know if that sounds like you.

 Episode 10 – The Value of a Statistical Life | File Type: audio/mpeg | Duration: 33:04

Value of a Statistical Life, the Ford Pinto, and Safety Cases. In this episode I examine the ethics and practicalities of placing a value on human life. I discuss how the value of a human life is determined, how it is used, and how it can be misused. We then delve into probably the most controversial example of safety versus cost trade-off, the Ford Pinto in the case of Grimshaw v Ford Motor Company, 1978. This episode also features an interview with George Despotou from the University of York. George and I talk about his recent article, a "First Contact with Safety Cases".

 Episode 3 | File Type: audio/mpeg | Duration: 26:13

DisasterCast Episode 3 - Coal mine disasters, risk acceptance, and personal electronic devices on aircraft. When did system safety engineering begin? Why do different risks get regulated differently? Do I really need to switch off all electronic devices? Will they really interfere with navigation?

 Episode 2 | File Type: audio/mpeg | Duration: 22:17

DisasterCast Episode 2 - Blame the Operator Performance shaping factors, BA 5390, Replacing humans. When an accident happens, humans are always at the heart of the story. The first characters we see are the victims - broken bodies, distraught families, dazed survivors. As the narrative grows, we hear about the heroes - the rescue workers running toward the danger, the pilot who performed a miracle, the quick thinking console operator who stopped things being much worse. But you can’t make a good story just with damsels in distress and knights in shining armour. We want to know why disaster struck, and too often, we confuse finding an explanation, with finding someone to blame.

 Episode 1 | File Type: audio/mpeg | Duration: 21:42

Hindenburg, Safety jargon, and Rogue Planets. This is the first episode of DisasterCast. DisasterCast is a fortnightly podcast about safety and risk. Each episode features something old (typically a historical accident), something new (an introduction to the language, concepts and techniques of safety), and something out of the blue (a glimpse of disasters in our future). The episode transcript can be found here.

Comments

Login or signup comment.