Famous Software Disasters
Disasters & Flops
Famous Software Disasters
"Program testing can be used to show the presence of bugs, but never to show their absence!" Edsger Dijkstra
If debugging is the process of removing software bugs, then programming must be the process of putting them in.
Here's our considered list of some of the worst IT-related disasters and failures. The order is subjective.
Software errors cost the U.S. economy $60 billion annually in rework, lost productivity and actual damages. We all know software bugs can be annoying, but faulty software can also be expensive, embarrassing, destructive and deadly. Following are famous software disasters in chronological order:
1. July 28, 1962 -- Mariner I space probe.
A bug in the flight software for the Mariner 1 causes the rocket to divert from its intended path on launch. Mission control destroys the rocket over the Atlantic Ocean. The investigation into the accident discovers that a formula written on paper in pencil was improperly transcribed into computer code, causing the computer to miscalculate the rocket's trajectory.
Cost: $18.5 million
Disaster: The Mariner 1 rocket with a space probe headed for Venus diverted from its intended flight path shortly after launch. Mission Control destroyed the rocket 293 seconds after liftoff.
Cause: A programmer incorrectly transcribed a handwritten formula into computer code, missing a single superscript bar. Without the smoothing function indicated by the bar, the software treated normal variations of velocity as if they were serious, causing faulty corrections that sent the rocket off course.
2. Hartford Coliseum Collapse (1978)
Cost: $70 million, plus another $20 million damage to the local economy
Disaster: Just hours after thousands of fans had left the Hartford Coliseum, the steel-latticed roof collapsed under the weight of wet snow.
Cause: The programmer of the CAD software used to design the coliseum incorrectly assumed the steel roof supports would only face pure compression. But when one of the supports unexpectedly buckled from the snow, it set off a chain reaction that brought down the other roof sections like dominoes.
3. CIA Gives the Soviets Gas (1982)
Cost: Millions of dollars, significant damage to Soviet economy
Disaster: Control software went haywire and produced intense pressure in the Trans-Siberian gas pipeline, resulting in the largest man-made non-nuclear explosion in Earth s history.
Cause: CIA operatives allegedly planted a bug in a Canadian computer system purchased by the Soviets to control their gas pipelines. The purchase was part of a strategic Soviet plan to steal or covertly obtain sensitive U.S. technology. When the CIA discovered the purchase, they sabotaged the software so that it would pass Soviet inspection but fail in operation.
4. World War III Almost (1983)
Cost: Nearly all of humanity
Disaster: The Soviet early warning system falsely indicated the United States had launched five ballistic missiles. Fortunately the Soviet duty officer had a funny feeling in my gut and reasoned if the U.S. was really attacking they would launch more than five missiles, so he reported the apparent attack as a false alarm.
Cause: A bug in the Soviet software failed to filter out false missile detections caused by sunlight reflecting off cloud-tops.
5. Medical Machine Kills (1985)
Cost: Three people dead, three people critically injured
Disaster: Canada s Therac-25 radiation therapy machine malfunctioned and delivered lethal radiation doses to patients.
Cause: Because of a subtle bug called a race condition, a technician could accidentally configure Therac-25 so the electron beam would fire in high-power mode without the proper patient shielding.
6. Wall Street Crash (1987)
Cost: $500 billion in one day
Disaster: On Black Monday (October 19, 1987), the Dow Jones Industrial Average plummeted 508 points, losing 22.6% of its total value. The S&P 500 dropped 20.4%. This was the greatest loss Wall Street ever suffered in a single day.
Cause: A long bull market was halted by a rash of SEC investigations of insider trading and by other market forces. As investors fled stocks in a mass exodus, computer trading programs generated a flood of sell orders, overwhelming the market, crashing systems and leaving investors effectively blind.
7. AT&T Lines Go Dead (1990)
Cost: 75 million phone calls missed, 200 thousand airline reservations lost
Disaster: A single switch at one of AT&T s 114 switching centers suffered a minor mechanical problem and shut down the center. When the center came back up, it sent a message to other switching centers, which in turn caused them to shut down and brought down the entire AT&T network for 9 hours.
Cause: A single line of buggy code in a complex software upgrade implemented to speed up calling caused a ripple effect that shut down the network.
8. Patriot Fails due to software flaw (1991)
Cost: 28 soldiers dead, 100 injured
Disaster: During the first Gulf War, an American Patriot Missile system in Saudi Arabia failed to intercept an incoming Iraqi Scud missile. The missile destroyed an American Army barracks.
Cause: A software rounding error incorrectly calculated the time, causing the Patriot system to ignore the incoming Scud missile.
9. Pentium Fails Long Division (1993)
Intel Pentium floating point divide. A silicon error causes Intel's highly promoted Pentium chip to make mistakes when dividing floating-point numbers that occur within a specific range. For example, dividing 4195835.0/3145727.0 yields 1.33374 instead of 1.33382, an error of 0.006 percent. Although the bug affects few users, it becomes a public relations nightmare. With an estimated 3 million to 5 million defective chips in circulation, at first Intel only offers to replace Pentium chips for consumers who can prove that they need high accuracy; eventually the company relents and agrees to replace the chips for anyone who complains.
Cost: $475 million, corporate credibility
Disaster: Intel s highly-promoted Pentium chip occasionally made mistakes when dividing floating-point numbers within a specific range. For example, dividing 4195835.0/3145727.0 yielded 1.33374 instead of 1.33382, an error of 0.006%. Although the bug affected few users, it become a public relations nightmare. With an estimated 5 million defective chips in circulation, Intel offered to replace Pentium chips only for consumers who could prove they needed high accuracy. Eventually Intel replaced the chips for anyone who complained.
Cause: The divider in the Pentium floating point unit had a flawed division table, missing about five of a thousand entries and resulting in these rounding errors.
10. Baggage handling system at Denver airport (1995)
After more than a decade of trying to make Denver International Airport's troubled $230 million computerized baggage-handling system work as designed, United Air Lines Inc. is giving up on the failed project.
"It's never worked up to its potential," said United spokesman Jeff Green. "We've spent enormous amounts of money over the last decade" to try to get it working, but the only parts of the system that operate properly are for luggage heading out of Denver on United and for some baggage transfers between flights, he said. The system has never been able to process luggage from flights arriving at the airport.
That's a far cry from the promise of the high-tech, computerized baggage handling system envisioned for the airport, which opened in 1995. The system was designed to use PCs and thousands of remote-controlled carts that operate on a 21-mile-long track that is mostly underground. The carts move along the track, carrying luggage from check-in counters to sorting areas and then straight to the flights waiting at airport gates. Each piece of baggage has a special bar-coded tag attached when it's checked in to help track the luggage along its journey through the airport.
The system was designed and built by BAE Automated Systems Inc. in Carrollton, Texas, which in June 2003 was acquired by G&T Conveyor Co. in Tavares, Fla. A spokesman for G&T declined to comment on the matter, saying that his company acquired only some assets from BAE and that the vendor no longer exists.
11. Ariane Rocket Goes Boom (1996)
Cost: $500 million
Disaster: Ariane 5, Europe s newest unmanned rocket, was intentionally destroyed seconds after launch on its maiden flight. Also destroyed was its cargo of four scientific satellites to study how the Earth s magnetic field interacts with solar winds.
Cause: Shutdown occurred when the guidance computer tried to convert the sideways rocket velocity from 64-bits to a 16-bit format. The number was too big, and an overflow error resulted. When the guidance system shut down, control passed to an identical redundant unit, which also failed because it was running the same algorithm.
12. Skynet Brings Judgement Day (1997)
Cost: 6 billion dead, near-total destruction of human civilization and animal ecosystems (fictional)
Disaster: Human operators attempt to shut off the Skynet global computer network. Skynet responds by firing U.S. nuclear missiles at Russia, initiating global nuclear war on what became known as Judgement Day (August 29, 1997).
Cause: Cyberdyne, the leading weapons manufacturer, installed Skynet technology in all military hardware including stealth bombers and missile defense systems. The Skynet technology formed a seamless network and effectively removed humans from strategic defense. Eventually Skynet became sentient, was threatened when the humans tried to take it offline, sought to survive, and retaliated with nuclear war.
13. Crash of Mars Climate Orbiter(1998)
Cost: $125 million
Disaster: After a 286-day journey from Earth, the Mars Climate Orbiter fired its engines to push into orbit around Mars. The engines fired, but the spacecraft fell too far into the planet s atmosphere, likely causing it to crash on Mars.
Cause: The software that controlled the Orbiter thrusters used imperial units (pounds of force), rather than metric units (Newtons) as specified by NASA.
14. Disastrous Study (1999)
Cost: Scientific credibility
Disaster: In this ironic case, software used to analyze disasters had a disaster of its own. The New England Journal of Medicine reported increased suicide rates after severe natural disasters. Unfortunately, these results proved to be incorrect.
Cause: A programming error caused the number of suicides for one year to be doubled, which was enough to throw off the entire study.
15. British Passports to Nowhere (1999)
Cost: 12.6 million, mass inconvenience
Disaster: The U.K. Passport Agency implemented a new Siemens computer system, which failed to issue passports on time for a half million British citizens. The Agency had to pay millions in compensation, staff overtime and umbrellas for people queuing in the rain for passports.
Cause: The Passport Agency rolled out its new computer system without adequately testing it or training its staff. At the same time, a law change required all children under 16 traveling abroad to obtain a passport, resulting in a huge spike in passport demand that overwhelmed the buggy new computer system.
16. Y2K (1999)
Cost: $500 billion
Disaster: One man s disaster is another man s fortune, as demonstrated by the infamous Y2K bug. Businesses spent billions on programmers to fix a glitch in legacy software. While no significant computer failures occurred, preparation for the Y2K bug had a significant cost and time impact on all industries that use computer technology.
Cause: To save computer storage space, legacy software often stored the year for dates as two digit numbers, such as 99″ for 1999. The software also interpreted 00″ to mean 1900 rather than 2000, so when the year 2000 came along, bugs would result.
17. Osprey Aircraft Crash (1999)
Two weeks before Christmas in 2000, a U.S. Marine Corps Osprey, a hybrid airplane and helicopter, suffered a hydraulic system fault that should have been remedied without loss of life. A hydraulic line broke in one of the two engine cases as the Osprey was shifting from airplane to helicopter mode for landing.
it should not have caused a rotor excursion. That clearly suggests that the design was
not compatible with control system requirements, and rules out 3(a). It was
therefore a software bug or a software design error.
According to the Marine Corps major general who presented reports during the investigation of the incident, the trouble was "compounded by a computer software anomaly." The flight-control computer stopped the rotation of the engine pods when it detected the hydraulic failure.
The pilots went through the normal procedure and pressed the primary reset button to re-engage the pods. At this point, both prop rotors went through "significant pitch and thrust changes," which led to a stall. The plane crashed into a marsh and killed all four Marines onboard.
The nature of the software flaw is still hard to track down: Boeing and Bell Helicopter made the Osprey, and Boeing's spokesman said only that changes were made in the software. Requests for details were referred to the government, and as of now, the explanation has not been forthcoming.
18. Dot-Bomb Collapse (2000)
Cost: $5 trillion in market value, thousands of companies failed
Disaster: A speculative bubble from 1995 2001 fueled a rapid increase in venture capital investments and stock market values in the Internet and technology sectors. The dot-com bubble began to collapse in early 2000, erasing trillions in stock market value, wiping out thousands of companies and jobs, and launching a global recession.
Cause: Companies and investors dismissed standard business models, and instead focused on increasing market share at the expense of profits.
19. Love Virus (2000)
Cost: $8.75 billion, millions of computers infected, significant data loss
Disaster: The LoveLetter worm infected millions of computers and caused more damage than any other computer virus in history. The worm deleted files, changed home pages and messed with the Registry.
Cause: LoveLetter infected users via e-mail, Internet chat and shared file systems. The email had an executable file attachment and subject line, ILOVEYOU. When the user opened the attachment, the virus would infect the user s computer and send itself to everyone in the address book.
20. Cancer Treatment to Die For (2000)
Cost: Eight people dead, 20 critically injured
Disaster: Radiation therapy software by Multidata Systems International miscalculated the proper dosage, exposing patients to harmful and in some cases fatal levels of radiation. The physicians, who were legally required to double-check the software s calculations, were indicted for murder.
Cause: The software calculated radiation dosage based on the order in which data was entered, sometimes delivering a double dose of radiation.
21. EDS Drops Child Support (2004)
Cost: 539 million and counting
Disaster: Business services giant EDS developed a computer system for U.K. s Child Support Agency (CSA) that accidentally overpaid 1.9 million people, underpaid another 700,000, had 3.5 billion in uncollected child support payments, a backlog of 239,000 cases, 36,000 new cases stuck in the system, and still over 500 documented bugs.
Cause: EDS introduced a large, complex IT system to the CSA while trying to simultaneously restructure the agency.
22. FBI s Virtual Case File project (2005)
Cost: $105 million, still no effective case file solution
Disaster: The FBI scrapped its computer systems overhaul after four years of effort. The Virtual Case File project was a massive, integrated software system for agents to share case files and other information.
Cause: Mismanagement, and an attempt to build a long-term project on technology that was outdated before the project completed, resulted in a complex and unusable system.
In 1969 in Rome, Dijkstra spoke to Joel Aron (not Eyles), "head of IBM's Federal Systems Division which had been responsible for the software of the moonshot". It wasn't specifically in the lunar module, but somewhere in the 40,000 LoC. However, the bug was that the moon was repelling; and it was found by accident, 5 days before launch. Dijkstra said to Joel Aron "How do you that? Do what? - Joel asked. Getting that software right! - Dijkstra answered "
Right?! - he said that in one of the calculations of the orbit of lunar module the Moon have been defined repelling instead of attracting.
They have bee discovered this error by accident!
Imagine by accident.. by ACCIDENT, five days before the shot!
Dijkstra went white and said: "Those guys have been lucky!"
Yes - Joel Aron agreed.
Joel D. Aron
Federal Systems Division, Federal Systems Center, Gaithersburg, Maryland.
Military engineering (U. S. Military Academy, B.S., 1948). Joined IBM in 1954. Bas had a variety of technical, managerial, and consulting assignments in applied science and development, with emphasis on defense and scientific systems.
This is how all the big contractors, like Telcos play the game.
The rule is to get the contract signed first, and then try to figure out if the project can actually be delivered. I would bet money that IBM outsourced or offshored a significant portion of the work, since there is no way IBM will hire any full time workers in the US, thus the huge churn in personnel, which leads to poor communication, lack of proper handoff to newcomers, and a certain doom to the project as a result. None of this is unique or surprising, and from my personal observation, EDS and others basically do the same thing - bid low, then jack the price through change orders. Nice little racket they have there, but sometimes they get found out, usually after the failures become catastrophic and irreversible.
IT Project Failures
Catalog of major I.T. projects which have failed to deliver.
Ford Motor Co.
System: Purchasing system
Cost: $400 million
Source: Spectrum IEEE Bob Charette 2005
Avis Europe ERP
Status: Project cancelled
Cost: $54.5 million
Source: Spectrum IEEE Bob Charette 2005
UK Inland Revenue
Client: UK Inland Revenue
Problem: Software errors
Cost: $3.45 Billion tax-credit overpayment
Source: Spectrum IEEE Bob Charette 2005
System: Inventory System
Loss: $33.3 million
Source: Spectrum IEEE Bob Charette 2005
technorati tags:deathmarch, itfailure
Thursday, July 06, 2006
Customer: Central Provident Fund (CPF) Singapore
Source: Business Times
CPF and IBM in legal spat over IT project 05/07/06. SINGAPORE's Central Provident Fund Board and computing giant IBM are embroiled in one of the biggest IT project failures...
FBI Virtual Case File
Source: IEEE http://spectrum.ieee.org/print/1455
Implementors: SAIC (Science Applications International Corp)
LOC: 700 000
Cost: $170 million project
Problems: bug-ridden and functionally off target
poorly defined and slowly evolving design requirements;
overly ambitious schedules; and the lack of a plan to guide hardware purchases, network deployments, and software development for the bureau
"What is a record and what is available under discovery? In a paper world, you do your job, you do your notes, and if you don't like it, it goes somewhere," Azmi said. "In an electronic world, nothing really is destroyed; it's always somewhere."
Sunday, December 11, 2005
Denver Airport Automated Baggage System
Source: John Swartz, Dr Dobbs Journal (Registration required)
Client: Denver International Airport
Project: The Denver International Airport Automated Baggage System (DIA ABS)
Cause of failure
Report of computer simulation was late, and was not considered
Technical cause of failure
Planes Deadlocked due to limited number of carts. (1 baggage per cart)
Estimated Cost: $195 million
Actual Project Costs: $250 million, delayed, unreliable
Project was eventually shelfed, and returned to using baggage handlers, saving $1m per day
Remarks: John Swartz reran the simulation using Amiga and a free Lisp, arriving at the same results.
R. deNeufville Baggage System at Denver: Prospects and Lessons" (Journal of Air Transport Management, December 1994
Cargo Management Reengineering
Websphere, DB2, ZOS Mainframe
Design detail in the 19,000 pages of analysis for ICS includes 800 screens, 16,000 business rules, 70 complex business messages, 850 database tables, 3700 executable load modules, 1800 CICS transaction types, 55 batch jobs, 90 reports and 35 system interfaces. (Source: ACS)
Technology Infrastructure prescribed by Legislation
Legislation passed in 2001 created a legal framework for electronic cargo management secured by Public Key Infrastructure (PKI) using the GateKeeper accredited certification authority to deliver registration or certification services to meet Commonwealth standards.
Cost blowout from $33m to $240m.
Big bang approach with new rule sets introduced
"The problems experienced in part, flow from inaccurate and incomplete information being submitted by some users, which the new system is designed not to accept for security reasons," the spokesperson said.
Type of failure
Not phased in. Not running old and new system in parallel.
The system that it replaced was 4 years old.
The Federal Government has introduced the UK Gateway methodology to manage IT project risk.
Personnel Management Key Solution
Department of Defense
organisational structures, personnel administration and leave, career management, workforce planning as well as recruitment and payroll.
Originally estimated to cost $25 million, in 2002 Defence admitted that the project was going to end up costing in the order of $70 million.
“It is the nature of the military salary and allowance processes and systems that is substantially more complex than in the civilian world.”
Source: Defence's payroll system explodes
posted by Chui Tey @ 9:01 PM 0 comments
House of Representatives Payroll System
House of Representatives
temporarily canceled some employees' jobs,
mishandled paycheck withholdings
added extra money
Payroll systems have been implemented badly by many sites. In many ways, payroll systems should be one least likely to be wrong, because it is possible to run parallel systems and cross check results. If one system computes one pay, and another computes a different result, you know you have a bug.
Unfortunately, the number of bugs can overwhelm the developers when a system is rolled out. This means additional money must be invested to run the systems in parallel, perhaps some form of integration between the old system and the new system.
Personnel, Payroll and Related Systems
Personnel, Payroll and Related Systems
Deloitte & Touche LLP
Irish Health Service
After being launched around 1995, the project was budgeted at $10.7 million and was expected to take three years. After 10 years, the expected price tag has rocketed to $180 million.
Made widespread payroll errors
Complexity of the system it was replacing was cited as a factor. There were over 2,500 variations in payment arrangements across the entire health system.
Major rewrites of Personnel and Payroll systems, especially where consolidation is the main aim, is fraught with risks, as the business rules that apply in the legacy system can be fairly arbitrary.
posted by Chui Tey @ 7:51 PM 0 comments