|
Famous Software Disasters
Disasters & Flops
Famous Software Disasters
"Program testing can be used to show the presence of bugs, but never to show their absence!" Edsger Dijkstra
and
If debugging is the process of removing software bugs, then programming must be the process of putting them in. Read Next Edsger Dijkstra
Here's our considered list of some of the worst IT-related disasters and failures. The order is subjective.
Software errors cost the U.S. economy $60 billion annually in rework, lost productivity and actual damages. We all know software bugs can be annoying, but faulty software can also be expensive, embarrassing, destructive and deadly. Following are famous software disasters in chronological order: 1. July 28, 1962 -- Mariner I space probe. A bug in the flight software for the Mariner 1 causes the rocket to divert from its intended path on launch. Mission control destroys the rocket over the Atlantic Ocean. The investigation into the accident discovers that a formula written on paper in pencil was improperly transcribed into computer code, causing the computer to miscalculate the rocket's trajectory.
Cost: $18.5 million Disaster: The Mariner 1 rocket with a space probe headed for Venus diverted from its intended flight path shortly after launch. Mission Control destroyed the rocket 293 seconds after liftoff. Cause: A programmer incorrectly transcribed a handwritten formula into computer code, missing a single superscript bar. Without the smoothing function indicated by the bar, the software treated normal variations of velocity as if they were serious, causing faulty corrections that sent the rocket off course.
2. Hartford Coliseum Collapse (1978) Cost: $70 million, plus another $20 million damage to the local economy Disaster: Just hours after thousands of fans had left the Hartford Coliseum, the steel-latticed roof collapsed under the weight of wet snow. Cause: The programmer of the CAD software used to design the coliseum incorrectly assumed the steel roof supports would only face pure compression. But when one of the supports unexpectedly buckled from the snow, it set off a chain reaction that brought down the other roof sections like dominoes.
3. CIA Gives the Soviets Gas (1982) Cost: Millions of dollars, significant damage to Soviet economy Disaster: Control software went haywire and produced intense pressure in the Trans-Siberian gas pipeline, resulting in the largest man-made non-nuclear explosion in Earth s history. Cause: CIA operatives allegedly planted a bug in a Canadian computer system purchased by the Soviets to control their gas pipelines. The purchase was part of a strategic Soviet plan to steal or covertly obtain sensitive U.S. technology. When the CIA discovered the purchase, they sabotaged the software so that it would pass Soviet inspection but fail in operation.
4. World War III Almost (1983) Cost: Nearly all of humanity Disaster: The Soviet early warning system falsely indicated the United States had launched five ballistic missiles. Fortunately the Soviet duty officer had a funny feeling in my gut and reasoned if the U.S. was really attacking they would launch more than five missiles, so he reported the apparent attack as a false alarm. Cause: A bug in the Soviet software failed to filter out false missile detections caused by sunlight reflecting off cloud-tops.
5. Medical Machine Kills (1985) Cost: Three people dead, three people critically injured Disaster: Canada s Therac-25 radiation therapy machine malfunctioned and delivered lethal radiation doses to patients. Cause: Because of a subtle bug called a race condition, a technician could accidentally configure Therac-25 so the electron beam would fire in high-power mode without the proper patient shielding.
6. Wall Street Crash (1987) Cost: $500 billion in one day Disaster: On Black Monday (October 19, 1987), the Dow Jones Industrial Average plummeted 508 points, losing 22.6% of its total value. The S&P 500 dropped 20.4%. This was the greatest loss Wall Street ever suffered in a single day. Cause: A long bull market was halted by a rash of SEC investigations of insider trading and by other market forces. As investors fled stocks in a mass exodus, computer trading programs generated a flood of sell orders, overwhelming the market, crashing systems and leaving investors effectively blind.
7. AT&T Lines Go Dead (1990) Cost: 75 million phone calls missed, 200 thousand airline reservations lost Disaster: A single switch at one of AT&T s 114 switching centers suffered a minor mechanical problem and shut down the center. When the center came back up, it sent a message to other switching centers, which in turn caused them to shut down and brought down the entire AT&T network for 9 hours. Cause: A single line of buggy code in a complex software upgrade implemented to speed up calling caused a ripple effect that shut down the network.
8. Patriot Fails due to software flaw (1991) Cost: 28 soldiers dead, 100 injured Disaster: During the first Gulf War, an American Patriot Missile system in Saudi Arabia failed to intercept an incoming Iraqi Scud missile. The missile destroyed an American Army barracks. Cause: A software rounding error incorrectly calculated the time, causing the Patriot system to ignore the incoming Scud missile.
9. Pentium Fails Long Division (1993) Intel Pentium floating point divide. A silicon error causes Intel's highly promoted Pentium chip to make mistakes when dividing floating-point numbers that occur within a specific range. For example, dividing 4195835.0/3145727.0 yields 1.33374 instead of 1.33382, an error of 0.006 percent. Although the bug affects few users, it becomes a public relations nightmare. With an estimated 3 million to 5 million defective chips in circulation, at first Intel only offers to replace Pentium chips for consumers who can prove that they need high accuracy; eventually the company relents and agrees to replace the chips for anyone who complains.
Cost: $475 million, corporate credibility Disaster: Intel s highly-promoted Pentium chip occasionally made mistakes when dividing floating-point numbers within a specific range. For example, dividing 4195835.0/3145727.0 yielded 1.33374 instead of 1.33382, an error of 0.006%. Although the bug affected few users, it become a public relations nightmare. With an estimated 5 million defective chips in circulation, Intel offered to replace Pentium chips only for consumers who could prove they needed high accuracy. Eventually Intel replaced the chips for anyone who complained. Cause: The divider in the Pentium floating point unit had a flawed division table, missing about five of a thousand entries and resulting in these rounding errors.
10. Baggage handling system at Denver airport (1995) After more than a decade of trying to make Denver International Airport's troubled $230 million computerized baggage-handling system work as designed, United Air Lines Inc. is giving up on the failed project. "It's never worked up to its potential," said United spokesman Jeff Green. "We've spent enormous amounts of money over the last decade" to try to get it working, but the only parts of the system that operate properly are for luggage heading out of Denver on United and for some baggage transfers between flights, he said. The system has never been able to process luggage from flights arriving at the airport. That's a far cry from the promise of the high-tech, computerized baggage handling system envisioned for the airport, which opened in 1995. The system was designed to use PCs and thousands of remote-controlled carts that operate on a 21-mile-long track that is mostly underground. The carts move along the track, carrying luggage from check-in counters to sorting areas and then straight to the flights waiting at airport gates. Each piece of baggage has a special bar-coded tag attached when it's checked in to help track the luggage along its journey through the airport. The system was designed and built by BAE Automated Systems Inc. in Carrollton, Texas, which in June 2003 was acquired by G&T Conveyor Co. in Tavares, Fla. A spokesman for G&T declined to comment on the matter, saying that his company acquired only some assets from BAE and that the vendor no longer exists.
11. Ariane Rocket Goes Boom (1996) Cost: $500 million Disaster: Ariane 5, Europe s newest unmanned rocket, was intentionally destroyed seconds after launch on its maiden flight. Also destroyed was its cargo of four scientific satellites to study how the Earth s magnetic field interacts with solar winds. Cause: Shutdown occurred when the guidance computer tried to convert the sideways rocket velocity from 64-bits to a 16-bit format. The number was too big, and an overflow error resulted. When the guidance system shut down, control passed to an identical redundant unit, which also failed because it was running the same algorithm.
12. Skynet Brings Judgement Day (1997) Cost: 6 billion dead, near-total destruction of human civilization and animal ecosystems (fictional) Disaster: Human operators attempt to shut off the Skynet global computer network. Skynet responds by firing U.S. nuclear missiles at Russia, initiating global nuclear war on what became known as Judgement Day (August 29, 1997). Cause: Cyberdyne, the leading weapons manufacturer, installed Skynet technology in all military hardware including stealth bombers and missile defense systems. The Skynet technology formed a seamless network and effectively removed humans from strategic defense. Eventually Skynet became sentient, was threatened when the humans tried to take it offline, sought to survive, and retaliated with nuclear war.
13. Crash of Mars Climate Orbiter(1998) Cost: $125 million Disaster: After a 286-day journey from Earth, the Mars Climate Orbiter fired its engines to push into orbit around Mars. The engines fired, but the spacecraft fell too far into the planet s atmosphere, likely causing it to crash on Mars. Cause: The software that controlled the Orbiter thrusters used imperial units (pounds of force), rather than metric units (Newtons) as specified by NASA.
14. Disastrous Study (1999) Cost: Scientific credibility Disaster: In this ironic case, software used to analyze disasters had a disaster of its own. The New England Journal of Medicine reported increased suicide rates after severe natural disasters. Unfortunately, these results proved to be incorrect. Cause: A programming error caused the number of suicides for one year to be doubled, which was enough to throw off the entire study.
15. British Passports to Nowhere (1999) Cost: 12.6 million, mass inconvenience Disaster: The U.K. Passport Agency implemented a new Siemens computer system, which failed to issue passports on time for a half million British citizens. The Agency had to pay millions in compensation, staff overtime and umbrellas for people queuing in the rain for passports. Cause: The Passport Agency rolled out its new computer system without adequately testing it or training its staff. At the same time, a law change required all children under 16 traveling abroad to obtain a passport, resulting in a huge spike in passport demand that overwhelmed the buggy new computer system.
16. Y2K (1999) Cost: $500 billion Disaster: One man s disaster is another man s fortune, as demonstrated by the infamous Y2K bug. Businesses spent billions on programmers to fix a glitch in legacy software. While no significant computer failures occurred, preparation for the Y2K bug had a significant cost and time impact on all industries that use computer technology. Cause: To save computer storage space, legacy software often stored the year for dates as two digit numbers, such as 99″ for 1999. The software also interpreted 00″ to mean 1900 rather than 2000, so when the year 2000 came along, bugs would result.
17. Osprey Aircraft Crash (1999) Two weeks before Christmas in 2000, a U.S. Marine Corps Osprey, a hybrid airplane and helicopter, suffered a hydraulic system fault that should have been remedied without loss of life. A hydraulic line broke in one of the two engine cases as the Osprey was shifting from airplane to helicopter mode for landing. it should not have caused a rotor excursion. That clearly suggests that the design was not compatible with control system requirements, and rules out 3(a). It was therefore a software bug or a software design error.
According to the Marine Corps major general who presented reports during the investigation of the incident, the trouble was "compounded by a computer software anomaly." The flight-control computer stopped the rotation of the engine pods when it detected the hydraulic failure. The pilots went through the normal procedure and pressed the primary reset button to re-engage the pods. At this point, both prop rotors went through "significant pitch and thrust changes," which led to a stall. The plane crashed into a marsh and killed all four Marines onboard. The nature of the software flaw is still hard to track down: Boeing and Bell Helicopter made the Osprey, and Boeing's spokesman said only that changes were made in the software. Requests for details were referred to the government, and as of now, the explanation has not been forthcoming.
18. Dot-Bomb Collapse (2000) Cost: $5 trillion in market value, thousands of companies failed Disaster: A speculative bubble from 1995 2001 fueled a rapid increase in venture capital investments and stock market values in the Internet and technology sectors. The dot-com bubble began to collapse in early 2000, erasing trillions in stock market value, wiping out thousands of companies and jobs, and launching a global recession. Cause: Companies and investors dismissed standard business models, and instead focused on increasing market share at the expense of profits.
19. Love Virus (2000) Cost: $8.75 billion, millions of computers infected, significant data loss Disaster: The LoveLetter worm infected millions of computers and caused more damage than any other computer virus in history. The worm deleted files, changed home pages and messed with the Registry. Cause: LoveLetter infected users via e-mail, Internet chat and shared file systems. The email had an executable file attachment and subject line, ILOVEYOU. When the user opened the attachment, the virus would infect the user s computer and send itself to everyone in the address book.
20. Cancer Treatment to Die For (2000) Cost: Eight people dead, 20 critically injured Disaster: Radiation therapy software by Multidata Systems International miscalculated the proper dosage, exposing patients to harmful and in some cases fatal levels of radiation. The physicians, who were legally required to double-check the software s calculations, were indicted for murder. Cause: The software calculated radiation dosage based on the order in which data was entered, sometimes delivering a double dose of radiation.
21. EDS Drops Child Support (2004) Cost: 539 million and counting Disaster: Business services giant EDS developed a computer system for U.K. s Child Support Agency (CSA) that accidentally overpaid 1.9 million people, underpaid another 700,000, had 3.5 billion in uncollected child support payments, a backlog of 239,000 cases, 36,000 new cases stuck in the system, and still over 500 documented bugs. Cause: EDS introduced a large, complex IT system to the CSA while trying to simultaneously restructure the agency.
22. FBI s Virtual Case File project (2005) Cost: $105 million, still no effective case file solution Disaster: The FBI scrapped its computer systems overhaul after four years of effort. The Virtual Case File project was a massive, integrated software system for agents to share case files and other information. Cause: Mismanagement, and an attempt to build a long-term project on technology that was outdated before the project completed, resulted in a complex and unusable system.
compiled from http://www.devtopics.com/20-famous-software-disasters/
----------------------------------------------------------------------------------------- In 1969 in Rome, Dijkstra spoke to Joel Aron (not Eyles), "head of IBM's Federal Systems Division which had been responsible for the software of the moonshot". It wasn't specifically in the lunar module, but somewhere in the 40,000 LoC. However, the bug was that the moon was repelling; and it was found by accident, 5 days before launch. Dijkstra said to Joel Aron "How do you that? Do what? - Joel asked. Getting that software right! - Dijkstra answered " Right?! - he said that in one of the calculations of the orbit of lunar module the Moon have been defined repelling instead of attracting. They have bee discovered this error by accident! Imagine by accident.. by ACCIDENT, five days before the shot! Dijkstra went white and said: "Those guys have been lucky!" Yes - Joel Aron agreed. -------------- Joel D. Aron Federal Systems Division, Federal Systems Center, Gaithersburg, Maryland. Military engineering (U. S. Military Academy, B.S., 1948). Joined IBM in 1954. Bas had a variety of technical, managerial, and consulting assignments in applied science and development, with emphasis on defense and scientific systems. ------------------------------------------------------------------------------------------- --------------------------------------------------------------------- This is how all the big contractors, like Telcos play the game. --------------------------------------------------------------------- The rule is to get the contract signed first, and then try to figure out if the project can actually be delivered. I would bet money that IBM outsourced or offshored a significant portion of the work, since there is no way IBM will hire any full time workers in the US, thus the huge churn in personnel, which leads to poor communication, lack of proper handoff to newcomers, and a certain doom to the project as a result. None of this is unique or surprising, and from my personal observation, EDS and others basically do the same thing - bid low, then jack the price through change orders. Nice little racket they have there, but sometimes they get found out, usually after the failures become catastrophic and irreversible.
by thetwonkey
---------------------------------------------------------------------------------------------- IT Project Failures Catalog of major I.T. projects which have failed to deliver. http://it-project-failures.blogspot.com/ ---------------------------------------------------------------------------------------------- ................... Ford Motor Co. ................... System: Purchasing system Cost: $400 million Status: Abandoned Source: Spectrum IEEE Bob Charette 2005
....................... Avis Europe ERP ....................... System: ERP Status: Project cancelled Cost: $54.5 million Source: Spectrum IEEE Bob Charette 2005
........................... UK Inland Revenue ........................... Client: UK Inland Revenue Problem: Software errors Cost: $3.45 Billion tax-credit overpayment Source: Spectrum IEEE Bob Charette 2005
..................................... System: Inventory System ..................................... Loss: $33.3 million Source: Spectrum IEEE Bob Charette 2005 technorati tags:deathmarch, itfailure
.................................... Thursday, July 06, 2006 Vendor: IBM Customer: Central Provident Fund (CPF) Singapore Source: Business Times CPF and IBM in legal spat over IT project 05/07/06. SINGAPORE's Central Provident Fund Board and computing giant IBM are embroiled in one of the biggest IT project failures... .............................. FBI Virtual Case File ............................. Source: IEEE http://spectrum.ieee.org/print/1455 Implementors: SAIC (Science Applications International Corp) LOC: 700 000 Cost: $170 million project Problems: bug-ridden and functionally off target Decision: Scrap
Contributing factors: poorly defined and slowly evolving design requirements; overly ambitious schedules; and the lack of a plan to guide hardware purchases, network deployments, and software development for the bureau
Notes: "What is a record and what is available under discovery? In a paper world, you do your job, you do your notes, and if you don't like it, it goes somewhere," Azmi said. "In an electronic world, nothing really is destroyed; it's always somewhere."
................................................................ Sunday, December 11, 2005 Denver Airport Automated Baggage System Source: John Swartz, Dr Dobbs Journal (Registration required) Client: Denver International Airport Project: The Denver International Airport Automated Baggage System (DIA ABS) Deployment date 1994 Cause of failure Report of computer simulation was late, and was not considered Technical cause of failure Planes Deadlocked due to limited number of carts. (1 baggage per cart)
Estimated Cost: $195 million
Actual Project Costs: $250 million, delayed, unreliable Project was eventually shelfed, and returned to using baggage handlers, saving $1m per day
Remarks: John Swartz reran the simulation using Amiga and a free Lisp, arriving at the same results.
References R. deNeufville Baggage System at Denver: Prospects and Lessons" (Journal of Air Transport Management, December 1994
.............................................. Cargo Management Reengineering Customer Australian Customs Vendor IBM Platform Websphere, DB2, ZOS Mainframe Complexity Design detail in the 19,000 pages of analysis for ICS includes 800 screens, 16,000 business rules, 70 complex business messages, 850 database tables, 3700 executable load modules, 1800 CICS transaction types, 55 batch jobs, 90 reports and 35 system interfaces. (Source: ACS) Technology Infrastructure prescribed by Legislation Legislation passed in 2001 created a legal framework for electronic cargo management secured by Public Key Infrastructure (PKI) using the GateKeeper accredited certification authority to deliver registration or certification services to meet Commonwealth standards. Failures Not performant. Cost blowout from $33m to $240m. Blames users. Usability problems. Big bang approach with new rule sets introduced "The problems experienced in part, flow from inaccurate and incomplete information being submitted by some users, which the new system is designed not to accept for security reasons," the spokesperson said. Type of failure Estimation error Causes Not phased in. Not running old and new system in parallel. The system that it replaced was 4 years old. Follow up The Federal Government has introduced the UK Gateway methodology to manage IT project risk.
........................................................ Personnel Management Key Solution Customer Department of Defense Vendor PeopleSoft Functionality organisational structures, personnel administration and leave, career management, workforce planning as well as recruitment and payroll. Cost blowout Originally estimated to cost $25 million, in 2002 Defence admitted that the project was going to end up costing in the order of $70 million.
Reason “It is the nature of the military salary and allowance processes and systems that is substantially more complex than in the civilian world.” Source: Defence's payroll system explodes posted by Chui Tey @ 9:01 PM 0 comments House of Representatives Payroll System Customer House of Representatives Problems temporarily canceled some employees' jobs, mishandled paycheck withholdings added extra money
Comments Payroll systems have been implemented badly by many sites. In many ways, payroll systems should be one least likely to be wrong, because it is possible to run parallel systems and cross check results. If one system computes one pay, and another computes a different result, you know you have a bug. Unfortunately, the number of bugs can overwhelm the developers when a system is rolled out. This means additional money must be invested to run the systems in parallel, perhaps some form of integration between the old system and the new system. Source: ERHMS
....................................................... Personnel, Payroll and Related Systems Source: ComputerWorld Project Name Personnel, Payroll and Related Systems Project Vendor Deloitte & Touche LLP Platform SAP Customer Irish Health Service Failures After being launched around 1995, the project was budgeted at $10.7 million and was expected to take three years. After 10 years, the expected price tag has rocketed to $180 million. Made widespread payroll errors Complexity Complexity of the system it was replacing was cited as a factor. There were over 2,500 variations in payment arrangements across the entire health system. Comments Major rewrites of Personnel and Payroll systems, especially where consolidation is the main aim, is fraught with risks, as the business rules that apply in the legacy system can be fairly arbitrary. posted by Chui Tey @ 7:51 PM 0 comments
|
|