18
CSRL C o m p le x S ystem s R e se a rc h Lab o ra to ry June 28, 2022 1 Massachusetts Institute of Technology, 2002 This paper was presented at the Digital Avionics Systems Conference in 2001. This paper and similar papers on accidents, accident modeling and accident reports can be found at http://sunnyday.mit.edu/accidents/index.html Kathryn Anne Weiss [email protected] http://www.mit.edu/~weissk Complex Systems Research Laboratory (CSRL) Department of Aeronautics and Astronautics Massachusetts Institute of Technology Tuesday, September 7, 2004 An Analysis of Causation in Aerospace Accidents

Kathryn Anne Weiss weissk@mit mit/~weissk

  • Upload
    roch

  • View
    33

  • Download
    2

Embed Size (px)

DESCRIPTION

An Analysis of Causation in Aerospace Accidents. Kathryn Anne Weiss [email protected] http://www.mit.edu/~weissk Complex Systems Research Laboratory (CSRL) Department of Aeronautics and Astronautics Massachusetts Institute of Technology Tuesday, September 7, 2004. - PowerPoint PPT Presentation

Citation preview

Page 1: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 1Massachusetts Institute of Technology, 2002

This paper was presented at the Digital Avionics Systems Conference in 2001. This paper and similar papers on accidents, accident modeling and accident reports can be found at

http://sunnyday.mit.edu/accidents/index.html

Kathryn Anne [email protected]

http://www.mit.edu/~weissk

Complex Systems Research Laboratory (CSRL)Department of Aeronautics and Astronautics

Massachusetts Institute of Technology

Tuesday, September 7, 2004

An Analysis of Causation inAerospace Accidents

Page 2: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 2Massachusetts Institute of Technology, 2002

Recent Aerospace LossesAriane 5

SOlar HeliosphericObservatory

Mars Climate Orbiter

Titan/Centaur/Milstar

Page 3: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 3Massachusetts Institute of Technology, 2002

Ariane 5• June 4, 1996, 40 seconds after launch, the launcher veered

off its nominal flight path and exploded

• Reused the IRS software from Ariane 4 on the Ariane 5– The time sequence of the Ariane 5 lift-off is

significantly different from that of the Ariane 4– A function was left in the Ariane 5 software for

commonality reasons, “based on the view that, unless proven necessary, it was not wise to make changes in software which worked well on Ariane 4”

• An exception was raised causing the nozzle of the solid rocket boosters to deflect, from which the launcher experienced high aerodynamic loads

Page 4: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 4Massachusetts Institute of Technology, 2002

Mars Climate Orbiter• Relied heavily on previous designs of MGS and Pathfinder

• There was an error in the spacecraft’s navigation measurements of nearly 100 km, which resulted in a much lower altitude than expected during MOI and led to the vehicle’s break-up in the atmosphere

• The conversion factor from English to Metric units was erroneously left out of the AMD files

– Interface Specification required that the impulse-bit calculations should be done using Metric Units

– The software supplied by a vendor that used English units

Page 5: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 5Massachusetts Institute of Technology, 2002

Titan/Centaur/Milstar• Mission to place Milstar in a geosynchronous orbit

• Roll rate filter constant should have been entered as–1.992476, but was entered as –0.1992476

– Centaur/Milstar began experiencing instability about the roll axis during the first burn

– Instability greatly magnified during Centaur’s second main engine burn, resulting in vehicle tumbling

• The Centaur attempted to compensate with its RCS, which ultimately depleted available propellant

– The third engine burn terminated early • Milstar satellite placed in a low elliptical final orbit

Page 6: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 6Massachusetts Institute of Technology, 2002

SOHO Background• SOHO, or the SOlar Heliospheric Observatory, is a joint

effort between NASA and ESA to perform helioseismology and monitor the solar atmosphere, corona and wind

– SOHO was launched on December 2, 1995, was declared fully operational in April of 1996, and completed a successful two-year primary mission in May of 1998

– It then entered into its extended mission phase– After roughly two months of nominal activity, contact

with SOHO was lost June 25, 1998

Page 7: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 7Massachusetts Institute of Technology, 2002

SOHO Loss (1/4)• The loss was preceded by a routine calibration of the

spacecraft's three roll gyroscopes (named A, B and C) and by a momentum management maneuver

• In order to increase the amount of science done during the mission and to increase the gyros’ lifespans, a decision was made to compress the timeline of the operational procedures for momentum management, gyro calibration and science instrument calibration into one continuous sequence

– The previous process had included a day between completing gyro calibration and beginning the momentum management procedures

Page 8: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 8Massachusetts Institute of Technology, 2002

SOHO Loss (2/4)• Because the gyro calibration in the new compressed

timeline was immediately followed by a momentum management procedure, despinning the gyros at the end of the gyro calibration and re-enabling the on-board software gyro control function was not required

• However, after the gyro calibration, Gyro A was specifically despun in order to conserve its life, while Gyros B and C remained active

Page 9: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 9Massachusetts Institute of Technology, 2002

SOHO Loss (3/4)• The modified predefined command sequence in the on-

board control software had an error; it did not contain a necessary function to reactivate Gyro A, which was needed by the Emergency Sun Reacquisition

– This omission resulted in the removal of the functionality of the spacecraft’s normal safe mode, ESR, and ultimately caused the sequence of events that led to the loss of telemetry

• In addition, there was another error in the software that resulted in leaving Gyro B in its high gain setting following the momentum management maneuver

– This error originally triggered the ESR

Page 10: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 10Massachusetts Institute of Technology, 2002

SOHO Loss (4/4)• The first error was contained within a software function

called A_CONFIG_N

• ESR requires the use of Gyro A for roll control

• Any procedure that spins down Gyro A must set a flag in the computer to respin Gyro A whenever the safe mode is triggered

• When A_CONFIG_N was modified, the software enable command was omitted due to “a lack of system knowledge of the person who modified the procedure”

• Because the change had not been properly communicated, the operator procedures did not indicate that Gyro A had been spun down

Page 11: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 11Massachusetts Institute of Technology, 2002

Lessons Learned• We can learn lessons from these and other (all very

different) aerospace accidents by examining the factors common among them

• These factors are systemic and indicative of many accidents involving aerospace software systems

• Systemic factors can be grouped into the following categories:

– Flaws in the Safety Culture– Ineffective Organizational Structure– Ineffective Technical Activites

Page 12: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 12Massachusetts Institute of Technology, 2002

Flaws in the Safety Culture• Overconfidence and Complacency

– Success is ironically one of the progenitors of accidents– In SOHO led to inadequate testing and review of

changes to ground-issued commands, a false sense of confidence in the team's ability to recover from an ESR, the use of challenging schedules, etc.

• Discounting or Not Understanding Software Risks– An engineering culture that has unrealistic expectations

about software and the use of computers– Changing (SOHO) software without introducing errors

or undesired behavior is much more difficult than building correct software initially

Page 13: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 13Massachusetts Institute of Technology, 2002

Flaws in the Safety Culture (Cont.)• Assuming Risk Decreases over Time

– In the Titan/Centaur/Milstar loss, the Titan Program Office decided that because software was “mature, stable, and had not experienced problems in the past,” they could use the limited resources available after the initial development effort to address hardware issues

• Inadequate Emphasis on Risk Management

• Incorrect Prioritization of Changes

• Slow Understanding of the Problems Associated with Human-Automation Mismatch

Page 14: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 14Massachusetts Institute of Technology, 2002

Ineffective Organizational Structure• Diffusion of Responsibility and Authority

– In almost all of the spacecraft accidents, there appeared to be serious organizational and communication problems among the geographically dispersed partners

• Low-level status or Missing System Safety Program– In the SOHO report, no mention is made to any

formal safety program.• Limited Communication Channels and Poor Information

Flow

Page 15: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 15Massachusetts Institute of Technology, 2002

Ineffective Technical Activities• Flawed or Inadequate Review Process

– For SOHO, the changes to the ground-generated commands were subjected to very limited review

• Inadequate Specifications– Software-related accidents almost always are due to

misunderstandings about what the software should do• Inadequate System and Software Engineering

• Software Reuse Without Appropriate Analysis of its Safety– Two of the spacecraft accidents, Titan and Ariane,

involved reused software originally developed for other systems

Page 16: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 16Massachusetts Institute of Technology, 2002

Ineffective Technical Activities (Cont.)• Unnecessary Complexity and Software Functions

– The Ariane 5 and Titan IVB-32 accidents clearly involved software that was not needed, but surprisingly the decision to put in or to keep these features (in the case of reuse) was not questioned in the accident reports. 

• Inadequate System Safety Engineering

• Test and Simulation Environments that do not Match the Operational Environment

– A general principle in testing aerospace systems is to “fly what you test and test what you fly”

Page 17: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 17Massachusetts Institute of Technology, 2002

Ineffective Technical Activities (Cont.)• Deficiencies in Safety-Related Information Collection and

Use

• Operational Personnel Not Understanding the Automation– The SOHO report says that the software enable

function had not been included as part of the modification to A-CONFIG-N due to a lack of system knowledge of the person who modified the procedure

• Inadequate and Ineffective Cognitive Engineering and Feedback

– SOHO controllers did not have the information they needed about the state of the gyros and the spacecraft in general to make appropriate decisions

Page 18: Kathryn Anne Weiss weissk@mit mit/~weissk

CSRLComplex Sys tem sRes earc h Labora to ry

April 22, 2023 18Massachusetts Institute of Technology, 2002

Conclusions• By examining recent, software-related aerospace

accidents, we notice similarities, or systemic factors, involved in the losses

• These similarities and parallels should help in focusing efforts to prevent future accidents