˘ ˇ ˆ˙˙˝origins.sese.asu.edu/ses405/Failure Reports/NEAR_Rendezvous_Burn… · The Near Earth Asteroid Rendezvous (NEAR) space-craft, after traveling for nearly 3 years, initiated

��

��

��

��

��

��

�� !�"�#"$%&��

��

��

�� !�� "�#��$��#��!��%��

��&��!�� "�#��$��#��!��%��

��

��$��

�� "�#��'��

��'��"�#��$��#��!��%��

��$��

��

��

��

��

��

��

��

��

CONTENTS

EXECUTIVE SUMMARY ................................................................................................................................. 1

INTRODUCTION ............................................................................................................................................. 2

What Was Supposed to Have Happened ....................................................................................................... 2Mission Operations Planning and Review ...................................................................................................... 2What Was Observed to Happen .................................................................................................................... 3

DISCUSSION .................................................................................................................................................... 4

Reconstructed Timeline ................................................................................................................................. 4Overview of Events ...................................................................................................................................... 6Simulation Results ........................................................................................................................................10A Walk Through the Fault Tree ....................................................................................................................14

High Actual Momentum (Branch 1.1 of the Fault Tree) .....................................................................14

Momentum Stored in Fluids, or Slosh (Branch 1.1.1) ..........................................................................14Slosh Modeling and Analysis ..............................................................................................................16Other Torque on Spacecraft (Branch 1.1.2) .......................................................................................17

Low Momentum Falsely Reported as High (Branch 1.2 of the Fault Tree) .......................................18

Processor and Software Errors (Branch 1.3 of the Fault Tree) .........................................................20

Processor Hardware Errors (Branch 1.3.1) .......................................................................................20Processor Software Errors (Branch 1.3.2) .........................................................................................20Code Inspection Team .......................................................................................................................20

FINDINGS AND RECOMMENDATIONS .......................................................................................................23

APPENDICES

Appendix A: NEAR Anomaly Review Board Membership and Charter (Not online)Appendix B: Overview of NEAR Propulsion System (Not online)Appendix C: Overview of NEAR Guidance and Control System (Not online)Appendix D: Overview of NEAR Autonomy System (Not online)Appendix E: Overview of NEAR Simulation Environment (Not online)Appendix F: Upgrades to Simulation to Support NEAR Anomaly Studies (Not online)Appendix G: NEAR Mechanical Response to LVA Transient (Not online)Appendix H: Reconstructed RND1 Event Timeline (Not online)Appendix I: The RND1 Timeline Reconstruction Process (Not online)Appendix J: Reconstructed Autonomy Command History (Not online)Appendix K: Reconstructed CTP1 and CTP2 Command Histories (Not online)Appendix L: Summary of Brassboard Simulations (Not online)Appendix M: ReferencesAppendix N: List of Acronyms

iii

EXECUTIVE SUMMARY

On 20 December 1998 the Near Earth Asteroid Ren-dezvous (NEAR) spacecraft began the first and largestof a series of rendezvous burns required for capture intoorbit around the asteroid Eros. Almost immediately af-ter the main engine ignited, the burn aborted, demotingthe spacecraft into safe mode. Less than a minute laterthe spacecraft began an anomalous series of attitudemotions, and communications were lost for the next27 hours. Onboard autonomy eventually recovered andstabilized the spacecraft in its lowest safe mode (Sun-safe mode). However, in the process NEAR had per-formed 15 autonomous momentum dumps, fired itsthrusters thousands of times, and consumed 29 kg offuel (equivalent to about 96 m/s in lost delta-v capabil-ity). The reduced solar array output during periods ofuncontrolled attitude ultimately led to a low-voltage shut-down in which the solid-state recorder was powered offand its data lost. After reacquisition, NEAR was com-manded to a contingency plan and took images of Erosas the spacecraft flew past the asteroid on 23 Decem-ber. The NEAR team quickly designed a make-up ma-neuver that was successfully executed on 3 January1999. The make-up burn placed NEAR on a trajectoryto rendezvous with Eros on 14 February 2000, 13 monthslater than originally planned. The remaining fuel is suffi-cient to carry out the original NEAR mission, but withlittle or no margin.

A NEAR Anomaly Review Board (NARB) wasformed to determine the reason for the rendezvous burnevents and to make recommendations for NEAR andfor similar programs. The cause of the abort itself wasdetermined within 2 days of the event: the main engine’snormal start-up transient exceeded a lateral accelera-tion safety threshold that was set too low. Compoundingthis error was a missing command in the onboard burn-abort contingency command script; this script errorstarted the attitude anomaly. Fault protection softwareonboard NEAR correctly identified the problem and took

the designed, preprogrammed actions. While the faultprotection actions did prevent complete battery dischargebefore the spacecraft recovered its proper Sun-facingorientation, they did not prevent, and they possibly evenexacerbated, the protracted recovery sequence.

The Board’s investigation included a painstaking re-construction of the post-abort timeline from the smallamounts of data that remained following solid-state re-corder powerdown. More than 128 simulations were runon a NEAR simulator containing ground hardware rep-licas of all six flight processors running the actual flightcode. Additional simulations were run on a software-only simulator. These simulations show that the faultprotection actions should have ended the attitude anomalyquickly. Although the simulation fidelity was substan-tially improved and extended during the course of thisinvestigation, it is clearly deficient in some respect, sincewe are unable to duplicate the entire sequence of eventsthat occurred in flight. An independent review of theflight code was also conducted, and suspect hardwareand circuit elements were reviewed. The investigationestablished a good understanding of the events duringapproximately the first 47 min after the abort, but noexplanation for the failure of onboard autonomy to quicklycorrect the problem. The Board found no evidence thatany hardware fault or single-event upset contributed tothe failure. Although software errors were found thatcould prolong and exacerbate the recovery, they by nomeans fully explain it.

The Board is unable to establish a complete explana-tion for the rendezvous burn events. Nevertheless, weinclude in this report observations and recommendationsthat could prevent a recurrence on NEAR or on otherprograms. These recommendations focus on improvingquality control and configuration management withinMission Operations, making better use of NEAR’s simu-lation capability, and taking certain defensive measureson the spacecraft.

1

INTRODUCTION

The Near Earth Asteroid Rendezvous (NEAR) space-craft, after traveling for nearly 3 years, initiated a plannedmain engine burn on 20 December 1998 to begin therendezvous with target asteroid 433 Eros. After a shortsettling burn, the main engine ignited. A fraction of asecond into the planned 15-min burn, it aborted. Theabort placed the spacecraft in safe mode to await in-structions from the ground, as it is designed to do. Butwhat happened next was definitely not expected. Withina few seconds, all communication from the spacecraftwas lost; NEAR remained silent for 27 hours. Whencommunications were reestablished, it was found thatduring the blackout period the spacecraft had encoun-tered an attitude anomaly, experienced a low-voltagecondition, and lost 29 kg of fuel.

The main engine abort meant that NEAR would soonfly past Eros rather than rendezvous with it. MissionOperations personnel and science team planners wereprepared with a contingency picture-taking sequence toobtain some benefit from the flyby. On 23 December,more than 1000 images were taken as NEAR flew towithin 4,000 km of Eros. Concurrently, the NEAR teamworked to understand the cause of the abort well enoughto safely command another rendezvous burn. The24-min make-up burn of the main engine on 3 Januarywas successful. It increased NEAR’s speed by 940 m/sto catch up to the faster-moving Eros, which had over-taken NEAR during the flyby. A small clean-up burnfollowed on 20 January. These two burns placed NEARon course to rendezvous with Eros 13 months later thanoriginally planned. The make-up burn left NEAR withabout one-third the fuel it would have had if the originalburn had been successful. The remaining fuel is suffi-cient to carry out the original NEAR mission, but thereis no longer any margin.

Once the spacecraft was safed and the NEAR mis-sion rescued, it became important to fully understandthe cause of the anomaly. A NEAR Anomaly ReviewBoard (NARB) was formed to determine the cause ofthe abort and, particularly, the cause of the subsequentevents. The NARB membership and charter are pre-sented in Appendix A. This report sets forth the Board’sfindings and recommendations.

What Was Supposed to Have Happened

For NEAR to catch up with and rendezvous with thefaster-moving Eros requires that the spacecraft be spedup with a series of four engine burns. The first rendez-vous burn, RND1, would have increased the spacecraft’svelocity by 650 m/s using the main, bipropellant large

velocity adjust (LVA) engine (see Appendix B; Ref. 1and Appendices B–E provide tutorial information onNEAR). This 15-min LVA burn was planned to begin at 5p.m. Eastern Standard Time (EST) on 20 December 1998,when NEAR was 238,000 km from Eros. The LVA burnis preceded by a 3-min settling burn from NEAR’s 22-Nmonopropellant thrusters to force the liquid oxidizer againstits tank outlets. During the LVA burn, the 22-N thrustersprovide vernier control.

If the LVA burn had terminated normally, a commandsequence (the “clean-up macro”) would have beenexecuted to shut the propellant tanks and make a grace-ful transition back to attitude control using reactionwheels. The latter action also automatically selects the4.5-N thrusters (the “B” thruster group) if an autono-mous momentum dump is required, instead of using thelarger 22-N thrusters (in the “A” thruster group) thathad been commanded for use during the LVA burn.

Following the LVA burn on 20 December, a secondburn 8 days later was planned. It would have increasedNEAR’s velocity by 294 m/s, at a distance of 21,000 kmfrom Eros. This would have reduced NEAR’s speedrelative to Eros to less than 30 m/s. On 3 January, asmall third burn was planned to reduce relative speed afurther 22 m/s, at 5,000 km. At 10 a.m. EST on 10 Janu-ary, NEAR would have performed one last small burnto begin orbiting Eros at an altitude of 1,000 km.

Mission Operations Planning and Review

This general rendezvous sequence was part of themission design from inception. In preparation for the ren-dezvous, NEAR Mission Operations had begun buildingup its staff from an average of 7 full-time equivalents(FTEs) during the 3-year cruise phase to 21 FTEs justbefore the rendezvous. Detailed design of the commandscripts for the four planned rendezvous burns and forthe contingency burns began in September 1998. Thedesigns were based on the successful Deep Space Ma-neuver (DSM) performed in July 1997, the only previ-ous firing of the LVA in flight. It should be noted that,between the DSM and RND1 burns, Mission Opera-tions introduced a new scripting tool (SEQGEN) thatrequired certain prior scripts from the DSM to be broughtmanually into the new system.

An outside review board, chaired by the Jet PropulsionLaboratory (JPL), conducted an independent review on27 October 1998 of NEAR’s readiness to rendezvouswith Eros. That board presented its findings to the NEARteam and issued its report on 14 November 1998, declar-ing NEAR “well on its way to a successful rendezvous

2

and orbit” (Ref. 2). The board did raise 10 key issues,however, with recommendations for each, to “increasethe robustness of the already excellent preparation.” APLtook positive steps to address each of the concerns raisedand submitted written responses on 14 December 1998to each of the issues raised (Ref. 3).

Mission Operations, working with guidance and con-trol (G&C) and propulsion engineers, began generatingthe RND1 command script in November 1998. The scriptswere internally reviewed for the last time on7 December during a detailed command review. This re-view included members of G&C and Mission Operations,but did not include the spacecraft system engineer. Thereview’s action items were documented by e-mail. TheRND1 commands were uploaded on 16 December. Thisupload also included a contingency burn (RND1A) thatcould be executed the next day if RND1 was missed forany reason that could be corrected within 24 hours. Sev-eral other burn scenarios were planned should the pri-mary and backup dates for RND1 not be feasible forsome reason. Multiple 34-m and 70-m antennas fromNASA’s Deep Space Network (DSN) were scheduledto cover the real-time execution of RND1 to assure thatadequate telemetry and tracking data were available forassessing maneuver performance.

What Was Observed to HappenThe sequence of small settling burns executed nomi-

nally beginning at T – 200 s on 20 December (T = 0 is thestart time of the LVA burn). This sequence added about6 m/s to spacecraft velocity, as planned. However, whenthe LVA burn began at T = 0, it shut down within a fractionof a second. RF communications from the spacecraft werelost at T + 37 s and not regained for 27 hours. The record ofdownlink Doppler measurements from the spacecraft(Fig. 1) shows these events up to loss of signal.

At 8 p.m. EST on 21 December, 27 hours after lossof signal, the DSN verified positive lock on a downlinksignal from NEAR. The spacecraft was in the lowest ofits safe modes, Sun-safe-rotate, in which the solar arraynormal is pointed at the Sun and the spacecraft rotatesaround the spacecraft–Sun line once every 3 hours (seeAppendix D). On the second rotation, a command wassent to stop the rotation (Sun-safe-freeze mode). As databegan to arrive at the Mission Operations Center, severalthings became clear:

• The LVA burn had aborted.• A low-voltage trip had occurred, and the bus volt-

age (nominally 33 V) had gotten as low as 24.3 V.• Detailed data stored in the solid-state recorders

had been permanently lost when they wereswitched off during the low-voltage trip.

• A significant amount of fuel (28.5 ± 1.5 kg) hadbeen lost.

• The active Attitude Interface Unit (AIU) was nowAIU2, rather than AIU1, which had been active atRND1 start.

• All three gyros had entered the Whole Angle Mode(WAM), suggesting that high body rates had beenexperienced in all three axes.

Within a day or two of recovery, the precipitatingcauses were understood well enough to permit a newburn, which took place successfully 3 January 1999. Datafrom thruster firings during this burn showed that nothruster had been damaged by cold firings that mighthave occurred during the RND1 event. Given the amountof propellant that was used, it can be theorized that therewere few cold starts on any thruster because the timebetween firings was less than the thruster cool-downtime of at least 2 hours. Aside from the fuel loss, theonly permanent damage suffered by the spacecraft ap-pears to be contamination of the Multispectral Imager’soptics by residues from propellant lost during the anomaly.This contamination degrades the response at the short-est wavelengths. In-flight calibrations and ground pro-cessing can partially correct these effects, minimizingthe impact on mission science.

The fuel lost during RND1 is equivalent to 96 m/sdelta-v. DSN tracking results since RND1 indicate thata net delta-v of 12 m/s was imparted to the spacecraftafter the settling burn and before the recovery. This smalleffect for a large expenditure of fuel is consistent with atumbling spacecraft.

Figure 1. Spacecraft radial velocity, as measured by thedownlink signal Doppler shift, shows the settling burn,the moment of LVA burn abort, and the subsequent lossof signal 37 s later. (Figure adapted from Ref. 4.)

One-Way Light Time (OWLT) = 20 min, 42 s— 1500

— 2000

— 2500

— 3000

— 3500

— 4000

Do

pp

ler

velo

city

(m

m/s

)

Universal Time22:23:19 22:24:46

LVA fires @ 22:03:20 at NEAR

22:24:00.5

22:24:01.5

0.17 m/s LOS

Settling burn (200 s)

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AA

AAAA

3

DISCUSSION

Formal NARB activities began at the first meetingon 21–22 January 1999. At this meeting, principals fromthe NEAR team provided tutorial briefings and answeredthe Board’s questions. The NARB also conducted pri-vate interviews with many of the principals. To providetechnical support to the NARB, an internal NEARAnomaly Group (NAG) was established, headed byDr. T. Strikwerda, supervisor of the Space Department’sMission Concept and Analysis Group. The NAG includeda G&C analyst, the NEAR system engineer, two mem-bers of the Mission Operations team, and the NEARperformance assurance engineer. In late February a codeinspection team was established to support NAG andNARB activities. This team, headed by M. White,supervisor of the Space Department’s Flight Softwaresection, was chartered to inspect the AIU and flight com-puter (FC) codes for possible software errors that couldexplain the observed events. The code inspection teamconsisted of three software experts and two additionalanalysts. This team also provided software support forthe brassboard upgrades needed to support NAG simula-

tions. The upgrades are described in Appendix F; thesimulations and results are summarized in Appendix L.

Reconstructed Timeline

The initial NAG activities focused on trying to estab-lish the exact timeline of events from the limited dataavailable. Because all of the data stored in NEAR’s solid-state recorders were lost during the low bus voltageevent, the timeline had to be reconstructed by deductionand inference from the limited data stored in processormemories (Appendix I describes the process). Thesedata consisted mostly of partial records of commandsand autonomy rules executed and of min/max values(and their times of occurrence) for selected telemetrypoints. Timeline reconstruction took approximately6 weeks. The detailed reconstructed master timeline isgiven in Appendix H. An abridged version is showngraphically in Figs. 2 (early events) and 3. The descrip-tion of events given below is based on these data and onbrassboard simulation results.

NEAR RND1 (ABORTED) TIME AFTER LVA SHUT-DOWN (MINUTES)

[Ref. 22:03:16Z 20 Dec 1998] 0 20 40 60 80 100

Power System Status Charge D Charge D No Sun Load Reduced Charge(D = Discharge) (LVS Trip)

Safe Mode State, PointingRND1 Burn (19 deg off-sun)

Earth-Safe

Sun-Safe Rotate Probably O.K. Off-Sun High Body Rates O.K.

Thrusters Available Tanks Shut Tanks Open Tanks Shut

Momentum (Body + Wheels)Off-Scale *

Unacceptable * * Continuous

High Plan to Dump * Dumps Sporadic

O.K. DumpsMomentum Dumps: ^ ^ ^ ^ ^ ^ ^ ^ (next at 183)

Gyro Mode Probably Normal Probably Whole-Angle (noisy)

Thruster Activity (intermittent within intervals shown)A Thrusters Warming Catalyst

B Thrusters

Attitude Interface Unit #1 #2 #1 #2 #1 #2

Control by FC or AIU FC Probably FC AIU FC FC AIU FC FC FC Probably AIU

Figure 2. Early events in the RND1 timeline, measured from time of abort. Uncertainties in the timeline result fromhaving to deduce and infer events from the small amounts of stored data remaining after the solid-state recorder waspowered off during LVS.

4

Figure 3. The complete RND1 timeline, derived from Table H-1 in Appendix H. Uncertainties in the timeline result from having to deduce and infer events from small amounts of stored data remaining after the solid-state recorder was powered off during LVS.

5

Overview of Events

The series of anomalous events began at LVA igni-tion, when the accelerometers sensed a lateral accel-eration probably in excess of 0.15 m/s2, which exceededthe threshold of 0.10 m/s2 that had been set to detectuncontrolled lateral thrust during any burn. (Groups offour consecutive 100 sample/s accelerometer readingsare averaged, then compared to a threshold.) This ac-celeration represented a reasonable mechanical responseto the normally abrupt LVA start-up, and it can be con-cluded that the threshold was set too tightly. An LVAstart-up transient had been observed on the only previ-ous use of the LVA, the DSM burn of July 1997 (Fig. 4);however, the significance of this transient was not ap-preciated. In addition, NEAR’s structural design, with aseparate propulsion module cantilevered from the baseof the spacecraft, heightens the mechanical coupling ofthis transient to the accelerometers (Fig. 5 and Appen-dix G); this bending response was not appreciated. In

retrospect, the correct thing for the G&C software tohave done would have been to ignore (blank out) theaccelerometer readings during the brief transient period,or at the very least to increase the filtering and/or raisethe threshold during that time. It is significant that NEARis the first of JHU/APL’s 58 spacecraft to use bipropel-lant propulsion, and RND1 was only the second time theLVA was fired in orbit.

The abort triggered execution of preprogrammed com-mands appropriate for a burn abort anomaly, includingcommands to place the spacecraft in Earth-safe mode(see Appendix D). The command system executed this34-s script as expected. Meanwhile, the G&C softwarebegan a 19° slew from the attitude necessary for LVAburn to the Earth-safe attitude. In Earth-safe mode, thesolar panels are aimed toward the Sun and the space-craft is rolled around the Sun line to place the medium-gain (fan-beam) antenna toward Earth to await furtherinstructions. The slew was initiated by thrusters ratherthan by reaction wheels, because the G&C system had

Figure 4. An acceleration transient similar to that which initiated RND1 occurred during the only previous use of theLVA engine, the Deep Space Maneuver burn in July 1997. The significance of the transient was not appreciated at thetime.

X-b

od

y ac

cele

rati

on

Y-b

od

y ac

cele

rati

on

Z-b

od

y ac

cele

rati

on

Time after start of settling burn (s)

DSM acceleration 3 July 1997

0

0.2

0.4

0.6

0.04

0.02

0

— 0.02

— 0.04

— 0.06

0.2

0.1

0

198 199 200 201 202 203 204

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

X axis transient atLVA ignition

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Z axis transient atLVA ignition

(m/s

2 )(m

/s2 )

(m/s

2 )

6

been configured to use the 22-N thrusters for attitudecontrol during the burn. This instantaneous response bythe G&C system to use thrusters for attitude control isperhaps a design weakness, since using thrusters for atti-tude control is generally to be avoided except during anactual delta-v maneuver. Simulations show that the thrust-ers imparted significant momentum to the system and es-tablished a body rate of about 1°/s toward the Sun.

Due to insufficient review and testing of the clean-upscript, the commands needed to make a graceful transi-tion to attitude control using reaction wheels were miss-ing. The G&C system remained commanded to use the22-N thruster group for attitude control. The commandscript did, however, close the fuel tanks and disable powerto the thrusters (at T + 24 s). Without thruster power,the G&C system uses the reaction wheels by default.With the large momentum in the system, the reactionwheels could not stop the attempted slew quickly enough,and the spacecraft overshot the Sun. When the solarpanels were not pointed at the Sun within 300 s of theinitiation of Earth-safe mode, an autonomy rule demotedthe spacecraft to Sun-safe mode.

During these Sun acquisition activities, the G&C mo-mentum management software detected the large sys-tem momentum and initiated a 30-min warm-up of thethruster catalyst beds in preparation for an autonomousmomentum dump (a so-called Red dump; see AppendixC). About 7 min into the warm-up period, a data struc-ture error in the use of wheel speed data for the mo-mentum calculation caused the warm-up timer to be

Large VelocityAdjust (LVA)Thruster

InertialMeasurementUnit

Aft Deck

PropulsionSubstructure

Figure 5. NEAR’s structural design uses a separate pro-pulsion module cantilevered from the aft deck of the mainspacecraft structure. The resulting flexure during LVA ig-nition couples into the accelerometers located in theIntertial Measurement Unit. (Propulsion tanks and twoside panels are removed for clarity.)

reset. The error involved misreporting a wheel at its maxi-mum speed as being at zero speed, causing the systemto falsely compute a momentum below the safe (Green)level. The data structure error explains the “mystery”of the first dump occurring 37 min, rather than 30 min,after the first report of Red momentum. On the basis ofbrassboard simulation results and the general lack ofany contrary indications, we believe that during the final20–30 min of this 37-min warm-up period, the space-craft attitude and rate probably stabilized into the de-sired Sun-safe mode.

When the catalyst bed warm-up period expired, thethrusters were enabled and the first momentum dumpbegan (T + 00:37:52). The dump terminated after 285 s,just 15 s short of the time allowed. At the conclusion ofthe dump, the G&C system signaled the command pro-cessor to remove power from the thrusters. A brief pe-riod occurred during this handshake when G&C wasnot dumping momentum, but the thrusters were still pow-ered because the command system had not yet respondedto the G&C request for dump termination. This periodwas lengthened by a design error (bug) in the FC code,discovered during brassboard simulations 22–24. Thisbug keeps the thruster request high for several secondstoo long and can also cause unnecessary AIUswitchovers due to time-out. Had G&C been correctlyconfigured to use reaction wheels to control attitude,leaving thrusters powered for this brief period wouldhave had no ill effect. But the RND1 command scripterror left the G&C system commanded to use the 22-Nthrusters to control attitude. During the period of themomentum dump, the spacecraft had drifted offSun-pointing. With thrusters powered for a few seconds,the G&C system immediately attempted to regain Sun-pointing by firing thrusters. That imparted a “kick” ofmomentum back into the system, this time at a level highenough to trigger a momentum dump without catalystbed warm-up (a so-called White dump). At T + 00:42:53,the G&C system began this second momentum dump,but failed to complete it in the allowed 300 s.* An au-tonomy rule booted the backup AIU and gave it control.The AIU switch-over finally reestablished the default op-erating conditions within G&C: attitude control using re-action wheels and autonomous momentum dumps usingthe smaller 4.5-N thruster set normally used for attitudecontrol when the LVA is not actually firing.

*The G&C system has an internal timer limiting thruster use re-quests to 270 s. It was thought that this was long enough to dumpmomentum under nearly all conditions but still short of the Com-mand and telemetry processor’s (CTP’s) 300-s autonomy limit. How-ever, repeated simulations have shown that there is significant “over-head” before G&C starts dumping, making the 270 s marginally toolong; that value has since been decreased on the spacecraft.

7

Given the range of unknown initial conditions shortlyafter T = 0, simulations have reproduced the observedspacecraft behavior reasonably well for the period upthrough the first two dumps (about T + 00:47:00), but notthe behavior after that. From that point on, all simula-tions of RND1 that do not introduce some additionalfault mode (e.g., a stuck-on thruster) result in the excessmomentum being dumped and nominal attitude controlbeing regained relatively quickly. In flight, however, theG&C system continued attempting to dump momentum.Autonomy interrupted this activity at 300-s intervals toswitch back and forth between alternate AIUs in anattempt to end the excessive thruster use. After fiveAIU switches, the self-limiting autonomy rules leftbackup AIU2 in control and stopped attempting to limitthe momentum dump duration. Analysis of the autonomydata indicates there were eight more momentum dumpsover the next 7 hours. With autonomy rules no longerlimiting their duration, many of these were very long(the longest over 1000 s). It was during this period thatmost of the fuel was probably consumed.

At T + 00:59:24, a Low Voltage Sense (LVS) tripoccurred, indicating that attitude was not controlled toSun-pointing for a considerable period. The exact pe-riod of time required for a fully charged battery to reachLVS* depends on many variables—such as heater dutycycles, wheel usage, and the exact battery temperature—that are unknown for RND1. To obtain a lower boundon the time, a worst-case analysis was performed. Us-ing a low battery energy capacity and Sun-safe space-craft loads iterated to match telemetry records (245 W),analysis suggests that at least 9 min of discharging (Sunangle greater than about 50°–60°) is required. This iswell within the time (22 min) from the start of the firstdump to the LVS, and coincidentally, is identical to thetime between when the solar panels turned edge-on tothe Sun and the LVS. It should also be noted that the“misreported wheel speed” error already described,which was responsible for extending the 30-min warm-up to 37 min, could also have caused more time to bespent at high Sun angles during the early part of thewarm-up period. An example of this effect is seen insimulation 63 (Appendix L). In this simulation themisreported wheel speed error causes the Sun angle toexceed 50° for almost 7 min during the early part of thewarm-up period. Since the battery recharges at onlyabout 1/20th the rate of Sun-safe mode discharge, thissimulation illustrates how the misreported wheel speederror can shorten the time to LVS.

* LVS occurs at 26-V bus voltage, corresponding to about 27.1 V atthe battery.

The data also indicate that body rates increased dra-matically, with system momentum going off the scale(>25.5 Nms, corresponding to about 4°/s) at T + 00:50:20.Another reason to believe the spacecraft experiencedhigh body rates is that the gyros—all three axes inde-pendently—were found in Whole Angle Mode (WAM)when the spacecraft was recovered. This mode isentered when the gyro’s normal strap-down mode (alocked force-to-rebalance loop) cannot be maintained;this occurs when body rates exceed about 11°/s (seeAppendix C). This mode was added to NEAR’s gyrosspecifically to deal with tip-off following launch vehicleseparation. Analysis and tests have not established anyother credible means of entry to WAM. No data allowus to pinpoint the time of entry into WAM. One datapoint does show that the gyros were not in WAM atT + 00:00:11, so we know that the abort did not directlycause the entry into WAM. The time of entry into WAMis of interest because our simulations show that oncethe gyros enter this coarse and noisy mode, momentumdump performance becomes extremely inefficient.

About T + 01:21:00 the spacecraft entered a quietperiod (Fig. 6). The next momentum dump would notoccur for another 1.5 hours. The only data point recordedduring this period shows that the bus voltage recoveredto 28 V at about T + 01:25:00, indicating that the arrayswere pointing generally toward the Sun and chargingthe battery. With Sun-safe loads, charging would beginalmost instantaneously when the panels were within 40°of the Sun, but we have no indication of how long thisattitude was sustained. The quiet period ended at aboutT + 03:02:00, when the G&C system began anotherseries of seven momentum dumps.

Over the next 3 hours there was a higher level ofcomputed momentum, resulting in four Red dumps (thosepreceded by catalyst bed warm-up), three White (im-mediate) dumps, and one dump that started as Red butswitched to White after about 7 min. As described above,with autonomy no longer intervening, some of thesedumps were very long. The long periods of high calcu-lated momentum probably resulted from the high WAMnoise that was feeding the momentum estimate.

At T + 06:09:46, the bus voltage recorded its mini-mum value (24.3 V, less than the LVS trigger voltage),indicating continued drain on the battery during this pe-riod. There were no autonomy rules that would be trig-gered by this condition, so no other associated data werecaptured. However, we can conclude that the attitudewas not well-controlled to Sun-pointing.

At about this same time, T + 06:10:00, the spacecraftentered another quiet period lasting more than 2 hours.We have no data points from this quiet period, but fromthe fact that the bus voltage stopped dropping we know

8

Figure 6. Extracted from Fig. 3 and Table H-1 in Appendix H, this figure illustrates that momentum dumps occurred inthree distinct groupings, separated by quiescent periods up to 2 hours long. When momentum goes Red, a 30-minwarm-up of the thruster catalyst bed heaters begins in anticipation of an autonomous momentum dump.

Momentum not Red

Momentum Red

0 5000 10000 15000 20000 25000 30000

Time (s)

Momentum dumpin progress

Not dumping

that the solar panels were mainly Sun-pointing. At aboutT + 08:31:00, G&C carried out one final White momen-tum dump of about 4.5-min duration. No further unex-pected activity occurred.

In all, 15 autonomous dumps had occurred, anestimated 7910 s of dump activity. Estimating fuel useby counting “thruster seconds” shows that this thrusteractivity is about right to explain the 29 kg fuel loss. Thedetails of these later dumps and the intervening quietperiods remain unexplained. At recovery, the spacecraftwas in a stable Sun-safe mode attitude and rotation witha fully charged battery.

The LVS trip at T + 00:59:24 turned off the solid-state recorders, clearing their volatile memory of thedetailed telemetry records. From the limited data avail-able to reconstruct events, we have certainty about mostof the spacecraft mode transitions and their timing, butno detailed information about thruster firings, G&C sen-sor data, spacecraft attitude and rates, or which of thetwo G&C computers (AIU or FC) was in control. Nomi-nally, the G&C system operates with the FC generatingthe actuator commands and the AIU acting only as aninterface unit and a checker. When an AIU is booted itimmediately tries to go to this nominal “FC override”state. The G&C system has successfully entered theFC override state after AIU boot in all of our brassboardsimulations. Furthermore, those simulations that most

resemble the reconstructed timeline (30+ min Red warm-up followed by two dumps) show that the FC usuallymaintains control during the first and second dumps.Simulations with only one dump before the AIU switchusually have been in AIU control earlier, due to Sun keep-in (SKI) violation; the exact reason for this difference isunclear. At recovery, the AIU reported that it had takencontrol from the FC because of an SKI violation (anSKI violation occurs if solar panels are not pointed within8° of the Sun for 300 s). Since the time-out for SKI isthe same as the time-out for excessive thruster use thatforced the AIU switches, we can be reasonably certainthat at least the five momentum dumps following AIUswitches were performed under FC control. The lowvoltage condition recorded in the data (indicating con-siderable periods spent significantly off Sun-pointing)suggest that the SKI logic probably triggered the AIU totake over control soon after the last switch. This is notcertain, however, since the solar panels only had to passwithin 8° of the Sun briefly to reset the 300-s SKI timer.All brassboard simulations show periods of control byboth the FC and AIU, with the exact timing of the peri-ods dependent on initial conditions.

Based on extensive simulations (a typical one isdescribed below), high WAM noise in the gyro comesclosest to replicating conditions during the periods of con-tinuous momentum dumping. The noise levels required

9

are 7–10 times higher than levels measured on the ac-tual gyros with the spacecraft quiescent. These simula-tions show poor performance by the G&C system withhigh WAM noise, so much so that the spacecraft “wal-lows” significantly, often pointing many tens of degreesfrom the Sun (highest reported Sun angles by each ofthe AIUs during RND1 were 129° and 169°). However,in our simulations the system never settles down unlessthe WAM noise is subsequently reduced; yet we knowthat NEAR did settle down and that the noise at recov-ery was quite low.

Simulation Results

An extensive series of simulations was run to provideinsight into the spacecraft behavior after the abort. Mostof the simulations were conducted on the NEARbrassboard simulator. The simulation process required asubstantial effort to repair accumulated brassboard hard-ware failures, improve fidelity, and add enhancementsneeded to test candidate causes for the anomaly. A totalof 128 simulations were run over a 7-month period. A

1

number of these were designed to explore variouspostulated branches on the fault tree (see fault tree dis-cussion in the following section and Appendix L). A pat-tern soon emerged that fit the RND1 behavior up to thefirst AIU switch. Events after the first switch variedconsiderably depending on the failure mode tested. Wehave chosen one of the most recent simulations (num-ber 126) to illustrate the “classic” behavior that seemsto most nearly replicate the behavior inferred from theRND1 timeline reconstruction.

In simulation run 126 (shown in Figs. 7, 8, and 9), theevent begins with the spacecraft pointed 19° from theSun during the 200-s settling burn (point a of Fig. 7).The system momentum is highly variable due to thethruster firings (point b). At abort the spacecraft de-motes from operational to Earth-safe mode (c). TheG&C actuators selection was set to thrusters only forthe LVA burn, and the thrusters now slew the space-craft toward the Sun (d). Meanwhile, the command andtelemetry processor (CTP) begins executing the burn-abort macro, which removes power from the thrusters(e). Without powered thrusters, the G&C system ignores

Figure 7. Simulation 126 illustrates the “classic” early timeline behavior. Shown are true and reported Sun angle andtotal momentum, as well as indicator flags showing which processor is in control and its state, as well as spacecraftmode. The bottom chart shows the momentum Red flag and the state of the thruster enable line.

Even

t in

dica

tors

Mom

entu

m

(Nm

s)Su

n an

gle

(deg

)

0 5 10 15 20 25 30 35 40 45 50 55 600

50

100

150 Sun Angle

a dg m xAIU HK

True

0 5 10 15 20 25 30 35 40 45 50 55 60

0

5

10

Nm

s

Momentum

Gyros WAM

b

h n

p

s

AIU HKTrue

0 5 10 15 20 25 30 35 40 45 50 55 60

1234

c

j

q

rt

uv v

Active AIU Mode: (2,1,0)=(Op,ES,SSR) State: (4,3)=(FC Oride, AIU)

0 5 10 15 20 25 30 35 40 45 50 55 600

0.5

1

Minutes after Abort

e

f

i

k l

o

w w w w

Mom Red (Warm Catbed)FVC Enable

Click here for a high-speed animation of the simulated spacecraft motions (first 90 min). Runtime 35 s, 2.2 MB.

0

Figure 8. Expanded view of Fig. 7 at the time of the first momentum dumps. In this simulation, the gyro is manuallyforced into WAM at about 3100 s (with artifically high noise level).

Even

t in

dica

tors

Mom

entu

m (

Nm

s)Su

n an

gle

(deg

)

2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 34000

50

100

150 Sun Angle

mx

AIU HKTrue

2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 34000

5

10

15

Nm

s

Momentum

Gyros WAMn

p

s

AIU HKTrue

2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400

1234

q

rt

uv v

Active AIU Mode: (0 = SSR) State: (4,3)=(FC Oride, AIU)

2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 34000

0.5

1

Seconds after Abort

lo

w w w

Warm CatbedFVC RequestFVC Enable

Even

t in

dica

tors

Mom

entu

m (

Nm

s)Su

n an

gle

(deg

)

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 3600

50

100

150 Sun Angle

z

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 3600

20

40

60Momentum

sy

bb

gg FC True

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360222426283032

Bus Voltage

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 36001234

Minutes after Abort

LVS2 LVS1

aacc dd ee

ffActive AIULVS EventsFVC Enable

Bus

vol

tage

(V

)

Figure 9. Illustration of the poorly controlled attitude and momentum when the gyros are in WAM. The third panelshows the power system status as simulated by a model within the brassboard (LVS trip point is 26 V). The bottompanel shows G&C processor and thruster status.

11

the commanded thrusters-only control mode and beginsto control attitude with the reaction wheels. Since theG&C system is no longer executing a delta-v maneuver,it declares that momentum is too high (approximately5 Nms in this case) and signals the command processorto warm the catalyst bed heaters (30-min warm-up) inpreparation for a momentum dump (f). The spacecraftcontinues slewing toward the Sun, but with high systemmomentum the reaction wheels cannot slow it downquickly enough, and the spacecraft overshoots the Sun,reaching a Sun angle of about 56° (g). As the wheelsspin up and one or more exceeds a software limit thathad been erroneously set too low, they are declared “in-valid” and removed from the momentum calculation (h)until they return within range. The momentum calcu-lated without the “invalid” wheel speeds briefly dips be-low the Green limit, which clears the momentum dumprequest (i). At about the same time the AIU demotesthe spacecraft to Sun-safe-rotate (j) because of a safemode SKI violation, i.e., the spacecraft has pointed morethan 8° away from the Sun for more than 300 s. As thespacecraft continues to rotate, the wheel speeds changeand all four are again used in the momentum calculation,restarting the 30-min catalyst bed warm-up timer (k). Thisbehavior explains the mystery of why the first RND1momentum dump occurred at 37 min rather than the ex-pected 30 min. This misreported wheel speed syndromehas been observed in several of the later simulations, anda special series of stand-alone, software-only simulationswas run to better understand it.

The spacecraft gradually recovers a stable attitudeand rotation rate during the 30-min catalyst bed warm-up period. The momentum dump begins at about 40 minwith the FVC_request (fine velocity control thrustersrequested) and FVC_enable (FVC thrusters powered)flags raised (l) and the warm-up flag lowered. Duringthe dump, the attitude is not controlled well by the wheelsas momentum is transferred from the wheels to the body;body rate is reduced by thrusters. Since the momentumis high and the dumping is inefficient, it takes some timeto transfer momentum from the wheels to the body. Thespacecraft drifts off the Sun (m) during this time. (Fig. 8provides an expanded timescale view of events m–v.)The dump terminates after a 270-s time-out expires (n)with the momentum in the Yellow zone—a so-calledYellow-Good-Enough (YGE) dump.

When a dump terminates, the G&C system lowersthe FVC_request flag to the command processor, whichresponds by unpowering the thrusters and lowering theFVC_enable flag (o). This process requires several sec-onds. Since the burn abort command script used forRND1 lacked the command to restore the reactionwheels as the attitude control actuators, the G&C

1

control laws switch from dump mode to attitude-controlmode using thrusters before the command processor canrespond. Since the spacecraft is pointing far from theSun, G&C fires thrusters again to restore Sun-pointing,imparting high momentum once again (p), and a secondmomentum dump begins, this time an immediate (White)dump because the momentum exceeds 6 Nms. Duringthis second dump, the AIU takes control (q) becausethe FC has not cleared the SKI violation within theallowed 300 s. This dump also completes as YGE, butunder AIU control, which apparently takes slightly longer.This time a command system autonomy rule detects thatthe dump request has exceeded 300 s. The autonomyrule powers AIU2 and gives it control (r). The boot-upof AIU2 causes a reset of the G&C parameters to nomi-nal; most important, the control actuator setting isdefaulted to reaction wheels. With the thrusters no longerused by G&C for attitude control (only for momentumdumping), no more momentum-imparting “kicks” occur.

In the actual RND1 abort on the spacecraft therewere also two dumps prior to the first AIU switch. Inother simulations with this “classic” behavior, there isonly one dump under AIU1 control before the switch toAIU2. The difference cannot be fully understood fromthe brassboard simulations but may be related to a “racecondition” between the AIU and the command proces-sor autonomy rule, since the YGE dump plus overheadappears to be about 300 s, the same as the 300-s time-out of the autonomy rule for AIU switch. Since the pur-pose of a YGE dump is to terminate a momentum dumpwith an acceptably low momentum before the autonomyrules switch AIUs, this indicates that the YGE time-out(270 s) is too long.

The timing of the first two momentum dumps and thefirst AIU switch in the simulation correspond quite wellwith what is known about the behavior of the space-craft following RND1. Some aspects of RND1, how-ever, never occur in the simulation. First, the spacecraftrates never approach the value needed to cause thegyros to enter WAM. This implies either that the simula-tion is not replicating the actual RND1 behavior, or thatsome rate-independent means of inducing WAM exists.None has been found in this investigation (see “LowMomentum Falsely Reported as High [Branch 1.2]” inthe section on the fault tree later in this report). Second,unless we induce some additional failure mode, thespacecraft always recovers in much less time than thatobserved in RND1. Without such intervention, the simu-lations all result in the momentum being reduced toacceptable levels and nominal, Sun-safe attitude and ratesbeing achieved within a relatively short time after theswitch to AIU2. Finally, this particular simulation doesnot encounter the extended time at poor Sun angle early

2

in the warm-up period due to the misreported wheelspeed error (simulation 63 in Appendix L shows an ex-ample of this effect).

When a failure mode is introduced after the first AIUswitch, the simulation results vary widely (Appendix Lsummarizes all 128 simulations). After exploring a num-ber of possible causes, we have found that only a highlyvariable and erroneous momentum calculation results inthe repeated momentum dumps that characterized RND1.One possible cause of this type of error is noise in the ratemeasurements obtained from the gyros. Because the gy-ros were in WAM on recovery, and because the noiselevel in WAM is considerably higher than in normal mode,we extensively explored the effect of WAM on space-craft behavior. We found that at noise levels measuredin flight on the day NEAR was recovered (0.0014° to0.0030° rms), the spacecraft was well controlled, andno repeated dumps occurred. At the Litton test values(0.0047° to 0.0135° rms [Ref. 5]), there are no repeatedmomentum dumps but the attitude does not fully stabi-lize. By trial and error, we found that a WAM noise levelof 0.02° rms is about the minimum level that causesrepeated momentum dumping.

In the simulation discussed here and shown inFigs. 7–9, the gyros are forced to WAM soon after theswitch to AIU2 (point s on the figures), replicating theapproximate time (relative to AIU2 turn-on) when thegyros may have entered this mode (this is approximatelythe time at which the momentum was reported to be at25.5 Nms, a saturated value in telemetry). With the gyrosin WAM, all subsequent dumps are very inefficient due tothe noisy gyro outputs; that is, they waste time and fuelreducing momentum. At start-up an AIU attempts to pro-mote the system to FC-override mode if it can (success-fully in every case simulated to date) (v). In this particularsimulation, the third dump, under FC control, also exceeds300-s duration, causing the command system autonomyrule to switch back to AIU1 (t). Since this dump also failsto clear the SKI violation within the allotted 300 s, theAIU again takes control from the FC (u), just before theAIU switch. At this high setting for the WAM noise am-plitude, the momentum estimate is very high most of thetime. There are frequent reports of high momentum whichtrigger immediate (White) momentum dumps (w). A totalof four dumps are visible in Fig. 7, but there are othersbeyond the range of the figure. Typically there are almostcontinuous dumps until the gyro noise is reduced (Fig. 9,point y). Note that the Sun angle is quasi-periodic duringthe high WAM noise period (x). This is typical of the simu-lations with high WAM noise and persists until the noise issubstantially reduced. This behavior results from a subtleinteraction of the unbalanced placement of the thrustersand the control system.

Figure 9 shows the entire time span of this simulation.WAM was initiated under the control of the brassboardoperator, since the simulations do not produce rates highenough to trigger WAM. Because momentum dumpingnever stops when the noise level is high (and after RND1,the repeated momentum dumps did stop), WAM noiselevels are manually stepped down to three distinct levels.The difference between true and FC-estimated angularmomentum in Fig. 9 is a measure of the level of WAMnoise. At initial WAM entry (s) the WAM noise was setartificially high (0.02° rms), a value chosen by trial anderror as about the minimum level that causes repeatedmomentum dumping. This value is about 7–10 timeshigher than levels actually observed on NEAR’s gyro.At 275 min (point y) the noise level is reduced to thevalues measured during gyro acceptance testing(Ref. 5). The tumbling rate decreases significantly (z)and repeated momentum dumping stops (aa), but theattitude does not fully stabilize. At 330 min (bb), theWAM noise is further reduced to the values estimatedfrom actual flight data when NEAR was recovered. Withthis final reduction in noise, the spacecraft at last achievesstable Sun-safe attitude and rate. One postulated mecha-nism for this variation in noise amplitude is that WAMnoise depends on rate or voltage. If true, the depen-dence must be a strong one, since the simulated rates(about 1°/s) are small. Further, in flight, NEAR had twolong quiet periods (1.5 and 2.3 hours) in between re-peated momentum dumps. Although this could be simu-lated by stepping the WAM noise up, as well as down,we can think of no credible mechanism that would causethis to happen in flight.

Also shown in Fig. 9 is the spacecraft bus voltage asmodeled in the Bench Test Equipment (BTE) portion ofthe brassboard (see Appendix E). The bus is at 33 Vwhenever there is sufficient array current, generally whenthe Sun angle is less than 90°. But as the spacecrafttumbles, the arrays point away from the Sun and the busdrops to the battery voltage, which continues to declinesince the net array current is insufficient to charge thebattery. At about 170 min the bus drops below 26 V,triggering the LVS event (LVS2, point cc of Fig. 9).LVS sheds a number of power system loads and alsoreenables the autonomy rules that cause AIU switchingfor incomplete momentum dumps. Two more AIUswitches are seen shortly thereafter (dd). Many mo-mentum dumps occur after these, but they do not triggerAIU switches because of the self-limiting autonomy (seeAppendix D). At about 240 min, the bus drops to 25 V(ee), triggering the LVS1 event (ee). This event doesnot reenable AIU switching.

The fifth AIU switch (ff) in this simulation is due toan autonomy rule triggered by the bus voltage going

13

below 23 V for more than 40 s. We know for certainthat this rule, which initiates numerous drastic actionsincluding power cycling of the gyros and turning off ofthe FC, did not fire in flight. The large momentum spike(gg) after the autonomy rule fires is because a thrusterfiring command is erroneously sent to the brassboardtruth model repeatedly during the autonomy rule activa-tion, causing the (simulated) thruster to be full ON forover 30 s. As soon as the autonomy rule completes andAIU2 is running normally, the excess momentum isdumped correctly. This thruster event is interesting—even alarming!—but it did not happen in flight becausethe hardware thruster drivers turn OFF in the absenceof a valid AIU. This does not happen on the brassboard,which erroneously responds to stale firing commands.In the actual RND1 event, the fifth and final switch wasdue to the two command processors being imperfectlysynchronized, so that the last AIU switch occurred twice(once from each command processor).

In flight after RND1, the LVS event occurred soonerand the five AIU switches occurred much closer togetherthan in this simulation. From this infomation, we suspectthat the brassboard power model has lower power drainthan the real spacecraft, or that the real Sun angle historyincluded less time at low Sun angle, or both.

A Walk Through the Fault TreeIn the course of the anomaly investigation numerous

“mysteries” presented themselves. At one time the “Mys-teries List” had eight distinct questions. For example,why did the catalyst bed warm-up for the first dumptake 37 min instead of the design value of 30 min? Thismystery was finally explained by the misreported wheelspeed error described earlier. Why was 23 min of FCSun vector usage reported (suggesting that the G&Csystem lost Sun sensor output for that long)? One hy-pothesis, which there was no safe way to test, was thatthruster plumes from a tumbling spacecraft might con-fuse the digital solar aspect detectors (DSADs) whenthe Sun is far from array normal. Another speculationwas that a flap of insulation may have covered a DSAD.Ultimately, this mystery was explained by the effects ofDSAD “keyholing” (the DSAD fields of view are ±64°square cones with “keyholed” regions of invisibility upto almost 7° above the array plane). One by one thequestions were answered, leaving the principal unex-plained mystery: why did we have such an inefficientrecovery, involving repeated momentum dumps over sucha long time? To attack this question, the fault tree inFig. 10 was constructed. This section systematically ex-plores the branches of the fault tree.

The repeated momentum dumps (Fig. 10) could re-sult from any combination of the following:

• Spacecraft momentum actually continued to exceedthe Red value (branch 1.1 of the fault tree).

• Spacecraft momentum was actually low, but a sen-sor falsely reported it as exceeding Red (branch 1.2).

• A processor was corrupted (bad algorithms, code,data structures, hardware, or single event upsets)(branch 1.3).

High Actual Momentum (Branch 1.1 of theFault Tree)

Momentum Stored in Fluids, or Slosh (Branch1.1.1)

Could high system momentum result from momentumstored in fluids (i.e., slosh)? NEAR’s three fuel tanks haveinternal diaphragms which, among other things, controlslosh. But NEAR’s two oxidizer tanks (OT), located oneabove the other on the spacecraft Z-axis, have no suchinternal propellant management device. Furthermore,NEAR’s tanks were approximately half full at RND1(Table 1), roughly the worst case for slosh. One postu-lated scenario is: (a) the settling burn “organizes” the82 kg of oxidizer into a contiguous mass; (b) after RND1abort, the spacecraft slew sets this mass in motion; (c)something approximating a spherical shell of oxidizer masscontinues to rotate, coupling momentum to the body throughviscous stresses at the OT interior walls; (d) the peakmotions thus induced throw the gyro into WAM.

During the design of NEAR, slosh was analyzed forpropulsion subsystem supplier Aerojet by SouthwestResearch Institute (SWRI) and reported in Ref. 6. Thisanalysis established the parameters used for the sloshmodel and helped design a fuel-efficient settling burnstrategy. The analysis also showed, among other things,that the time for the wall–liquid momentum coupling pos-tulated in (c) above could take on the order of 15 min,which might help explain some of the delayed effectsthat occurred later in the RND1 timeline. Thus we felt itimportant to include slosh dynamics in the RND1 simu-lations and analysis.

Table 1. NEAR’s propellant tanks.

Fuel OxidizerNumber of tanks 3 2Tank nominal radius 0.279 m 0.243 m

(11 in.) (9.56 in.)Propellant management diaphragm none internal deviceUltimate capacity of tanks 276 kg 173 kg (22% ullage)Load at launch 215 kg 109 kg (37% ullage)Load at RND1 165 kg 82 kg (47% full)

14

�B

�� !(��

Otherlectronics

us FC

.3.1

re

DataStructures

1.3.2

Software

Algorithm CompiledCode

1.3

Computer Errorardware or Software

omentum, Sensors OK)

Or

Or

��#��$(�)��!=�� (�+��>$+��1�!

Thruster Defective,Leaking or Stuck

(e.g., Valve SeatingFailure)

ThrusterNominal

Two or MoreSimultaneous Faults

Momentum GreaterThan Red Limit

(Sensors, Computers OK)

1.1

1.1.2.3.2

Commanded byAIU

Electronic DriverCircuitry Fault

InadvertentThrust

Propellant LeakFrom PlumbingUnknown

Momentum Storedin Fluids

Recurring orContinuous Torque

EAIU

CTP1553 Bus

G&C1553 B

1

Hardwa

H(M

Repeated MomentumDumps

Faulty Sensors or Data(Momentum, Computers OK)

1.2

ThrustersOnlyMode

AttitudeData

Erroneous

DSADs StarCamera

Wheel SpeedData Erroneous

Glitch PutsGyros in WAM

ComputerRejects

Good Data

ComputerAcceptsBad Data

EMCMechanical

ShockPowerCycle

Low BusVoltage

Body RateData Erroneous

1.1.1 1.1.2

And

Or

Or

Or

Or

Or

Or

Or

Or

Or

Or

TachometerWrong

Tach DataMishandled

1.1.2.3.2.2

Or

Slosh Modeling and AnalysisPropellant slosh is modeled in the simulations by

spherical pendulums in each of the three fuel and twooxidizer tanks (OT). This commonly used pendulummodel is two-dimensional and does not model rotationalslosh. Because the slosh model is computationally inten-sive, it was usually disabled on the brassboard to ensurethat the simulation would run in real time within the al-lowed time slice. Investigations on the stand-alone, soft-ware-only simulator found that at high body rates andlow fuel (as after the RND1 abort), the OT pendulumstates were diverging numerically. Thus, for mostbrassboard RND1 abort simulations, revised slosh pa-rameters were used that made these stable for very highbody rates, high enough to induce WAM. The fuel tankpendulums were not changed (the upgrades to thebrassboard slosh model are further discussed in Appen-dix F). Various large-momentum test cases were runwith these parameters, in which the gyros enter WAMand the system dumps momentum, but slosh dynamicsdo not numerically diverge.

For reference, Table 2 shows the OT parameters thatwere changed, listing the old and new values used onthe brassboard.

Table 2. Oxygen tank slosh parameters used inoriginal and revised brassboard models.

Parameter Description Old New

KS Spring constant 0.004 Nm/rad 0.01 Nm/radAlfaMax Max angular 10.0 rad/s2 0.4 rad/s2

accelerationZetaOx Damping ratio 0.004 0.01

(unitless)DminOx Min damping 0.015 Nm/rad/s 0.020 Nm/rad/s

coefficient

It is important to recognize that these parameters, oldor new, are somewhat arbitrary. For example, there isno strong physical basis for a restoring spring in the OTpendulums. However, in nearly zero-G, it seems intu-itively reasonable that the liquid in the OT will migrate toa preferred location in the tank. A weak spring was addedat the pendulum pivot to satisfy this intuitive desire, andits spring constant (KS ) is entirely arbitrary (as long asit is small). When parameters were adjusted by trial anderror to improve performance, KS was one of the pa-rameters adjusted, and the “new” value listed in Table 2is simply the final value used. Recent experiments haveshown that KS is not critical and can be set to zero with-out qualitatively changing OT slosh behavior. AlfaMaxand DminOx are clipping limits introduced in the code toprevent runaway behavior. It seems intuitively reason-able that the angular acceleration of the liquid cannot be

arbitrarily large, but the value of this limit for the pendu-lum model is strictly ad hoc. The damping ratio, ZetaOx,is traceable to the original slosh report (Ref. 6) and waschosen empirically to produce slosh oscillation time con-stants in settling burn simulations that were about aspredicted by the report. ZetaOx is unitless, however,and the actual damping coefficient (torque per unit rate)is calculated in the code as a function of propellant mass,applied acceleration, and geometry as well as ZetaOx.The clip limit DminOx was introduced to prevent nu-merical exceptions in these calculations.

More recently, after most RND1 brassboard simula-tion activity had been concluded, the small trajectorycorection maneuver scheduled for NEAR on 20 Octo-ber 1999 (TCM20) was simulated on the stand-alonesimulator. It diverged numerically with slosh enabled butworked fine with slosh disabled. On investigation, thelatest problem was traced to the fuel tanks, which havediaphragms (“bladders”) to hold the liquid against theoutlet even if the tank is inverted. The slosh pendulumcharacteristics vary with the amount of propellant re-maining and the current acceleration vector. The inverted-tank pendulum model (i.e., when acceleration causesfuel to “fall” away from the outlet) was diverging at lowfuel load. The fix applied (after much iteration) is toalways use the upright tank model whenever the re-maining propellant is below a threshold. This is a rea-sonable compromise, since slosh has diminishing effecton vehicle dynamics as propellant decreases; i.e., largerapproximations are reasonable for smaller effects. Thisworked well for the TCM20 simulation.

With these fixes to the fuel-tank pendulum models,additional simulations were run of RND1 post-abort be-havior. We found that slosh dynamics were adequatelystable with the original OT pendulum parameters (the“Old” column of Table 2), except for Max Angular Ac-celeration (AlfaMax). With these values it is still pos-sible to produce a situation that has unstable OT sloshdynamics, resulting in overall system dynamic instabilityand eventual numerical blowup. The case most thor-oughly investigated has an arbitrary high body rate in-troduced, which ordinarily results in an autonomousmomentum dump and recovery. As AlfaMax is increased,slosh-induced oscillations increase, and the dump takeslonger to complete, until eventually the simulation di-verges. We know the divergence is not due to an un-stable interaction between the control laws and the sloshdynamics, because for high enough AlfaMax it divergeseven with no control at all from either thrusters orwheels. This result is clearly physically meaningless be-cause it has increasing (in fact diverging) momentumand energy with no applied external torque. In thesedivergent cases the OT pendulum frequencies are above

16

2 Hz, much higher than is supported by any known data.As long as the OT slosh pendulums are constrained notto exceed about 1 rad/s2 of angular acceleration(AlfaMax), the simulation remains stable and reason-able and can be used to study control performance inthe presence of slosh-like disturbances. In fact withAlfaMax = 1, the slosh states slowly diverge with nocontrol at all, but are damped with control enabled. Thislends confidence that NEAR’s control laws are not thesource of the instability, and in fact can even deal withinstability in physically unsupportable slosh dynamics.

Stand-alone simulations of an AIU-only B-thrusterautonomous momentum dump case were used to ex-plore the effect of the various slosh parameters. Thiscase introduced an arbitrary tumble rate of about 9°/sabout the Y-axis, which triggered an autonomous mo-mentum dump. With slosh disabled in the simulation, thedump completed (momentum Green) about 50 s after itbegan. With slosh parameters set at values that are rea-sonable, but believed pessimistic, we found that the dumpalso returned to Green but took longer, about 125 s. Therewere noticeably greater oscillations in angular rate andmomentum estimates, and hence in thruster activity, forthe slosh case. So it is evident that slosh does affect thebehavior of momentum dumping and that our simulationdoes model this. However, there is clearly significantuncertainty in the slosh model. Any simulation resultsmust be considered as qualitatively representative ofreality at best.

This incomplete situation for slosh is not entirely com-fortable. Some interaction between thruster control andslosh is clearly possible, and we still cannot rule out thatslosh was a factor in the RND1 post-abort misbehavior.It is probably not possible to answer definitively and quan-titatively how much the effect is or was. Very possibly,slosh contributed to some of the inefficient or appar-ently ineffective dumping, particularly after the gyrosentered WAM. However, with any physically defensibleset of slosh model parameters tried, we have not beenable so far to demonstrate that control–slosh interactioncan produce anything like the body rates necessary toinduce WAM. This work is still in progress because ofits importance for future missions. The NARB’s reportwas not held up, however, to await a satisfying conclu-sion to this particularly difficult modeling problem.

Given the uncertainty and difficulty of analyzing sloshdynamics, we questioned the NEAR methodology formodeling and analyzing slosh and compared it with whatother spacecraft programs have done. The consensusfrom several peer organizations (Orbital Sciences, LittonAmecom, Lockheed Martin Astronautics, the NavalResearch Center, Goddard Space Flight Center, Aerojet,and the Jet Propulsion Laboratory) is that the pendulous

1

mass approach used for slosh modeling on NEAR iscommon in the industry. Furthermore, slosh is generallyconsidered a more critical concern for spinning space-craft. For 3-axis stabilized spacecraft such as NEAR,most analysts do not consider it a major concern forstability, although it is of interest for jitter.

Flexible body interactions must also be considered.Since the second trajectory correction maneuver (1996),NEAR simulations have modeled these effects (forceand torque reactions on the main spacecraft body). So-lar array flexure is the principal source of these effects.It is modeled as single-cantilever bending modes of eachof the four solar panels, with frequency and dampingempirically tuned to match flight data. The resulting ef-fect is seen as sinusoidal oscillations at about 2 Hz, ex-cited by thruster firings and decaying with a time constantof about 10–30 s. These oscillations are evident in bothaccelerometer and gyro data on all burns. Typical am-plitudes of spacecraft reaction are less than about10 µrad/s of angular rate and about 0.1 µg of accelera-tion. Flexure modeling was activated in our simulationswhenever fuel slosh was enabled, that is, in all RND1brassboard simulations after number 51. The small flex-ure effect, like slosh, cannot of course explain one ofthe principal mysteries of RND1: why did the space-craft begin dumping momentum again after long quies-cent periods (as Fig. 6 shows)?

Other Torque on Spacecraft (Branch 1.1.2)Could there be a recurring or continual torque on the

spacecraft? Absent repeated hits from outside the space-craft, something that is utterly implausible, such torquecould result from either a propellant or a pressurant leakor from propellant actually exiting a thruster nozzle.There has been no evidence of a propellant or pressurantleak before or after RND1, so a leak involving any partof the propulsion system other than a thruster seemsmost unlikely. There are two ways propellant could er-roneously exit a thruster: a thruster valve could actuallyleak, or a thruster could be erroneously commanded toopen.

NEAR’s thruster valves (Wright Components P/N18207 for the 4.4-N thruster and P/N 18208 for the22-N thruster) contain two series-redundant, normallyclosed valves. Each valve has its own seat and returnspring and is activated by an independent solenoid coil.The coils are tied together at the propulsion subsysteminterface and driven by circuitry in the AIU. A fine mesh(20-µm) absolute filter at the valve inlet protects theupstream valve seat, and the same kind of filter screenon the injector protects the downstream seat. Theredundant valves are leak-checked individually prior tointegration into the propulsion subsystem, but once their

7

coils are wired in parallel they can no longer be testedindependently. Could a thruster valve be held fully orpartially open by a particle, possibly trapped betweenthe filters at the time of manufacture? The postulatedscenario is that a particle would hold open the valve atsome partial thrust level until it was either flushed clearby the propellant flow or eventually pounded into thesoft elastomeric seat. Given that both of the series valveswould have to be held open by particles at once, andthat there has been no evidence of thruster leakage ei-ther before or after RND1, this possiblity seems mostunlikely. Nevertheless, we explored ways such a thingcould occur and also explored the possible consequences.

Extensive discussions were held with the supplier ofthe NEAR thrusters (Primex) regarding the possibilityof thruster valve leakage. The same topic was discussedwith other thruster valve makers, including Moog andMarotta. The anecdotal evidence indicates that particle-induced leakage has in fact occurred in the past withalmost everyone’s thruster valves, including Wright’s.One mechanism is for the particles to shed from an in-ternal component, such as a spring, and become trappedbetween the protective filters. Such particles seem tobe caught by the extensive predelivery ground testingand have not been documented to cause problems inorbit. Another mechanism occurs when a spacecraft isvibration-tested and launched with a thruster facing up(as two NEAR thrusters were). Vibration can causethe iridium-coated alumina catalyst particles to grindagainst each other; the “fines” so generated can passthrough the downstream valve outlet filter into the seatarea, where they can agglomerate into a larger particle.Such particles would be flushed clear in the first fewthruster firings. In both cases, of course, this would haveto affect both series valves, an implausible event.

Despite the unlikelihood of thruster leakage from anycause, we did explore its likely consequences in a seriesof simulation runs (Appendix L). A series of simulations,beginning with run 44, was conducted in which variousFVC thrusters were held open for varying amounts oftime and at various levels of thrust, from 100% to 10%of rated thrust. While an A-thruster held 100% open forseveral seconds would impose sufficient body rate todrive one gyro axis into WAM, no leaky thruster (20%thrust) case caused momentum that was not eventuallyreduced by the G&C system.

In addition to a thruster valve seat leaking, softwareor drive electronics failures could erroneously commanda thruster ON, which would have essentially the sameeffect as a leak. Branch 1.1.2.3.2 of the fault tree(Fig. 10) results in one or more FVC thrusters beingfalsely directed to open, imposing a momentum-raisingtorque on the spacecraft. The AIU circuitry that drives

the thrusters was reviewed. This circuitry consists es-sentially of digital one-shots that hold a thruster open foronly 20 or 40 ms. Holding a thruster open longer thanthis requires a repeated series of software commands.This approach, which was implemented in digital hard-ware, was specifically designed to prevent a “Clementine-type” loss-of-propellant failure.* Midway through ourinvestigation, NASA’s WIRE spacecraft was lost dueto a pyrotechnic anomaly that was subsequently tracedto a misapplication of Actel Field Programmable GateArrays (FPGAs). Because such FPGAs are also usedin the AIU digital one-shot circuitry, we again had thiscircuitry carefully reviewed, this time by a senior circuitdesigner particularly experienced with Actel usage.Neither review revealed any design flaw in the AIUthruster-driving circuitry that might hold a thruster open.

The other way that thrusters could be erroneouslycommanded ON is by software. This branch (1.1.2.3.2.2)is discussed later.

Low Momentum Falsely Reported as High(Branch 1.2 of the Fault Tree)

NEAR’s G&C sensors measure attitude (usingDSADs and the star camera), body rate (using the gy-ros), and wheel speed (using tachometers). “Bad datamarked as good” is a known deadly failure mode forany control system. For this reason, the NEAR G&Cand C&DH (command and data handling) subsystemscontain numerous cross-checks on data quality.

DSAD data were examined up to and through RND1.There was no evidence of wrong vectors being reported.There was some evidence of slight imbalance betweenthe DSAD heads, but the imbalance was still within speci-fications. A series of simulations with failed DSAD headsdid not replicate anything similar to the RND1 behavior.The star camera appears to be irrelevant to the RND1discussion.

Wheel speed errors can occur from tachometererrors (for which there is no evidence) or through mis-handling by the software of a properly reported wheelspeed. We know that NEAR was vulnerable to thislatter situation due to an improperly set software datastructure value. Simulations on the stand-alone, software-only simulator have shown that this error can cause

*The Clementine spacecraft suffered a catastrophic loss of propel-lant on 7 May 1994 that prevented it from accomplishing its goal offlying by and photographing the asteroid Geographos. Clementine’shydrazine supply was depleted by thrusters being erroneously heldopen for 11 min by the spacecraft’s control processor. Instead ofhardware one-shots, Clementine used 100-ms thruster protectiontimers implemented in software (Ref. 7); this software contained anundetected bug.

18

��+�1'��'5!�+��!��$$1�,��,��"��'��" ��! �� 2�� "$��" ��*��!�, ��+��"��"��&��$��""�"1�!��"�+�$��! ��$��4��&� �"1��""�"��!��'��!��+�$��!��'"��'��"��+�@$��"� �"�,�$$"�!%��,�!��$'��5"��!��5��++�+�!��+�

3��5"��+��"��'5��(@�:��%5"��*��*/@�%��9K��5��+��,�&�"�,�$��"�!%��"�D�*�%5"�!��1&��*��$� �"5&��%��"��!��*�,��!��$� �$��,�$��"�!%&��"!��,,2�!��!��!�$$54�� "��'�!�&��,�"��'�"!�

9"��""�"��!%5"��+��'��,,�*��!�$5&�$$��*��'59K��,�&�"��!��BB('��+��%�� !��! �� $�� +��$ ��&� �"1 �,��%5"��$��!��"��'�*�+��:�"�+�$5!��51��5��,H'��"��"��%��I�$$��"��%5"��:��&�"�,��!� �!��&��!�� &��"�@*� �"�� ,�$$�&�!% ��/!�& �� , �!�%5"��:�� !��"��1 �� " �&� �*�� :�� "� !��+��*�$$5 �� R�!�� 5 *�! �!��"��!$5�,��"��$��!%��*�*"�,�+��!�*��%�"��"��!��:��+�$��!8?�!��"��&��1&��!�%5"��:��!��!��!�$$5/�*/��!��1��" �&� �:�� "� !�� +��*�$$5 ��$$�� !��+��' ��:�$�!��!,�"�%5"��!��"@�!%��'��5"��*��$$5�:*�� $��&��*��%5"�E�,�"*�@��@"�'�$�!*�$��*�!!�$�!%�"+��!��!$�*/2"��%�$5��PG�)PC�4��,*��"��$�� 6��!�,&��"��"�6��"��!%�$�"+�+�!��+*�+� ,"�+� 2��%��!%�$�"+�+�!��+&��*��"$��"1�!��"H��%��*��$��+�!��+13"�!*��,��0��$��"��I4

��:�*��,�!�"5�!��&��,%"��!@��"��1,�"��*��$�%� � �$��'$�*$��*��/!�&,�"*�"��!1,"�+��$�+��$�+��"5"�*�"��1��%5"��&�"�� !��M��,��&��!��*��'5��%�'��5"��1��+�� **�""��,��"��,�"��*"��!%�,��+�+�!��+��$�+��"5"��@�!%�,�B�B�+�2��"��! �$��4��MB.+�!1*�""�@��!��!%��"��%�$5)PC��"�'�'$5�**�""��&�$$'�,�"��,�"��"�� $��!1��M.�;��;..�

�� 3&��"��*�$�"$5�!��"��!�:�$�"�!%*��"��!'��5"��,�"�%5"��!��"��#��$��+�*��!��+��!*$��+�*��!�*�$��*/1��'��""�"�1�$�*�"�+�%!��*�!��",�"�!*�2��41�!��,,�*��,��%%�!%��$5 �$��%��&�"��$��!@��"��!�:�$�"�!%��5��!��+�$��!��+�%��'�"��@�" �$��%�@��!��!��&��$� �"� �� +�*��!��+ ,�" ��%��"@��!@+��"��!�� $��*��"��+�@+�!��+��+��!��"��+�$��!��!�,��'��" �� '�� "�,� �!��$"�*� �"5��$�&!��$� �$��!*�'��5"��"�!% ��+�$��!�&��%�

��!��"��*��$$56��$�&2[�PC�41�!5"��@��!��!*�&��$�� '� �"5��"�!%��"��*�� !@��!,�$��!*"��!!�� $��+�@$��!��&�"�"�6��"��"��"��*��'�� "�

��9+�+'�"��!��!�#<*�!�"�$*�"*��*��$��+��&��"��"��!�� , ��%5"��$��" 2��$*�1!�&<��!4�!��"�$�???��"� ��&*�"*��*��+��@�*� �!� �� '"��!��"+�!� �:�$�"� ��'$� !�!@"��@�!��*��!�"��!��<�&'�� $��%��$�!�*�!'�"�$��1��!*��%5"��&�"��"��!�"�$��!*��&!��(�&��!��"�!%��1�!�&�&�"��'$��"��<��!��,"�+��"��",�"�!%��$�!�*�!�"�'�'$5'�"�$��1��!*�!�+�"��'�"!��!�� !��"��$��!��%5"��!��"�!%��&� �"1��,"�+!��"'5��"��"��$�!��1 �� *��) ��1&��$��'$��5@��*��!�*�$ ��*/ *�! '� "�$�� +�*��!��+'�*��&��/!�&��%5"�&��!��!��,�"��,�"��,��"��<��!%�!��'�"��%5"�*�!!��'�*�++�!��!��"��%��BB('��*��!��$�!��&� �"1��%5"��'��$�*��!��"��%��"��&��*��$$5��!��!*��!�� "��!�$5��"�!%��*�*"�,��"+�$ �*��+��!%1'��!�!�!��$'"��*��,�"�*��"�!�#"�'$�+C0��$@�"� ��"�&��&"��!,�"��!�+�$5��!�!5� �!�1��"��*��,,�!��!��!��,$�%��"�@&�"��*��"��,��&��$�'�+��,�$ ��:�$�"� ��,,�*��,'��5 "��!��$5 �$��%��!��!��!��:�$�"��5��'��*�+'�!��,,�*��,��!�'�� $��%��

�!,�"��!��$51�� "�%"�+��!��"�*�"��!�!%�!��"�!%+��$!�"�!5��"%5"��!��,$�%��!�� !��&��!�6��*�!,�%�"��!2��@��"��'5!�&�"��%!�41<��!��!��!�� $@�'$� �� +�$�� %5"� �!� �� &��!��!��$6��$��!��! ��%��!��<��!&��'$��+'$��"��*$�"��+'$�!%�� %5"�1�!*$��!%��!��"��" �$��%��C��*�! �"��"��%5"�&�� $�'$��!$5,�"�,�&��5�1�,��"&��*� �� '��""�!��"�� !��"�"�%"�+��#<�!%�!��"��"��"��$�!2�!*$��!%��5�,��*�!��*��*��'�$��5��41�!��!�#<"��"��!�� "� �$��<��!��"�*��!�&��!��!,�"��!��$51<��!&��!��'$��"� ��"��'$��"��6��+�!��!�*�'$�!��,�"��!%��"�!%��!�""�&&�!��&�,��"��!��5��"�@,�"�&�&�"��!�'$��"+�!��!��!��!*��!'��5"��"��+��H/�*/I��%5"��!��"��%��!$5�� /�!&�"�!�� "��$5 �$��%��2 �,�84��&��!$5�&��/��!��!*��!��$5 �$��%�� " ��"�!%��:��"�@�!*��"�!%�� !��%!�,�*�!�$51<��!��&��+��"��!��$� �$��!��1��

�?

least at near-zero body rates, were much lower than thelevels required in our simulations to produce repeatedmomentum dumping.

Processor and Software Errors (Branch 1.3 ofthe Fault Tree)

Processor Hardware Errors (Branch 1.3.1)A hardware failure in either of the three processor

types (FC, AIU, CTP; two units of each), if it occurred,would have to have been intermittent or transient, be-cause there is no evidence of such failure before or af-ter RND1. One class of hardware-related possibilitiesexplored by the NARB is bus errors, particularly on theG&C and CTP 1553 buses. For example, could instruc-tions to or data from the gyro become bit- or byte-shiftedand misinterpreted? Since the FC maintains a qualitycheck by keeping track of the order that data arrive, thismight also require the FC’s 1553 message counter to bereset.

The code inspection team examined this issue. TheFC resets its 1553 buffer only at start-up. The 1553 busprotocol is word oriented, with 20 bits transmitted permessage. Extra parity bits are checked to assure thateach word is transferred correctly. The entire messageis also checked, for example, to ensure that the correctnumber of words is transferred each time. The FC checksthis status word for each transfer, and then clears it soas not to be fooled by stale data. The FC software alsochecks the status word in each message from the iner-tial measurement units (gyros and accelerometers). Ifwords or bytes were shifted, the FC would reject theentire message. If the entire message is in error, boththe AIU and the FC would receive bad data. Althoughhypothetically possible, there is no evidence that this everhappened; the probability is about of the same order asthat of the AIU or FC software getting momentarilycorrupted and then spontaneously fixing itself withoutresetting a processor.

A final type of hardware error is the single-event up-set (SEU). Both the RTX2010 processor used for theCTP and AIU and the Honeywell 1750A FC processorare extremely hard to single-event effects. The FC pro-tects its memory against SEUs with error detection andcorrection (EDAC); the CTP and AIU memories areinherently SEU-resistant. Prior to launch, APL had cal-culated an upset rate of one every 300 days for the FC,and at least two orders of magnitude better than that forthe RTX2010. The vast majority of SEUs are eithercorrected on the fly by EDAC or are trapped and causea reset. No processor resets had been observed in the3 years prior to RND1, and we know for certain that noprocessors reset during RND1. It is possible (but not

2

provable) that the FC1 spontaneous reset of February1999 (see below) was caused by an SEU; this would beconsistent with the calculated rate. In any event, it seemshighly unlikely that SEUs played any role in the RND1events.

Processor Software Errors (Branch 1.3.2)“Software” error is taken loosely here to mean a num-

ber of things: an error in an algorithm; an error (bug) incompiled flight code (in either the AIU, FC, or CTP pro-cessors); or a “data structure” error in the tables of co-efficients that are uploaded to the spacecraft. We havealready seen that it was a data structure error (settingan acceleration parameter too tight) that precipitated theentire RND1 event; another data structure error cancause the misreported wheel speed problem. Together,the FC and the AIU comprise about 80,000 lines of sourcecode.

Two main tools were used to search for all three typesof software error:

• The series of simulations already described, whichran actual RND1 flight code on ground hardwarereplicas of the flight AIU, FC, and CTP computers

• Auditing by the code inspection team

Code Inspection TeamThe brassboard used for simulations had exact cop-

ies of the RND1 flight code in its AIU and FC proces-sors. In theory, if flight software problems contributedto the RND1 behavior, they should reveal themselvesthrough the simulations. However, there is always thepossibility that the simulations might not exercise a par-ticular region of code in a reasonable number of tries.So a software inspection team was formed to analyzethe G&C flight software for defects that might havecontributed to the RND1 behavior deduced from thetimeline reconstruction. The team looked for design flawsas well as coding errors such as mishandled exceptionsor interfaces, faulty control logic or timing, and so forth.The code inspection team leader was a member of theNAG and attended the NAG meetings. Simulation re-sults provided clues on areas of the code to examinefirst and what to look for. For example, a suspiciouslylarge spike in Sun angle seen in simulation 92 was ana-lyzed by G&C and the code inspection team and led tothe discovery of an AIU code error in the way the solararray Sun vector is handled. In addition, the code in-spection team provided valuable support for necessarysoftware fixes and upgrades to the brassboard.

The team initially consisted of three software expertsand two G&C-cognizant analysts; it was later reduced tothree members. The team had a mix of NEAR develop-ers and others who were completely independent of the

0

NEAR development. Original NEAR code developersprovided consultation to the team as needed. “Corporatememory” of the NEAR code development was thereforeavailable, but was kept from “contaminating” an indepen-dent evaluation of the architecture, design, and implemen-tation of the G&C software.

The team’s approach was twofold. First, while therest of the NAG was reconstructing the timeline, con-ducting simulations, and generating hypotheses, the codeteam independently reviewed the entire software in de-tail, in some cases having to reverse engineer parts ofthe G&C code because of inadequate documentation.Once all team members were fully educated on the G&Csoftware design and implementation, what if scenariosand use case analyses were performed against the de-sign and code. Second, in response to specific NAGhypotheses, the directly relevant software that might havecontributed to the anomaly was scrutinized for defectsusing code walk-throughs and design reviews. All G&Csoftware was examined, but it should be pointed out thata complete code review of all 80,000 lines of G&C codewas beyond the resources of this investigation.

The G&C software consists of two major compo-nents: AIU code version 1.06 and FC code version 1.11,the versions active during RND1. The AIU processor isan RTX2010, with software consisting of approximately21,000 lines of C and 10,000 lines of assembly code.The FC is a 1750A running approximately 42,000 linesof Ada and 7,000 lines of assembly code. In addition toflight code, the inspection team also scrutinized the au-tonomy rules that safed and operated the spacecraft dur-ing the anomaly. Appendices C and D give overviews ofNEAR’s Guidance & Control and Autonomy systems,respectively. An area of particular interest to the NARBwas the behavior of processors during AIU switches. AnAIU requires 7–9 s to boot up; during this time, no pro-cessor is in control of the G&C actuators. However, thecode inspection team confirmed that software-driventhruster activity cannot take place during this time; thiswas also confirmed by brassboard simulations.

The code inspection team concluded that the G&Csoftware is sound in design and implementation. That isnot to say that no defects were found: the team found atotal of 17 errors, including 9 in the compiled code. Inaddition, certain poor software design or programmingpractices were identified, although they did not directlyimpact the health of the spacecraft. Defects ranged fromexpressions having identical constants to “save” vari-ables being overwritten under certain conditions to poorly

2

designed exception handlers. Each error or defect wasanalyzed, and many were studied in simulations. Forexample, two different series of stand-alone simulationswere run to force gross errors in momentum estimatesso that the effects of the misreported wheel speed errorcould be studied. Initially, the AIU-only cases errone-ously added momentum to a near-zero momentum sys-tem, raising it above the White limit. But all casesrecovered to a low-momentum, stable safe mode in lessthan 20 min. This error and others were discussed in thesection “Simulation Results” earlier; some of these soft-ware errors could have moderately prolonged thepost-RND1 recovery or made it less efficient (morewasteful of fuel). However, in none of the simulationsdid the software errors alone fully account for the 15autonomous momentum dumps, the prolonged recovery,and the amount of fuel used in the actual RND1 event.Table 3 identifies the defects found and their disposition.

Before leaving the discussion of processor hardwareand software, we should note one final point. On theevening of 24 February 1999, Mission Operations dis-covered that, following completion of a normal pass andreturn to passive momentum management the day be-fore, NEAR’s FC1 processor (running version 1.11 code)had spontaneously rebooted, demoting the spacecraft toEarth-safe mode. Because Mission Operations had failedto update the orbit stored in EEPROM, the Sun anglewas way off. This violated SKI rules, so NEAR soondemoted to Sun-safe mode and AIU2 took over.Fortunately, there was no fuel use or LVS (therefore, solid-state recorder data were preserved). An intensive ex-amination by the code inspection team failed to uncoverthe cause of this reboot, the first ever for any NEARprocessor. Unfortunately, Mission Operations had not, dur-ing the 2 months since RND1, downloaded a memoryimage of suspect processor FC1. With the reboot, thatopportunity was now lost, and the team had little data towork with. Included in the team’s investigation was amargin analysis of the FC 1.11 code. Although some tim-ing margins were tight, they all appeared to be sufficient,in the worst case. Following the reboot, NEAR was op-erated on FC2 (and its prior 1.10 code) until the team’sinvestigation was completed. When NEAR was returnedto FC1 and the 1.11 code on 4 August the spacecraftimmediately demoted to Sun-safe mode, this time becauseMission Operations had failed to activate the new orbit.While this event did not shed much light on possible pro-cessor hardware faults, it does illuminate some shortcom-ings in Mission Operations’ error rate.

1

��;<=��!

Unit Description Resolution Status

1 AIU Yellow momentum is safe, socontinuing a post-abort dumptoo long is undesirable

Change AIU momentum“Yellow-Good-Enough” limittime-out to 120 s from 270 s tofinish dump ASAP

Data structure changedand uploaded

2 AIU When switching to safe modethere is a potential to still beusing thrusters

Force AIU to enable wheelswhen going to safe mode

Software Change Requestgenerated and beingworked

3 FC Green momentum limit of0.5 Nms is unnecessary andtakes too long to reach

Change FC Green limit to0.9 Nms


4 AIU AIU Green limit should behigher than FC’s (changed to0.9 Nms, item 3 above), soAIU will not prolong a dumpthat FC has finished

Change AIU Green limit to1 Nms


5 FC A 30-min wait (Red) dumpafter an abort can causeproblems; immediate (White) ispreferable after abort

Change FC White momentumlimit to 4 Nms for quicker dumpin case of abort

Data structure changedand uploaded (for burnsonly; changed back to6 Nms after burn)

6 AIU A 30-min wait (Red) dumpafter an abort can causeproblems; immediate (White) ispreferable if abort

Change AIU White momentumlimit to 4 Nms for quicker dumpin case of abort

Data structure changedand uploaded (for burnsonly; changed back to6 Nms after burn)

7 AIU FC could drive spacecraftthrough a large angle beforethe AIU takes over

Shorten AIU time for AIU-onlySun-safe-rotate due to SKIviolation to 30 s from 300 s


8 AIU/FC

High wheel speeds can beflagged as invalid and reportas zero speed

Change upper limit on datastructure value controllingwheel speed upper and lowerlimit tolerances


9 FC An incomplete FC-to-AIU HighRate message could be sent

Check the return status ofBuild_Msg and flag outputmessage as error

Problem/Failure Reportand Software ChangeRequest generated andbeing worked

10 FC An invalid Sun vector could besent to the AIU by the FC

FC_Sun_Stat should beinitialized at the beginning ofits processing section toindicate an invalid Sun Vector


11 FC Under some conditions a resetof the Sun-angle Sun filter canerroneously reset theDSAD/FC filter

Do not overwrite the DSAD/FCfilter when the Sun-angle Sunfilter is reset


12 FC A request from the FC to AIUfor demotion to Sun-safe-rotate mode may not begranted

Expression had the samewords (Sun-safe-freeze mode)twice instead of Sun-safe-freeze mode and Sun-safe-rotate mode


13 AIU FVC_Request discrete is beinglowered at an inopportune time

Clear FVC_Request bit at abetter time

Software Change Requestgenerated and beingworked

14 AIU AIU can report a ±1-s error orthe same second twice in arow due to AIU and CTPclocks not being synchronized

Implement AIU clock walk fixdeveloped for TIMED mission


15 FC FC can report a ±1-s error orthe same second twice in arow due to its estimationscheme using MET and UT

Implement the time tagcalculation summarized by thefollowing formula:Current_UT + .040 *minor_frame_number –message_age – TickFudge


16 AIU Autonomy rules require quite afew seconds to execute; 13 stime-out is too short

Change AIU FVC_Requesttime-out to 60 s from 13 toprevent its going inactive toosoon


17 AIU Violation of compiler vendordata passing method

Copy the global in a local datastore and pass the local to thefunction


��

FINDINGS AND RECOMMENDATIONS

Despite the loss of spacecraft data when the solid-state recorder was powered off during the LVS event,it was possible to reconstruct a general understandingof the post-RND1 timeline. More than 128 simulationswere run on the NEAR brassboard simulator, an inde-pendent review of the flight code was conducted, andsuspect hardware and circuit elements were reviewedto attempt to explain the protracted RND1 recovery.Although software errors were found that could con-tribute to the protracted recovery, they do not fully ex-plain it. We were unable to find a complete explanationfor the post-RND1 events. Nevertheless, it is possibleto make observations and recommendations that mightprevent a recurrence on NEAR or on other programs.

Finding 1. The precipitating event for the RND1anomaly was the burn abort. The burn aborted becausethe main engine’s normal start-up transient exceeded alateral acceleration safety threshold that was set toolow. A start-up transient very similar to that of the RND1burn had been observed during the only previous use ofthe LVA engine, the DSM burn of July 1997. The NEARteam failed to recognize the significance of this tran-sient and its potential increase in importance with de-creasing propellant mass. NEAR’s structural design, witha separate cantilevered propulsion module, combinedwith the placement of the accelerometer package,exacerbates the coupling of the transient into theaccelerometers.

An error in the burn-abort command script initiatedthe spacecraft’s anomalous attitude motions and the pro-tracted recovery events. A formal review of the RND1scripts had been held, as required, but failed to catchthe missing command. RND1 simulations were run onthe stand-alone, software-only simulator, but this simu-lator does not emulate the actions of the autonomy rulesand so did not catch the script error. The RND1 com-mand scripts were simulated on the NEAR brassboardsimulator, but abort cases were not run. The brassboardis difficult to use and can give “strange” results; thisdiscourages thorough testing of command scripts. Whenabort cases were simulated after RND1, every simula-tion exhibited the disastrous effect of the abort com-mand script. It is likely that without this script error,NEAR could have been recovered in time to executeone of the backup rendezvous burns, allowing Eros or-bit insertion in January 1999 as planned.

The safing modes on NEAR are overly complex, in-volving frequent hand-overs of control among multiplecomputers and too many logical paths and branches.

This complexity adds risk to an anomaly such as that ofRND1. If a safe mode must be computer-based (analogcontrol might be better), the processor modes and sup-porting software must be simple and readily de-buggableso that the safe modes can be relied upon. Nevertheless,NEAR’s autonomy system performed every action it wasdesigned to perform, and there is no evidence that it didanything that it was not supposed to do. The autonomysystem was tested prior to launch (in particular, burnaborts, momentum dumping while in Sun-safe mode, andexcessive thruster use scenarios were tested). However,it should be remembered that the autonomy rules weretested with earlier versions of the AIU and FC codes,not those running during RND1. The failure of the au-tonomy system to prevent the kind of fuel-loss scenarioit was designed to guard against demonstrates a cleardesign flaw.

Recommendation 1a: The NEAR mission cannotsurvive a repeat of the RND1 events. The NEAR sys-tem engineer must be given the responsibility and au-thority to form a team to (1) create and test a burn abortcommand macro that minimizes the risk of a future burnabort leaving the spacecraft with high system momen-tum, and (2) redesign the autonomy response to exces-sive thruster use.

Recommendation 1b: The RND1 event was precipi-tated by an incorrect parameter. Parameter (data struc-ture) errors can be every bit as deadly as errors in compiledcode, and they need the same careful design, review, test-ing, and configuration management. G&C data structuresshould be placed under configuration control that consid-ers the risk/benefit of changed parameters and reviewsthe adequacy of analysis and testing of new values.

Recommendation 1c: Formal review of commandscripts can catch many, but not all, errors (particularly ifthe reviewers are not independent). For critical events,complete testing of all command scripts on the NEARbrassboard simulator is required for safety. For example,burn abort cases, not just success cases, must be run onthe simulator. Future programs must have simulators withsufficient fidelity and user-friendliness to encourage thispractice.

Recommendation 1d: The NEAR system engineershould be required to review and sign off all critical scripts.This was once the practice within Mission Operations,but at some point the practice ceased.

Recommendation 1e: Regression testing of the en-tire autonomy system (including interactions between theautonomy rules and the AIU, FC, and CTP processors)should be carefully considered whenever G&C software

23

or autonomy rules change. The original 21 scripts usedfor NEAR software acceptance testing should still beavailable to support regression testing.

Finding 2. There is no evidence that the prolonged andfuel-wasteful momentum dumping was caused by hard-ware failure.

• The propulsion system (and its drive electronics)did not cause the prolonged recovery. Although intheory a leaking thruster that spontaneously recov-ered could explain the initiation of the prolongedrecovery sequence, there is no evidence that thisoccurred, and the Board considers it extremely un-likely. Simulated thruster failures do not reproducethe observed RND1 behavior. Furthermore, theBoard feels there is no technical reason to avoidusing NEAR’s LVA engine again, should that benecessary.

• We found no evidence that DSAD or star camerafailures contributed to the prolonged recovery.Simulated DSAD failures do not reproduce theobserved RND1 behavior.

• We found no evidence of hardware failures in anyof the six processors or in their interconnecting buses.

• We found no evidence that single-event upsetscontributed to the prolonged recovery. Given thehardness of the NEAR processors, and the benignradiation environment, the probability of an unde-tected SEU is extremely small.

Finding 3. The NARB investigation uncovered 17 soft-ware errors, taken here to mean errors in algorithms orin uploaded data structures as well as in compiled code.Most of these were minor and do not necessarily needto be corrected. However, some are significant and con-tributed to the event, while not fully explaining it.

Recommendation 3a: NEAR’s safing logic was notwell designed to handle the burn abort, in that it allowedthrusters to fire immediately, before the commandsystem’s autonomy rules could change the attitude con-trol actuators, reconfigure tanks, or change data struc-tures. Although the design was demonstrated to beworkable prior to launch, it has undesirable side effectsthat could have been avoided. Changing this now onNEAR entails unacceptable risk, but future programswith similar attitude control systems should consider thisinteraction more carefully.

Recommendation 3b: The G&C design for momen-tum dumping temporarily abandons the requirement forSun-pointing. The rationale behind this choice (perhapsdriven by schedule pressures) was to have a single

momentum dumping algorithm handle the immediate post-launch tip-off momentum dump as well as the opera-tional dumps. Under stressed failure conditions this turnsout to have been an unwise choice. Changing this de-sign now on NEAR entails unacceptable risk, but thisshould be carefully evaluated for future missions.

The FC code version 1.11 includes an “enhanced burn”capability that maintains attitude for commanded (butnot autonomous) momentum dumps. When redesigningthe autonomous momentum dump safing logic, a carefulevaluation should be made of which course is least risky:using this capability for autonomous dumps, or relyingmore heavily on ground commands.

Recommendation 3c: The “misreported wheelspeed” data structure error, in which a wheel at maxi-mum speed is reported as zero speed, is serious and hasbeen fixed.

Recommendation 3d: Having so many of the au-tonomy rule and G&C time-outs set to 300 s is confusingand can cause dangerous “race” conditions. Considerchanging the time-outs to more easily distinguishable val-ues. Verify by analysis and simulation that no negativeeffects result for other time-outs or autonomy functions.

Recommendation 3e: The G&C subsystem shouldincorporate a coarse ephemeris to use in the event thatMission Operations fails to upload a valid orbit. (Thisrecommendation has been partially implemented onNEAR by storing coarse ephemeris data in EEPROMin case of processor reset. The system will also use itslast known position when the current orbit expires. Thisshould no longer be a reason for NEAR to drop intoSun-safe-rotate mode.)

Recommendation 3f: Momentum limits (Green, Red,etc.) were set too tight in some cases, causing or contrib-uting to the loss of extra fuel. Adjust these momentumthreshold levels to minimize fuel loss (already done onNEAR).

Recommendation 3g: A more robust safing designwould have been unaffected by the Mission Operationsscript error that allowed the thrusters to be used brieflywhen transitioning between momentum dump and atti-tude control modes. On NEAR, G&C does not producethe telemetry that would allow the command system au-tonomy rules to find and correct this error. Such “beltand suspenders” type safety checks must be imple-mented more intensively on future programs.

Recommendation 3h: The AIU is allowed to go intothe FC control state (“FC Override”) on boot, even ifthe system had been in AIU-only mode prior to boot.This violates the NEAR safing philosophy of no transi-tions from a lower to a higher operating mode withoutground intervention. Again, changing this design now on

24

NEAR entails unacceptable risk, but this should be cor-rected on future missions having a processor architec-ture like that of NEAR.

Recommendation 3i: The use of a “Red Teamer”who continually tries to “break” the software designshould be considered.

Finding 4. The NEAR team is to be commended onthe speed with which they determined the proximatecause of the abort itself. The rapid diagnosis of the pre-cipitating events, coupled with quick work from MissionOperations and the rest of the NEAR team, enabled thesuccessful re-burn 2 weeks later that rescued the mis-sion. The NEAR team won an American Institute ofAeronautics and Astronautics Space Operations and Sup-port Award for the successful recovery effort.

On the other hand, the role Mission Operations playedin causing the RND1 failure cannot be ignored. Further-more, Mission Operations was less than successful in pro-tecting and acquiring spacecraft data needed to support adiagnosis: certain data that might have been valuable tothe diagnosis were lost because Mission Operations over-wrote spacecraft command memories, failed to downlinkprocessor memory images, or made other operational er-rors. There was no plan or procedure in place for recov-ering from an anomalous event, despite several previousentries into safe modes during cruise.

• Although detailed lower-level procedures exist,there is a general lack of top-level procedures, andof adherence to those in place. For example, thereis no established and enforced policy of having allcritical command loads fully simulated. The require-ment that the system engineer sign off all criticalcommand scripts is not enforced.

• Configuration management is inadequate. For ex-ample, the version of the burn-abort command scriptused during the DSM did contain the critical com-mand to return the G&C to reaction wheel attitudecontrol; when or why this command script waschanged is unknown. As the NAG simulations be-gan it was discovered that there were two ver-sions of FC code version 1.11. Flight code wasstored on a network server in an uncontrolled en-vironment. The tool Mission Operations used tounpack data structures had an error that gavewrong values for certain data structures (the toolhad not been fixed, despite having caused a prob-lem in the past). That these errors were not dis-covered until the NARB investigation suggests thatdata structure values were not routinely checked,although the NARB finds them to be critical forcorrect operation of the spacecraft.

2

• As NEAR entered its 3-year cruise phase, the roleof the system engineer declined relative to that ofthe mission managers. The comments, suggestions,and requests of the system engineer in operationalmatters carried no particular weight and were of-ten ignored. The former NEAR software lead en-gineer, now in another department, is not a reviewerof even critical operations. These two individualsare the principal repository of knowledge ofNEAR’s overall system design. Not having themfully involved in the decisions and reviews reducedthe chances of eliminating errors during the pre-rendezvous activities.

• The conservative “belt and suspenders” mind-setexhibited by the engineers toward flight hardwareprior to launch is missing from Mission Operations.Planning does not sufficiently consider how fail-ures or errors would affect the operations. Thespacecraft is not always placed in the best pos-sible state to recover from a failed operation. Thelowest-risk approach to each operation is not al-ways used, even when there is little or no impactto using that approach. Some of the risk-reductionpractices that were established for critical opera-tions prior to launch and were used during earlyoperations had simply been abandoned by the timeof RND1.

• Problem/Failure reporting, although generally well-disciplined prior to launch, is not taken as seriouslyafter launch.

Recommendation 4a: Better configuration manage-ment is needed in Mission Operations, including con-figuration management of the brassboard simulator,command scripts, data structures, autonomy rules, groundsoftware tools, and other critical items currently outsidethe Configuration Control Board’s (CCB’s) purview. Ata minimum, the CCB should have cognizance over changesto autonomy rules and certain critical data structures. Anindividual responsible for maintaining configuration-controlled copies of the flight software and importantground software should be identified (a software librarianfunction).

Recommendation 4b: NEAR operations require tech-nical oversight by a knowledgeable system engineer.System knowledge and a system viewpoint beyond thelevel of expertise of the Mission Operations staff mustbe incorporated into future NEAR operations planningand review.

Recommendation 4c: Protecting diagnostic data mustbe made a priority. This requires written, tested post-recovery procedures, and staff trained in their use, toassure that valuable spacecraft data are not lost.

5

Recommendation 4d: Establish and enforce proce-dures to assure that the spacecraft will be prepared forrecovery from failed operations. This includes settingcommand loss timers and resetting important data cap-ture buffers in the command processor and elsewherebefore critical maneuvers. Be fully prepared to captureimportant diagnostic data if something goes wrong.

Recommendation 4e: Implement a plan to routinelycollect and possibly analyze all the G&C telemetryformats of the G&C system, not just the ones requestedby G&C engineers. This will assure adequate data foranalysis of anomalies.

Recommendation 4f: The post-launch Problem/Fail-ure Reporting system must be adhered to with no ex-ceptions, to assure that important lessons are not lostand that dangerous trends are spotted in time.

Recommendation 4g: A near-current orbit shouldalways reside in EEPROM onboard the spacecraft. Asthe February 1999 FC1 reboot event demonstrated, fail-ure to do so can cause the spacecraft to drop unneces-sarily from Earth-safe to Sun-safe mode.

Recommendation 4h: A general reemphasis on Mis-sion Operations process and quality control is needed.

Finding 5. The NEAR simulation environment is defi-cient in two respects. First, the most complete simula-tion of critical G&C actions takes place on thestand-alone, software-only simulator. This simulator isnot tied directly to the scripts executed by Mission Op-erations. The parameters and commands that are in ef-fect on the spacecraft during an operation cannot beguaranteed to be those used during the pre-operationsimulation on this simulator. Therefore the simulationresults may imply nothing about the outcome of a givenoperation.

Second, the NEAR brassboard simulator, where theactual effect of Mission Operations scripts can be de-termined, is inadequate. It was assembled late in theprogram from breadboards, engineering models, andground support equipment that would otherwise havebeen dispersed. Because Mission Operations never ac-cepted “ownership” of the brassboard simulator, its docu-mentation was skimpy, and its configuration managementwas poor. Its hardware and dynamics do not alwaysfaithfully emulate the spacecraft, and its operation is frus-trating and difficult. Mission Operations and G&C mustshare the brassboard equipment, requiring frequentreconfiguration. A high degree of skill and knowledge isrequired to configure and initialize the brassboard simu-lator correctly; without this careful setup, “strange” re-sults may be obtained. Strange results were often ignoredbecause more often than not they would turn out, after

2

lengthy analysis, to be caused by the brassboard itself.The brassboard’s user-unfriendliness discouraged Mis-sion Operations from running many cases (such as theabort cases prior to RND1). Still, the brassboard simu-lator remains the best and only tool for completely de-bugging critical operations. Its use prior to RND1 wouldhave uncovered the burn-abort command script error.

Recommendation 5a: Guidelines must be developedand adhered to for deciding which scripts must be simu-lated on the brassboard, and which test cases must berun (for example, which failure mode cases, if any). Allstrange results for these cases must be analyzed, evenif most turn out to be brassboard artifacts.

Recommendation 5b: Procedures and scripts forconfiguring and initializing the brassboard simulator shouldbe developed to minimize the number of brassboard runsthat give strange results, make the most efficient use ofthe brassboard itself, and encourage its confident use bythe Mission Operations team.

Recommendation 5c: The “simulation environment”is of critical importance for reviewing and checking com-mand uploads and diagnosing anomalies. It needs to beconsidered early in a program, treated (and funded) asseriously as any other subsystem, and then maintainedas a mission-level resource.

Recommendation 5d: Careful consideration needsto be given to which parts of the simulator are “hard-ware in the loop” and which are emulated by software.Software emulations can uncover problems only in thoseareas anticipated by the software designers, which canlimit the simulator’s utility in anomaly investigations.

Recommendation 5e: Subsystems that participatein autonomous safing must be emulated with the highestfidelity.

Recommendation 5f: Consideration should be givento creating a tool to tie the software-only simulation tothe actual command scripts used by Mission Operations.This tool could be of value to both NEAR and TIMED.

Finding 6. The gyros are critical to the G&C systemoperation and performance. At recovery, all three gyroaxes were found in Whole Angle Mode (WAM) with anacceptably low noise amplitude. There is no evidencethat they entered WAM by any mechanism other thanhigh body rate. However, we were unable to test thegyros for EMI susceptibility or for rate-dependence ofthe WAM output noise amplitude. An EMI test had beenrun during acceptance testing of the gyro, but the testdata could not be located by either APL or Litton. Wecould not, therefore, rule out the possibility that thegyros are sensitive to EMI in the presence of low busvoltage.

6

Recommendation 6a: Future programs should main-tain closer oversight of vendors during acceptance test-ing, carefully specify test deliverables at contractinitiation, and maintain better test data reporting andarchiving.

Recommendation 6b: Future programs should con-sider purchasing an engineering model gyro along withthe flight unit and retaining it postlaunch. That wouldhave been very helpful to this investigation (as well aspossibly useful in the brassboard simulator). It is par-ticularly important when the unit is the first of its type tofly, as the hemispherical resonant gyro was for NEAR.

Finding 7. The design of the G&C software for bothattitude control and for momentum management ignoresthe dynamics of onboard fluids (“slosh” effects). WithNEAR’s oxidizer tanks loaded approximately half full atRND1, and with no internal propellant management de-vice, they were nearly at a worst case for slosh. Realis-tic slosh modeling is particularly difficult, and numericalinstabilities can often mask or mimic physical effects.The spherical pendulum model used for NEAR is moreor less standard in the industry. With this model, simula-tions indicate that slosh may have contributed to someof the inefficient or apparently ineffective momentumdumping, particularly after the gyros entered WAM.However, to date we have been unable to demonstratethat control–slosh interaction can produce anything likethe body rates necessary to induce WAM using any physi-cally defensible set of slosh model parameters tried. Ifslosh played a major role in the post-RND1 difficulties itwould be fortunate, for as NEAR approaches Eros only6 kg of oxidizer remain in the tanks, precluding any re-peat of oxidizer slosh effects.

Recommendation 7a: APL’s MESSENGER space-craft propulsion architecture will be similar to NEAR’s:bipropellant, no oxidizer tank propellant managementdevice, and the requirement for settling burns. It willhave a somewhat higher initial liquid mass fraction, andthe bipropellant thrust will be higher. APL must care-fully consider slosh dynamics in the MESSENGER con-trol system design.

Recommendation 7b: A better slosh model is neededfor future missions—the NEAR model is too complex,too difficult to code, has numerical problems, and pro-vides little or no physical insight.

Finding 8. Much of the difficulty in diagnosing theRND1 anomaly arose because all of the solid-state re-corder data were lost when the recorder was poweredoff during the LVS event. These recorder data containeddetailed thruster firing, G&C sensor, spacecraft attitudeand rate, and G&C state information that would have

2

made this investigation immensely easier. Although ex-tremely unfortunate for the NARB, the decision to shedthe solid-state recorder was necessary to achieve powerbalance under LVS conditions. Some features were in-cluded in NEAR’s system design to capture data evenwithout the recorder. For example, the onboard proces-sors were programmed to capture in memory autonomyrules and the data that triggered them, commands sentby the command processors, a min–max telemetry sum-mary, and various counters and flags. These featuresdid prove useful in the investigation, although some wererendered less useful by Mission Operations’ failure toreset important buffers prior to RND1.

Recommendation 8a: To preserve diagnostic data,solid-state recorders should have nonvolatile memory orbe the last load shed in a low-power situation. Futuredesigns could preserve critical diagnostic data in a trulynonvolatile portion of the recorder, in a region backed upby a small independent battery, or in a lower powermode.

Recommendation 8b: Valuable CTP memory spacefor diagnostic data could be saved by recording just thefirst autonomy command in a macro, not the entiremacro.

Recommendation 8c: The min–max recordingmethod was inadequate for storing some key G&C databecause it operated only on bytes, and the G&C telem-etry often packed several single or double-bit status flagsinto a single telemetry byte. Information was difficult tointerpret at best and confusing or useless at worst. TheG&C also zeroed all its telemetry upon boot, so that theminimum data points recorded for the G&C carried littleinformation. The min–max data summary can be ex-tremely useful and should be extended to address theseinadequacies in future designs.

Recommendation 8d: Critical diagnostic data couldbe preserved in the processor memories to a greaterextent than was done on NEAR. With the advances inmemory density since NEAR’s design freeze, this shouldbe easier for future programs.

Recommendation 8e: The telemetry system shouldhave a special shortened frame for use during emer-gency mode operations (any entry into safe modes). Thisframe would include only data needed for resolution ofthe anomaly (e.g., spacecraft housekeeping and G&Cdata), eliminating, for example, instrument suite data.This abbreviated telemetry frame should be stored inthe solid-state recorder. Autonomy rule housekeepingsnapshots should be transferred to the recorder as soonas the snapshot buffer is full.

Finding 9. Having the gyro in Whole Angle Mode witha high noise level while the G&C system is under AIU

7

control is a dangerous combination because of the AIU’slimited ability to filter the noisy WAM output. At thenoise levels measured in flight, the AIU is completelycapable of controlling the spacecraft, as demonstratedby its performance at recovery. However, we do nothave records of the noise levels during the RND1 peri-ods of high body rates, and the NARB was unable tomeasure the dependence of noise amplitude on rate orother conditions that might cause it to increase.

Recommendation 9a: Consideration should be givento reducing the risk from this combination by, for ex-ample, automatically re-setting the gyro from WAM whenit is safe to do so, or through changes to control loopgain or other parameters while in WAM. It may be pos-sible to implement this by autonomy rule changes only.Of course, any such changes are risky and the probabil-ity that high body rates actually exist must be weighedbefore such a course is followed. If a change is contem-plated, it must be carefully reviewed by the CCB, otherconcerned parties, and most especially, by the systemengineer. Designing, analyzing, and testing changes tothese rules is one example of why a full-time postlaunchsystem engineer is needed.

Finding 10. NEAR’s multiprocessor architecture (twoeach of three different processors) is cumbersome, andredundancy complicates it even further. Although theBoard did not find that the design contributed to the RND1abort, it adds to the risk of mis-synchronization and otherissues that arise as processors shut down, boot up, andtransfer control. Having separate AIU and FC processorswas felt to be necessary because no single processor

2

was capable of handling the entire job back when NEARhad to make its selection (the NEAR PreliminaryDesign Review was April 1994).

Recommendation 10a: For future programs, sim-plify the controller modes, with particular emphasis onall safe modes. Consider consolidating control authorityin a single processor, rather than distributing it acrossmultiple processors.

Finding 11. Mission Operations was staffed at 20 peopleat the time of RND1, one short of the number plannedfor operations at Eros. In addition, two people in anothergroup supported mission design and G&C aspects ofmission operations. During most of the cruise phase,Mission Operations staffing averaged 7 people, severalless than the number originally planned. While staffingshortfalls did not directly contribute to the RND1 event,they may have indirectly contributed through the inad-equacy of procedures, training, and simulation. A gen-eral reemphasis on quality control should not require anincrease in staff.

Finding 12. Following launch, NEAR returned $3.6million to NASA, a rare underrun for this first Discov-ery program that APL was justifiably proud of. In retro-spect, however, it might have been wiser for NASA tohave redirected this $3.6 million toward activities tofurther reduce mission risk. These could have includedimproving the state of the brassboard, replicating thebrassboard so that it didn’t have to be shared, regres-sion testing flight software, properly documentingMission Operations procedures, and so forth.

8

��

�$ F��5��)G�,��"�-'��.)�/��0��)�� '��)�=��$��)��$�(��@�$

($ 0$�$�3� �)�12��2��2�� #��*-��3��-��(�� $�.��)��01��<<��@$<�<��@�$

*$ �$��$�.��"��)��-��.��0��4��-��(�5� �� $�.��)��/K?�01��223��T@��+�3�'��@�$

+$ 3$�4$�3�� )�$�4$�6��&��)��$�=$�5'�� )��$�7$�4�� )��$�N$�5��)�.$�1$�/��!��'�)�0$�7$��)��4$�;��)��$)�F�'��!��I��5��)G�56�� (� >�60� ��X$,$<,)�� )��;'��$

,$ 3$�D��)�12��#��)�1�� 3$��'#��(,�6��$

�$ 6$��$�3��"�)��-��7�.��-��0$��.#0�0��%��1��2��8�� 912��:�&-��%)�2��'��>��0��:�'��<+��(�T�� +�$

T$ 2$�.$�1��)�"��$��12��)��(��/K?�01� � �� 2�>��+�<�(�(��5��+�$

@$ 0$��)�F�� 2� ��%�>��5�� K��3K�� )G�1�� $��(*��$

�$ 2$�5$�N�� "��)�F� �� !�� )G��/K?�01� � �� 23;��<�++��$

�<$ 3$�6$�0��)�F��2��'��2� ��)G��/K?�01� � �� 25��,*��@�6��$

5��

��

�>K ��!�'��

�� '��&��

..� .��!�"��.��

.J3/ .� ��"

.�0 .� �� '��

32�3 3�"�� '��'��

325 3�� '��

32� 3�� 2 �'��#

3�. ��'��'��'��

0;5 ��'��'��

5> ��'�� "��'��!��'�

2� ��2��

6�= 6��

6. 6��"��'� ��

607� 6�� "�� "��

6��)�6�( 6��#��(

6� 6�� &��

6� 6��'��'�

6=. 6��'��'��

7J. 7��'��'��

/7 /� �� '��"��'� �

>5K >��

�01 ��0�� 1��

�

1=� 1��"��'��:��

1=2 1��"��

5� 5��

��7 �� 7��

��

��2�> �� "��&��

�� 9��

� � �� '��

�� '��

��; ��"��-��

;� ;-��9��#

6 ��!��&��'�

�3� ��9��

2K 2��"��

2N> 2��#��

24> 2��'��>��

�.5 ��:�'��'��'��

�>53 �� >�� 5�� "��'��3�� '�

K�)�K�. K�� )�.��K��

4�5 4��"��

Q7 Q��"��"�

��

Documents

˘ ˇ ˆ˙˙˝origins.sese.asu.edu/ses405/Failure Reports/NEAR_Rendezvous_Burn… · The Near Earth Asteroid Rendezvous (NEAR) space-craft, after traveling for nearly 3 years, initiated