April 9, 2018

Margins

How do we estimate the resources we need to cover future unknowns?

In every flight project — indeed in every complex project — the costs tend to go up over time from the first estimates at the beginning of formulation. And the mass goes up. And the power requirements. And…you name it. Seldom do any metrics go down.

After decades of experiencing this growth, the aerospace industry has created systems for anticipating where the budget, the mass, the power requirements, and all the other metrics will end up by the time a new system is built. We need to plan ahead for these changes, and have the resources we need from the beginning. The way we do this is by setting margins, or growth reserves, into every metric we can.

In the NASA Systems Engineering Handbook Rev2, “Margin” is defined as:

The allowances carried in budget, projected schedules, and technical performance parameters (e.g., weight, power, or memory) to account for uncertainties and risks. Margins are allocated in the formulation process based on assessments of risks and are typically consumed as the program/ project proceeds through the life cycle.

In other words , unexpected issues arise, and need to be dealt with. And sometimes that means things like the cost, or mass, or power requirements will increase. The set margin does the project to be able to accommodate growth during the planning and building phases.

Over the decades of building spacecraft and the instruments they carry, the Jet Propulsion Lab (JPL), and Goddard, and the other big builders, have learned processes for building flight projects with teams. Their lessons learned have been codified into principles and practices. JPL’s are the JPL Design Principles, for example, and Goddard has their Gold Rules. They are not all the same.

The JPL Principles codify, from experience, the key metrics to identify, track, and report against, and how much margin you should have as a function of the lifecycle of the project.

For Psyche, we keep a lot of margins. A lot. The big ones are cost (overall for the mission, but broken down into every tiny subsystem), power (again in total, but also broken into the power required by every subsystem to function), and mass, but there are so many more, for example:

performance margins (for example, timing, throughput, storage size, latency, accuracy, and precision),
hardware margins (for example, switches: current capability, voltage level, protected or not, arm-and-disable circuitry, fused or non-fused),
science margins (for example, the cameras need to be able to take images at a higher resolution than we need to reach conclusions about our science hypotheses), and
telecomm system margins (the link parameters to the Deep Space Network, for example).

For many of these metrics, and especially for power, cost, and mass, the standard margin to have in Phase B (Formulation), where we are today, is around 30%. As we big deeper and deeper into the design, writing requirements (see a future blog!), we learn that something needs additional switches, or a new card, or a new cable, or a bigger model. There goes some of the cost and mass margin.

In this figure we show the Psyche data as of early 2018 for dry mass. The required margin, in yellow, follows the guidelines agreed upon by the project. The current best estimate and the maximum expected value are data we track for the project month by month. Along the top are acronyms for key reviews leading up to launch. As the plan matures, the possible needed margin decreases.

If anyone on the team sees a possible change needed that would affect cost, in particular, they follow a standard procedure so we can track and decide carefully on these issues. The team member brings the issue, for example, the need to replace a part that we planned for but is now no longer being made, to a committee on threats and liens, which meets every few weeks. We discuss the issue, the actions needed, and the likelihood and size of the impact. We accept it as a “threat” against our margin. And if the decision to spend the extra money is the right one, we make that decision and move the change to the official “lien” against the cost margin. Sometimes changes are in our favor and save money! But more often they are not.

I’ve been wondering: Why is the need for 30% margin in Phase B so universal? Why do we all underestimate so dramatically, so predictably? In a discussion with a group of scientists and engineers, we wondered whether human evolution favored the optimistic risk-taker. Perhaps we are wired to be optimistic underestimators.

But a systems engineer offered another opinion. We can only know so much about a complex system in the beginning. We budget based on what we know. And as time goes by and we dig deeper and deeper into subsystems, writing requirements as we go (see figure), we learn about interactions among subsystems that we had not anticipated. He suggests that interactions among subsystems account for the margin.

So far, we are doing well on Psyche with the margins we have set. So far, no big surprises. But we are working mainly with heritage hardware (hardware that has been tested and flown before), all of which is undergoing only highly reviewed and specific changes, and experienced builders making a standard chassis. Getting a bigger project with new technology to a point of development where it needs only 30% margin at the beginning of Phase B is a much bigger challenge. So, psychologists, sociologists, biologists, and systems engineers, how do we get better estimates of margin? Or better estimates of the true numbers from the start? Let us know!