DELIVERING QUALITY SOFTWARE IN TWENTY-FOUR HOURS A. Gerb, G. Curtis, R. Douglas, N. Nigro, V. Berman and L. Swaminathan The Space Telescope Science Institute 3700 San Martin Drive Baltimore, MD, 21218 ABSTRACT Those committed to long-term support and enhancement of critical operations software have become painfully familiar with the associated dilemmas and tradeoffs: - Frequent Improvements vs. Stability: If we deliver enhancements every time they would be useful, the system changes frequently enough to disrupt operations. - Rigorous Testing vs. Quick Delivery: Testing processes designed to ensure quality slow down rapid deliveries aimed at serving the customer. - Early Testing vs. Stable Testing Area: Testing after modifications have been batched into a release guarantees the stable version needed for reliable testing but is antithetical to the practice of finding problems as early as possible when they are cheaper to fix. At the Space Telescope Science Institute (STScI), we have wrestled with these issues and have recently made improvements in our delivery and testing processes which have allowed us a best-of-both-worlds solution to these tradeoffs. We have been able to realize the following improvements: - We can place a critical enhancement in the customer's hands in 24 hours without compromising our rigorous testing procedure or circumventing our normal delivery process. - Deliveries are made on a "Demand-Pull" basis, with releases occurring within 24 hours of user requests. - Testing occurs in a stable area but before modifications are batched into a release. Problems identified by testers can be fixed on the spot, allowing testing to continue without redelivery being necessary. This has tended to foster a more team-oriented, less adversarial relationship between developers and independent testers. These improvements were realized while supporting the Transformation (Trans) system. Trans is an expert system for turning Hubble Space Telescope (HST) proposals into detailed observing plans. Trans has been operational since 1989 and has prepared all of the hundreds of thousands of observations executed on HST since its launch. Trans must be modified whenever HST is used in a new way, new hardware is installed on HST during a Space Shuttle servicing mission or HST must be operated differently due to degredation of components as they approach the end of their projected operating lifetime. In addition, Trans has been a tool of process improvement efforts aimed at developing its potential to decrease the workload of operations specialists. Trans undergoes hundreds of enhancements every year, ranging from the tweaking of numerical constants to complex multi-month development projects that must be integrated with changes in the rest of the HST ground system. The quality requirements for Trans are stringent. The uniqueness of HST coupled with its limited lifetime make HST time especially precious so the cost of improperly prepared observations is high. The hardware aboard HST is extremely expensive to repair or replace, so considerable care needs to be taken that planned observations cannot damage the satellite in any way. The sheer volume of observations which are processed through Trans demands an extremely low failure rate. For these reasons, a two pronged testing strategy was adopted for Trans. First, STScI is committed to testing of every enhancement by someone other than the developer making the change. Second, every capability must be represented in an automated regression test to ensure that future enhancements do not compromise existing capabilities. Traditional regression testing strategies of differencing output have proven insufficient for Trans. Each observation produces hundreds of parameters generated from a model built incrementally in many stages. A small change early in the model generation could cause pages of differences, analysis of which would be prohibitive. Instead, Trans uses an expert system style testing tool called the "Looker", which encodes knowledge of the human tester and uses it to examine the results of thousands of test observations. Each Looker test is sensitive to the capability being tested while remaining robust to all other changes, so even changes early in the modelling process don't cause a cause a cascade of problems. The Looker now performs over a million tests nightly, distributed over many processors. Despite this, the number of separate problems it finds (when there are any at all) is small enough for each to be followed up in detail. This level of rigor in regression testing does come with a cost. Almost every enhancement to Trans requires a modification to the Looker as well. Under the old delivery procedure, Trans developers made changes into a volatile development version. When the enhancement work to be included in the next release completed, the release was placed in a stable area for independent testing while development for the next release continued in the development area. Once the independent testers were happy with the new version, it was installed for operational use, at which time the next version was normally ready for delivery to the independent test group. These handoffs appeared necessary - testing the volatile development version would be like road grading during an earthquake. Unless they were critical, problems encountered during testing normally waited until the next release to be fixed. This delivery procedure suffered many problems, the most serious of which was our need to circumvent it to deliver critical modifications. Such modifications can be prompted by observations that need processing on a short timescale, expensive manual workarounds or by problems found during testing. Under our normal procedures, a critical enhancement would be required to "wait in line" behind those already delivered while their release undergoes independent testing. It would then be batched into a release with those developed but not yet delivered. All such items, regardless of their importance, would have to be tested before customers could receive the critical change. As a result, any customer request for a change to be delivered quickly would put us in a quandary. Should we keep to our normal procedure, and risk the customer's wrath? Should we forego some of our testing rigor and trade quality for expedience? Should we place an error-prone patch into the testing, or even the operational version, braving the duplication of effort involved? In practice, despite the urgency, the customers usually found the quality tradeoffs unpalatable and requested a redelivery of the development version into testing. This would delay the installation of the release being redelivered. When several time-critical changes were received in succession, the earlier ones would be held up by the delays assocated with the later ones. Ironically, this resulted in emergency enhancements being delivered later than had we adhered to our normal schedule! Our initial solution, arrived at by a Continuous Process Improvement (CPI) team, was to increase the frequency of our deliveries. Since the duration of the testing period is proportional to the number of enhancements (and therefore the development period) we reasoned that more frequent deliveries would have a shorter testing cycle and a smaller minimum delivery time. We shortened our cycle to deliver once every month and then further to deliver once every two weeks. Although this did shorten the lead time to allow delivery of critical modifications in two weeks, it did not solve all of our problems. A two week delay still meant costly manual work in many cases. The two week delivery schedule was fragile. If testing completed late, not only would that release be late, but the next release would encompass more changes (due to the longer development time), and therefore require longer testing. We found that a slip in the testing schedule placed inordinate pressure on the testers to put the process back on schedule. We also found that, except when critical changes were requested, two weeks was more often than customers needed a release. They began to object to the frequent disruption of operations and the increased need to keep track of modifications. We began seeking a solution which addressed the root cause, the need for batching modifications into a release in order to create a stable area to allow independent testing. Removing this need required us to fundamentally alter the process by which we develop, test and deliver our software. Under the new process, instead of incorporating modifications into the development version when the changes are made, they are retained in a "Parallel World", a separate version of some or all of the source code set aside for modification. When development completes, the parallel world is frozen for independent testing. When testing has certified the change, the parallel world is then merged back into the development version. The Looker is run overnight to make sure that the merging process did not compromise the modification, and the software is ready for release the next morning. During the testing period, the developer is free to work on other modifications, since the number of simultaneous parallel worlds is limited only by disk space. Once a day, the development version is in a state of readiness for release, allowing us to respond to customer requests for release in less than 24 hours. This process was made much easier by the incorporation of a public-domain technology called Concurrent Version System (CVS) into our software development process. CVS allows multiple versions of a system to be simultaneously under development and is capable of merging them when development is complete. Creating a parallel world or merging one back into the development version each can be accomplished by a single CVS command. This new process has radically changed the service we are able to provide our customers with no observed decrease in quality. Testers are no longer placed in the position of having to stand in the way of the developers's goals of delivering changes to users in a rapid fashion. The ability of developers to fix a problem in a parallel world and have it retested with no redelivery has removed many of the barriers to teamwork between testers and developers. In the six months since using this new scheme, we have been able to deliver critical modifications in less than 24 hours while development and testing of less important changes waited. The frequency of deliveries has actually gone down, instead of up, since software is now being installed only when customers request it. This has caused a decrease in the effort devoted to interim manual work while waiting for modifications and customers have been able to stop maintaining the tools they used for that purpose. We are now able to offer them the improvements they need when they need it instead of apologizing for our long testing cycle.