A Managerial View of the Multics System Development
1. Introduction
A reasonable question of a software manager might be "What possible insight can I gain from the agonies of someone else's project?" Of course, not much if one takes too literally from another's experience every problem and their reaction to it. But one of the attributes of a shrewd manager is to abstract from the circumstances of others aspects of a previous episode which are germane to his own. Moreover successful large-scale software development efforts are sufficiently rare that their chronicles may still be viewed with the curiosity accorded a traveler to a distant land.
An objection to the study of large-scale projects which is often raised is that the only difficulty is in the word "large." If only one could do it with a small team, so the reasoning goes, one's difficulties would vanish. Such an argument ignores completely those systems which have such vast scope or construction deadlines that they exceed the capacity or technical knowledge of a small group. With time, of course, formerly large efforts become more modest in scale as the sophistication, tools, and engineering knowledge improve but there will always be externally-set pressures to engineer ever more ambitious systems.
The present paper is a case history look at the development of the Multics System from a management point of view. The goals and features of the Multics System have already been described at length [1-6] and a summary of the technical experience of the project has been presented elsewhere [7] Further, there have been several books which address the internal structure and organization of the system [8, 9, 10].
Thus for the present purposes only a brief recapitulation is in order. The system planning, begun in 1964, was for a new computer system which could serve as a prototype of a computer utility. Although Multics is now offered as a standard product by Honeywell Information Systems Inc., the system planning and development was a cooperative effort involving three separate organizations: the computer department of the General Electric Company (since acquired by Honeywell Information Systems Inc.) the Bell Telephone Laboratories (1965 to 1969) and Project MAC of M.I.T. (since renamed the M.I.T. Laboratory for Computer Science and then the Computer Science and Artificial Intelligence Laboratory).
The design goals were many but included:
- Convenient remote terminal use
- Continuous operation (i.e., without shutdown) analogous to power and telephone companies
- A wide range of configuration capacity which could be dynamically varied without system or user program reorganization
- An internal file system with apparent reliability high enough for users to entrust their only copies of programs and data to it
- The ability of users to share selectively information among themselves
- The ability to store and create hierarchical structures of information for purposes of system administration and decentralization of user activities
- The ability to support a wide range of applications ranging from heavy numerical production calculations to inter active time-sharing users without inordinate inefficiency
- The ability to allow a multiplicity of programming environments and human interfaces within the same system
- The ability to evolve the system with changes in technology and in user aspirations.
The above goals were ambitious ones, not because of any one particular idea, but because they had never been tied together in a single comprehensive system. Moreover, to achieve the goals, many novel techniques were proposed.
The importance of the above list of goals from a management viewpoint is that it clearly signaled a strong research and development effort added to the ordinary problems of building a large software complex. Not only had an implementation of the goals not been successfully achieved before, but there were large knowledge gaps in the likely behavior of both system and users. In particular the behavior of the system required extrapolating user attitudes to the domain of rapid-access large-memory computing, presuming what typical programming practices would be, and making assumptions of the behavioral properties of virtual memory systems.
Perhaps it is the normal naiveté of those beginning a large project -- most on the project had never done anything like it before and the average age was less than 30 -- but everyone seemed to assume that those aspects of the system development they were not responsible for would be implemented in an orderly, straightforward, clear way without schedule slippage or serious debugging difficulties. Still there were many major problems which were perceived from the beginning and others which only became apparent with time. To structure the presentation we will take in order sections on the problems perceived early, the problems detected later, a discussion of the various tools which were used, and finally some general observations on software development management.
2. Problems Perceived Early in the Project
As already mentioned, the project goals were ambitious and in turn led to technical challenges in three major areas: hardware, languages, and general software ideas. Let us take these topics up:
2.1 Hardware
The basic hardware system chosen was the newly marketed GE 635 which allowed both multiple processors and multiple memory modules. However, to support the fundamental ideas of controlled information sharing and large memory management at the user level, it was necessary to design rather radical changes in the processor architecture which allowed addressing suitable for segmentation of the memory. Moreover to relieve the user of managing the multiple levels of the physical storage hierarchy, further processor modifications had to be introduced to support the management of paging by the supervisor. In addition processor architectural changes were required for the hardware access control mechanisms so that the supervisor could properly control access to all information in the system. To support the supervisor functions, both an interrupt clock and a non-repeating calendar clock had to be designed. To support the projected large number of communication lines to terminals, it was necessary to design a special purpose input-output controller. Finally, the requirements of information storage also dictated the design of a high-transfer-rate drum as a second-level paging buffer and the design of an ultra-high-performance disk.
One property of hardware design is that it must usually be done first and yet hardware problems are the most difficult to correct or change. This results partly from the rigidity of the engineering practices and procedures which must support the maintenance of the hardware and partly from the large fan-out of design decisions which are a consequence of the basic hardware architecture. This property, which emphasizes prudence, along with the number of design problems listed above, had the result of producing unforeseen delay in hardware delivery schedules. Not only was the design time prolonged by careful checking for consistency and coherence but the unfamiliarity of the new ideas often led to communication difficulties among hardware and software designers.
Another cause of delay was confusion over reliability specifications of hardware. Often those responsible for software would expect hardware when delivered to be at the ultimate reliability level; conversely, hardware engineers would expect reliability to follow a learning-time experience curve and reach the final specifications only asymptotically after an extended period of time. The result was unanticipated delay when unreliability seriously hampered the system debugging process. Moreover a large amount of unanticipated software effort was expended in developing software programs to recover from transient hardware failures.
These learning-time difficulties were particularly apparent in the information storage parts of the system and affected the core memories, the paging drum, and the disks. In the first two cases the difficulties were eventually overcome but in the case of the disk design, several unplanned substitutions became necessary using later technologies.
Further complications arose from the specialized input/output controller, which while effective, was unusual in design and intricate in operation. This resulted in an inordinate amount of hardware and software design time and a substantial engineering burden to support its relatively unique properties. Later in the project, as the overall system design became streamlined and simplified, the controller was replaced by a more conventional one which was fabricated using a later technology.
The significance of the above hardware problems, typical of the industry at the time, lies in their cumulative impact upon the project. In fact the overall set of design problems was anticipated; but it is also fair to say that it was not properly appreciated then how many things could go wrong -- or in effect -- how much the state-of-the-art was being pushed. In retrospect, the major miscalculation was not to have anticipated the normal unreliability of newly developed hardware, for if there had been better prediction there could have been better planning. The result, of course, was delay in the evolution of the project and delay in the discovery of other problem areas.
2.2 Languages
An immediate consequence of a significant departure from previous system design is the obsolescence of all previous software. Not only is an assembler not available but the usual compilers and general software aids and debugging tools are also missing. Partly because the construction of assemblers was felt to be understood, several false steps were taken. Initial attempts to patch over an assembler from the related GE 635 were finally abandoned when the cumulative magnitude of the system architectural changes began to be appreciated. A second effort to develop an elaborate macroassembler was also abandoned in the face of unexpected difficulties which produced an ever receding completion date. Finally a primitive assembler was built to serve both the purpose of direct coding and as a final pass of a compiler. As with many other activities, the simplest form survived.
As unexpected as the assembler difficulty was, it was dwarfed by the problem which arose out of the compiler implementation. Most of the system software was to be programmed in the compiler language and as a result there was a strong interest in choosing one which was advanced and comprehensive in its design. The language chosen was PL/I partly because of the richness of its constructs and partly because of the enthusiasm of those planning to implement the compiler. The difficulties in carrying out this implementation are described in great detail elsewhere [11] but one can summarize by noting that the language implementation became an inadvertent research project. Not only were efficient mechanisms unknown for many of the language constructs, partly because of the consequences of their interaction, but there also had to be designed major interfacing to the supporting system environment which itself was under development. One compiler version to be built by a vendor never became operational when it belatedly became clear that typical FORTRAN compiler construction techniques did not extrapolate well. There was no contingency plan for this disaster and massive efforts were required to patch a "quick and dirty" compiler into a useable tool. Only several years after the start of the project was a compiler of the quality anticipated by the PL/I designers finally produced.
It is now obvious that the Multics development would have been much easier if a simpler language had been chosen for implementation. In choosing another language there would still have been a very difficult compromise to make in choosing one of those existent in 1965. Alternatively developing a new language (or subsetting PL/I) would have required a firmness and dogmatism which was incommensurate with the structure of the project organization. Nevertheless, in hindsight, it is believed that one of the two later courses would have been the more effective development path. However, having been successfully developed, PL/I is now one to the strengths of the system.
2.3 New Software Ideas
In order to implement the system goals described earlier a variety of new software techniques were needed. Notions of user management of his memory and storage space were to be approached with the technique of using segments. Each segment would represent a user-namable region of memory which would maintain its identity even while a part of the user's active memory address space. Further, the sharing of information, data and programs would be done by the user specifying for every segment a list of those allowed each of several classes of access. These lists themselves could be varied by a user at any time. Segments themselves were to be stored in a physically multi-level and logically hierarchical file system which each user could name and structure to suit his filing needs. Both the segmentation and file system were to be supported by system-mediated paging which would dynamically and automatically move information among the physical memory levels in response to execution demands and the duration of reference inactivity. This virtual memory system was to be unique in that the entire file system was to be included in the one-level store that the processor acted upon.
Finally the user environment would include conventions which would allow all procedures to be possibly recursive, to be pure (i.e., not self-modifying), to allow automatic retrieval and binding of sub-programs only upon execution demand, and to support rings of protection as a generalization of the user-supervisor relationship. Most of these ideas were individually understood but never before had there been an attempt to synthesize a coherent system containing all of them.
It was in the software synthesis area where the difficulties were best anticipated -- probably because new ground obviously had to be broken. Nevertheless it is fair to say that the iterative nature of the design process was not sufficiently appreciated. The need for iteration develops in part because of the magnitude of the effort and the inability of a single individual to comprehend the effect of a particular module design on the system behavior as a whole. Not only does he usually not know the expected usage pattern of his software path, but it is hard to estimate the impact on the system performance of an occasional exercising of it. A straightforward approach is to create a perhaps crude and incomplete system, begin to use it and to observe the behavior. Then, on the basis of the observed difficulties, one simplifies, redesigns, and refines the system. In the case of Multics, most areas of the system were redesigned as much as half a dozen times in as many years.
Such a redesign is done with hardware where breadboarding and test machines are common occurrences. There it is assumed that design iteration will occur and no one expects to go into production using a prototype design. It is only in recent years that it has begun to be recognized that software designing is similar.
A second reason for underestimating the need for iteration is that systems with unknown behavioral properties require the implementation of iterations which are intrinsic to the design process but which are normally hidden from view. Certainly when a solution to a well-understood problem is synthesized, weak designs are mentally rejected by a competent designer in a matter of moments. On larger or more complicated efforts, alternative designs must be explicitly and iteratively implemented. The designers perhaps out of vanity, often are at pains to hide the many versions which were abandoned and if absolute failure occurs, of course one hears nothing. Thus the topic of design iteration is rarely discussed. Perhaps we should not be surprised to see this phenomenon with software, for it is a rare author indeed who publicizes the amount of editing or the number of drafts he took to produce a manuscript.
It is important to recognize too that "state-of-the art" is to be measured with respect to the experience of the team actually doing the work. Thus, even though a particular design already may have been implemented successfully somewhere, a team with little background related to that design should recognize that their activities will contain a large component of uncertainty due to the research they must do to get up to speed. Overlooking this "locality of expertise" can lead to the misjudgment that a task is a routine engineering effort rather than the more realistic characterization as a research and development project.
In summary, the inevitable design indeterminacy characteristic of a large, state-of-the-art system when coupled with inexperienced project management experience, scheduling will always be a problem since predicting the pace of research is at best an educated guess. Planning explicitly for at least one complete design and implementation iteration may lead to some apparently extravagant schedules but for those developments where the solutions are poorly understood such planning usually proves to be realistic, if not optimistic.
2.4 Geographical Separation
Geographical separation (Massachusetts, New Jersey, Arizona) was an obvious difficulty in pursuing the development of a system, yet one which was felt to be outweighed by the strengths which each organization brought to the project. On a project with a strong developmental flavor it is hard to eliminate a need to interact with fellow designers and implementors on an impromptu basis. Three types of technology were employed but the communication obstacles were never completely overcome. These technologies were the telephone, the airplane, and the xerographic copying machine. The role of the telephone, which included transmission of programs and data, was clearly important, as was the availability of good one-day airline service between two of the sites. Not so obvious was the importance of the modern copying machine industry which just came to maturity as the project began. The ability to disseminate rapidly memoranda describing prototype designs was vital to the success of the project; design interactions proceeded in parallel and at an enlivened pace while ideas were fresh in the heads of the authors and readers. Moreover by allowing the fast distribution of definitive design memoranda, technical control of the project was greatly enhanced. The key point is that early capture in written form of design descriptions is vital for orderly development, and one must be careful not to let formal publication mechanisms be a hindrance.
2.5 Three Organization Cooperation
The reasons for wanting multiple organization participation were strong. Each organization had unique ingredients contributing to the ambitious project goals. The computer manufacturing, marketing, and maintenance could best be done by a company dedicated to that business. The computational expectations and requirements of a diverse company with needs ranging from engineering to business management could clearly be supplied by the Bell Laboratories. Moreover they had special competence in the communication area and had shown themselves to be inventive and sophisticated users of previous computer systems. Lastly the university, often a source of innovation, could be expected to approach problems without belong constrained by past solutions; certainly several years of pioneering development of time-sharing systems gave evidence of this.
And yet there were problems too. Most important was that the project remained a cooperative since none of the organizations felt able to accept the ascendancy of the others. The effect was that of a three-cornered marriage where success depended on each party wanting to make the arrangement work. Such an arrangement was obviously organizationally weak. Decision-making depended on tacit agreements of leadership responsibility and more importantly on a consensus by the half-dozen or so major technical leaders in the three organizations.
Diffuse responsibility along with intra-organizational allegiances required diplomacy and tact. More significantly though a loose organizational structure made it difficult for ruthless decisions and encouraged design by aggregation. These problems ware recognized and consciously fought, but decisions to kill a design or more frequently an implementation were often postponed until negative evidence was nearly overwhelming. As a result of this rejection mechanism weakness, schedules inevitably became prolonged.
3. Problems Which Became Apparent Later
One distinction between those problems just described and those which developed later is that the latter are more systemic in character. Many of these problems were a result of the project stresses produced by the delays resulting from the first set of problems.
3.1 Different Organizational Goals
It is of course unrealistic to expect every organization involved in a project to have the same objectives. In the initial enthusiasm of a project it is easy to ignore differences. In the case of the Multics project the three organizations attached different levels of importance to 1) research and development in computer science, 2) development of a commercial product, and 3) useful computational service in the short term. The inclusion of several objectives in turn left unresolved priorities of time, money, or concept demonstration. The default effect was to favor concept demonstration but not without great stress because of the unplanned delay which resulted.
3.2 More Than Two Year Project
As soon as a project extends beyond a couple of years, several new needs gradually appear. Foremost is that turnover of highly trained and specialized staff develops. Although the reasons for departure were usually unrelated to the project, nevertheless replacements had to be recruited and trained. Because of the highly developmental character of the project, bringing a new person "up to speed" usually took 6 to 9 months. More importantly, to maintain this training process, a generous portion of the project personnel resources had to be directed towards documentation and education.
One might argue staff turnover is an obvious consequence of delay in a large project. Not so obvious however is sponsor turnover. In some form, every project has at least one sponsor, and in the case of Multics there were three. The initial sponsors were enthusiastic supporters of the project (or it never would have started!) but also had many other responsibilities. With time promotions, changes, and reorganizations are inevitable and executive replacements are made. There then develops the need to repersuade key persons that project concepts and goals are both desirable and feasible. Such repersuasion represents an overhead requiring the very best project personnel since the life of the project depends upon successfully educating and communicating. One can only conclude that long projects are especially hazardous because of the mandatory success of this "reselling" requirement. Obviously any project segmentation into useful subprojects enhances the likelihood of success since periodic exhibits of progress are more easily demonstrated to sponsors.
A long project also has to address the need for evolution. Fortunately the Multics project had as one of its goals such a requirement and evolution became a major factor in the survival of the project. Not only were there changes in the level of technology, e.g., from transistors to integrated circuits from drums to large memories, but also there were changes in the relative importance of conceptual objectives and the techniques used to achieve them. The burden which evolution requirements place upon a system are many but include 1) careful design of functionally modular domains and procedures within them, 2) argument passing and control transfer conventions which are fail safe, 3) enforcement of programming standards, and 4) in general a vigorous adherence to good software engineering practices. Such practices on a first attempt are frequently less effective than artful but gimmicky short-cuts. Moreover it is important to recognize that there is a design time overhead associated with the discovery of general, but effective, techniques. Nevertheless if evolution is to be achieved, this design investment cannot be avoided.
3.3 Misestimated Schedules
Almost all large projects face the difficulty of schedule slippage. Partly the difficulty is inexperience in following an unknown path. Some of the trouble is that individuals are often unconsciously wishful. Usually there is no realistic penalty for a slipped schedule and this encourages optimism. Partly too there is miscommunication of the significance of milestones. For example, software modules said to be complete may still have serious bugs that produce system "crashes." Occasionally also individuals estimate too conservatively either out of personal reluctance to follow a particular design strategy or because of an overzealous desire for a "safe" schedule. The Multics project had all of these problems but in the latter phases, as individuals and managers became more experienced, schedule estimations became more and more reliable. No panacea was ever discovered except that of better familiarity with the tasks to be accomplished.
3.4 Imbalanced Resources
Perhaps it was a carryover from the early computer system days when successful production of hardware was the primary problem but the development of Multics software was hampered by a shortage of computer time. This shortage was in part due to unanticipated delays generating a sudden need for more help; clearly, system programmers were available on a shorter time scale than hardware so the response was to increase the programming staff. However these programmers in turn created more demand for computer resources and greater communication and coordination problems. Thus for short-term reasons some long-term delay was probably introduced [12]. More importantly, because the budgets for personnel and for equipment were handled differently, a non-optimum mix of resources was applied. In retrospect it would have been better to add more hardware even at the expense of reducing the programming staff.
3.5 Unnecessary Generality
Because a major requirement of the system design was that it achieve a consensus endorsement, it was always easier to design for generality. Some of these design efforts were so expansive they bordered on the grandiose and were easily detected, but far more designs suffered from the subtle problem of taking on too many requirements at once. Here the judgment was a pragmatic one, namely, that the resultant mechanisms were either too complicated or too ineffective. Ponderous mechanisms after programming were often allowed to go into the debugging phase; hindsight suggests it would have been better to budget in advance the amount of program that a set of ideas was deemed to be worth and to make their initial implementation be as lean as possible.
4. Management Tools And Their Effectiveness
Although system developers often bemoaned the lack of tools for development and management often suspected excessive tool building, it is probably true that most projects could benefit from more attention to the methodology of design, construction, and production. The Multics development was no exception. However, during the history of the project, a significant number of techniques and ideas ware used. A list of the more important ideas with some brief remarks is:
-
High Level System Programming Language: The best comment that can be made is that the system would never have been completed without the use of a high level language for most of the system programming [11].
Although not without problems, the use of the high level language made each programmer a factor of 5 to 10 more productive in a coding sense and more concerned with the semantics than the syntax of modules.
- Structured Programming: Although the phrase had not been coined, structured programming principles were used under the guise of "good engineering practice". The use of the language PL/I, the establishment of call/return conventions, and argument passing and intermodule communications were all aspects of this usage. The model of hardware design and engineering served as a major inspiration. Although good engineering practices can never be a complete solution, since they only describe a style, they imposed an order to the project which was a strong sinew of strength.
-
Design Review: Perhaps the most difficult problem in designing a complex system is assigning the right person to the right job. Prior experience is not always a good indication of how a person will perform in a new situation. Youth and immaturity make judgment suspect. Thus the design process must be approached warily. In the case of Multics the general strategy was to let design leadership be exposed rather than imposed. Potential designers were first asked to write position papers describing the design problems their scope and realistic solutions. If these position papers were persuasive a design document was next initiated which proposed a particular mechanism (and omitted alternative designs). If after a review of the document by his technical peers a consensus was reached, a set of module designs was prepared by the designer. In turn, the same designer was then expected to implement and debug his ideas, perhaps with assistance but without loss of responsibility.
The above process, although not flawless, was very effective in forcing ideas into written form suitable for design review. By coupling design with implementation responsibility communication problems were minimized. And the written design document became a part of the vital System Programmers' Manual which was the definitive description of the system. This manual became a crucial educational tool when staff turnover developed in the later phases of the project.
The above design process was carefully applied in the early design stages but in retrospect could have been carried further. In particular a rigorous review after programming but before debugging can be of immense value in minimizing waste effort and debugging time. Such a review should include mandatory reading of all code by at least one other peer. One should expect a scrutiny of style, logic, and the overall algorithmic behavior. On those occasions when such practices were applied, either major improvements usually occurred, or in other instances entire design strategies were revised. The principal obstacles to universal application were the absence of a disciplined design tradition among programmers and the occasional unwillingness by managers "to waste two men on the same job."
- Test Environments: in at least two instances, the file system and the communication system for terminals, it was possible to isolate and simulate the input-output behavior of a section of the system. Such isolation was of immense value when the system was being upgraded since it decoupled the debugging of major areas. Not only did such decoupling reduce the number of system integration bugs but it significantly reduced the total debugging time since integration bugs were usually the most difficult to analyze.
-
Production Techniques: Most important of all the software tools used for development was the system itself. The rate of system improvement went up dramatically after it was possible to do all software development in a compatible self-consistent environment. The development use of the system for itself was not possible until a period of over four years after the beginning of the project. The long delay period strongly suggests that more effort should have been applied to establishing a sub-design of a skeletal subsystem which could have been placed in operation sooner and then evolved into more complete form.
A consequence of using the system itself for development and of the design goals of easy maintenance and evolution was the option of rapid system changes. Mechanically, changes to some parts of the system could be made literally in seconds while more central changes might require 10 minutes. It was rapidly found that the principal obstacle to change was the ability to update and propagate the system documentation information. Gradually a pattern developed where a new system was installed once or twice a week under the direction of a single person acting as an editor (in the magazine sense) who also was accountable for any reliability problems which developed. Gradually, too, the process of submitting and assembling systems was made more and more automatic, thus lessening the chance of human error. The ability to make system changes without long delays was especially efficient since it allowed the consequences of module changes to develop while the details were still fresh in programmers' minds.
-
Management and Performance Tools: Already described elsewhere are various tools which were used in evaluating system performance. Perhaps the most fundamental idea was the decoupling of the gathering of event information from its presentation. Thus module writers were encouraged to record significant events in single memory cell counters; it was then possible later to write programs which would analyze the raw data into more meaningful form.
PERT charts were attempted in early stages of the project but with very mixed results. The difficulty was that the relational structure of the chart would drastically change from month to month as a result of unexpected delays or failures in key tasks. However the exercise of preparing the chart was effective in forcing designers to think through better the problems of system integration and planning.
Inventories of completed modules, their sizes, and the status of their debugging were kept, especially in the early stages The contribution they made was more valuable in demonstrating lack of progress than in telling how nearly accomplished a task was. Perhaps the most effective way to view an inventory is as a lower bound of the work to be performed.
5. General Observations
Although comments have been made throughout there are further observations possible regarding the overall problems of software development management.
5.1 Size Is a Liability
Traditionally the best software has been produced by individuals and not teams. Not the least of the reasons for this phenomena is the need to communicate among the team members. Clearly if a project is totally disorganized an n person team needs on the order of a n^2/2 communication links. If a full hierarchical structure is assumed one only needs a bit more than n links. The rub is that one also has introduced (assuming a fan-out of 6) on the order of log6 n levels to the team and a new danger of managerial isolation.
Because individuals when reporting upwards in a managerial frame often filter out bad news, for fear failure will be equated with incompetence, it is necessary to conclude that "perfect" hierarchies are especially dangerous in technical management. What is being observed here is not that n should be 1 but rather that enlarging the personnel on a project should be viewed as a major management decision.
5.2 Inhomogeneity of Technical Understanding
The ultimate limitations to the complexity of a large system are the resources required for its fabrication and widespread understanding of how it should function. In an overly ambitious project, managers who do not understand the details of what they are managing are easily blustered and misled by subordinates. Conversely, low-level staff may be unable to appreciate the significance of details and fail to report serious problems. In moderate form such confusion generates extra design Iterations.
Such inhomogeneity of system knowledge is hard to quantify but a useful test is to consider how interchangeable all the personnel on a project are. To the extent each manager cannot program the lowliest module and the most junior programmer does not understand the strategic system objectives, the system design process is vulnerable to mishaps of misunderstanding. Of course the converse does not guarantee success, but the likelihood of detecting problems at an early stage is certainly higher.
5.3 Decision Costs
Inevitably in building a large scale system many decisions must be made on the basis of very incomplete information and one might wish to consider contingency planning. Unfortunately contingency planning if it is done with complete redundancy is particularly costly both in requiring major budgeting and in the use of key leadership. Moreover, even the most disciplined professional engineer may have serious ego problems when the subproject he has worked on for many months is deemed second best and scrapped. One can endure these problems on projects of great national importance such as the NASA program placing men on the moon or the war-time Manhattan Project developing nuclear weapons, but ordinarily major contingency planning is not available as a realistic hedge for decisions.
If only partial contingency planning is done, there will be increased project complexity. Moreover unexpected events or developments will inevitably occur. As a result a great deal of the success of a project will depend upon the improvisational skill and the resources available to the technical managers. The perils of insufficient effort belong devoted to decision making are obvious, but it is argued that there is a balance between the extreme of "overdeciding" and maintaining the flexibility to react to the unforeseen.
Not all decisions are equally important and deciding the relative importance of a decision is one of the more critical tasks a manager faces. Clearly the impact of some decisions such as the choice of a brand of computer or a particular programming language can permeate a project completely. Not so obvious is the more subtle effect of software conventions such as the choice of a character set or a module interface. Again design understanding can only reinforce the decision making process with the likelihood of wiser choices.
5.4 Psychology
There are several places where psychological issues enter into software projects. In contrast to many engineering projects, one cannot readily "see" progress on a software system. This intangible aspect dictates that a more expert means must be employed to gauge the level of project accomplishment. In such a situation the selection of frequent "milestones" becomes an important management consideration. It is especially important that two properties are met: 1) there is an unequivocal understanding of when the milestone is passed, and 2) the significance of the milestone relative to the final project objective is comprehensible to all levels of personnel.
A second area where psychology enters into a software project is that of the programming staff. Frequently programming teams are composed of imaginative energetic young staff members who all too often are inexperienced with large projects (not many persons today have worked on two!), naive in a broad sense, and sometimes simply irresponsible. One might wonder why one assigns them to tasks at all. Unfortunately because of the immaturity of the software profession, the alternative of staffing with older, more mature programmers can just as easily be worse. Too often the field has evolved faster than the ability of individuals to grow technically. As a result excessive inflexibility and dogmatism are frequently observed traits of older programmers, many of whom appear to be trying to relive some programming triumph of decade ago.
One consequence of a professionally immature staff is that attempts by management to monitor individual performance are frequently resented. This monitoring problem is believed to be easing as it becomes clearer what standards of professional performance are. Today most programmers consider it reasonable that their programs be audited for quality before acceptance whereas only a decade ago such an inspection often precipitated a personnel crisis.
Part of the immaturity problem faced by large projects is traceable to the rapid development of the computer field and the consequent shortage of good programmers. But another part has a more insidious reason. As a project evolves into new areas of development and accomplishment it is inevitable that individuals will develop key knowledge and become genuine experts in subcomponents of the effort. Such key knowledge or "know-how" is not rapidly transferred and may require six to twelve months of effort even with highly qualified and competent programmers. In a highly structured and intricate project, knowledge compartmentalization can lead to serious project difficulties if managers are inhibited (even unconsciously) from exercising effective control over a key person for fear that he may quit. With time the professional expectations of a programmer are rising, so this problem should ease in the future. Nevertheless it will probably always be true that the only prudent course a software manager can take is to have considerable personnel redundancy in all critical knowledge areas. In this way one can keep the project vulnerability to intimidation at an acceptable level and one also has insurance against incapacitating accidents to those playing pivotal roles.
5.5 Evaluation of Progress
One of the more unsatisfying areas in discussions of large software projects is that of evaluating progress. Clearly one can set schedules for major milestones with experienced estimation and (and often a good deal of guessing and hedging). And it is easy enough to monitor progress and even debate the reasonableness of such items as only one week for debugging a particular module or two weeks to integrate a pair of large subsystems. What is hard, however, is to prejudge the overall performance and acceptability of the result. Frequently software specifications are diffuse and incomplete (e.g., make the module easy to maintain) and implementations of subsystems can unexpectedly resemble the seaweed in the Sargasso Sea.
Simulation is often suggested as an answer for performance questions but with large software systems, the difficulty of correctly modeling the system is commensurate with that of building the system itself. Just because the performance evaluation question is hard does not mean one should behave in ostrich fashion and ignore it. Rather the emphasis should be on developing crude quickly evaluated system models and measures which allow one to make rough predictions of system performance.
For example, there should be an expectation for the amount of program required to implement the function of every subsystem so that the input-output and primary memory impact can be computed. Critical software paths should be counted out instruction-by-instruction to get lower bounds of performance. Worst case timing and capacity estimates should be developed. The important thing is that all the models and calculations be kept simple enough so that they can be frequently recomputed as better information develops.
An oversimplified modeling and estimation approach can easily be off by 25 to 50%. However large software projects usually have not failed by these amounts, large as they are, but rather have foundered on orders of magnitude miscalculations.
But if the above is the case, of what value can an earlier prediction of the disaster have? The answer lies in that with early warning, one has the chance of redirecting a project without fatal compromise of the objectives. Necessity is a great stimulus to seeking out more effective solutions especially when the survival of the project is at stake. Further there is a process of trading off performance with features. Just as an experienced hiker when filling his backpack may start with a large number of almost "essential" items then reevaluate their importance and discard some of them as he begins to weigh the total load, so too can the essentialness of program features be reevaluated. Thus the real importance of early performance warnings is that they can not only save people time and computer time but they may even allow a project redirection from a disastrous course.
6. Conclusions
If one observed a long, involved military engagement one would not be inclined to form a single conclusion. And so it is with our experience with the Multics system development. One can observe though that despite the unexpectedly large technological jump which was undertaken, the development effort did succeed, and today the system has become a viable commercial product.
There were four major reasons we would single out for the successful development. These are:
- The system was built to evolve. Without this property, one has a ship without a rudder. With it one can revise one's course as the unexpected is encountered or as one's destination changes.
- The system goals were articulated in an extensive body of papers and memoranda. As personnel changes inevitably occurred, the transmittal of philosophical ideas was possible without distraction of the more important team members.
- The system was implemented in a higher-level language so that the effectiveness of each programmer was amplified and the project size minimized.
- The system was implemented by a development team whose members were extraordinarily loyal and dedicated to the project goals. By conventional practice, the project management should not have been able to function effectively because of its loose structure. The organizational weakness was overcome by the collective determination of the individual team manners who wanted the project to succeed.
Of the above reasons the first, evolvability, is the most important technically. But one cannot discount the last, which one might label inspiration. For without it, no really difficult project can succeed.
7. References
- Corbató F. J. and V. A. Vyssotsky, "Introduction and overview of the Multics System," AFIPS Conf. Proc 27 1965 FJCC,Spartan Books Washington D. C., 1965, pp. 185-196.
- Glaser E. L., J. F. Couleur, and G. A. Oliver, "System design of a computer for time sharing applications," AFIPS Conf. Proc 27 1965 FJCC,Spartan Books Washington D. C. 1965 pp. 197-202.
- Vyssotsky V. A., F. J. Corbató and R. M. Graham , "Structure of the Multics Supervisor," AFIPS Conf. Proc 27 1965 FJCC,Spartan Books Washington D. C. 1965 pp. 203-212.
- Daley R. C. and P. G. Neumann, "A general-purpose file system for secondary storage," AFIPS Conf. Proc 27 1965 FJCC, Spartan Books Washington D. C. 1965 pp. 213-229.
- Ossanna J. F., L. Mikus and S. D. Dunten, "Communication and Input/output switching in a multiplex computing system," AFIPS Conf. Proc 27 1965 FJCC,Spartan Books Washington D. C. 1965 pp. 231-241.
- David E. E. Jr. and R. M. Fano, "Some thoughts about the social implications of accessible computing," AFIPS Conf. Proc 27 1965 FJCC, Spartan Books Washington D. C. 1965 pp. 243-247.
- Corbató F. J., C. T. Clingen, and J. H. Saltzer, "Multics - the first Seven Years," Proc. SJCC, May 1972. pp. 571-583.
- Organick E. I. The Multics System: An Examination of Its Structure, MIT Press, Cambridge Massachusetts and London, England. 1972.
- Watson R. W. Timesharing System Design Concepts ,McGraw-Hill Company New York, New York, 1970.
- Ikeda Katsuo, Structure of a Computer Utility: Anatomy of Multics (In Japanese), Shokoda Co. Ltd. Tokyo Japan 1974.; second edition 1976.
- Corbató F. J., "PL/I as a tool for system programming," Datamation 15, May 6, 1969 pp. 68-76.
- Brooks Frederick P. Jr., The Mythical Man-month - Essays on Software Engineering, Addison-Wesley Publishing Company Reading Massachusetts 1975. (See Chapter 2 especially).
- Saltzer J. H., and J. W. Gintell, "The Instrumentation of Multics," Communications of the ACM, Vol. 13, no. 8 Aug. 1970 pp. 495-500.
This research was supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Office of Naval Research under contract number N00014-70-A-0362-0006.
Presented at the Conference on Research Directions in Software Technology, Providence, Rhode Island, October 10-12, 1977. The conference proceedings were published as Research Directions in Software Technology edited by P. Wegner, MIT Press, 1979.
(Also published in Tutorial: Software Management, Reifer, Donald J. (ed), IEEE Computer Society Press, l979; Second Edition l981; Third Edition, 1986.)
(Preprint handed out to Multics project members as MTB-354.)
© Copyright 1979, M.I.T. Press. Posted by permission.