andJOHN W. GINTELL
An array of measuring tools devised to aid in the implementation of a prototype computer utility is discussed. These tools include special hardware clocks and data channels, general purpose programmed probing and recording tools, and specialized measurement facilities. Some particular measurements of interest in a system which combines demand paging with multiprogramming are described in detail. Where appropriate, insight into effectiveness (or lack thereof) of individual tools is provided.KEY WORDS AND PHRASES: instrumentation, performance measurement, multiprogramming systems, measuring tools, system analysis, multics, metering, event tracing, demand paging, script driven measurement
In the construction of a modern, complex computer operating system, sophisticated tools are needed to measure what is going on inside the system as it runs. The list of hardware and software tools and techniques used for the measurement of Multics is interesting both from the point of view of what has proved to be important to measure and what has not. Multics is a project whose intent is to explore the implications of building a comprehensive computer utility. The specific goals of Multics are described in a series of papers written in 1965 [1]; briefly, the objective is to create a computer operating system centered around the ability to share information in a controlled way and permitting application to a wide variety of computational jobs. A spectrum of user services, including a hierarchical file organization, sharing of information in core memory, dynamic linking of subroutines and data, parallel processing, and device independent input output facilities, characterizes the system and contributes to a complexity that makes careful instrumentation mandatory.
Two implementation techniques used in Multics call for specialized measurements. The first of these is a multiprogrammed multiprocessor organization, chosen to facilitate continuous operation of the utility and for ease of system scaling. The second technique is exploitation of an ability to begin executing a program which is not completely loaded into primary memory: this technique, usually named demand paging, is intended to exploit and encourage a tendency of programs to localize their references to primary memory in any given period of time. Since these two techniques when applied simultaneously require interacting multiparameter controlling algorithms, measurements must be made to check on the resulting performance and to allow adjustment of the parameters.
Multics, as a research project, contains new ideas arid new combinations of old ones. As a result, in its design there have been a dismayingly large number of choices to be made: strategies, algorithms, parameter settings, and emphasis on importance of design and speed of individual modules. Since the presumption was made at the start that some wrong choices would be inevitable, there has been an emphasis on integrated instrumentation from the earliest design of the system. The result has been an ability to rapidly identify bottlenecks. In particular, two effects have been observed:
Many of the measurement techniques used on Multics are not new. They are mentioned anyway, because it is the array of techniques used together which has been valuable, and also the relative importance of various techniques, old and new, is different in the Multics environment than elsewhere. One should not presume that all the measurement techniques described here were thought out in advance, though many were. Much of the experience in measuring Multics has been discovering what additional measuring facilities were needed.
We begin by describing three hardware tools which aided in construction of measuring facilities. Then eight general programmed measuring facilities are described. This is followed by a brief discussion of the built-in instrumentation used to monitor multiprogrammed demand paging. After a description of techniques for obtaining controlled measurements, a few observations about measurement of operating systems conclude the discussion.
The calendar clock consists of a 52-bit register which counts microseconds
and is readable as a double-precision integer by a single instruction from
any central processor. This rate is in the same order of magnitude as the
instruction processing rate of the GE-645, so that timing of 10-instruction
subroutines is meaningful. The register is wide enough that overflow requires
several tens of years; thus it serves as a calendar containing the number
of microseconds since 0000 GMT, January 1, 1901.
There are three advantages to this hardware clock design compared,
say, with a software clock simulated by a processor interval timer (a technique
which requires no extra hardware in most present-day computers):
The second hardware tool is simply a modification of the ubiquitous processor interval timer; in the GE-645 this "timer" counts the number of memory references made by the central processor rather than the number of clock ticks. There are at least three reasons for interest in such a measurement:
The third hardware tool is an input/output channel which can run in an endless loop, once initialized, without attention from the operating system. The particular endless loop programmed is a "read into the address part of the next command" followed by a "write out repeatedly the contents of the word whose address was just read". The channel is connected, by a 2400 baud telephone line, to a Digital Equipment Corporation PDP-8/338 programmable display computer. With this channel, the PDP-8/338 program can monitor the contents of any Multics data bases for which it knows the location and format. The data rate involved-less than 60 words per second presents a negligible I/O and memory cycle load to the GE-645 system. Since no GE-645 processor code is executed in obtaining the data (as would be the case if one of the system's users probed periodically into a data base), one can be confident that the act of probing has not significantly affected the measurement. This slow data rate does make it difficult to monitor a rapidly changing data base.
The first, and most elaborate, of these tools is a general metering
package which records time spent executing selectable supervisor modules
while the system is running. For each selected module the metering package
records the number of times the module is invoked and the total execution
time accumulated within each of a number of ranges of execution time.
Four modules associated with implementation of the Multics virtual
memory [5] were intuitively felt to be potential system bottlenecks and
thus were chosen for initial integration with the measurement package.
The first of these is the dynamic linking procedure, which is invoked when
a procedure makes a symbolic reference to another procedure or data. The
second module is the missing segment procedure which is invoked to set
up the environment required for paging. The third module is the missing
page procedure which is invoked when a program refers to a page not in
primary memory. The fourth module is the wall crossing procedure which
is invoked each time the process needs to switch from one protection ring
(domain of access) to another ring. Such a switch occurs, for example,
on each call from a user program to a supervisor procedure.
The measurement of time spent executing a module is complicated by two problems. The first problem is that the central processor is multiplexed among many processes; thus something more than reading the calendar clock at the beginning and end of execution of a module is required to compute time spent executing in the module. Time spent waiting for I/O operations on missing pages or for a lock to clear is not counted as part of the module's execution time whenever the situation permits multiprogramming during the wait. The second problem is that modules may invoke other modules (including themselves) during their operation and provision must be made for this situation. For example, during the handling of a missing segment both missing pages and additional missing segments may occur, each of which must be handled in order to proceed with the original missing segment handling. The rule is that time spent in a nested module is not charged to the nesting module if the nested module is also being metered. By this rule it is possible to perform a pair of experiments to learn the amount of time spent in a nested module as a result of use of the nesting module. For example, if one first meters both missing segment handling and missing page handling and then later meters missing segment handling only, the second experiment will show missing segment handling time increased by just the amount of missing page handling which was triggered by missing segment handling.
A segmented system provides a simple way to detect how time spent in the system is distributed among the various components. The second tool, a segment utilization metering facility, sets the calendar clock to interrupt periodically (typically every 10 milliseconds). When the interrupt occurs, the segment number of the segment which was executing is used to index into an array of per-segment counters and the appropriate counter is incremented by one. After the system has run for a while this table can be sorted, and the resulting distribution of segment utilization can be printed out, listing the most popular segments first. This facility is similar to the one described by Cantrell and Ellison [4].
A related, third tool records on a per-segment basis the number of missing pages and segments encountered during execution in that segment. Both of these measurements can be coupled to the general metering package described earlier. If coupled, the arrays are updated only during execution of metered modules, or possibly only when outside the metered modules but within a specified process. This latter option permits detailed analysis of any user program.
Two examples of the use of these three measurement facilities illustrate their utility. The first significant application of these packages was in the analysis of missing page handling. By obtaining the time distribution function for missing page handling and running segment usage metering during this time only, it was possible to compute how much time was spent in each module of the missing page handler. A heavy imbalance of time spent in the core management module suggested a redesign of that module.
As a second example, a user analyzing his own program can use the packages in several ways. By requesting the timing of all supervisor modules and then running segment utilization metering only during time outside the metered modules, the user can deduce the central processor time expended in each of the procedures which are part of his program. The supervisor module timing permits him to see the cost of the specific types of modules he is using. The per-segment missing page counters allow him to see if one of his own segments encountered an unexpectedly large number of missing pages perhaps because it uses a data structure ineffectively.
A fourth tool, different in scope from the facility described above, counts the number of times procedures are called. A standard call-save-return sequence is used for all interprocedure reference in Multics. An "add-one-to-storage" instruction is 'included in this sequence which increments a counter each time a procedure is entered. This counter enables a programmer to determine later how many times a procedure has been called and to relate that number to the number of calls to other procedures.
A fifth tool is a software package named the Graphic Display Monitor, the subsystem of PDP-8/338 programs that use the previously described synchronous data channel to interrogate locations of memory in the GE-645. Multics obliges this display by building, during system initialization, a table containing pointers to interesting data bases. A set of display generating tools permit the preparation of a new display program in a few hours time. Some standard displays have been developed to observe the traffic controller's queues, the arrays of module execution time distributions, and the use of primary memory. Observations of these displays have been helpful in detecting bottlenecks in the system, and on several occasions have exhibited the system passing through states previously thought to be impossible. A more complete description of this tool and examples of its output can be found in the paper by Grochow [61].
Perhaps the most useful software measurement tool of all is the sixth and simplest one: following the completion of each typed command, the command language interpreter types a "ready message" consisting of three numbers. The first number is the time of day at the preparation for printout of this ready message. The second number is the cpu time used since the previous ready message to the nearest millisecond. The third number is the number of times the process had to wait for a page to be brought in. These pieces of information, which appear automatically, give immediate feedback to the programmer as to the resource utilization of the command just typed. This feedback is a valuable aid to the programmer in seeing the influence of program changes upon performance. For cases where there are several ways to perform the same task, the user is given guidance as to which is more economical. For example, there are two text editors currently available on Multics; the cheaper one is readily apparent and thus generally the chosen editor.
Perhaps the most significant drawback to providing powerful system facilities such as a large virtual memory and a full PL/1 compiler is the ease with which even a sophisticated system programmer can unintentionally trigger unbelievably expensive operations. One of the principal Multics tools to fight back at misuse of virtual memory as though it were real memory is a missing page tracing package. In this package, the missing page handler retains in a ring buffer the segment and page number and the time of day of the last 256 missing pages of the process under measurement. Printing out the contents of the ring buffer following execution of some library program is often very revealing, since it provides evidence of which were the different pages the program touched. This tracing package frequently reveals that a large working set is the result of unnecessary meandering in the path of control of a program. The list of pages touched gives a programmer information on how to reorganize his program to improve its locality of reference.
Finally, a second tracing package monitors the effect of the systems multiprogramming effort on an individual user. The general strategy here is to write a user program which goes into a tight loop repeatedly reading the calendar clock. Normally, successive clock readings differ by the loop transit time. If a larger difference occurs, it is a result of control of the processor having been snatched away from the loop to handle an interrupt or run another process. These larger time differences, as well as the time they were noted, are added to the end of a growing table of interruptions, and the program returns to its loop. When the table is filled, the program prints out the table, showing the time of occurrence of each interrupt and the length of time it took to handle it. This table helps build confidence that the processor scheduling algorithm is working as predicted, and it occasionally discovers a misprogrammed data channel which is producing more frequent interrupts than necessary. It also provides an independent confirmation of the time required to handle each interrupt. This measurement is a good example of one for which a simulated software clock would barely suffice, since the clock simulation itself is likely to interact with the scheduling algorithm and the interrupt handlers, whose functions are being measured. On the CTSS system, a predecessor of Multics for the IBM 7094, the lack of a calendar clock forced this type of measurement to be made using arrival of words from a magnetic tape as a kind of pseudo clock.
Apart from the two techniques just described, Multics does not have built-in general event tracing packages, such as those reported by Campbell and Heffner [3], or Deniston [7], or instruction jump tracing such as that of Roek and Emerson [8]. This lack can probably be attributed to a suspicion that the volume of interesting traceable events in Multics would preclude intelligent analysis with the limited manpower available; nevertheless there have been times when a built-in general trace would have been very handy.
If the primary memory can hold only a few programs, there will be times when all available programs are roadblocked simultaneously. The central processor then must idle, waiting for some program's I 0 requirements to be satisfied. This idle time we will term "multiprogramming idle," to distinguish it from "true idle" time which occurs when, despite space in primary memory for another program, there is simply no customer waiting for the system's services. We thus have two measures of central processor utilization for which instrumentation must be provided.
The fundamental complication introduced by the ability to run a program without all its pages in primary memory is that there is no longer a simple rule to determine whether or not one more program will fit. In fact, with limits which are too generous to be helpful, there is always room for one more program in primary memory. Addition of another program to this "eligible set" may either allow some otherwise idle processor time to be used or cause the programs in the eligible set to fight over the available memory.
Some system designers [9] have partitioned primary memory among the
eligible programs. If a program in one partition is not allowed to steal
space for its pages from a program in another partition, the question of
adding another program to the eligible set is just like that of multiprogramming
without demand paging. This stratagem breaks down when many pages are (and
any page may be) shared among several programs, as required by the objectives
of Multics [10]. We have therefore explored the non-partitioned avenue
by controlling the size of the eligible set. Control implies that there
will generally be multiprogramming idle time; the decision about adding
another program to the eligible set turns on whether or not any additional
paging activity thereby introduced either wipes out recouped idle time
or causes unacceptable job delays.
A variety of special purpose meters are therefore included as an integral
part of the Multics multiprogramming scheduler and the page removal selection
algorithm. Measures of paging activity include total processor time spent
handling missing pages, number of missing pages, average running time between
missing pages, and average length of the grace period from the time a page
goes idle until its space is actually reused.
As a rough measure of response time for a time-sharing console user, an exponential average of the number of users in the highest priority scheduling queue is continuously maintained. The exponential average is computed by a method borrowed from signal data processing [11]. An integrator, I, initially zero, is updated periodically by the formula
where Nq is the measured length of the scheduling queue at the instant of update, and n is an exponential damping constant which determines the average distance into the past over which the average is being maintained. In general, the sample which was taken k samples in the past, whereI <-- I * m + Nq , 0.0 <m < 1.0,
will have 1 / e times the effect of the current sample on the value of the integrator. The average queue length is approximatelyk =l.0 / (l.0 - m)
This averaging technique, which requires only a multiply and an add instruction slipped into a periodic path, is an economical way to maintain an average which does not "remember" conditions too far into the past.Nq = I / k.
If the recent average queue length is multiplied by the average run time in the first queue, an estimate is obtained of the expected response time of the moment. This estimate has been used on CTSS to dynamically control the number of users who may log into the system. In Multics, this estimate, as mentioned above, is also a guide with which to measure effectiveness of dynamic control of the size of the multiprogramming eligible set.
A number of scripts have been developed, but the one most frequently used is one which represents a user typing in and debugging a small FORTRAN program. This script goes as follows:
Because of the sheer logistic problems of extending a telephone line driven technique to more than a few simulated users, an internal driver program for Multics has also been developed. The driver can create as many processes as desired and have them each perform some script (the script described above is usually used) in competition with one another. Only a minimal change to the normal operating conditions is required, because the Multics I/0 system provides the capability of attaching an input / output stream to a file rather than a typewriter. Even so, this technique has the limitation that it does not exactly simulate real users. The I0 path to a file is inescapably different from the path to a typewriter (especially as to number of different pages in the working set), and the driver program itself competes for resources at least at the beginning and end of the test. In addition, the current version of the internal user simulator does not insert "think times" between commands, although addition of this feature is contemplated. Despite these difficulties, the internal user simulator has proven very useful because of its simplicity of operation and repeatability of the measurements taken while it operates. When new Multics systems are installed, they are first required to be "certified" by running the user simulator as a check on both performance and functional capability.
A second conviction, arising from a variety of experiences when an apparent
performance bug turned out to be an instrumentation bug, is that the meter
readings are always suspect. Whenever possible, an independent, perhaps
gross, measurement which can confirm some aspect of a measurement in question
is very worthwhile.
A third observation is that most system programmers are not by training
or temperament scientists, and they often lack the patience to methodically
set up an experiment which is precisely controlled. An alarming number
of "nonexperiments" are performed, with a total useful (?) result of (for
example), "I brought the system up with a shorter scheduling quantum, and
response seemed a little better." Although much useful information and
insight can be gained by on-line monitoring of uncontrolled live users,
use of such measurements for performance comparison must always be suspect
since the particular user population at any instant may be non-"average".
The rules that apply to all scientific measurements also apply to measuring
computer systems:
Contributions to the fault metering package came from Charles Clingen and David Vinograd. Robert Rappaport and Steven Webber contributed to the design of metering for dynamic paging and multiprocessor scheduling. The internal script driver was designed and implemented by David Stone and Richard Feiertag; the external (PDP-8) version, by Howard Greenbaum and Akira Sekino.
Presented at the Second ACM Symposium on Operating System Principles, Princeton, New Jersey, October 1969.
This version was revised and published in Communications of the ACM -- Volume 13, Number 8 -- August, 1970.
It was also published in Peter Freeman, Software Systems Principles, 1975, Science Research Associates, Inc., 1975, pp. 524-536.
* Work reported herein was supported (in part) by Project MAC, an M.I.T. research program sponsored by the Advanced Research Projects Agency, Department of Defense, under Office of Naval Research Contract Number NONR-4102(01).
Home | Changes | Multicians | History | Features | Bibliography | Sites | Chronology
Stories | Glossary | Papers | Humor | Documents | Source | Links | About