Benchmarks

Benchmarks were a big part of the history of Multics in the 1970s and 1980s. They influenced the way the system developed, they consumed a lot of resources, and those that led to sales kept Multics viable as a product.

Multics systems sold for multiple millions of dollars in the 70s and 80s (back when that was a lot of money). Most major procurements started with a thick document from the customer called a Request For Proposals (RFP). The RFP would be supplied to all companies, who would then answer the RFP with bids specifying configurations and prices. For government-related customers, RFPs were often subject to detailed regulations, to prevent unfair competition, bid rigging, and kickbacks. If an RFP appeared to unfairly exclude some competitors, or if a losing vendor felt that they were not fairly treated, the contract award might be protested.

RFPs typically specified a list of required features, and then described what the requested system must be able to do by means of a benchmark. For example, "support 100 terminal users, performing a random mix of text editing and program execution, with specified response time," with detailed specs for what the programs must do. Some benchmarks specified detailed methodology for measuring the results.

When the Honeywell Multics Marketing team received an RFP, they would analyze the specifications to see if Multics could meet them. Often, this required consultation with Engineering, and sometimes requests for clarification from the proposers. If Marketing decided to bid on an RFP, a team would begin preparing the Honeywell response, and a benchmark plan would be put together. Most Multics benchmarks were run in Phoenix: System M in Phoenix had extra hardware so that a separate "System MB" could be configured for benchmarks. System M also had hardware for a subsystem known as CUESTA, which was a script driven terminal simulator that could present a simulated load on the system.

There were cases where Honeywell Marketing bid GCOS for an RFP, then discovered that they could not compete, and brought Multics in at the last minute.

Bencharks were often crash projects that drew many Honeywell Multicians into a team that worked long hours. Benchmark team members sometimes spent weeks in Phoenix, using System M resources at off hours, to prepare and run and re-run benchmark scripts.

USGS (1975) United States Geological Survey

Engineering: Jim Bush (PMDC), Mike Grady (CISL), Bob Mullen (CISL), Allen Berglund (PMDC), Don Mengel (PMDC), Wade Myers (PMDC), Ed Wallman (PMDC), Bob Franklin (PMDC), Ron Riedesel (PMDC), Lacy Johnson (PMDC), Rich Chouinard (PMDC), Jerry Cahoon (PMDC), and Harold Van Sant (PMDC)

Marketing: Ed Rice (FSO), Carl Stanek (FSO)

System improvements: work group scheduler, MCS changes.

Result: Made the sale. Three sites installed in 1976-77.

Mike Grady My fondest memory is USGS of course. I think it was one of the biggest we ever did (maybe it is just me remembering what we went thru...). I think it must have been the fall of 1975 that all of this happened.

Mike Grady As Jim Bush pointed out, I was there a lot, as was Bob Mullen. But it was for USGS, not Carnegie Mellon. We practically lived in Phoenix for about three months. I was home for Xmas with my family for about 48 hours, and then back to Phoenix. We took all of System M every night for months, much to the dismay of many of the users of System M. Our days usually started around 9PM and ended with a pretty bad breakfast at Denny's or some such place.

Mike Grady We made a lot of "tweaks" to the system over that time, and learned a lot about how things really worked. We had a lot of interesting debates on how to make things work better, in all aspects of the system. We also had our share of troubles. I recall being hot on the trail of a problem with the DIA and crashed the whole system just to get a dump of what was going on. I did this without notice and affected a lot of other people. I do believe that lynching was being discussed briefly :-)

Mike Grady We won USGS in the end. I'm sure that the USGS folks (some of them, at least) did not expect this. I recall that USGS was a Burroughs shop, and may have wanted more and better Burroughs stuff. But, I think that USGS was, by and large, a happy customer for a long time.

Mike Grady After that one, there was a lot more focus put on benchmarking in general, and more dedicated resources and equipment were arranged. I don't know if all of System-M was taken again for such an extended period of time for a benchmark. I believe that System-M benefited in the end, as additional gear was added to make it what was needed for the benchmark.

IN (1976) Industrial Nucleonics

Engineering:

Marketing:

Result: Made the sale. Site installed in 1977.

DCC (1976) Data Communications Company

Engineering: Bob Mullen (CISL)

System improvements: printing improvements, scheduler improvements, more open files.

Marketing:

Result: Made the sale. One site installed in 1976.

BELLCAN (1977) Bell Canada

Engineering: Bernie Greenberg (CISL), Bob Mullen (CISL), Richard Barnes (CISL), John Bongiovanni (CISL)

System improvements: Cow's Stomach.

Marketing:

Result: Made the sale. Two sites installed in 1979. Required to re-run the benchmark in Toronto and Montreal for acceptance of systems.

Bob Mullen Bell Canada - that's the cow's stomach (long story, starting with initial mis-run (passed but invalidly) then reruns weeks in Toronto, weekend 48 hr block time in Montreal). First time I remember falling asleep between each word or char typed.

Avon (1978) Avon Universities (Bristol and Bath), UK

Engineering:

Marketing:

Result: Made the sale. System installed in 1979. Story: The State Trooper Story

[marketing newsletter] Closing stages involved benchmark with IBM, CDC to connect 210 simultaneous under defined workloads. Peter Harding-Jomes, recently appointed manager of UK Multics Office, with assistance from Phoenix Benchmark services, completed benchmark with splendid results in three weeks.

Kit Powell I have no documentation on the Avon benchmark, which I think must have been largely put together by the user-facing sides of the two university computer services. As I recall the key requirement was to support an improbably large number of terminals doing program input, editing, compiling and execution, in Fortran. Our technical support in benchmarking came from the government CCTA, who where hot stuff in benchmarking batch systems but had little experience of interactive systems, and particularly not of large ones. So the Honeywell people ran rings round us. As I recall all they arranged that all the (simulated by script of course) users were logged in and in FAST before the benchmark started running, (did they subsequently at at least some points benefit by being in lock step, though I can see that this could work in the opposite sense in some cases). Of course, once the users were let loose on the new system they were like vampires in a blood bank and were quickly exploiting all the attractive and resource-hungry features that Multics offered, with predictable results. But they still all loved it.

ILL (1978) Institut Laüe Langevin, France

Engineering:

Marketing: Daniel Bois (Bull)

Result: First French benchmark. No sale.

Daniel Bois First (Bull) benchmark was for Institut Laüe Langevin, from Grenoble, which is a research center on particles. We lost it, but also had a benchmark at the same time for University of Grenoble and also INRIA and we won both of them. Benchmarks in Phoenix were ALWAYS very good for us because of the attractive place with sun, desert, etc ...etc ... and prospects loved it!

CICG (1978) University of Grenoble, France

Engineering:

Marketing: Daniel Bois (Bull)

Result: Made the sale. One site installed in 1979.

INRIA (1978) Institut National de la Recherche en Informatique et en Automatique, France

Engineering:

Marketing: Daniel Bois (Bull)

Result: Made the sale. One site installed in 1979.

UC (1978) University of Calgary, Canada

Engineering: Bob Mullen (CISL), Allen Berglund (PMDC)

System improvements: scheduler improvements.

Marketing:

Result: Made the sale. One site installed in 1978. Story: Continuing performance problems in 1979-80.

EOP (1979) United States Government, Executive Office of the President

Engineering: Rich Fawcett (PMDC), Allen Berglund (PMDC), Bob Mullen (CISL)

Marketing: Paul Benjamin (FSO), Charlie Spitzer (FSO), Mike Broussard (FSO)

Result: Passed the benchmark and were informed that we had the sale. Then Carter was not re-elected and the sale was canceled by the Reagan administration.

Paul Benjamin I worked for the FSO benchmark group for much of 1977 doing GCOS benchmarks and then transferred to the Multics pre- and post- sales group under Susan Boehm later in the year. I couldn't seem to get away from benchmarks and ended up working on EOP in 1979. There is some information about that benchmark already on the System M page. The FSO benchmark group lead was Wally Hom and the senior technical person from our group was Ed Brunelle. I can't remember who the sales person was. Rich Fawcett created the Cuesta scripts but I was responsible for running them since I had an alleged GCOS background. We worked closely with Allen Berglund's team.

Paul Benjamin As to dates for EOP it was in the middle of the year 1979. I remember being in the fishbowl (the glass viewing room for Systems M and MB) and hearing the news of Three Mile Island while we were preparing for the benchmark. That would have been March. I moved to Phoenix in August, going to work for Allen Berglund's group, in August, and the benchmark had completed before that. It was a bit later when we learned that we had won the bid. The salesman flew me back for the party in McLean.

Paul Benjamin There was a lot of word processing in the mix, causing the creation of format_document. I think we used lister to automate an existing 3x5 card process that Jody Powell used to track the President's appointments. I can't imagine making that up.

Charlie Spitzer I worked about 3 months on the EOP benchmark, which eventually caused my transfer from McLean to Phoenix because Susan wasn't about to keep funding me to live in a hotel in Phoenix for so long. I believe Mike Broussard also worked on this a bit, as I recall him going tubing down the Salt and having such bad sunburns he couldn't wear shoes or anything but shorts for a few days. Lots of nights working in the fishbowl, with dinner at 8am at Denny's. I also remember the satisfaction at winning the contract, and having Marketing around to pick up the tab a lot, and the deep disappointment when it never concluded. I remember Bob Mullen being around late at nights trying to change some of the tuning to affect the timings of the many runs.

Tom Van Vleck I met the person who decided against Multics.. I interviewed at Stanford Research Libraries Group, in the mid 80s, and the director of RLG used to work in the Reagan White house. He was the person who decided to install an IBM PROFS system (remember Oliver North?) instead of Multics.

STC (1982) Standard Telephones and Cables, UK

Engineering:

Marketing:

Result: Made the sale. One site installed in 1982.

DND (1982) Department of National Defence, Canada

Engineering:

Marketing:

Result: Made the sale. Two sites installed in 1982-83.

RAE (1984) Royal Aircraft Establishment, UK

Engineering:

Marketing:

Result: Made the sale. One site installed in 1984. Required to re-run the benchmark for acceptance of systems.

Deryk Barker Although I had no involvement with the original benchmarks which got us (Honeywell) the RAE business, I was involved with the installation and trials. My recollection is that the original concept was 5 Level 68 CPUs, but what was actually installed was 3 (2?) 8/70Ms. When rerunning the original benchmark on differently configured hardware we hit a problem: the system crashed, consistently at the same point, with a message about the system trailer segment. Turned out it was a bug/limitation in the code. I created and ran the security demo for RAE, which was the UK's first MLS (multi-level secure) site.

Other (....) Other benchmarks

Bill Schulz, in his story Marketing Multics in the Midwest, mentions that he "spent considerable time in Phoenix working on a Multics benchmark for Blue Cross out of Chicago I think it was."

Gregory Patterson I worked for Honeywell in the Pittsburgh field office from 1973 to 1990. I recall working on a couple of Multics benchmarks. The timeframe would have been 1976 to 1978. Clients were Carnegie Mellon University and Westinghouse Nuclear Division. We did not win either one.

Jim Bush I also recall these benchmarks, especially Carnegie Mellon. I worked in Phoenix at the Camelback Road Facility in the Multics benchmark support group at the time. Allen Berglund was the leader of the group and other members were Frank Martinson, Chip Lackey, Don Mengel, Bob May and maybe Gary Dixon (but not sure about Gary). For these benchmarks as with many others we had two guys visiting from CISL to help tune the Multics software to fit the particular customers requirements. These two guys were Mike Grady, who was the communications/DN355/Cuesta expert, and Bob Mullen who tuned the kernel and developed the 'Deadline Scheduler' in his spare time. Many long nights were put in to get these benchmarks run (it was always nights...). I don't remember the exact details of why we did not win these benchmarks, but I do recall that at least the Carnegie Mellon benchmark was specified by the customer to greatly favor Multics (as I remember our competition was an IBM OS/370 system).

Bob Mullen There was one that generated boxes of snoopy calendars. CNI? Some insurance company.

Bob Mullen There was PacBell (or the equivalent) with a very sharp analyst/consultant from BellLabs. The guy watched me tune for a day or two and said one AM before we left: "You know no matter how much you tune, if it can't keep up, it can't keep up." That made for a sleepless night (day) since respected his judgement. But the next day I said, "I know what you mean, I can't add mips. But do you agree bad tuning can cost mips?" "Well sure." "Ok, my job is to remove those."

Bob Mullen There was the one with Votrax Box.

Bob Mullen There was one with Swedes in Phx. I could eavesdrop on their phone calls, but my German didn't really work.

Bob Mullen There was one done from NOLA hotel room, after HLSUA evenings on Bourbon Street, I think Germans at the other end of the wire in Phx.

Bob Mullen My recollection, which may just be biblical-counting, is that we won one and lost the next twenty or so. Some we may have passed, but didn't get the biz.

Charlie Spitzer I also remember another big one (NCAR perhaps?) that we eventually didn't get. There was some sort of spreadsheet program that had just come out (VisiCalc) that was going to be useful for that benchmark, and HIS found someone who had written a clone (Megacalc) that could be ported to Multics. Rather than spending the $ for that project, Mike and I wrote a limited, fast, and cheap clone of the clone to get by for the benchmark. We later learned that the developer of Megacalc then went on to murder his partner for stealing some of their company and was in jail for an extended period of time, and we wondered if loss of the HIS $ could have caused that to occur, or what would happen if/when he got out.

Daniel Bois We had many benchmarks done in Phoenix and Paris, and I remember Renault, IRT (Institut de Recherche sur les Transports), Ministry of Finances (DG and INSEE), and many others.

Gary Dixon What about the benchmarks comparing GCOS Simulator results against native GCOS? I don't know if these were done on behalf of a particular customer, or as part of the GCOS/Multics comparisons done by Honeywell/Bull. But they too used up resources... I wasn't directly involved, so cannot contribute details about these; I just heard about them as they occurred. I don't remember dates on which these were conducted. It could have been for Bell Canada. It was when the Multics team was housed in the New Software Building, which would have been in the mid-1980's. I only vaguely remember an outcome, that after sufficient tuning of the GCOS Simulator (and perhaps Multics), the team got a compilation (seems like it was FORTRAN) to run faster on the simulator (in real-time) than it ran on equivalent hardware with GCOS TSS.

Suggestions and corrections are welcome. We have no information on whether there were benchmarks for MDA-TA, Oakland, PRHA, SCSI, SJU, NWGS-SDF, USL, VPI, VWoA, ASEA , BRUNEL, CARDIFF, LUT, RAE, BHAM, Mainz, SOZAWE, CIRIL, CICB, CICT, CICRP, CITI, CNET, CERAM, CCVR, SNEA, Credit Lyonnais, EPSHOM, IN2P3, INRA, MULCULT, Prevision, ONERA, SUNIST, SEP, SNECMA.