Multics Technical Bulletin MTB-647
C Compiler Spec.
To: Distribution
From: Gregory A. Baryza
Date: 23 January 1984
Subject: Multics C Compiler Specification
1. Abstract
This MTB discusses the implementation issues surrounding the
installation of an externally-developed compiler for the "C"
programming language on Multics. The intent is to have a
compiler which accepts a version of the language identical to
that already present on GCOS-III and the DPS6. The compiler will
run native in the Multics environment and produce standard
Multics object segments.
Comments on the nature and content of the supporting run-time
library for C are also included.
Comments on this MTB should be sent to the author -
via Multics mail to:
Baryza.Multics
via posted mail to:
Gregory A. Baryza
Honeywell Information Systems, Inc.
Four Cambridge Center
Cambridge, Massachusetts, U.S.A. 02142
via telephone to:
(HVN)-261-9315,
(617)-492-9315
via forum on System-M to:
>user_dir_dir>Multics>Baryza>mtgs>C_Compiler_Spec
(cc_spec)
________________________________________
Multics project internal documentation; not to be reproduced or
distributed outside the Multics project.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
TABLE OF CONTENTS
Section Page Subject
======= ==== =======
1 i Abstract
2 1 Preface
3 2 Introduction
3.1 2 . . Overall Goal
3.2 2 . . Motivation
3.3 3 . . Division of Labor
3.4 3 . . Reference Document for C
4 4 Identifiers
4.1 4 . . Characters Allowed in Identifiers
4.2 4 . . Length of Identifiers
4.3 4 . . Reserved Identifiers
5 5 Data Types
5.1 5 . . Basic Types
5.2 6 . . Derived Types
5.2.1 6 . . . . Pointers
5.2.1.1 6 . . . . . . Pointers to Functions
5.2.2 6 . . . . Aggregates
5.2.2.1 6 . . . . . . Arrays
5.2.2.1.1 7 . . . . . . . . Strings
5.2.2.2 7 . . . . . . Structures
5.2.2.2.1 7 . . . . . . . . Fields
5.2.2.3 8 . . . . . . Unions
5.2.3 8 . . . . Enumerations
5.3 10 . . Type Definitions
5.4 10 . . Storage Classes
6 11 Constants
6.1 11 . . Integers
6.1.1 11 . . . . Decimal
6.1.2 11 . . . . Octal
6.1.3 11 . . . . Hexadecimal
6.1.4 11 . . . . Representation of LONG Values
6.2 12 . . Floating Point
6.3 12 . . ASCII Characters
6.4 13 . . Strings
7 14 Expressions
8 15 Keywords
9 16 Data Type Conversion
9.1 16 . . Character to Integer
9.2 16 . . Integer to Character
9.3 16 . . Floating Point to Double
9.4 16 . . Double to Floating Point
9.5 16 . . Floating Point to Integer
9.6 17 . . Integer to Floating Point
9.7 17 . . Integer to Unsigned
9.8 17 . . Pointer to Integer
Multics Technical Bulletin MTB-647
C Compiler Spec.
9.9 17 . . Integer to Pointer
9.10 17 . . The Standard Conversion Rules
10 19 Statements
11 20 Compiler Directives
11.1 20 . . #define
11.2 20 . . #undef
11.3 20 . . #if
11.4 20 . . #ifdef
11.5 21 . . #ifndef
11.6 21 . . #else
11.7 21 . . #elseif
11.8 21 . . #endif
11.9 21 . . #line
11.10 21 . . #include
11.11 22 . . #equate
12 23 C Programs on Multics
12.1 23 . . The C Program Model
12.2 24 . . Symbol Table Requirements
12.2.1 24 . . . . Descriptor Types
12.2.2 25 . . . . Other Symbol Table Issues
12.3 26 . . Probe Changes
12.4 26 . . Memory Allocation
12.5 27 . . Use of an Operators Segment
12.6 27 . . Argument Lists
12.7 31 . . References to Library Routines
12.8 32 . . Function Name Resolution
13 33 Run-Time Library Definition
13.1 33 . . Input & Output
13.1.1 34 . . . . fopen
13.1.2 35 . . . . fclose
13.1.3 35 . . . . getc
13.1.4 36 . . . . putc
13.1.5 36 . . . . fgets
13.1.6 36 . . . . fputs
13.1.7 37 . . . . printf
13.1.8 37 . . . . fprintf
13.1.9 37 . . . . sprintf
13.1.10 38 . . . . scanf
13.1.11 38 . . . . fscanf
13.1.12 39 . . . . sscanf
13.1.13 39 . . . . rewind
13.1.14 39 . . . . open_file
13.1.15 40 . . . . open_switch
13.1.16 40 . . . . attach_switch
13.1.17 40 . . . . detach_switch
13.1.18 41 . . . . fflush
13.2 41 . . String Manipulation
13.2.1 41 . . . . strcat
13.2.2 42 . . . . strncat
13.2.3 42 . . . . strcmp
MTB-647 Multics Technical Bulletin
C Compiler Spec.
13.2.4 42 . . . . strncmp
13.2.5 43 . . . . strcpy
13.2.6 43 . . . . strncpy
13.2.7 43 . . . . strlen
13.2.8 44 . . . . strchr
13.2.9 44 . . . . strrchr
13.3 44 . . Memory Allocation
13.3.1 44 . . . . malloc
13.3.2 45 . . . . free
13.3.3 45 . . . . calloc
13.3.4 46 . . . . realloc
13.4 46 . . Mathematical Functions
13.4 46 . . . . abs
13.4 46 . . . . acos
13.4 46 . . . . asin
13.4 46 . . . . atan
13.4 46 . . . . ceil
13.4 46 . . . . cos
13.4 46 . . . . cosd
13.4 46 . . . . cosh
13.4 46 . . . . exp
13.4 47 . . . . floor
13.4 47 . . . . log
13.4 47 . . . . log10
13.4 47 . . . . log2
13.4 47 . . . . sin
13.4 47 . . . . sind
13.4 47 . . . . sinh
13.4 47 . . . . srqt
13.4 47 . . . . tan
13.4 47 . . . . tand
13.4 47 . . . . tanh
13.5 47 . . Miscellaneous
13.5.1 48 . . . . clock
13.5.2 48 . . . . vclock
13.5.3 48 . . . . date
13.5.4 48 . . . . time
13.5.5 49 . . . . exit
14 50 Open Issues
14.1 50 . . Use of Standard Operators
14.2 50 . . Mismatch in System Calling Conventions
14.3 51 . . Unbound Programs and Name Resolution
14.4 51 . . Support for the Entry Keyword
14.5 51 . . Linker Support for the MAIN Entrypoint
14.6 52 . . Content of the Library
14.7 52 . . UNIX Environment Features
14.7.1 52 . . . . Enclosing the Main Routine
14.7.2 53 . . . . Device Nomenclature
14.7.3 53 . . . . Support for ARGC & ARGV
Multics Technical Bulletin MTB-647
C Compiler Spec.
2. Preface
Developing the specification for anything is a difficult task.
The trade-offs are not always easy and seldom do all go away
feeling satisfied. The writing of this specification for C has
run true to form.
The C language was invented in 1972 by Dennis Ritchie of Bell
Laboratories. Since then, it has become widely accepted as a
major programming language. And, like all major languages, it
exists in a number of dialects (sometimes several, subtly
incompatible versions for a given machine).
However, the evolution of C has been strongly affected by the
features provided by its UNIX(1) host. For better or worse,
these are also features which are found in a number of other
commercially available operating systems. A large body of code,
from the UNIX "shell" to some sophisticated applications
programs, has come to expect their presence. Many of these
programs and systems are viewed as useful adjuncts to the
facilities Multics already provides.
The crux of the matter is that some of the expected "features"
are missing on Multics and the ways of doing things are
different. Providing the features and paths is often difficult
or undesirable. This specification attempts balance the
expectations of programs written elsewhere and of those Multics
programmers developing codes for use only on Multics. Thus,
there are some "un-Multicious" thoughts herein; but, I hope they
add to the overall environment rather than subtracting from it.
I want to thank those people who have contributed to this
document either in conversation or as reviewers of early drafts:
Peter Fraser, Steve Herbst, Barry Margolin, Kevin Martin, Dave
Mason, Tom Oke, Ed Ranzenbach, Olin Sibert, Melanie Weaver, and
Brian Westcott.
Gregory A. Baryza
________________________________________
(1) UNIX is a registered trademark of Bell Laboratories. It is
commercially available under license from Western Electric.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
3. Introduction
3.1. Overall Goal
The intent of this project is to provide a compiler and run-time
library for the C language on Multics. The compiler is to run as
a standard Multics command and produce Multics object segments,
listings, error messages, and so on, in a style consistent with
other Multics compilers.
The run-time library will provide functions common to most C
implementations. It will interface between C programs and the
services provided by Multics. It should be noted that those
library routines which provide agency services to operating
system functions for specific systems (e.g. tasking on UNIX)
will not necessarily be provided for Multics.
The mechanism by which this will be achieved is the installation
of an existing compiler, written in C and developed at the
University of Waterloo, on Multics. No changes in the syntax or
semantics of the language definition for Multics are implied, nor
should any be assumed.
3.2. Motivation
The increasing popularity of the C programming language is
impossible to deny. Many mainframe manufacturers and most mini-
and microcomputers now sport C compilers (and run-time
libraries). As a consequence much systems and applications
software is now being written in C and, ipso facto, support for
the language is a "requirement" for commercial, general-purpose
systems. Within the present Multics community, Standard
Telephone and Cable (U.K.) has been the strongest proponent of C
on the system.
To satisfy this need, Honeywell has contracted with the
University of Waterloo to produce compilers for GCOS-III and
GCOS-8, and for the DPS-6 product line. In order to take
advantage of this opportunity, MDC would like to utilize the work
already in progress toward providing a Multics C compiler. The
compiler is expected to be source-language compatible with the
other Honeywell offerings.
Multics Technical Bulletin MTB-647
C Compiler Spec.
3.3. Division of Labor
Three parties will be involved in the development of the product
for Multics: the Universities of Waterloo and Calgary, and the
Multics Development Center. The University of Waterloo will be
responsible for the pre-processor, parser, and code generator for
the C language. Mainly, this effort involves changes to their
present compiler necessary to support the interpretation of C on
Multics. The University of Calgary will be responsible for
packaging the output of the code generator into Multics standard
object segments, complete with symbol table and debug
information. They will also provide the run-time library assumed
by C programs and its connection to Multics facilities. The
Multics Development Center will do the overall project
coordination. It will also make the changes in the system
software (probe, symbol table utilities, binder, etc.) necessary
to support the product.
3.4. Reference Document for C
Unless otherwise noted, references to features of the language,
section and page numbers, and examples will be presumed to come
from
The C Programming Language(1)
Kernighan, Brian W. & Ritchie, Dennis M.
Prentice-Hall (1978)
Englewood Cliffs, New Jersey
In particular, Appendix A of this document claims to be a
reference manual for the language. Unfortunately, it suffers
from some ambiguities and omissions. These will be cited when
they are discussed.
One further point deserves mention. Appendix A frequently makes
reference to the H6000 version of the compiler. This is not the
Waterloo product. It is an internally developed compiler for C
which runs (primarily) on the Bell Laboratories' GCOS systems.
________________________________________
(1) This document is also commonly referred to to as K&R, or the
"White Book".
MTB-647 Multics Technical Bulletin
C Compiler Spec.
4. Identifiers
Identifiers in C are used to name variables and symbolic
constants, functions, structures, type definitions, etc.
4.1. Characters Allowed in Identifiers
Identifiers are sequences of characters constructed from the
following sets of items:
- upper- and lower-case letters
- digits
- the underscore character, "_"
The Multics implementation also adheres to the conventions(1) in
distinguishing upper and lower case letters as different, and
requiring that the initial character of an identifier be either a
letter or an underscore character.
4.2. Length of Identifiers
Identifiers may be as long as storage requirements in the
compiler permit. However, the compiler will use only the first
256 characters given to distinguish identifiers from one another.
NOTE: This is a deviation from the K&R practice of using only
the first 8 characters to distinguish between identifiers.
4.3. Reserved Identifiers
Certain identifiers in C are reserved as having special meaning.
Most of them are keywords and are listed in a section by that
title. In addition, the function named "main" is designated as
the entrypoint at which the system is to begin execution of the C
program.
________________________________________
(1) K&R, Chapter 2, Types, Operators and Expressions; Section
2.1, Variable Names; pg. 33
Multics Technical Bulletin MTB-647
C Compiler Spec.
5. Data Types
The C language has a small number of fundamental data types:
integers, floating point numbers, and characters. In addition,
the declaration rules allow for the construction of a potentially
infinite set of derived types. We will discuss the
representation of each of these classes separately.
5.1. Basic Types
C allows the three fundamental types described above. In
addition, declarations of each of these types also allows
"adjectives" which modify the size of the basic type or its
arithmetic performance. These adjectives are: "short", "long",
and "unsigned". The following table gives the various base types
and their equivalent representation in machine terms. Equivalent
forms are listed together.
Width Sign Bit Boundary
C Declaration In Bits Present Alignment
int 36 Yes Word
short int
long int 72 Yes Double Word
unsigned int 36 No Word
unsigned short int
short unsigned int
unsigned short
short unsigned
unsigned long int 72 No Double Word
long unsigned int
unsigned long
long unsigned
float 8 & 28 Yes Word
double 8 & 64 Yes Double Word
long float
char 9 No Character
unsigned char
The data types "float" and "double" also include a signed, 8-bit,
power-of-two exponent.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
NOTE: While K&R(1) permit only one adjective to precede the base
type declaration, the compiler for Multics will allow both a size
and an "unsigned" specifier. They may appear in any order (as
shown above).
5.2. Derived Types
The derived types of C are: pointers to typed-objects (ala
Pascal and ALGOL-68), and various kinds of aggregates. Each of
these will be discussed separately.
5.2.1. Pointers
Pointers in C are always pointers to objects of a specific type.
A C pointer is represented as a Multics ITS pointer. Thus, a C
pointer is equivalent to a PL/I "pointer aligned".
5.2.1.1. Pointers to Functions
Since the C language does not allow(2) the definition of
functions within other functions, pointers to functions do not
need an environment pointer as part of their reference
information. Hence, they may also be represented as ITS
pointers. This makes them equivalent to the result of the
Multics PL/I builtin function, codeptr.
5.2.2. Aggregates
The Multics C compiler supports the construction of data
aggregates. Several types are possible: arrays, structures,
unions. Of course, each of these aggregates may be formed from
data elements which are themselves either basic types or
aggregates.
________________________________________
(1) K&R, Appendix A, Section 8.2, pg. 193
(2) K&R, Chapter 4, Functions and Program Structure; Section 4.8,
Block Structure; pg. 81
Multics Technical Bulletin MTB-647
C Compiler Spec.
5.2.2.1. Arrays
C arrays correspond directly to PL/I arrays. However, the
initial subscript for C arrays is zero so that the C declaration:
int A[8];
is equivalent to the PL/I:
dcl A real fixed binary (35, 0) aligned dimension (0 : 7);
5.2.2.1.1. Strings
Like Pascal and Ada, strings in C are derived types. They are
represented as arrays of characters. The last element in the
string is the ASCII "NUL" character (location 0/0) and forms its
delimiter. Since the address of a C string is the address of its
first element, they can be overlayed with PL/I based character
strings provided the length has been found first by scanning for
the NUL character at the end.
5.2.2.2. Structures
Structures in C are also directly analogous to those in PL/I. In
particular, the same alignment rules apply and a C structure will
always be aligned to the most strict boundary required of any of
its components.
5.2.2.2.1. Fields
The Multics C compiler also implements the definition of
fields(1) within a machine word. Fields on Multics are assigned
left-to-right. For example, a C "float" variable is described
as:
struct float_bin_real { unsigned exponent : 8;
unsigned mantissa : 28; };
For purposes of computation, fields in Multics are treated as
unsigned integers.
________________________________________
(1) K&R, Appendix A, Section 8.5, pgs. 196-197
MTB-647 Multics Technical Bulletin
C Compiler Spec.
NOTE: Multics deviates from the K&R specifications in allowing a
pointer to point to a bit field. In this case, it points to the
leftmost bit of the field. It is the programmers responsibility
to insure that later use of this pointer as a locator does not
violate language boundary assumptions (i.e. if you later use it
to point to a character, it better be aligned to a character
boundary).
5.2.2.3. Unions
Unions have no single counterpart in PL/I. An object which is a
union may be thought of as a named piece of storage which is
large enough to contain any of the objects defined to be part of
the union. The syntax for defining unions is almost identical to
that of structures, and component access is accomplished in the
same way.
For example, the C declaration:
union tag_name { int a;
float b;
char *c; } var_name;
would have to be represented by a series of PL/I declarations
involving aliasing a properly aligned piece of storage large
enough to contain the largest value. In this example, we are
talking about 72 bits aligned on an even word boundary because
"var_name.c" is a pointer. Note that it is not possible in PL/I
to create a sequence of declarations which preserves both the
storage overlaying and the naming structure of C unions.
5.2.3. Enumerations
NOTE: This data type is an extension to the language defined by
K&R.
Multics Technical Bulletin MTB-647
C Compiler Spec.
Enumerated data types are user defined data types which are
represented as though they were of type "int". The declaration
consists of the keyword "enum" followed by the tag-name of the
type, a list of values, and an optional list of variable
identifiers. For example, the declaration:
enum day {sun, mon, tue,
wed, thu, fri, sat} start;
defines the variable "start" to be of type "day". The values
which can be assigned to "start" are "sun", "mon", "tue", etc.
However, unlike some languages(1) which do not specify the
mapping between enumerated values and their representations, this
version of C does. The first value in the list is assigned the
value 0, the second 1, and so on. This version of C also allows
arithmetic on the enumerations. This means that:
start = sun + 2; and start = tue;
are equivalent. In fact, they are also equivalent to the
sequence:
#define sun 0
#define mon 1
. . .
int start;
start = sun + 2;
In addition, the values specified in the enumeration list may
also be optionally assigned values. This allows an enumeration
of the form:
enum day {sun = -1, mon = 1, tue, wed,
thu, fri, sat = -1} start;
where "sun" and "sat" have the assigned value -1, "mon" is
assigned the value 1, and the values of "tue" to "fri" are
assigned successive integers starting with the value of
"mon + 1". As shown overlaps in enumerated values are allowed.
________________________________________
(1) Ada, for instance
MTB-647 Multics Technical Bulletin
C Compiler Spec.
5.3. Type Definitions
Type definitions, which allow the user to define synonyms for
existing types to increase portability and readability, are
allowed as in K&R.
NOTE: The tag-name of an enumerated definition may also be used
to define additional data elements of that "type" in the same
manner as with structures and unions. Thus, using the enumerated
type "day", we may define a finish date as another instance of
that type, and even initialize it, by:
enum day finish = fri;
5.4. Storage Classes
K&R defines four storage classes: auto, static, extern, and
register. The correspondence with PL/I storage classes is as
follows:
C Storage Class Equivalent PL/I
auto internal automatic
static internal static
extern external static
register internal automatic
The Multics C compiler does not allow variables to be assigned to
registers. For this implementation, "register" and "auto" are
taken as equivalent. This does not mean, however, that the "&"
operator can be used to take the address of a register variable.
Such use is non-portable and prohibited by the compiler for the
sake of consistency with other implementations.
In addition, future versions of the compiler may make use of the
register declaration, not as a way of dedicating a "fast" machine
register, but as a way of denoting that no pointer de-reference
can legitimately change the value of the variable.
Multics Technical Bulletin MTB-647
C Compiler Spec.
6. Constants
The C language allows several different types of constants for
numeric values and provides representations for ASCII characters
and strings.
6.1. Integers
With the exception of those explicitly designated as long, all
integer constants are represented in storage as "int".
6.1.1. Decimal
Decimal integers are written as whole numbers (no decimal point)
and have no leading zeroes; examples are:
7 34359738637 100 13
6.1.2. Octal
Octal numbers are also written as whole numbers. They are formed
from the digits zero through seven and are always preceded by at
least one leading zero as in
06 0377777777777 0100 013
6.1.3. Hexadecimal
Numbers in hexadecimal are formed from the set of hexadecimal.
In determining the value of the constant, case is ignored.
However, to distinguish the use of hexadecimal, the first two
characters of all such constant are required to be either "0x" or
"0X". The following are examples of hexadecimal constants in C:
0x6 0X7FfF 0xDead 0x0FF
6.1.4. Representation of LONG Values
A "long" integer constant may be represented in one of two ways.
Either the number may be obviously too large to fit in an "int",
in which case the compiler will automatically type it as "long".
Or, the constant may be suffixed with an upper- or lower-case
"L". In the latter instance, the compiler will convert the value
to a "long" representation (including double word alignment, for
example) regardless of the number of digits.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
6.2. Floating Point
Number written with a decimal point in them are considered to be
floating point values. They may also (optionally) have leading
zeroes, a decimal fraction, and an exponent part in the usual
form. Floating point constants are always represented as being
of type "double".
6.3. ASCII Characters
An ASCII character constant is a sequence of from one to four
ASCII characters enclosed in single quotes. Each character
occupies one byte (9 bits) of storage. Character constants which
contain less than four characters are stored right-justified in a
word and the high-order bit of the first character is propagated
to the left end of the word. For the Multics representation of
ASCII, this amounts to zero-fill.
A number of escape sequences are permitted in character constants
to ease representation of certain characters. The backslash ()
character introduces the sequence. Allowed sequences are:
Escape Sequence Interpretation
t Horizontal tab
\ Backslash
' Single quote
" Double quote
n Newline
c Carriage return
f Form feed
b Backspace
d{d{d}} A one- to three-character octal
constant whose value is the value of
the character
NOTE: ASCII character constants accepted by this compiler are a
superset of those described by K&R.
Multics Technical Bulletin MTB-647
C Compiler Spec.
6.4. Strings
A string constant is a sequence of zero or more characters
enclosed by double quote marks. Escape sequences are permitted
in character strings also. As noted elsewhere, a string constant
is treated as an array of single characters terminated by an
ASCII NUL byte (0).
It is worth mentioning at this point that long strings can be
continued over several lines of source code. The sequence
"<nl>", where <nl> is a real newline character, will do this.
When the "<nl>" is encountered while scanning a string during
compilation, the lexical analyzer discards these two characters
and all unescaped leading whitespace from the succeeding line.
This allows long strings to be squeezed into the available space
and also permit indentation at the proper place. Escaped
whitespace characters are included "as is". The following
examples illustrate this.
Input Sequence Interpretation
"This is "This is continued"
continued"
"This string "This string has four blanks"
has four
blanks"
MTB-647 Multics Technical Bulletin
C Compiler Spec.
7. Expressions
This section deals with expressions in C. The various operations
in an expression are evaluated according to a defined precedence.
Many operations share the same precedence. In that case, each
precedence class will group either left-to-right or right-to-left
depending on the class.
The standard(1) rules for operator precedence and associativity
do not completely determine the order of evaluation of
expressions, however. In the case of "A+B", C says nothing about
whether A or B should be evaluated first. In such situations,
the Multics compiler will evaluate each of the sub-expressions in
an undetermined order, even if there are side-effects to a
certain order. Parentheses, such as "(A)+B", cannot be used to
force a certain order in such cases. If a particular order is
necessary, the expression will have to be broken down into two
statements with the result of the first stored into a temporary
variable and used in the second.
Since functions in Multics C may be declared to return no result
of consequence ("void"), such functions may only be used in
restricted cases. In general, if the function result would have
had to be used in further evaluating an expression, the "void"
function invocation is prohibited.
With the preceding caveats, the expressions of the Multics C
compiler are those of K&R.
________________________________________
(1) K&R, Appendix A, pgs. 185-192
Multics Technical Bulletin MTB-647
C Compiler Spec.
8. Keywords
There are a number of keywords reserved by the C compiler. They
cannot be used anywhere as identifiers. In keeping with standard
Multics practice, these keywords will only be recognized in
lower-case. Use of identifiers which are identical to keywords
except for case should be avoided as a portability issue. The
table of such keywords is
auto else int switch
break entry long typedef
case <> enum register union
char extern return unsigned
continue float short <> void
default for sizeof while
do goto static
double if struc
NOTE: Entries marked with "<>" are additions to the list of
keywords defined in K&R. A synopsis of their usage is
Keyword Explanation
enum This keyword is a declarator which introduces
the definition of a user-defined data item.
The values which may "properly" be assigned to
this data type appear in the list which
follows the tag-name associated with this
type.
void The keyword, "void", is used in place of a
data-type specifier in functions. It
indicates that the value returned by the
function is not used and is therefore of no
importance. Hence, functions declared "void"
may not have their return values assigned to
anything, nor may they appear in expressions
involving additional computation. A "void"
function is effectively a subroutine.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
9. Data Type Conversion
9.1. Character to Integer
The character object will be converted to an integer value which
represents the character objects value in memory. If the
character object is not declared "unsigned", sign extension of
the left-most character in the character object will take place.
An unsigned character object will be copied "as is". For
example,
int x;
x = '777a';
will result in x having the value, -415. If x were additionally
declared as "unsigned", then the assignment will result in x
being set to 261729.
9.2. Integer to Character
When an integer is converted to a character, the result is the
low-order nine bits of the integer value. All other bits are
ignored.
9.3. Floating Point to Double
The mantissa of the "float" value is extended on the right to
carry out all floating point arithmetic.
9.4. Double to Floating Point
The "double" value is rounded when the target precision is
"float".
9.5. Floating Point to Integer
Floating point numbers have their decimal fraction parts
truncated toward zero. If the truncated value is too large to
fit in the target integer, continued execution will yield
undefined results.
Multics Technical Bulletin MTB-647
C Compiler Spec.
9.6. Integer to Floating Point
The conversion occurs in the expected way. However, some loss of
accuracy may result if the floating point target cannot hold all
the significant digits of the source exactly.
9.7. Integer to Unsigned
The result of this conversion is the smallest unsigned integer
congruent to the "int" source mod 2**N (where N is the number of
bits in the unsigned number). The effect of this is that the
actual bit pattern for the number remains unchanged (since the
Multics processor is a binary machine).
9.8. Pointer to Integer
Pointers will be stored in "int" data items as Multics packed
pointers. They will be stored in "long" items as Multics ITS
pairs. However, to conform to standard C usage, Multics null
pointers will be stored as the value zero.
9.9. Integer to Pointer
An "int" being converted to a pointer will be assumed to be in
packed pointer format; a "long", in ITS format. An "int" or
"long" whose value is zero will be converted into a Multics null
pointer (in the ring of execution).(1)
9.10. The Standard Conversion Rules
Many binary operators cause conversion of their operands to other
types by default. The conversions follow the rules given below.
NOTE: Because "unsigned long int" is allowed in this
implementation, these rules differ slightly from those given in
the reference(2) document.
________________________________________
(1) Thus, a Multics null pointer (segno = -1, wordno = 1,
bitno = 0) and an integer value of zero are considered equal.
One will be converted into the other for purposes of
assignment or during "casts". They will be converted into a
common form for comparison.
(2) K&R, Appendix A, Section 6.6, pg. 184
MTB-647 Multics Technical Bulletin
C Compiler Spec.
1. Any operands of type "char" are converted to "int".
2. Any operands of type "float" are converted to "double".
3. If either operand is of type "double" the other is converted
to "double", and the result of the operation will be
"double".
4. If either operand has an attribute of "long", the other will
be converted to "long"; and the result will have the
attribute, "long".
5. If either operand has an attribute of "unsigned", the other
will be converted to "unsigned"; and the result will have
the attribute, "unsigned".
6. Otherwise, the two operands are converted to "int", and the
result is "int".
Multics Technical Bulletin MTB-647
C Compiler Spec.
10. Statements
The statements accepted by the Multics C compiler are those
defined by K&R. In addition, the following general remarks are
in order:
1) Simple statements are terminated by a semi-colon. Thus,
C programs are free-form.
2) Whitespace may be inserted as desired to improve program
readability.
3) Whitespace is required to separate identifiers, keywords,
and constants which would otherwise be contiguous.
4) While comments (delimited by "/*" and "*/") are not
strictly statements, it is noteworthy that in Multics C,
comments do not nest.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
11. Compiler Directives
This section defines the directives acceptable to the
pre-processor facility. The pre-processor is capable of text and
macro substitution, conditional compilation, and inclusion of
other source files into the compilation unit.
Lines beginning with the character, "#", are considered
directives for this facility. They are not subject to scoping
rules; their effects last from their first use to the end of the
compiled unit.
11.1. #define
This has the same format as in K&R. However, the contents of
strings which are given as part of the #define are examined for
the presence of formal parameters to be substituted. For
example,
#define derogation(SLUR) "You SLUR, you!"
11.2. #undef
The actions taken by this directive are identical to those in
K&R.
11.3. #if
The actions taken by this directive are identical to those in
K&R.
11.4. #ifdef
The actions taken by this directive are identical to those in
K&R.
11.5. #ifndef
The actions taken by this directive are identical to those in
K&R.
Multics Technical Bulletin MTB-647
C Compiler Spec.
11.6. #else
The actions taken by this directive are identical to those in
K&R.
11.7. #elseif
NOTE: This compiler directive is an addition to those listed in
K&R. The construction:
#elseif constant_expression
may be used in place of the sequence:
#else
#if constant_expression
in nested #if constructions. The advantage to this is that only
one #endif is required to close the selection.
11.8. #endif
The actions taken by this directive are identical to those in
K&R.
11.9. #line
The actions taken by this directive are identical to those in
K&R.
11.10. #include
NOTE: The actions taken by this directive are different from
those described by K&R. The directives:
#include "filename"
and
#include <filename>
both use the Multics standard translator search paths to locate
the referenced files. No bypassing of the working directory
takes place because that is under the control of the programmer.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
In addition, the assumed suffix for C include files is ".incl.c".
Therefore, "common" files like "stdio.h" will be mapped to
Multics segment names as "stdio.h.incl.c".
11.11. #equate
NOTE: This compiler directive is an addition to those listed in
K&R. The directive has the form:
#equate identifier text
where "identifier" is a valid C identifier and "text" is any
sequence of characters. The directive states that references to
"identifier" in the list of extern items for the program should
be replaced by a reference to "text".
For example, a C program containing the lines:
#equate BIGGEST_SPACE sys_info$max_seg_size
extern int BIGGEST_SPACE
allows a program to reference directly the word containing the
system-defined maximum segment size (in words).
Multics Technical Bulletin MTB-647
C Compiler Spec.
12. C Programs on Multics
The "usual"" C programming language presumes a static environment
where the entire code segment to be run is linked together into a
single unit before execution begins. In addition, the treatment
of external variables also differs from the standard Multics
paradigm. Finally, the "standard" run-time library has name
conflicts with existing Multics commands and subroutines.
12.1. The C Program Model
There are several points about the paradigm assumed in the
execution-time model of C programs that need to be made explicit
so their difference from or demands on the Multics model can be
discerned. This is not a statement about the way that C programs
on Multics must run. Only about the way they usually run on
other implementations.
A) All of the code involved in an application will be
combined into an executable module prior to placing it in
execution. This is at odds with the dynamic linking
features of Multics.
B) Once the executable module has been prepared, only the
entrypoint to the main program (i.e. the "main"
function) is known to the system which puts the module
into execution. All other external definitions and
references made by the various components are
"inaccessible" when execution begins.
C) All C functions are accessible by name when the modules
are linked regardless of what object program they are
contained in. This is in contrast to Multics'
segname$entryname convention.
D) No relationships exist between successive executions of
the same or different executable images. This is not the
normal Multics process view, although it has been
implemented via the run_ facilities.
E) There are no procedures in C. All subroutines are
functions.
F) All arguments are passed by value. Side-effects are
produced by passing a pointer to the function argument
which is to be modified. Multics argument lists allow
parameters to be passed by reference as well.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
Various aspects of this model's and Multics' adaptation to each
other will be discussed below.
12.2. Symbol Table Requirements
The present Multics symbol table is inadequate to describe a C
program with sufficient clarity. Therefore, additions and
modifications to the information stored in the symbol section of
the object segment will be required to support C programs.
12.2.1. Descriptor Types
The following table gives the C data items which will have to be
represented in the symbol table for debugging. Some C data types
are already represented in other languages which Multics
supports. Those that are not are identified by the marker, "--".
C Object Code Standard Type or Explanation
short int 01 real fixed-point binary short
long int 02 real fixed-point binary long
unsigned short int 34 real fixed-point binary long
unsigned
unsigned long int -- The definition of unsigned types
in the standard descriptor type
table does not allow the short
unsigned integer to have a
precision greater than 35 bits.
Since C "unsigned short int"
variables use all the bits in a
machine word, they must be
assigned to type 34. There is
no descriptor type for a datum
having a precision of 72 bits or
greater.
float 03 real floating-point binary short
double 04 real floating-point binary long
character 21 character string
Multics Technical Bulletin MTB-647
C Compiler Spec.
string -- There is presently no Multics
datum defined which matches C
strings in being delineated by a
zero byte.
pointer 13 pointer
structure 17 structure
union -- Although Algol-68 unions (type
62) are available, they are not
applicable because their data
structure always specifies the
current contents of the union.
In C, this is left to the
programmer to keep track of.
enum constant element -- Pascal enumerated list values
(type 71) are restricted to
non-negative integers. This is
not true in C.
enum variable -- The reasoning here is the same
as for enumerated list constant
elements. The corresponding
Pascal data type (72) is
inapplicable for C.
Very few of these new descriptor types will appear in argument
lists, however, due to the conversion rules.
12.2.2. Other Symbol Table Issues
The following list contains other symbol representation issues
which will have to be resolved before support for C programs can
be considered complete:
A) Symbol nodes for pointers may have to include the
pointer's "base type" (e.g. pointer-to-character) in
order to support correct pointer arithmetic in probe.
B) A symbol node for a union should probably be represented
as the root of a symbol sub-tree of all the possible
constituents of the union.
C) The symbol table will have to include a way to represent
C typedefs resulting from the "tag" on structures and
unions (for example) as separate objects.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
12.3. Probe Changes
Probe will have to be extended as expected to handle C
expressions. Some of the needed extensions are:
A) The C comparison operators "==" and "!=" will have to be
allowed in designating conditional breakpoints.
B) The C modulus operator, "%" will be allowed in
expressions.
C) Constants in octal and hexadecimal must be allowed in
expressions.
D) The C form of subscripts, "A[i][j]", must be allowed in
requesting the values of variables and in assigned
values.
E) The address reference, "&A", will be allowed for
obtaining the address of an item.
F) Explicit dereferencing of a pointer via "*A" will be
allowed.
G) Probe should support arithmetic on C pointers.
H) It should be possible to display the contents of a union
in a programmer-chosen format.
I) The probe builtin functions, length, maxlength, and
substr, must be changed to work on C strings (and
arrays).
J) The C function, "sizeof", should probably be supported.
K) Boolean tests, "if var" and "if !var" should work as
expected as long as "var" can be cast into a int.
12.4. Memory Allocation
Most C implementations place all data (auto, static, extern, and
programmer-allocated) in a single contiguous address space. On
Multics, this is possible, but not desirable. Therefore, the
"standard" place will be used for each type of object: auto
variables will be allocated in the stack; static in the linkage
section; extern in the user_free_area (via *system variables);
and programmer-allocated data in the user_free_area.
Multics Technical Bulletin MTB-647
C Compiler Spec.
The assignment of external and programmer-allocated storage to
the user_free_area will make it possible for programmers to
manage their allocated storage via set_fortran_common.
It should be noted that this separation of storage may cause
difficulties when importing programs which do comparisons of
pointer values. This is because some applications take advantage
of the implicit collection of all data into one unit even though
it is explicitly warned against(1) except where "the pointers
point to objects in the same array." Since Multics allocations
always result in storage blocks contained wholly in a single
segment, programs which observe this portability constraint will
continue to work.
12.5. Use of an Operators Segment
The C compiler will produce object segments which use the
standard pl1_operators_ segment for call/save/return, data value
conversions, intrinsic functions, etc.
12.6. Argument Lists
C programs will use standard Multics calls. That is, they will
produce a list of pointers to the argument values. Because of
the call-by-value(2) requirement, temporary copies will be made
of all non-expression arguments and the addresses of these will
be placed in the argument list.
Whenever possible, descriptor information will be included in the
argument list. However, the utility of this information is in
question. This is because the actual number of different types
which can be passed as arguments(3) is rather small. Thus, while
it seems desirable to pass the address of the first character of
a string and to construct a descriptor for it when the copying
________________________________________
(1) K&R, Appendix A, Section 7.6, pg. 189
(2) K&R, Appendix A, Section 7.1, pgs. 185-186
(3) loc. cit.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
process determines its length, this cannot be done. The language
rules require that the address of a (temporary) pointer to the
first character of the string be placed in the argument list.
The Multics descriptor for it cannot say it is anything other
than that it is an unpacked pointer (at least not without adding
many more descriptors).
The following information attempts to illustrate the
correspondence between a C data item and the value actually
passed as the argument in a function invocation. To assist in
this, the actual PL/I attribute list corresponding to the C
argument value is given when possible. Otherwise, the value
passed is described. When necessary, the reason for the set of
attributes is also listed.
C Argument: int
PL/I Attributes: real fixed binary precision(35, 0)
aligned
Explanation: none
C Argument: long int
PL/I Attributes: real fixed binary precision(71, 0)
aligned
Explanation: none
C Argument: unsigned int
PL/I Attributes: bit(36) aligned
Explanation: This could also be described as
"real fixed binary precision(36, 0)
unsigned aligned" in PL/I terms.
However, this raises the spectre of
known bugs with the representation
of "unsigned" items in the present
compiler. This particular
representation at least gives the
proper computational result when
filtered through the "bin" builtin
function into a signed variable of
precision larger than 36 bits.
C Argument: long unsigned int
PL/I Attributes: bit(72) aligned
Explanation: PL/I does not allow precisions of
binary numbers to exceed 71 bits in
length.
Multics Technical Bulletin MTB-647
C Compiler Spec.
C Argument: float
PL/I Attributes: real float binary precision(63)
aligned
Explanation: C conversion rules for arguments
require that all items of type
float be converted to double.
C Argument: double
PL/I Attributes: real float binary precision(63)
aligned
Explanation: none
C Argument: char
PL/I Attributes: real fixed binary precision(35, 0)
aligned
Explanation: C conversion rules for arguments
require that all items of type char
be converted to int.
C Argument: an array name
PL/I Attributes: pointer aligned
Explanation: An array name is treated as a
pointer expression in C. The value
of the pointer is the address of
the first element of the array.
C Argument: string
PL/I Attributes: pointer aligned
Explanation: Strings are arrays in C. The value
of the pointer is the address of
the leftmost character of the
string.
C Argument: pointer
PL/I Attributes: pointer aligned
Explanation: none
C Argument: a structure name
PL/I Attributes: The structure is passed as the
value of the argument. However,
care should be taken in trying to
described actual arguments which
contain unions.
Explanation: A temporary copy will be made of
MTB-647 Multics Technical Bulletin
C Compiler Spec.
the entire structure and the
address of this copy will appear in
the corresponding position of the
actual argument list. To the
receiver, this argument pointer
will, of course, be invisible.
C Argument: a field within a structure
PL/I Attributes: bit(36) aligned
Explanation: Bit fields are coerced into
unsigned integers. Alternatively,
the representation given above for
unsigned int could have been used.
However, for bits fields this
representation seems more
descriptive. The extracted bit
field values are the rightmost bits
of the string.
C Argument: a union name
PL/I Attributes: bit(n) unaligned
Explanation: Unions are treated like structures.
However, PL/I has no way of
describing a union and C provides
no way to indicate the current
format of the data residing in a
union.
C Argument: enum
PL/I Attributes: real fixed binary precision(35, 0)
aligned
Explanation: Instances of variables which are
defined to contain enumerated
values are treated as variables of
type int.
C Argument: enumerated constant
PL/I Attributes: real fixed binary precision(35, 0)
aligned
Explanation: Constants appearing in an
enumeration list are treated as
being of type int.
Multics Technical Bulletin MTB-647
C Compiler Spec.
Some C implementations utilize this call-by-value mechanism in a
different way. Copies of the arguments to be passed are
catenated together into a structure-like format. The address of
this structure is then passed as the argument pointer. The
called program can then have declared only a single input
argument as in
char *my_arg;
which it manipulates to access the various portions of the
argument list values. From the definition above, Multics C does
not support this programming style.
12.7. References to Library Routines
As mentioned above, the assumption that C makes about execution
is that the library routines have been physically incorporated
into the executing program before execution begins. This is
contrary to the normal Multics policy of having one copy of the
library which is dynamically referenced by all users.
The proposed solution to this problem is to make some
modifications to the Multics binder. The nature of the change(1)
is to provide, as part of the binder's input, a list of external
symbol name-pairs of the form
segname_1$entryname_1 segname_2$entryname_2
The idea is that, after all the inputs have been examined for
external symbol definitions, if there are any unresolved
references to segname_1$entryname_1, they are to be replaced with
references to segname_2$entryname_2.
Thus, a name-pair entry like:
fopen standard_C_library_$fopen
would allow us to provide the C library as a unique object in
Multics without forcing larger bound segments than necessary.
Since it works only on unresolved symbols, C programmers will
still be able to replace library routines in the manner they do
now; by writing a function with that name into their program.
________________________________________
(1) The exact mechanism has not yet been defined.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
As an additional comment, while this addition is being proposed
to accommodate the C model, I believe it will prove worthwhile in
dealing with imported application systems whose organization
makes similar assumptions about the run-time environment. It
also helps resolve name conflicts when these applications
originate on other systems.
For example, it is common to find routines with names like "date"
and "time" being called by imported programs. It would be very
convenient in the management of this importation to be able to
say
date MVS_library_$julian_date
thereby assuring that the application would not now unexpectedly
transfer to the Multics system's "date" command.
12.8. Function Name Resolution
The binding process on Multics is another area where there are
subtle differences from "conventional" usage vis-a-vis "linkage
editing". The C environment (and many other systems, regardless
of language) disregard the name of the object file being used as
input, and concentrate instead on the external entry(1) names
defined and referenced. In Multics terms, this means that a
reference to the entrypoint "bar"(2) should be satisfied by the
entrypoint which Multics knows as "foo$bar", at least as far as
the binder is concerned.
This presently is not possible, but is another area where a
binder change would not only make C programmers more comfortable,
but would probably have benefits when applications software is
imported to Multics from more conventional systems.
________________________________________
(1) in the Multics sense of being visible from outside the
segment (e.g. operands of an ALM "segdef" or "entry"
pseudo-op)
(2) implicitly transformed by Multics into "bar$bar"
Multics Technical Bulletin MTB-647
C Compiler Spec.
13. Run-Time Library Definition
This section defines the minimum set of library routines to be
made available with the compiler. It also attempts to define the
nature of the structures used by programs desiring to communicate
with or manipulate the Multics run-time environment (e.g.
files).
13.1. Input & Output
All input and output to C programs (except that done by direct
reference to Multics virtual memory using the #equate directive)
is done using library functions. It should be stressed that
these functions are among the most machine dependent and thus are
most likely to differ among implementations of C on various
machines. They depend on various constants, macros, and typedefs
specified in the include file, "stdio.h". A brief summary of
some of the more important ones are given in the following table.
Item Description
FILE A typedef for a structure which contains
information about the file from the
run-time library point of view. It is
not a Multics IOCB pointer, but does
contain a reference to the IOCB which
defines this file for Multics.
BUFSIZ The maximum size of an i/o buffer in
characters.
STRSZ The maximum length of a string.
NULL The defined constant value for a null
pointer value.
stdin The standard file identifiers for the
stdout "default system" input, output, and error
stderr files respectively. They are assigned
the natural correspondence on Multics to
user_input, user_output, and
error_output.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
EOF This is an int value which cannot result
from casting ANY character into an int.
Since characters read are treated as
unsigned, the customary value chosen by
most implementations is -1. Thus,
getchar() will return 0777 if the Multics
character 777 is read, and 0777777777777
when end-of-file occurs.
The following list of input and output functions presumes the
data definitions given below in discussing the actions performed
by each function.
FILE *fp; /* A pointer to the structure
defining the file */
char c1, c2; /* Characters to be sent or
received */
int N; /* An integer send or receive
length */
int status; /* A Multics system standard
error code */
char *s1, *s2; /* Pointers to strings of
characters to be sent or
received */
13.1.1. fopen
Declaration:
FILE *fopen();
Invocation:
fp = fopen("filename", "mode");
As shown, the function returns a pointer to a structure
describing the relevant data about the file. It takes two
arguments, both strings. The first is an absolute or relative
pathname of the file to be opened. The opening will be attempted
via the vfile_ io module of Multics, using a "stream" mode.
Multics Technical Bulletin MTB-647
C Compiler Spec.
The second argument is a single character string designating the
intended use for the file. Allowed values are "r" (read), "w"
(write), and "a" (append). Using "r" will cause an attempt to
open for "stream_input", otherwise, the attempt will be made to
open the file in "stream_output".
If the file does not exist, and it is being opened for writing,
it will be created. If the file cannot be opened as requested,
the value NULL will be returned.
13.1.2. fclose
Declaration:
void fclose();
Invocation:
fclose(fp);
The argument to "fclose" is always a file pointer. Files remain
open until explicitly closed by the program or until forced
closed by a "close_files" command or the termination of a
run-unit. Closing a file which is not open is not an error.
13.1.3. getc
Declaration:
char getc();
Invocation:
c = getc(fp);
This function gets a single character from the file whose pointer
is given as its argument. The file must be opened for stream
input. When the input file is exhausted, this function returns
an EOF character from each invocation.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
13.1.4. putc
Declaration:
char putc();
Invocation:
c1 = putc(c2, fp);
The "putc" function writes the character given as its first
argument to the file whose pointer is specified as its second
argument. The file must be opened for stream output at the time
of the invocation. The "putc" function returns as its value the
character it sends to the file.
13.1.5. fgets
Declaration:
char *fgets();
Invocation:
s2 = fgets(s1, N, fp);
This function reads characters from the file whose pointer is
given as the third argument. The second argument, N, tells how
many characters to read. Characters are read until a newline
(n) is encountered(1) or N-1 characters have been passed. The
string terminator (0) is stored as the last character in the
string given as the first argument. The result of the function
is the value of the first argument.
13.1.6. fputs
Declaration:
void fputs();
Invocation:
fputs(s1, fp);
________________________________________
(1) If the newline character stops the input, it is still stored
as part of the characters read into the string.
Multics Technical Bulletin MTB-647
C Compiler Spec.
The first argument must be a pointer to s string of characters
and the second is a pointer to a file structure. Characters are
written from the string up to but not including the null
character marking the end of the string.
13.1.7. printf
Declaration:
void printf();
Invocation:
printf("fmt string", ... );
This function is used to convert a number of arguments (possibly
none) from their internal representation to ASCII under control
of a format string (given as the first argument). The converted
value are written to the standard output file. The format
controls are those defined by the reference document,
pgs. 145-147.
13.1.8. fprintf
Declaration:
void fprintf();
Invocation:
fprintf(fp, "fmt string", ... );
This function works like printf except that the resultant string
is written to the file given as the first argument. The format
control string is given as the second argument, and the data to
be converted (if any) as the third and succeeding arguments.
13.1.9. sprintf
Declaration:
int sprintf();
Invocation:
sprintf(s1, "fmt string", ... );
MTB-647 Multics Technical Bulletin
C Compiler Spec.
This function performs the conversion to ASCII in the manner of
fprintf. However, the first argument designates a string where
the result is to be placed rather than a file to which is to be
written. No check is made to ensure that the target string,
given as the first argument, is long enough to hold the result.
13.1.10. scanf
Declaration:
int scanf();
Invocation:
scanf("fmt string", &arg1, ... );
This function is the input analog of fprintf. The first argument
is a control string indicating how to interpret characters
received from the standard input file. The remaining arguments
are pointers to data values which will hold the converted values.
The valid scanning control sequences are given in K&R,
pgs. 148-149.
The result of the function is the number of items which were
successfully converted and assigned to items in the argument
list.
13.1.11. fscanf
Declaration:
int fscanf();
Invocation:
fscanf(fp, "fmt string", &arg1, ... );
This function works like scanf except that the first argument
designates the file which is to be used as the input file.
Multics Technical Bulletin MTB-647
C Compiler Spec.
13.1.12. sscanf
Declaration:
int sscanf();
Invocation:
sscanf(s1, "fmt string", &arg1, ... );
This function works like fscanf except that the first argument
designates a string which is to be used as the source of input
characters, rather than a file.
13.1.13. rewind
Declaration:
void rewind();
Invocation:
rewind(fp);
This function resets the file position for the file whose pointer
is given as it argument to the beginning of the file.
13.1.14. open_file
Declaration:
FILE *open_file();
Invocation:
fp = open_file("Multics attach description",
"Opening Mode");
As shown, the function returns a pointer to a structure
describing the relevant data about the file. It takes two
arguments, both strings. The first argument is a standard
Multics attach description. The second is a standard Multics
opening mode for the target switch. If the file cannot be opened
as requested, a "FILE" structure will still be allocated and a
pointer to it returned. The structure will contain the reason
for the inability to open the file.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
13.1.15. open_switch
Declaration:
FILE *open_switch();
Invocation:
fp = open_switch("Multics io switchname"
"Opening Mode");
This function performs like open_file except that the first
argument is the name of an attached and unopened io switch,
rather than an attach description.
13.1.16. attach_switch
Declaration:
int attach_switch();
Invocation:
status = attach_switch("Multics io switchname"
"Attach description");
This function attaches a Multics io switch with the given name
and attach description. It returns zero if it successfully made
the attachment and a standard error code otherwise.
13.1.17. detach_switch
Declaration:
int detach_switch();
Invocation:
status = detach_switch("Multics io switchname");
This function detaches a Multics io switch with the given name.
The switch must be closed for the detach to succeed. It returns
zero if it successfully made the attachment and a standard error
code otherwise.
Multics Technical Bulletin MTB-647
C Compiler Spec.
13.1.18. fflush
Declaration:
void fflush();
Invocation:
fflush(fp);
Any output which is in the C file buffer but has not been sent to
the associated Multics io switch is forced out. The file must be
opened in an output mode.
13.2. String Manipulation
This section describes the library functions available for string
manipulation. In the discussion of the individual functions, the
following definitions are assumed:
char s1, s2, s3; /* Strings */
char c; /* A single character */
int M, N; /* Various character count */
13.2.1. strcat
Declaration:
char *strcat();
Invocation:
s3 = strcat(s1, s2);
This function appends a copy of the string, s2, to the end of the
string, s1. No check is made on the allocated length of s1; this
is the responsibility of the programmer. The value returned by
the function is the value of s1.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
13.2.2. strncat
Declaration:
char *strncat();
Invocation:
s3 = strncat(s1, s2, N);
This function appends at most N characters from s2 to s1. If s2
is less than or equal to N characters in length, it behaves like
"strcat".
13.2.3. strcmp
Declaration:
int strcmp();
Invocation:
N = strcmp(s1, s2);
The two strings are compared lexicographically. If s1 is greater
than s2, the value returned is positive; if less, negative; and
if equal, zero.
13.2.4. strncmp
Declaration:
int strncmp();
Invocation:
M = strncmp(s1, s2, N);
This works like "strcmp" except that no more than N characters
from the front of s1 and s2 are compared.
Multics Technical Bulletin MTB-647
C Compiler Spec.
13.2.5. strcpy
Declaration:
char *strcpy();
Invocation:
s3 = strcpy(s1, s2);
In this function, s2 is copied into s1. The copy ends when the
last character of s2 has been moved. No check is made on the
allocated length of s1. The function return value is the value
of the first argument.
13.2.6. strncpy
Declaration:
char *strncpy();
Invocation:
s3 = strncpy(s1, s2, N);
This function copies exactly N characters from s2 into s1. If s2
is longer than N characters, no string terminator is stored in
s1. If s2 is shorter than N characters, s1 is padded to N
characters with trailing null characters until it is N characters
long. The return value of the function is the value of the first
argument.
13.2.7. strlen
Declaration:
int strlen();
Invocation:
N = strlen(s1);
The value of the function is the length (including the string
terminator) of s1.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
13.2.8. strchr
Declaration:
char *strchr();
Invocation:
s2 = strchr(s1, c);
The return value of the function is a pointer to the first
occurrence of c in s1. If c does not occur in s1, the return
value is a null pointer.
13.2.9. strrchr
Declaration:
char *strrchr();
Invocation:
s2 = strrchr(s1, c);
The return value of the function is a pointer to the last
occurrence of c in s1. If c does not occur in s1, the return
value is a null pointer.
13.3. Memory Allocation
This section describes the library functions available for
allocating and freeing blocks of memory. In the discussion of
the individual functions, the following definitions are assumed:
unsigned N, M; /* Sizes and amounts to be
allocated */
char *loc, /* Address of allocated
*oldloc; space */
Multics Technical Bulletin MTB-647
C Compiler Spec.
13.3.1. malloc
Declaration:
char *malloc();
Invocation:
loc = malloc(N);
The argument to malloc is the number of bytes which are to be
allocated. It returns a pointer to a block of bytes at least N
long. The return value also points to an address suitable for
use with any data type.
13.3.2. free
Declaration:
void free();
Invocation:
free(loc);
This function returns the space previously allocated by malloc to
the free storage pool. No guarantee is made about the value of
the bits in the allocated block.
13.3.3. calloc
Declaration:
char *calloc();
Invocation:
loc = calloc(N, M);
This function works like malloc except that it returns a pointer
to a block of space sufficient to hold N copies of size M. In
addition, all bytes in the allocated block are guaranteed to be
zero.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
13.3.4. realloc
Declaration:
char *realloc();
Invocation:
loc = realloc(oldloc, M);
This function "resizes" the block of storage pointed to by its
first argument to be the size given by its second argument. If
the space is to be shrunk, bytes will be trimmed from the right
end of the block. If the requested size is larger, the new block
will have the old block's value stored left-justified in the new
block padded with 0 bytes to fill out the new size.
In no case, even when the block size does not need to be changed,
should the program expect that loc = oldloc.
13.4. Mathematical Functions
The following list of mathematical functions will be available in
the run-time library. All of these routines take arguments of
type "double" and returns "double" values as their result.
Function Description
abs(X) absolute value of X
acos(X) arccosine of X in radians
0 <= acos(X) <= pi
asin(X) arcsine of X in radians
-(pi/2) <= asin(X) <= (pi/2)
atan(X) arctangent of X in radians
-(pi/2) < atan(X) < (pi/2)
ceil(X) smallest integer value greater than or equal
to X
cos(X) cosine of X in radians
cosd(X) cosine of X in degrees
cosh(X) hyperbolic cosine of X
exp(X) e ** X
Multics Technical Bulletin MTB-647
C Compiler Spec.
floor(X) largest integer value less than or equal to X
log(X) natural logarithm of X
log10(X) logarithm (base 10) of X
log2(X) logarithm (base 2) of X
sin(X) sine of X in radians
sind(X) sine of X in degrees
sinh(X) hyperbolic sine of X
srqt(X) square root of X
0 <= X
tan(X) tangent of X in radians
tand(X) tangent of X in degrees
tanh(X) hyperbolic tangent of X
13.5. Miscellaneous
The following functions do not fit easily within the preceding
classifications. Many of the functions listed here implicitly
make programs dependent on the Multics environment and should be
avoided in situations where portability is important. The
following definitions are assumed in the discussion of these
functions.
long int tics; /* A counter for clock
"tics" */
long int when; /* A date or time value */
int code; /* A Multics system standard
error code */
char flag; /* A choice indicator */
char *msg; /* A pointer to a message
string */
MTB-647 Multics Technical Bulletin
C Compiler Spec.
13.5.1. clock
Declaration:
long int clock();
Invocation:
tics = clock();
The return result is the number of microseconds since 0000 hours,
1 January 1901, GMT.
13.5.2. vclock
Declaration:
long int vclock();
Invocation:
tics = vclock();
The result is the number of microseconds of virtual cpu time used
by the process.
13.5.3. date
Declaration:
long int date();
Invocation:
when = date();
The result is an integer value representing the current date in
the form YYYYMMDD, where YYYY is the year within the century, MM
is the month within the year, and DD is the day of the month.
Multics Technical Bulletin MTB-647
C Compiler Spec.
13.5.4. time
Declaration:
long int time();
Invocation:
when = time();
The result is an integer value giving the current time in the
form HHMMSSFFFFFF where HH is the hour of the day (00-23), MM is
the minute within the hour, SS is the second within the minute,
and FFFFFF is the microsecond within the second.
13.5.5. exit
Declaration:
void exit();
Invocation:
exit(code, msg, flag);
This function forces a return to the caller of its "main"
program. All arguments are optional. If code is zero, or the
function is invoked without arguments, then control passes to the
caller of the "main" program.
If code has the value -1, then the Multics condition
"command_abort" will be signalled.
If code is not zero or -1, it is interpreted as a standard
Multics error code, and a call is made on the system routine,
sub_err_, passing the msg. In this case, "flag" may only take
one of the values acceptable to sub_err_.
MTB-647 Multics Technical Bulletin
C Compiler Spec.
14. Open Issues
This section contains unresolved, important issues related to the
suitability, performance, or "look" of the Multics implementation
of the C compiler and language. Many of them have come from
reviewers of prior drafts of this document. They are listed here
in no particular order. Your comments and concrete suggestions
are welcome on these topics.
14.1. Use of Standard Operators
A suggestion has been made that C programs not use the
pl1_operators_ segment, but instead have a special one of their
own. The reasons in support of this are:
A) The pl1_operators_ segment is too hard to maintain and
modify.
B) The rules for PL/I arithmetic do not match those of C
well enough to make its use profitable in object
segments. One more tailored to C rules would allow more
compact object segments.
C) Given the tendency for C programs to contain many small
functions and make heavy use of function calls during
execution, the pl1_operators_ call/push/return sequence
will be too slow. A more effective one could be written
that takes advantage of C programming style.
D) Additional efficiency may be gained by having the
compiler recognize functions which are intrinsic(1) but
implemented efficiently in the operators segment. The
function "strcpy" is a good example of this.
14.2. Mismatch in System Calling Conventions
There is no mechanism to define a function or subroutine external
to the calling program which obeys "native" calling conventions:
argument passing by-reference, use of descriptors, call-by-value
through the use of expressions, etc. Multics FORTRAN provides
this via the declaration:
external foo descriptors
________________________________________
(1) An "intrinsic" function in this context is one which is part
of the standard library supplied with the compiler.
Multics Technical Bulletin MTB-647
C Compiler Spec.
The Maclisp compiler also provides a "defpl1" facility to do a
similar function in addition to providing data type conversion as
part of the call. Several reviewers have asked for such an
extension in the Multics implementation of C.
14.3. Unbound Programs and Name Resolution
The design proposes that C programs will have their
inter-function name resolution done by the binder. While this
seems to mimic the approach on other systems which require link
editing compiled programs into executable objects, it leaves
stand-alone C programs on Multics in the lurch.
The suggestion has been made that the binder name-resolution
mechanism be implemented. In addition, this facility should also
be added to the compiler (perhaps through the inclusion of a
standard preamble containing #equate directives) as well. In
this case, additional provisions must allow the redefinition of
such names by explicit inclusion of the function in a source
program.
14.4. Support for the Entry Keyword
It has been proposed the Multics provide an extension to the
language which allows the creation of multiple-entry functions
via the "entry" keyword. This keyword is presently reserved(1)
for future use in the reference language.
14.5. Linker Support for the MAIN Entrypoint
When an external reference is made to routine "foo" on Multics,
the linker maps that into a reference to a segment whose name is
"foo". Having found the segment, it then looks to see if there
is an entrypoint in that segment called "foo". If there is one,
execution begins at that entrypoint.
In deference to languages like FORTRAN which have "main
programs", if the linker cannot find an entrypoint named "foo",
it will look for one called "main_". The FORTRAN compiler
creates such an entrypoint for main programs to indicate the
point to begin execution.
________________________________________
(1) K&R, Appendix A; Section 2.3, Keywords; Pg. 180
MTB-647 Multics Technical Bulletin
C Compiler Spec.
The issue in this case is is to decide among the following
possibilities:
A) The C compiler should translate any function defined as
"main", the reserved keyword, into an entrypoint in the
object segment called "main_". There will have to be an
additional keyword reserved, "main_"; but, the linker
does not have to be changed.
B) The C compiler should add an additional entrypoint,
called "main_", to any object segment it finds which
contains a definition for "main". The "main" entrypoint
will also appear an an external symbol; both will cause
execution to begin at the same point in the compiled
code. The linker will not have to be changed in this
case; the "main_" keyword must be reserved.
C) The linker should be changed to additionally look for the
entrypoint "main" in the object segment before giving up
and reporting failure. No additional keywords have to be
reserved by the compiler. The linker change would be
almost invisible to most users.
14.6. Content of the Library
The functions defined earlier make up a minimal subset of a
useful programming library for C. Other useful routines, and
suggestions for other libraries, are especially welcome.
14.7. UNIX Environment Features
Unlike many other languages, C was developed in conjunction with
an operating system, UNIX.(1) A consequence of this is that many
C programs are written with the (implicit?) assumption that
certain facilities will be present. Which of these features
should be built into the C compiler/run-time and which should be
included in a larger enclosing environment is also an important
open issue. Some of those which have been raised are included
here.
________________________________________
(1) UNIX is a registered trademark of Bell Laboratories. It is
commercially available under license from Western Electric.
Multics Technical Bulletin MTB-647
C Compiler Spec.
14.7.1. Enclosing the Main Routine
There is no way of automatically providing for pre-execution
preparation of the running environment. This includes providing
files for the standard devices: stdin, stdout, and stderr.
14.7.2. Device Nomenclature
The present proposal provides no way to map between
program-generated device strings commonly used by UNIX (e.g.
/dev/tty6 or /dev/mem) and Multics counterparts. Some reviewers
see this as a desirable feature of the run-time support.
14.7.3. Support for ARGC & ARGV
The present proposal provides no way to identify C main programs
as different from those written in any other language.
Therefore, the suggested(1) cannot be used in Multics without
some additional support. Whether this is to be handled by
extending the command processor, providing an easy conversion
sequence, or providing it as part of the encapsulating support
for C programs remains undecided.
________________________________________
(1) K&R, Chapter 5, Pointers and Arrays; Section 5.11,
Command-Line Arguments; pp.110-114.