Multics Technical Bulletin MTB-710
The LALR System
To: Distribution
From: Betty Wong
Date: 23 May 1985
Subject: Multi System and Language Support
1. Abstract
This MTB describes the systems and languages supported by the
LALR system other than Multics and Multics PL/1 and ALM. The
systems supported are GCOS and DPS 6. The languages supported
are GMAP, GCOS PL/1, DPS 6 Assembly Language, Ada/SIL, and C.
Comments on this MTB should be sent to the author -
via Multics mail to:
BWong.Multics on System M
via posted mail to:
Betty Wong
Advanced Computing Technology Centre
The University of Calgary
Foothills Professional Building
Room #301, 1620 - 29th Street N.W.
Calgary, Alberta T2N 4L7
CANADA
via telephone to:
(403)-270-5400
(403)-270-5408
_________________________________________________________________
Multics project internal documentation; not to be reproduced or
distributed outside the Multics project.
MTB-710 Multics Technical Bulletin
The LALR System
TABLE OF CONTENTS
Section Page Subject
======= ==== =======
1 i Abstract
2 1 Introduction
3 2 Changes to the Control Section of the Grammar
Source Segment
4 3 Changes to Handle Embedded Semantics
4.1 4 . . DPS 6 Assembly Language
4.2 6 . . Ada/SIL Language
4.3 8 . . C Language
5 10 Changes to Handle Separate Semantics
6 11 Changes to LALR Commands
6.1 11 . . The 'lalr' Command
6.2 15 . . The 'make_dpda' Command
6.3 18 . . The 'dps6_dpda' Command
7 21 Parse Tables Produced
7.1 21 . . GMAP Source Segment Parse Tables
7.2 24 . . DPS 6 Parse Tables Object Files
7.2.1 25 . . . . DPS 6 Files for Assembly Language Use
7.2.2 29 . . . . DPS 6 Files for Ada/SIL Use
7.2.3 30 . . . . DPS 6 Files for C Use
Multics Technical Bulletin MTB-710
The LALR System
2. Introduction
LALR translates a BNF-like language description into a parser for
the language. The output from LALR is a set of tables that
control the operation of a parser procedure. Because these
tables are lists of signed integers they can be easily
transported to computers other than Multics. The parser
procedure is a simple routine and versions of it have been coded
in PL/1, C and Assembly language. LALR has options which allow
the tables to be generated as a Multics object segment, an ALM
source segment, a GMAP source segment or a DPS 6 Multics Host
Resident System object segment. Declarations for these segments
can be generated in Multics PL/1, GCOS PL/1, DPS 6 Assembly
Language, Ada/SIL, or C. These Multics-generated segments can
then be transferred to other systems.
The work for support on Multics systems, Multics PL/1, and ALM
has already been done. Therefore, this document contains
information describing Multics commands in the LALR system to
produce tables for GCOS and DPS 6 systems and files in the
languages supported on these systems.
Most of the support for the specified systems and languages is
already incorporated into the LALR system. The work remaining is
to:
1) document the systems and languages supported
2) create parser procedures written in the supported
languages that can be tailored to individual
specifications
3) create equivalents of the lalrp command on other
systems so that parts of the translator written
in the supported languages can be tested.
The LALR system was originally created by J. Falksen and Dave
Ward of LISD. The information for this MTB is taken from LALR, a
Translator Construction System, the SLANG Project Technical
Bulletin written by Patrick Prange dated July 2, 1984.
MTB-710 Multics Technical Bulletin
The LALR System
3. Changes to the Control Section of the Grammar Source Segment
Control arguments to the 'lalr' command can be included in the
control section of the grammar source segment. Additional
control lines to support other systems and languages are:
-ada_sil
-asm
-c, -C
-dps6_format
-gmap
-hrs_format
-no_ada_sil
-no_asm
-no_c, -no_C, -noc, -noC
-no_gmap
For further information on the meanings of these control
arguments, see Section 6.1 - The 'lalr' Command.
Multics Technical Bulletin MTB-710
The LALR System
4. Changes to Handle Embedded Semantics
In the embedded semantics format, the source segment will be able
to contain code for a GCOS PL/1 procedure, a DPS 6 Assembly
Language program, an Ada/SIL program unit, or a C function.
The following sections describe the creation of the semantics
segment from an embedded semantics source segment for DPS 6
Assembly Language, Ada/SIL, and C. A GCOS PL/1 semantics segment
is generated by the same procedure used for a Multics PL/1
semantics segment.
MTB-710 Multics Technical Bulletin
The LALR System
4.1. DPS 6 Assembly Language
If the source segment is a DPS 6 Assembly Language program unit
(as indicated by the -semantics control argument), LALR creates
the assemblable semantics segment from it by the following steps:
1) Begins the segment with the title statement:
title X,'yymmdd00'
where "yymmdd" is the current date, if the semantics segment
is named X.nml or X.nml.MAC,. If the semantics segment is
named X.incl.nml, it begins the segment with the comment
lines mentioned in step 2 below.
2) Appends comment lines giving the name of the input grammar
segment, the date and time it was translated, the version
of LALR that was used to translate it, and the user_id of
user who translated it are placed in the output semantics
segment.
3) Appends the following statements defining the semantics
procedure's entry point and transfering control to the
semantics for the current rule.
xdef X
X lab $B4,jtable-1
ldr $R1,$B4.$R1
jmp $B4.$R1
These statements assume the parser passes the rule number
or production number, as appropriate, by value in register
R1.
4) Appends the source segment to the semantics segment making
the following changes:
a) Puts a "*" in front of each line of the control
portion, if present.
b) Puts a "*" in front of each line of each rule. If a
rule does not begin at the beginning of a line or end
at the end of a line, lines are split as necessary to
make the rule do so.
Multics Technical Bulletin MTB-710
The LALR System
c) If the -production control is not in effect, each
"%%%%" in the semantics is replaced with the 4-digit
number of the rule which it represents. If the
-production control is in effect, each "%%%%"
immediately followed by an unsigned decimal number
representing an alternative number, is replaced with
the 4-digit number of the production which it
represents.
5) Appends a DC statement defining the jump table used by the
statements shown in step 3 above. If the -production
control is not in effect the jump table is as follows:
jtable dc R1-jtable+1;
R2-jtable+1;
...
Rn-jtable+1
The jump table contains an entry for each rule of the
grammar. If the i-th rule has a significant semantic, Ri
used in the i-th line of the DC statement is the letter "r"
followed by the value of i as a 4-digit decimal number.
Otherwise, Ri is "no_sem". (The user is assumed to have
defined the tag "no_sem" somewhere in the semantics
segment.)
If the -production control is in effect the jump table is
as follows:
jtable dc P1-jtable+1;
P2-jtable+1;
...
Pn-jtable+1
The jump table contains an entry for each production of the
grammar. If the i-th production has a significant
semantic, Pi used in the i-th line of the DC statement is
the letter "p" followed by the value of i as a 4-digit
decimal number. Otherwise, Pi is "no_sem". (The user is
assumed to have defined the tag "no_sem" somewhere in the
semantics segment.)
6) Appends the following end statement to the semantics segment
if it is named X.nml or X.nml.MAC.
end X
MTB-710 Multics Technical Bulletin
The LALR System
4.2. Ada/SIL Language
If the source segment is a Ada/SIL program unit (as indicated by
the -semantics control argument), LALR creates the compilable
semantics segment from it by the following steps.
1) Begins the semantics segment with a <subprogram
specification> naming the subprogram. If the semantics
segment is named X.ada, the following <subprogram
specification> is generated:
procedure X
(rule_no: in natural;
alt_no: in natural;
lex_stack_ptr: in access;
ls_top: in integer) is
If the semantics segment is named X.incl.ada, the following
<subprogram specification> is generated:
procedure X
(rule_no: in natural;
alt_no: in natural) is
If the -production control is in effect, the formal
parameters rule_no and alt_no in the above <subprogram
specification>s are replaced by a single input formal
parameter "prod_no" of type natural. If the -rule_only
control is in effect, the formal parameter alt_no is
omitted from the above <subprogram specification>s.
2) Appends a sequence of <comment> lines giving the name of the
input grammar segment, the date and time it was translated,
the version of LALR that was used to translate it, and the
user_id of the user who translated it.
3) Appends the source segment to the semantics segment making
the following changes:
a) Puts "--" in front of each line of the control portion,
if present.
b) Puts "--" in front of each line of each LALR rule. If
a rule does not begin at the beginning of a line or
end at the end of a line, lines are split as necessary
to make each rule do so.
Multics Technical Bulletin MTB-710
The LALR System
c) If the -production control is not in effect, each
"%%%%" in the semantics is replaced with the zero
suppressed number of the rule which it represents. If
the -production control is in effect, each "%%%%"
immediately followed by an unsigned decimal number
representing an alternative number, is replaced with
the zero suppressed number of the production which
it represents.
4) Ends the <subprogram body> with the following text:
end X;
NOTE: If the -no_semantics_header control has been given,
steps 1 and 4 above are skipped.
MTB-710 Multics Technical Bulletin
The LALR System
4.3. C Language
If the source segment is a C function (as indicated by the
-semantics control argument), LALR creates the compilable
semantics segment from it by the following steps.
1) Begins the semantics segment with a <type specifier> and a
<function declarator> naming the procedure. If the
semantics segment is named X.c, the following <type
specifier> and <function declarator> are generated:
int X (rule_no, alt_no, lex_stack_ptr, ls_top)
If the semantics segment is named X.incl.c or X.h, the
following <type specifier> and <function declarator> are
generated:
int X (rule_no, alt_no)
If the -production control is in effect, the parameters
rule_no and alt_no in the above <function declarator>s are
replaced by the single parameter prod_no. If the
-rule_only control is in effect, the parameter alt_no in
the above <function declarator>s is omitted.
2) Appends a <comment> giving the name of the input grammar
segment, the date and time it was translated, the version
of LALR that was used to translate it, and the user_id of
the user who translated it.
3) Appends a <declaration list> declaring the formal
parameters. If the semantics segment is named X.c, the
<declaration list> is as follows:
int rule_no;
int alt_no;
int *lex_stack_ptr;
int ls_top;
If the semantics segment is named X.incl.c or X.h, the
declaration of lex_stack_ptr and ls_top is omitted.
If the -production control is in effect, the declaration of
the formal parameters rule_no and alt_no in the above
<declaration list>s are replaced by a declaration of a
single int parameter "prod_no". If the -rule_only control
is in effect, the declaration of the formal parameter
alt_no in the above <declaration list> is omitted.
4) Appends a "{" (left brace) to the semantics segment.
Multics Technical Bulletin MTB-710
The LALR System
5) Appends the source segment to the semantics segment making
the following changes:
a) Puts "/*" and "*/" around the control portion, if
present.
b) Replaces each occurrence of "/*" or "*/" within the
control portion with the four character string "*//*".
c) Puts "/*" and "*/" around each LALR rule.
d) Replaces each occurrence of "/*" or "*/" within an LALR
rule with the four character string "*//*".
e) If the -production control is not in effect, each
"%%%%" in the semantics is replaced with the zero
suppressed number of the rule which it represents. If
the -production control is in effect, each "%%%%"
immediately followed by an unsigned decimal number
representing an alternative number, is replaced with
the zero suppressed number of the production which
it represents.
6) Appends a "}" (right brace) to the semantics segment.
NOTE: If the -no_semantics_header control has been given, only
steps 2 and 5 above are performed.
MTB-710 Multics Technical Bulletin
The LALR System
5. Changes to Handle Separate Semantics
In separate semantics source segments, the rules have this basic
form:
<var> ::= <prod list> ! <rule semantics>
<prod list>
represents a production list where a production is a
sequence of terminals and variables. If there is a
list of them, they are separated by "|". The
production list may be empty.
If the -production control is in effect, a production
may end with the symbols "=> t : p$e", where t is an
identifier tagging the production and p$e identifies
an entry point in an external procedure to be called
to perform the semantic action. If no tag is needed,
"t" and the ":" following it may be omitted. There
may not be any white-space between "p" and "$" nor
between "$" and "e". If "p" and "e" are the same, the
"$e" may be omitted.
When the tables are produced as a GMAP source segment,
"p" is ignored and "e" is taken to be an external
symbol; i.e., it has been SYMDEF'ed. Each "t"
generates a word, tagged with "t", containing the
corresponding production number. Each "t" is also
SYMDEF'ed.
When the tables are produced as a DPS 6 object unit,
"p" is taken to be the name of an object unit and "e"
is considered to be an entry point defined within that
object unit. If the -asm control is used to request
the object unit, each "t" names an external value
equal to the corresponding production number. If the
-ada_sil control is used to request the object unit,
each "t" generates a variable of type integer which is
initialize with the corresponding production number.
!
represents "end of production list". This must always
be present. If the -rule control is in effect, the
"!" of each rule may be followed by the symbols "=> t
: p$e", where "t", "p", and "e" are as described above
except that they pertain to rules instead of
productions.
Multics Technical Bulletin MTB-710
The LALR System
6. Changes to LALR Commands
The support for other languages and systems requires changes to
existing commands or the addition of other commands.
6.1. The 'lalr' Command
This command requires additional control arguments and changes to
the interpretation of the -semantics and -table control
arguments.
---------
lalr, lrk
---------
SYNTAX:
lalr path {control_args}
FUNCTION: Invokes the LALR compiler to translate a source
segment containing the text of the LALR source into a set of
tables located in an object segment. The object segment is
given two names consisting of the entryname portion of the
source segment with the suffixes grammar and result. A
listing segment is optionally produced. Packaged forms of
the tables may be requested. These segments are placed in
your working directory.
ARGUMENTS:
path
is the pathname of the LALR source segment containing the
grammar to be processed. The lalr suffix is assumed if not
supplied. This argument may be an archive component
pathname.
MTB-710 Multics Technical Bulletin
The LALR System
CONTROL ARGUMENTS:
-semantics {X}, -sem {X}
produces a semantics file named X. X cannot be an archive
component pathname. The equals convention is applied to the
entryname of X and the entryname (or component name in case
of an archive component) of the source segment. The
suffix(s) of the resultant entryname must be pl1 or incl.pl1
(PL/I source), nml, incl.nml, or nml.MAC (DPS 6 Assembly
Language source), ada or incl.ada (Ada/SIL source), or c,
incl.c, or h (C source). If no suffix is present, incl.pl1
is assumed. If incl is present, it is treated as incl.pl1.
If X is not given, "=_s.pl1" is assumed. This control
argument is meaningless with a separate semantics format
source segment.
-table {X{.incl.pl1}}, -tb {X{.incl.pl1}}
produces table named X and appropriately named source files.
X may not be an archive component pathname. The equals
convention is applied to the entryname of X and the entryname
(or component name in case of an archive component) of the
source segment. The table is produced as a Multics object
segment unless otherwise specified by the control described
below. If X is not given, "=_t" is assumed. This control
argument implies the -terminals_hash_list, -terminals_list,
-variables_list, -production_names and -synonyms control
arguments.
-ada_sil
produces the table as a DPS 6 Multics Host Resident System
object file name X.object or a DPS 6 native object file X.o
and produce a DPS 6 Ada/SIL package specification named
X.spec.ada. X is the name supplied with the -table control
argument less all suffixes.
-asm
produces the table as a DPS 6 Multics Host Resident System
object file named X.object or a DPS 6 native object file X.o
and produce a DPS 6 Assembly Language include file named
X.incl.nml. X is the name supplied with the -table control
argument less all suffixes.
Multics Technical Bulletin MTB-710
The LALR System
-c, -C
produces the table as a DPS 6 Multics Host Resident System
object file named X.object or a DPS 6 native object file X.o
and produce a DPS 6 C Language header file named X.h. X is
the name supplied with the -table control argument less all
suffixes.
-dps6_format
causes the DPS 6 object file produced because of the -asm,
-ada_sil, or -C control argument to be generated in 'native'
format. A native format DPS 6 object file may be transmitted
to a DPS 6 running the Mod 400 Operating System via a
network_request l6_ftf command specifying data_type binary.
No intermediate format conversion step is required before
transmitting native format files. Native format is the
default format for DPS 6 object files. This control argument
is meaningless if none of the control arguments -asm,
-ada_sil, or -C are specified.
-gmap
produces the table as a gmap segment X.gmap and a GCOS III
PL/I include file named X.incl.pl1. X is the name supplied
with the -table control argument less all suffixes.
-hrs_format
causes the DPS 6 object file produced because of the -asm,
-ada_sil, or -C control argument to be generated in the
Multics Host Resident System format. This is the format
required by the various HRS tools. This control argument is
meaningless if none of the control arguments -asm, -ada_sil,
or -C are specified.
-no_ada_sil
does not produce the table in the form described above for
the -ada_sil control argument.
-no_asm
does not produce the table in the form described above for
the -asm control argument.
-no_c, -no_C, -noc, -noC
does not produce the table in the form described above for
the -c control argument.
MTB-710 Multics Technical Bulletin
The LALR System
-no_gmap
does not produce the table in the form described above for
the -gmap control argument.
-origin N, -org N
specifies the lower bound, N, to be used with the arrays
generated for DPS 6 format tables. N must be 0 or 1. The
default is 0 if the -c control is present, otherwise it is 1.
For the DPDA, the final state (state zero) is materialized
when the origin is zero, otherwise it is a fictitious state.
For the skip table, a dummy row zero is generated when the
origin is zero. For the effect of this control on the
terminals list structures, see the language specific
discussions below.
Notes:
Options -alm, -gmap and one of -asm, -ada_sil, or -c may occur
together. Options -asm, -ada_sil and -c are mutually exclusive.
If -alm, -gmap, -asm, -ada_sil or -c is in effect but the -table
parameter is not, -table =_t is assumed.
Multics Technical Bulletin MTB-710
The LALR System
6.2. The 'make_dpda' Command
This command requires additional control arguments.
-------------
make_dpda, md
-------------
SYNTAX:
make_dpda result_file_path {table_path} {control_args}
FUNCTION: produces a table containing the DPDA extracted from
the result file of a previous LALR generation. This table is
the same as the one produced by the lalr command when it is
invoked with the -table control argument.
ARGUMENTS:
result_file_path
is the pathname of the result file from a previous LALR
generation from which the DPDA is to be extracted. The
grammar suffix is assumed if not supplied. This argument may
be an archive component pathname.
table_path
is the pathname of the table to be produced. If this argument
is given with the suffix incl.pl1, the suffix is ignored. Any
other suffix is retained as given. The default is "=_t".
CONTROL ARGUMENTS:
-ada_sil
produces the table as a DPS 6 Multics Host Resident System
object file named X.object or a DPS 6 native object file
named X.o and produce a DPS 6 Ada/SIL package specification
file named X.spec.ada.
MTB-710 Multics Technical Bulletin
The LALR System
-asm
produces the table as a DPS 6 Multics Host Resident System
object file named X.object or a DPS 6 native object file
named X.o and produce a DPS 6 Assembly Language include file
named X.incl.nml.
-c, -C
produces the table as a DPS 6 Multics Host Resident System
object file named X.object or a DPS 6 native object file
named X.o and produce a DPS 6 C Language header file named
X.h.
-dps6_format
causes the DPS 6 object file produced because of the -asm,
-ada_sil, or -C control argument to be generated in 'native'
format. A native format DPS 6 object file may be transmitted
to a DPS 6 running the Mod 400 Operating System via a
network_request l6_ftf command specifying data_type binary.
No intermediate format conversion step is required before
transmitting native format files. Native format is the
default format for DPS 6 object files. This control argument
is meaningless if none of the control arguments -asm,
-ada_sil, or -C are specified.
-gmap
produces the table as a gmap segment named X.gmap and a GCOS
III PL/I include file named X.incl.pl1.
-hrs_format
causes the DPS 6 object file produced because of the -asm,
-ada_sil, or -C control argument to be generated in the
Multics Host Resident System format. This is the format
required by the various HRS tools. This control argument is
meaningless if none of the control arguments -asm, -ada_sil,
or -C are specified.
-no_ada_sil
does not produce the table in the form described above for
the -ada_sil control argument.
-no_asm
does not produce the table in the form described above for
the -asm control argument.
Multics Technical Bulletin MTB-710
The LALR System
-no_c, -no_C, -noc, -noC
does not produce the table in the form described above for
the -c control argument.
-no_gmap
does not produce the table in the form described above for
the -gmap control argument.
-origin N, -org N
specifies the lower bound, N, to be used with the arrays
generated for DPS 6 format tables. N must be 0 or 1. The
default is 0 if the -c control is present otherwise it is 1.
For the DPDA, the final state (state zero) is materialized
when the origin is zero otherwise it is a fictitious state.
For the skip table, a dummy row zero is generated when the
origin is zero. For the effect of this control on the
terminals list structures, see the language specific
discussions under the lalr command.
Notes:
As used above, X is the name given, or assumed, for the table.
Options -alm, -gmap and one of -asm, -ada_sil, or -c may occur
together. Options -asm, -ada_sil, and -c are mutually
exclusive. If none of the control arguments -alm, -gmap, -asm,
-ada_sil, or -c are present, the table is produced as a Multics
object segment named X and a Multics PL/I include file name
X.incl.pl1.
The -terminals_hash_list control argument is treated as if it
were the -terminals_list control argument when producing a DPS 6
(Level 6) object file. The -synonyms control argument is
meaningless when producing a DPS 6 object file with the -asm
control argument. The -production_names and -variables_list
control arguments are ignored when producing a DPS 6 object
file. The DPS 6 object file is produced in LAF mode.
MTB-710 Multics Technical Bulletin
The LALR System
6.3. The 'dps6_dpda' Command
This is an additional command.
------------------
dps6_dpda, l6_dpda
------------------
SYNTAX: dps6_dpda result_file_path {object_file_path}
{control_args}
FUNCTION: produces a DPS 6 Multics Host Resident System object
file or a DPS 6 native object file containing the DPDA
extracted from the result file of a previous LALR generation.
This object file is the same as the one produced by the lalr
command when it is invoked with the -table control argument
and either the -asm, -ada_sil, or -c control argument.
ARGUMENTS:
result_file_path
is the pathname of the result file from a previous LALR
generation from which the DPDA is to be extracted. If
result_file_path does not have a suffix of grammar, one is
assumed. However, the suffix grammar must be the last
component of the name of the result segment to be used. This
argument may be an archive component pathname.
object_file_path
is the pathname of the object file to be produced. If
object_file_path does not have a suffix of object, one is
assumed. The default is "=.object".
CONTROL ARGUMENTS:
-ada_sil
produces an Ada/SIL package specification describing the
external variables defined in the object file. This package
specification is stored in the same directory as the object
file. Its entryname is obtained by changing the object
suffix of the object file to spec.ada.
Multics Technical Bulletin MTB-710
The LALR System
-asm
produces a DPS 6 Assembly Language include file describing the
external variables defined in the object file. This include
file is stored in the same directory as the object file. Its
entryname is obtained by changed the object suffix of the
object file to incl.nml.
-c, -C
produces a DPS 6 C Language header file describing the
external variables and functions defined in the object file.
This header file is stored in the same directory as the
object file. Its entryname is obtained by changing the
object suffix of the object file to h.
-dps6_format
causes the DPS 6 object file to be generated in 'native'
format. A native format DPS 6 object file may be transmitted
to a DPS 6 running the Mod 400 Operating System via a
network_request l6_ftf command specifying data_type binary.
No intermediate format conversion step is required before
transmitting native format files. Native format is the
default format for DPS 6 object files.
-hrs_format
causes the DPS 6 object file to be generated in the Multics
Host Resident System format. This is the format required by
the various HRS tools.
-no_ada_sil
does not produce the table in the form described above for
the -ada_sil control argument.
-no_asm
does not produce the table in the form described above for
the -asm control argument.
-no_c, -no_C, -noc, -noC
does not produce the table in the form described above for
the -c control argument.
MTB-710 Multics Technical Bulletin
The LALR System
-no_terminals_list, -ntl
does not include the terminals list (TL and TC) in the table.
(Default)
-origin N, -org N
specifies the lower bound, N, to be used with the various
arrays generated for parse tables. N must be 0 or 1. The
default is 0 if the -c control is present otherwise it is 1.
For the DPDA, the final state (state zero) is materialized
when the origin is zero otherwise it is a fictitious state.
For the skip table, a dummy row zero is generated when the
origin is zero. For the effect of this control on the
terminals list structures, see the language specific
discussions under the lalr command.
-synonyms, -syn
includes the terminal encoding as a field in the terminals
list instead of using the index to the terminals list as the
encoded value. This option is forced if the grammar
contains a -synonyms control. The -synonyms control
argument is meaningless unless the -terminals_list control
argument is also specified.
-terminals_list, -tl
includes the terminals list in the object file.
Notes:
The object file is produced in LAF mode.
The control arguments -asm, -ada_sil, -c are mutually exclusive.
If none are specified, -asm is assumed.
Multics Technical Bulletin MTB-710
The LALR System
7. Parse Tables Produced
7.1. GMAP Source Segment Parse Tables
The gmap source segment produced by the -gmap control argument is
equivalent to the data described by the following PL/I
declarations. The generated include file X.incl.pl1 contains a
copy of these declarations (unless the -alm control argument is
also in effect). When a separate semantics format source segment
is used, the gmap source segment also contains a transfer vector
with the external name SEMVEC. This vector is used by the parser
to call the various semantic actions. The rule number, or
production number if the -production control is in effect, must
be passed as the n-th argument, where n is the value specified by
the -separate_semantics control argument, in the call to the
transfer vector. Any additional arguments desired may be passed.
The generated include file does not describe the transfer vector.
dcl 1 THL (0:xx) bit (12) unaligned external static;
dcl 1 TL (xx) external static,
2 lk fixed bin (17) unaligned,
2 pt fixed bin (17) unaligned,
2 ln fixed bin (17) unaligned,
2 cd fixed bin (17) unaligned;
dcl TC char (xx) external static;
dcl 1 DPDA (xx) external static,
2 v1 fixed bin (17) unaligned,
2 v2 fixed bin (17) unaligned,
dcl 1 SKIP (xx) external static),
2 v1 fixed bin (17) unaligned,
2 v2 fixed bin (17) unaligned;
dcl PN fixed bin (17) unaligned external static;
dcl 1 VL (xx) external static,
2 pt fixed bin (17) unaligned,
2 ln fixed bin (17) unaligned;
dcl VC char (xx) external static;
binary(THL(i), 12, 0) is the TL index of the first terminal
symbol whose hash value is i. The function lalr_hash_ (contained
in the include file lalr_hash_.incl.pl1), when invoked by
lalr_hash_ (T, dim (THL, 1)), returns the hash value of the
character string T. The THL structure is generated only when the
-terminals_hash_list control is in effect.
MTB-710 Multics Technical Bulletin
The LALR System
The format shown above is generated when both the
-terminals_hash_list and -terminals_list controls are in effect
and synonyms have been defined. TL(i).lk is the TL index of the
next terminal symbol having the same hash value as the i-th
terminal symbol. substr (TC, TL(i).pt, TL(i).ln) is the
normalized spelling of the i-th terminal symbol. And finally,
TL(i).cd is the encoded value of the i-th terminal symbol.
If the -terminals_hash_list and -terminals_list controls are both
in effect but no synonyms are defined, the following structure is
generated for the terminals list instead of the one shown above.
When this structure is used, the encoded value of the i-th
terminal symbol is i.
dcl 1 TL external static,
2 lk fixed bin (10) unaligned,
2 pt fixed bin (13) unaligned,
2 ln fixed bin (10) unaligned;
If the -terminals_hash_list control is not in effect but the
-terminals_list control is in effect and synonyms are defined,
the following structure is generated for the terminals list
instead of one of those shown above.
dcl 1 TL external static,
2 pt fixed bin (13) unaligned,
2 ln fixed bin (10) unaligned,
2 cd fixed bin (10) unaligned;
If the -terminals_hash_list control is not in effect but the
-terminals_list control is in effect and no synonyms are defined,
the following structure is generated for the terminals list
instead of any of those shown above.
dcl 1 TL external static,
2 pt fixed bin (17) unaligned,
2 ln fixed bin (17) unaligned;
If the -terminals_hash_list control is not in effect, the THL
structure is omitted. If neither the -terminals_hash_list nor
the -terminals_list control is in effect, THL, TL, and TC are all
omitted.
DPDA and SKIP are the Deterministic Push Down Automata
implementing the parsing algorithm and its associated error
recovery tables. The DPDA and SKIP structures are always
generated.
Multics Technical Bulletin MTB-710
The LALR System
PN is the production names list. PN(i) is the negation of the VL
index for the variable (non-terminal) naming the i-th production
(or the i-th rule if the -rule_only control is in effect). If
the -production_names control is not in effect, the PN structure
is not generated.
Vl is the variables list. substr (VC, VL(i).pt, VL(i).ln) is the
normalized spelling of the i-th variable. If neither the
-production_names control nor the -variables_list control is in
effect, PN, VL, and VC are all omitted.
MTB-710 Multics Technical Bulletin
The LALR System
7.2. DPS 6 Parse Tables Object Files
The -terminals_hash_list control argument is treated as if it
were the -terminals_list control argument when producing a DPS 6
object file. The -production_names and -variables_list control
arguments are ignored when producing a DPS 6 object file. The
DPS 6 object file is produced in LAF mode.
In the following discussion of DPS 6 Parse Table formats, the
symbols N, R, S, T, U and V are used as extent expressions. In
the generated data and its declarations, they are replaced by the
appropriate constants. N is the lower bound specified by the
-origin control or implied by the language intended for use with
the parse tables U and V are the upper bounds of the DPDA and
skip recovery tables, respectively. R, S, and T are used as
upper bounds of the various terminals list tables.
If N is zero, the final state (state 0) is materialized in the
DPDA; otherwise it is a fictitious state. A dummy row zero is
generated in the skip table when N is zero. See the language
specific discussions below for the effect of N on the terminals
list tables.
Multics Technical Bulletin MTB-710
The LALR System
7.2.1. DPS 6 Files for Assembly Language Use
The DPS 6 object file produced by the -asm control argument is
equivalent to the data described by the PL/I declarations below.
When a separate semantics format source segment is used, the
object file also contains a transfer vector with the external
name SEMVEC. The rule number, or production number if the
-production control is in effect, must be passed to the transfer
vector by value in register R1. The transfer vector's code
destroys registers R1 and B4; all other registers are unchanged.
dcl OP1C_n fixed binary (15) internal static
options (constant) initial (R);
dcl OP2C_n fixed binary (15) internal static
options (constant) initial (S);
dcl RSWD_n fixed binary (15) internal static
options (constant) initial (T);
dcl LIT_c fixed binary (15) internal static
options (constant) initial (xx);
dcl INT_c fixed binary (15) internal static
options (constant) initial (xx);
dcl LINT_c fixed binary (15) internal static
options (constant) initial (xx);
dcl NUMB_c fixed binary (15) internal static
options (constant) initial (xx);
dcl REAL_c fixed binary (15) internal static
options (constant) initial (xx);
dcl SYMB_c fixed binary (15) internal static
options (constant) initial (xx);
dcl EOL_c fixed binary (15) internal static
options (constant) initial (xx);
dcl HEXI_c fixed binary (15) internal static
options (constant) initial (xx);
dcl BIT_c fixed binary (15) internal static
options (constant) initial (xx);
dcl NIL_c fixed binary (15) internal static
options (constant) initial (xx);
dcl OP1C_s (N:R) char (1) external static
initial ("x", "x", ... );
dcl OP2C_s (N:S) char (2) external static
initial ("xx", "xx", ... );
dcl 1 RSWD (N:T) aligned external static,
2 RSWD_s char (xx) initial ("xx", "xx", ... ),
2 RSWD_c fixed bin (15) initial (xx, xx, ... );
dcl DPDA_n fixed binary (15) internal static
options (constant) initial (U);
dcl SKIP_n fixed binary (15) internal static
options (constant) initial (V);
MTB-710 Multics Technical Bulletin
The LALR System
dcl 1 DPDA (N:U) external static,
2 v1 fixed binary (15) initial (xx, xx, ... ),
2 v2 fixed binary (15) initial (xx, xx, ... );
dcl 1 SKIP (N:V) external static,
2 v1 fixed binary (15) initial (xx, xx, ... ),
2 v2 fixed binary (15) initial (xx, xx, ... );
The data with internal static options (constant) attributes are
generated as "external value definitions" in the DPS 6 object
file. The data with external static attributes are generated as
"code section" constants with "external location definitions".
OP1C_n and OP1C_s are the index of the last one character
operator (e.g. +) and the one character operators themselves,
respectively. OP2C_n and OP2C_s are the index of the last two
character operator (e.g. >=) and the two character operators
themselves, respectively.
LIT_c is the code for the nonnumeric literal complicated termi-
nal. This terminal may be specified as <character string>,
<string>, <quoted string>, or <nonnumeric literal>.
INT_c is the code for the integer literal complicated terminal.
This terminal may be specified as <integer>.
LINT_c is the code for the long integer complicated terminal, it
may be specified as <long integer>.
NUMB_c is the code for the fixed-point literal complicated
terminal. This terminal may be specified as <number> or
<fixed-point literal>.
REAL_c is the code for the floating-point literal complicated
terminal. This terminal may be specified as <real> or
<floating-point literal>.
SYMB_c is the code for the identifier complicated terminal. This
terminal may be specified as <identifier> or <symbol>.
EOL_c is the code for the end of line complicated terminal. This
terminal may be specified as <eol>, <end of line>, <nl>, or
<newline>.
HEXI_c is the code for the hexadecimal integer literal
complicated terminal. This terminal may be specified as
<hexadecimal integer> or <hex integer>.
BIT_c is the code for the bit string literal complicated
terminal. This terminal may be specified as <bit string> or
<boolean aggregate>.
Multics Technical Bulletin MTB-710
The LALR System
NIL_c is the code for the nil symbol terminal. This terminal may
be specified as <nil> or <syntax error>
For any of the above mentioned complicated terminals not used in
the grammar, a code of zero is used.
If a complicated terminal not listed above is encountered, an
external value definition is generated for it. The symbol so
defined is obtained by removing the enclosing angle brackets from
the complicated terminal. If the resultant symbol is fewer than
five characters in length, it is further modified by appending
"_c".
RSWD_n, RSWD_k, and RSWD are the index of the last reserved word,
the length of each reserved word, and the reserved words
themselves, respectively. All terminal symbols which are not
complicated terminals and are not one or two character operators
as defined above are considered reserved words. In RSWD (i),
RSWD_s is the i-th reserved word padded with spaces and RSWD_c is
the encoding for that reserved word. DPDA_n and DPDA are the
index of the last DPDA entry and the DPDA itself, respectively.
SKIP_n and SKIP are the index of the last SKIP table entry and
the skip tables themselves, respectively.
If the -terminals_list control is not in effect, only the
declaration of DPDA_n, SKIP_n, DPDA and SKIP are generated.
MTB-710 Multics Technical Bulletin
The LALR System
The text of a generated assembly language include file is shown
below.
*
* SCANNER AND PARSER TABLES FROM SEGMENT
* >user_dir_dir>SLANG>LANGUAGE>adasil_rel_0.grammar
* Generated by: Lo.SLANG.a using LALR 7.3
* of Tuesday, December 6, 1983
* Generated at: TCO 68/80 Multics Billerica, Ma.
* Generated on: 12/14/83 1453.2 est Wed
* Generated from: >udd>slang>LANGUAGE>adasil_rel_0.lrk
* >udd>slang>Lo>ada_decl_part.incl.lrk
* >udd>slang>include>ada_statements.incl.lrk
*
xval OP1C_n Index of last one character operator
xval OP2C_n Index of last two character operator
xval RSWD_n Index of last reserved word
xval RSWD_k Length of longest reserved word
*
xval LIT_c Code for nonnumeric literal
xval INT_c Code for integer literal
xval LINT_c No complicated terminal for long integer literal
xval NUMB_c No complicated terminal for fixed-point literal
xval REAL_c No complicated terminal for floating-point literal
xval SYMB_c Code for identifier
xval EOL_c No complicated terminal for end-of-line terminal
xval HEXI_c No complicated terminal for hexadecimal literal
xval BIT_c Code for bit string literal
xval NIL_c Code for nil (syntax error)
xval EE_c No complicated terminal for example element
*
xval DPDA_n Index of last DPDA row
xval SKIP_n Index of last SKIP row
*
xloc OP1C_s The one character operators (2 per word)
xloc OP1C_c The corresponding codes (1 per word)
xloc OP2C_s The two character operators (1 per word)
xloc OP2C_c The corresponding codes (1 per word)
xloc RSWD The reserved word table
*
xloc DPDA The DPDA table
xloc SKIP The SKIP table
*
Multics Technical Bulletin MTB-710
The LALR System
7.2.2. DPS 6 Files for Ada/SIL Use
The DPS 6 file produced by the -ada_sil control argument
is equivalent to the data described by the PL/I declarations
below. When a separate semantics format source segment is used,
the object file also contains a transfer vector with the external
name SEMVEC. The rule number, or production number if the
-production control is in effect, must be passed to the transfer
vector by value in register R1. The transfer vector's code
destroys registers R1 and B4, all other registers are unchanged.
dcl 1 Terminal aligned based,
2 position fixed binary (15),
2 length fixed binary (15),
2 code fixed binary (15);
dcl 1 T_List (N:R) aligned like Terminal external static;
dcl T_Char char (S) external static init ("xxx ... ");
dcl DPDAv1 (N:U) fixed binary (15) external static
initial (xx, xx, ... );
dcl DPDAv2 (N:U) fixed binary (15) external static
initial (xx, xx, ... );
dcl SKIPv1 (N:V) fixed binary (15) external static
initial (xx, xx, ... );
dcl SKIPv2 (N:V) fixed binary (15) external static
initial (xx, xx, ... );
All of the above external static variables are generated as "code
section" constants to allow them to be shared constants. Because
of this, this object file must be linked (with a LINKN linker
directive) before the object file for any Ada/SIL compilation
unit using the generated package specification.
As used in the above declarations, R is the index of the last
terminal (including complicated terminals) and S is the length of
the T_Char variable. The based variable Terminal describes a
single entry in the terminal list array T_List. The i-th
terminal is substring (T_Char, T_List.position (i), T_List.length
(i)). If the grammar uses synonyms, T_List.code (i) gives the
code for the i-th terminal. Otherwise, the code component is
omitted from the Terminal structure and the code for the i-th
terminal is i. In this case, if N is zero, a dummy row zero with
T_List.position (0) = 1 and T_List.length (0) = 0 is generated.
U and V specify the index of the last entry in the DPDA and SKIP
tables, respectively. DPDAv1 and DPDAv2 are the two columns of
the DPDA. Similarly, SKIPv1 and SKIPv2 are the two columns of
the SKIP tables.
MTB-710 Multics Technical Bulletin
The LALR System
If the -terminals_list control is not in effect, Terminal,
T_List, and T_Char are not generated.
The text of a generated Ada/SIL package specification is shown
below.
-- SCANNER AND PARSER TABLES FROM SEGMENT
-- >user_dir_dir>SLANG>LANGUAGE>adasil_rel_0.grammar
-- Generated by: Lo.SLANG.a using LALR 7.3
-- of Tuesday, December 6, 1983
-- Generated at: TCO 68/80 Multics Billerica, Ma.
-- Generated on: 12/14/83 1453.2 est Wed
-- Generated from: >udd>slang>LANGUAGE>adasil_rel_0.lrk
-- >udd>slang>Lo>ada_decl_part.incl.lrk
-- >udd>slang>include>ada_statements.incl.lrk
package adasil_rel_0_t is
subtype TL_index is Integer range 1..125;
subtype TC_index is Integer range 1..838;
subtype DPDA_index is Integer range 1..3591;
subtype SKIP_index is Integer range 1..77;
type Terminal is record
position: TC_index; -- index into T_Char.
length: Positive; -- length of terminal.
code: Integer; -- code for terminal.
end record;
T_list: array (TL_index) of Terminal;
T_Char: string (TC_index);
DPDAv1: array (DPDA_index) of Integer;
DPDAv2: array (DPDA_index) of Integer;
SKIPv1: array (SKIP_index) of Integer;
SKIPv2: array (SKIP_index) of Integer;
end adasil_rel_0_t;
Multics Technical Bulletin MTB-710
The LALR System
7.2.3. DPS 6 Files for C Use
The DPS 6 object file produced by the -c control argument is
equivalent to the data described by the PL/I declarations below.
When a separate semantics format source segment is used, the
object file also contains a transfer vector with the external
name SEMVEC. The rule number, or production number if the
-production control is in effect, must be passed as the first
argument in the call to the transfer vector. The transfer vector
assumes B4 is the argument list pointer. It destroys B1 and R7;
all other registers are unchanged.
dcl gtoptb entry returns (pointer);
dcl gtrwtb entry returns (pointer);
dcl opmc (N:R) unaligned unsigned fixed bin (8);
dcl rswd_t (N:S) fixed bin (15);
dcl rswd_s (N:T) unaligned unsigned fixed bin (8);
dcl gtdpda entry returns (pointer);
dcl gtskip entry returns (pointer);
dcl dpda (N:U, N:N+1) fixed bin (15);
dcl skip (N:V, N:N+1) fixed bin (15);
All of the above external static variables are generated as "code
section" constants to allow them to be shared constants. The
external functions gtoptb, gtrwtb, gtdpda and gtskip return
pointers to opmc, rswd_t, dpda and skip respectively.
opmc defines all of the terminal symbols that do not consist
entirely of letters, digits, dollar signs and underscores. These
terminal symbols are ordered by decreasing length. Each is
stored as a byte containing the symbol's encoded value followed
by a NUL terminated string giving the symbol's spelling. This
list of terminal symbols is terminated by a byte containing the
value -1.
rswd_t and rswd_s define the remaining terminal symbols
(excluding the complicated terminal symbols). rswd_t (N) con-
tains hbound (rswd_t, 1); i.e. the index of the last symbol
defined. rswd_t (i), for N < i <= rswd_t (N), gives the index
into rswd_s to the definition of a terminal symbol. The entries
in rswd_t are ordered so as to permit a binary search. Each
symbol defined in rswd_s is stored as a byte containing the
symbol's encoded value followed by a NUL terminated string giving
the symbols's spelling.
MTB-710 Multics Technical Bulletin
The LALR System
If a terminal symbol which would normally be defined by rswd_t
and rswd_s is found to be the same as an initial substring of a
terminal defined by opmc, it is placed in opmc instead of the
rswd arrays.
If the -terminals_list control is not in effect, only the
declarations of gtdpda, gtskip, dpda, and skip are generated. If
the -terminals_list control is in effect, a series of #define
preprocessor statements is also generated to name the encoded
value of the various complicated terminals. The names are chosen
as described above for the DPS 6 Assembly Language format parse
tables.
Multics Technical Bulletin MTB-710
The LALR System
The text of a generated C header file is shown below.
/*
SCANNER AND PARSER TABLES FROM SEGMENT
>user_dir_dir>SLANG>LANGUAGE>adasil_rel_0.grammar
Generated by: Lo.SLANG.a using LALR 7.3
of Tuesday, December 6, 1983
Generated at: TCO 68/80 Multics Billerica, Ma.
Generated on: 12/14/83 1453.2 est Wed
Generated from: >udd>slang>LANGUAGE>adasil_rel_0.lrk
>udd>slang>Lo>ada_decl_part.incl.lrk
>udd>slang>include>ada_statements.incl.lrk
*/
char (*gtoptb ()) [];
int (*gtrwtb ()) [];
#define OPmC_n 41 /* Index of last m character operator */
#define RSWD_n 57 /* Index of last reserved word */
#define LIT_c 107 /* Code for nonnumeric literal */
#define INT_c 108 /* Code for integer literal */
#define LINT_c 0 /* No complicated terminal for
long integer literal */
#define NUMB_c 0 /* No complicated terminal for
fixed-point literal */
#define REAL_c 0 /* No complicated terminal for
floating-point literal */
#define SYMB_c 88 /* Code for identifier */
#define EOL_c 0 /* No complicated terminal for
end-of-line terminal */
#define HEXI_c 0 /* No complicated terminal for
hexadecimal literal */
#define BIT_c 105 /* Code for bit string literal */
#define NIL_c 103 /* Code for nil (syntax error) */
#define EE_c 0 /* No complicated terminal for
example element */
#define DPDA_n 3591 /* Index of last DPDA row */
#define SKIP_n 77 /* Index of last SKIP row */
int (*gtdpda ()) [];
int (*gtskip ()) [];