MULTICS TECHNICAL BULLETIN MTB-621
To: MTB Distribution
From: Roger D. Lackey,
L. A. Dupuis,
Ronald B. Harvey,
Ronald A. Barstad,
Noah S. Davids,
Michael S. Kubicar,
David J. Schimke,
Craig L. Senft,
Paul W. Benjamin
Date: 05/11/83
Subject: An Architecture for End User Data Access
Forum Meeting: >udd>Demo>dbm_test>con>End_User_Data_Access (euda)
ABSTRACT
This paper proposes a long range plan for the implementation of a
facility that will provide an end-user interface to various types
of files, including but not restricted to, MRDS databases, Lister
databases, ASCII text files, and structured files. Major aspects
are:
o A new End User Facility (EUF) to replace LINUS.
o Retrievals across databases and across database
types.
o A selection expression parser that uses the SQL
selection language, and produces Canonical Query
Language (CQL).
o A single uniform subroutine interface to MRDS and
Lister databases, replacing dsl_ and lister_.
_________________________________________________________________
Multics Project internal working documentation. Not to be
reproduced or distributed outside the Multics Project.
MTB-621 End User Data Access
LINUS
Starting from the top, let's look at LINUS. What does it do?
And what are the alternatives?
It processes requests, or at least used to do so. The subsystem
utilities now do that for the most part.
It offers a macro facility via the invoke request. The exec_com
feature supplied by ssu_ does a much better job of that.
It offers LILA. What does LILA do? It performs very crude
editing functions and parses selection expressions. There is a
proposal for a common parser that would support the MRDS, LINUS
and SQL selection languages. Editors such as ted or qedx do a
much better job of editing than LILA could hope to accomplish.
LINUS produces reports, sort of. There is an interface to MRPG,
but MRPG is generally recognized not to be a good interface for
end users. There is a method of creating Lister files, but the
user is left with escaping from LINUS and using Lister commands
directly to produce reports. There is the LINUS print request,
which merely displays the data in an inflexible format. And then
there is the new LINUS report writer, which is much better than
any of these, and fortunately modularized so that it could be
extricated from LINUS and stand on its own.
And finally, LINUS accesses MRDS databases. In some cases, it
communicates properly with the appropriate MRDS subroutine
entrypoints. In others, it does the inexcusable and looks at
MRDS files and structures directly.
The New EUF
We propose a new LINUS, similar in task, simpler in design, and
broader in scope. The standard subsystem tasks would be handled
by ssu_, the selection expression editing by the user's editor of
choice, the selection expression parsing by a common selection
expression parser, report generation by the (formerly LINUS)
report writer and file access by something we will call the View
Manager. Precisely what that EUF would look like will be the
topic of another paper. There are various approaches that could
be taken. One is to have it essentially similar to LINUS.
Another is to mimic IBM's ISQL interface (or MEDC's IQ which is
similar). That point need not be worked out at this stage of the
discussion.
End User Data Access MTB-621
View Manager
At the center of this proposal is something called the View
Manager (VM). VM will act as a traffic cop of sorts for database
accesses. It will have entrypoints such as open, close, get,
put, get_description, get_population, etc. These entrypoints are
named for sake of argument, not as an actual proposal, the actual
implementation should include the full repertoire of database
functions.
VM's task includes determining the type (MRDS, Lister, etc.) of
the database in question and routing the request to the
appropriate database manager.
To simplify VM's task, those routines that want to be supported
by VM will provide a uniform set of entrypoints that is a subset
of the VM entrypoints. This subset will consist of primitives
for database manipulation. For example, VM will have an "open"
entrypoint, as will MRDS, Lister, et al. More exotic
entrypoints, however, such as one that would parse a selection
expression and retrieve a tuple in one call, would be provided
only in VM. There are 2 reasons for this. It isolates as much
functionality as possible in VM, making it unnecessary to add new
features in more than one place. It also makes VM more palatable
as the "approved" interface. The dsl_ interface to MRDS would
continue to be supported for some period of releases, but no work
beyond bugfixes would be applied to them. The direct subroutine
interfaces to MRDS, Lister, etc. would be "internal interfaces".
The VM interface would be the approved, recommended, documented
interface. This is not to say, however, that the Lister
commands, for example, would not continute to be supported.
So, in the simple case, the user asks the EUF to open a MRDS
database, EUF calls, for example, VM$open who sees that the
object in question is MRDSish and in turn calls mrds_$open.
Whenever the task involves only one database and involves nothing
more than the primitive operation, VM acts as a transfer vector
of sorts. Note however, that EUF cares not what type of object
is being manipulated. It can be MRDS, Lister, whatever. EUF
handles it in the same fashion. So, with this simple approach,
the EUF is made much more powerful.
Now, consider the situation where the user has opened more than
one database and requests a retrieval whose selection criterion
involves more than one database. The parser translates the
expression to Canonical Query Language (discussed in more detail
later in this paper) which is returned to the EUF. EUF then
calls VM$get (or whatever) with this CQL. VM determines that the
selection involves more than one database and breaks the CQL down
into appropriate parts. Retrievals are made from both databases,
comparisons are made by VM, and the requested data is returned to
the user.
MTB-621 End User Data Access
Canonical Query Language
A selection expression parser reads user-supplied selection
expressions, e.g. "select foo from bar" and translates it into a
form suitable for processing by the database manager's selection
mechanism. Currently on Multics we have such parsers for MRDS,
LINUS and Lister (plus tr_query but this proposal does not speak
to that). Each accepts a different selection language as input
and creates a different internal representation as output. This
paper proposes that the various parsers produce the same internal
representation as output. This internal representation is known
as Canonical Query Language or CQL.
There are several relational or quasi-relational database
packages under development for various Honeywell operating
systems. Work is underway on DPS6 in Billerica, DPS7 in France,
GCOS 3/8 at the Multi-Environment Development Center (MEDC) in
Phoenix, CP-6 in Los Angeles and at CCSC in Minneapolis. There
has been a fair amount of contact between these developers in
hopes of establishing a Canonical Query Language. The idea was
that any user-visible selection language should/would be reduced
to an internal representation that would be common to all. The
MRDS group in Phoenix was not directly involved in these
discussions. The effort to establish a CQL that spanned product
lines was singularly unsuccessful. One group was not far enough
along in their work to agree to the proposal of another while a
third could not wait for resolution, having to get a product out
the door. While there exists no agreement, there still appears
to be a sincere desire to conform to such a standard should it be
established. The MEDC group in Phoenix, in developing RAM for
GCOS, was unable to wait and established their own protocol. RAM
has been announced and was demonstrated at the Toronto HLSUA
(4/83).
The original intent of this proposal was to attempt to conform to
the Honeywell standard. Since their is no such standard, the
proposal is now to look at the RAM CQL and attempt to use that.
Initial contact with MEDC has indicated a willingness to make
some modifications to their CQL if it was necessary to do so for
MRDS. This CQL could well provide a basis for the hoped-for
company-wide standard.
Company-wide standardization is a lofty ideal and one that should
not get lost in this process. However, if we were unable to come
to an agreement even with the GCOSians, a Multics-only
implementation would still be a win. It would make it easier to
implement new end-user interfaces such as the Query By Example
package that is wanted so badly by Marketing. The Multics
implementation should be structured in a fashion that would allow
modifications if and when a Honeywell standard were chosen.
End User Data Access MTB-621
Parsers
Regardless of what CQL is chosen (even if Multics rolls its own),
there would be a new parser. The language that this parser would
process would be essentially the ISQL language, but the parser
would be equipped to handle both the existing LINUS and MRDS
languages. Our documentation would be upgraded to describe only
the ISQL dialect, as that is rapidly becoming the industry
standard.
This proposal is not new. Jim Gray wrote a paper in 1981
describing it and at least one BNF grammar exists. It would be
implemented using LALR.
The parser would serve 3 purposes:
1) It would be used by the new EUF to process selection
expressions. Since the proposal calls for MRDS and
Lister to be supported under EUF/VM, both MRDS and
Lister would have to be taught to accept CQL. This
would mean that the user would use the ISQL language
for Lister selections within the EUF. Since the
existing Lister command interface would have to be
retained, at least for a period of time, it might
also be necessary to alter lister_compile_select_ to
produce CQL. If there was strong sentiment for the
Lister commands to continue indefinitely, it may be
useful to have them optionally accept either the
existing selection syntax or the new syntax.
2) It would be used by VM itself to process selection
expressions. Currently, the user can call
dsl_$retrieve to get a tuple, passing in an unparsed
selection expression. Although the EUF would not
use the capability, such functionality must be
provided to the user. Therefore, for each function
in VM that required selection criterion (retrieve,
modify and delete operations, whatever they are
called) there would be 2 entrypoints, one that took
CQL as input and one that took an unparsed selection
expression. In the latter case, VM would have to
call the parser.
3) The parser would be available as a system subroutine
for anyone wishing to write an application that
accepted the standard syntax as input and desired
CQL.
MTB-621 End User Data Access
The Database Managers
MRDS and Lister have been used as examples of facilities that
would supply the appropriate entrypoints for access by VM. The
conversion of these modules to provide the entrypoints is a
mandatory part of this proposal. MRDS has a myriad of subroutine
entries, but unless the dsl_ entries are used as a model for the
definition of the VM entries (not likely), new entrypoints will
have to be provided. Lister, for all practical purposes, has no
subroutine interface. You can't do a whole lot with lister_. It
exists only for the purposes of the LINUS create_list request.
Two more examples of system routines for file handling are for
ASCII files and structured files. A manager (with appropriate
entrypoints) could easily be written to allow support of ASCII
files under VM, perhaps having a file defined like card images
(the first 2 positions are field A, the next 7 field B, etc.), or
perhaps using delimiters a la the LINUS write and store requests.
A structured file would be one not limited to ASCII data which
could be described by a PL/1 structure declaration.
The potential for support of other objects is intriguing. An
interface to RDMS or JANUS would be feasible. Anybody who can
provide the appropriate entrypoints can conceivably have their
package and database type supported under VM. Consider a
subsystem that communicated with the Intel database machine. So
long as the interface is correct, it works.
Questions Needing Answers
This paper does not purport to have all of the details worked out
as to how such an architecture would be implemented. We are
aware of several things that need to be worked out. They are
presented here to stimulate discussion and will hopefully be
resolved in future MTBs:
1) When the users have more than one database open and are doing
cross-database activities, what they are conceptually
accessing is a "virtual database" or "view". How should this
view be defined? How should it be established? There are
potential naming conflicts within a view (relations/files
and/or attributes/fields with the same name). How should
these conflicts be resolved?
One approach (best we've come up with this far) is to have a
view identified by some sort of descriptor. This descriptor
could be a separate segment ("open foo.view"). It could be an
entry in a data dictionary (the authors shudder to use that
term since it can mean so many things). Or perhaps a data
dictionary could be just an archive of .view segments. Taking
the single segment approach for sake of argument, it could
created by a create_view command that took as input ASCII
End User Data Access MTB-621
source and "compiled" the view. Any name conflicts would be
flagged as errors. There could also be a command to recreate
the ASCII source from the compiled version. Within the EUF,
perhaps the user could open several separate databases and
then issue the "establish_view" request. That request would
create a view, prompting the user for resolution of naming
conflicts, and then open the view. Another useful request
would be "save_view" request where this dynamically created
view could be preserved in a .view segment (or data dictionary
or whatever we decide on).
2) Let's start thinking of any opening (single database, for
example) as a view. The user can have more than one view open
at a time. How will the view to be acted upon be identified
by the user in a request. Database indices (view indices?)
are not to thrilling from a Schneidermanian viewpoint. How
about reference names of sorts, i.e. "open foo" opens foo and
then the user refers to it as foo in subsequent requests,
whereas "open foo bar" opens foo which is refered to as bar.
Is it cumbersome to have to specify the view name in all
requests? Probably. How about a use request? That speaks to
the EUF, how about the subroutine interface to VM? Should we
use view indices, a la MRDS?
3) As implied earlier, we have no idea what the EUF will look
like. Should it look much like LINUS (and Multics, for that
matter) and have ctl_args, standard request names, etc., for
compatibility with other Multics subsystems or should it
follow a different drummer? A case could certainly be made by
Marketing for it to look like ISQL. When the LINUS Report
Writer design review was held, the consensus was that it
should use ctl_args where the original design said no. Does
that mean that the same logic calls for this new EUF to look
like the rest of Multics? Probably not. A good case for
ctl_args in LRW was that it was part of a subsystem that
already used clt_args. If an ISQL-like or other human factors
oriented interface was chosen, it would only need to be
internally consistent with regard conventions. So, what it
probably boils down to is who the audience for this EUF will
be. Will it be primarily for systems programmers? Engineers?
Clerical people? Seasoned Multicians? And so on. Should it
be a video and/or menu interface? If is not an interface that
requires intelligent video terminals, should it have such
things built into it as options or alternate approaches or
should a video/menu interface be a separate EUF?