Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
To: Distribution
From: Jeffrey D. Ives
Date: 02/15/84
Subject: Data Management: File Manager Design.
1 ABSTRACT
Data-management files are designed to protect data better than
segments or vfile_ can. The file manager stores data in the
pages of multi-segment files. Before accessing a page, it calls
the lock manager to lock the page against concurrent access.
Before modifying a page, it calls the before-journal manager to
journalize a before-image. When a transaction commits, it
flushes all pages remaining in main-memory to mass-storage. It
permits data-management files to be treated as Multics
file-system extended-objects.
Comments invited
via Multics Mail:
Pierret.Multics on either MIT Multics or System M.
via US Mail:
Matthew C. Pierret
Honeywell Information Systems, inc.
4 Cambridge Center - 9th Floor
Cambridge, Massachusetts 02142
via telephone:
(HVN) 261-9338, or
(617) 492-9338
________________________________________
Multics project internal working documentation. Not to be
reproduced or distributed outside the Multics project without the
consent of the author or the author's management.
CONTENTS
Page
1 ABSTRACT . . . . . . . . . . . . . . i
2 INTRODUCTION . . . . . . . . . . . . 1
2.1 pathnames, UIDs, and OIDs . . . 1
2.2 control-intervals . . . . . . . 1
2.3 protection and transactions . . 1
2.4 locks . . . . . . . . . . . . . 2
2.5 before-journals . . . . . . . . 2
2.6 extended-objects . . . . . . . 2
2.7 related documents . . . . . . . 2
3 DATA STORAGE . . . . . . . . . . . . 2
3.1 multi-segment files (MSFs) . . 3
3.2 proposed large-files . . . . . 4
3.3 data-management-ring . . . . . 4
3.4 control-interval management . . 5
3.5 control-interval structure . . 6
3.6 file attributes management . . 9
3.7 file attributes structure . . . 9
4 FILE IDENTIFICATION . . . . . . . . 11
4.1 pathnames . . . . . . . . . . . 11
4.2 component zero FSUID . . . . . 11
4.3 component zero segment number . 11
4.4 opening identifiers (OIDs) . . 12
4.5 unique identifiers (UIDs) . . . 13
4.6 UID/pathname table management . 13
4.7 UID/pathname table structure . 14
4.8 old UID/pathname table . . . . 15
5 FILE ACCESS MECHANISM . . . . . . . 15
5.1 keeping OIDs valid for bj . . . 16
5.2 file access-data management . . 16
5.3 file access-data structure . . 17
5.4 access-data table management . 20
5.5 access-data table structure . . 20
6 BASIC PROTECTION . . . . . . . . . . 21
6.1 modified CI table management . 21
6.2 modified CI table structure . . 22
6.3 protection incompleteness . . . 24
7 CONCURRENCY CONTROL . . . . . . . . 25
7.1 lock hierarchy . . . . . . . . 25
7.2 lock advice . . . . . . . . . . 26
7.3 fast locks . . . . . . . . . . 27
8 PROTECTION AGAINST FAILURES . . . . 27
CONTENTS (cont)
Page
8.1 before-images . . . . . . . . . 28
8.2 rollback-handlers . . . . . . . 29
8.3 postcommit-handlers . . . . . . 29
8.4 write-sync protocol . . . . . . 30
8.5 support for before-journals . . 31
9 PROPOSED ROLLBACK HANDLERS . . . . . 31
9.1 existing rollback protection . 31
9.1.1 create (unprotected) . . . 32
9.1.2 delete (unprotected) . . . 32
9.1.3 allocate (unprotected) . . 32
9.1.4 free (inefficiently
protected) . . . . . . . . . . 32
9.2 proposed full rollback
protection . . . . . . . . . . . . 33
9.2.1 create (protected) . . . . 33
9.2.2 delete (protected) . . . . 33
9.2.3 allocate (protected) . . . 33
9.2.4 free (protected) . . . . . 33
9.3 proposed partial rollback
protection . . . . . . . . . . . . 34
9.3.1 create (unprotected) . . . 34
9.3.2 delete (unprotected) . . . 34
9.3.3 allocate (protected) . . . 34
9.3.4 free (inefficiently
protected) . . . . . . . . . . 34
9.4 user environment implications . 34
9.4.1 create (with and without) 34
9.4.2 delete (with and without) 35
9.4.3 allocate (with and
without) . . . . . . . . . . . 35
9.4.4 free (existing and
proposed) . . . . . . . . . . . 35
9.5 implementing features later as
an incompatable change . . . . . . 36
9.5.1 create and delete . . . . 36
9.5.2 allocate and free . . . . 36
10 PROPOSED FILE DUMPING . . . . . . . 36
11 PROPOSED AFTER JOURNALS . . . . . . 37
12 THE DATA MANAGEMENT DAEMON . . . . 38
12.1 transaction adoption . . . . . 38
12.2 recovery after crash . . . . . 38
12.3 daemon access . . . . . . . . 38
13 EXTENDED OBJECT SUPPORT . . . . . . 38
14 ACCESS CONTROL . . . . . . . . . . 39
14.1 ring brackets . . . . . . . . 39
14.2 access control lists (ACLs) . 39
14.3 access isolation mechanism
(AIM) . . . . . . . . . . . . . . 39
CONTENTS (cont)
Page
15 ERROR HANDLING AND STATUS REPORTING 39
15.1 status reporting . . . . . . . 39
15.2 error reporting . . . . . . . 40
15.3 condition handling . . . . . . 40
16 INITIALIZATION . . . . . . . . . . 40
16.1 system initialization . . . . 40
16.2 process initialization . . . . 40
17 FILE MANAGER MODULARIZATION . . . . 40
17.1 fm_attribute_.pl1 . . . . . . 40
17.2 fm_combos_.pl1 . . . . . . . . 41
17.3 fm_data_.alm . . . . . . . . . 41
17.4 fm_fetch_.pl1 . . . . . . . . 41
17.5 fm_get_.pl1 . . . . . . . . . 41
17.6 fm_open_.pl1 . . . . . . . . . 42
17.7 fm_put_.pl1 . . . . . . . . . 42
17.8 fm_read_.pl1 . . . . . . . . . 42
17.9 fm_std_error_handler_.pl1 . . 42
17.10 fm_validate_.pl1 . . . . . . 42
18 DESCRIPTIONS OF OPERATIONS . . . . 42
18.1 acl_add . . . . . . . . . . . 43
18.2 acl_delete . . . . . . . . . . 43
18.3 acl_list . . . . . . . . . . . 43
18.4 acl_replace . . . . . . . . . 43
18.5 add_acl_entries . . . . . . . 43
18.6 adopt . . . . . . . . . . . . 43
18.7 allocate . . . . . . . . . . . 43
18.8 chname_file . . . . . . . . . 44
18.9 close . . . . . . . . . . . . 44
18.10 create . . . . . . . . . . . 44
18.11 create_open . . . . . . . . . 44
18.12 delentry_file . . . . . . . . 44
18.13 delete . . . . . . . . . . . 44
18.14 delete_acl_entries . . . . . 44
18.15 delete_close . . . . . . . . 44
18.16 end_of_crash_recovery . . . . 44
18.17 fetch . . . . . . . . . . . . 44
18.18 find_old_uid_pn_table . . . . 44
18.19 flush_consecutive_ci . . . . 44
18.20 flush_modified_ci . . . . . . 44
18.21 free . . . . . . . . . . . . 45
18.22 get . . . . . . . . . . . . . 45
18.23 get_ci_header . . . . . . . . 45
18.24 get_exclusive . . . . . . . . 45
18.25 get_max_length . . . . . . . 46
18.26 get_switch . . . . . . . . . 46
18.27 get_user_access_modes . . . . 46
18.28 list_acl . . . . . . . . . . 46
18.29 list_switches . . . . . . . . 46
CONTENTS (cont)
Page
18.30 lock_advice . . . . . . . . . 46
18.31 open . . . . . . . . . . . . 46
18.32 open_by_uid . . . . . . . . . 46
18.33 open_by_uid_after_crash . . . 46
18.34 postcommit_do . . . . . . . . 46
18.35 prepare_to_copy . . . . . . . 46
18.36 put . . . . . . . . . . . . . 47
18.37 put_journal . . . . . . . . . 47
18.38 raw_get . . . . . . . . . . . 47
18.39 raw_put . . . . . . . . . . . 47
18.40 read . . . . . . . . . . . . 47
18.41 redo . . . . . . . . . . . . 47
18.42 replace_acl . . . . . . . . . 47
18.43 reput . . . . . . . . . . . . 47
18.44 set_bit_count . . . . . . . . 47
18.45 set_max_length . . . . . . . 47
18.46 set_switch . . . . . . . . . 47
18.47 status . . . . . . . . . . . 48
18.48 store . . . . . . . . . . . . 48
18.49 sub_err_flag_get . . . . . . 48
18.50 sub_err_flag_set . . . . . . 48
18.51 suffix_info . . . . . . . . . 48
18.52 undo . . . . . . . . . . . . 48
18.53 unput . . . . . . . . . . . . 48
18.54 validate . . . . . . . . . . 49
18.55 write . . . . . . . . . . . . 49
19 TESTING AND DEBUGGING TOOLS . . . . 49
19.1 command interface . . . . . . 49
19.2 create_file and delete_file . 49
19.3 fm_tester . . . . . . . . . . 49
19.4 fm_driver . . . . . . . . . . 49
20 PROPOSED FEATURES . . . . . . . . . 50
20.1 software ring brackets . . . . 50
20.1.1 reason . . . . . . . . . 51
20.1.2 performance . . . . . . . 51
20.1.3 effort . . . . . . . . . 51
20.1.4 priority . . . . . . . . 51
20.2 audit hardcore support . . . . 51
20.2.1 reason . . . . . . . . . 52
20.2.2 performance . . . . . . . 52
20.2.3 effort . . . . . . . . . 52
20.2.4 priority . . . . . . . . 52
20.3 flushing directories . . . . . 52
20.3.1 reason . . . . . . . . . 52
20.3.2 performance . . . . . . . 52
20.3.3 effort . . . . . . . . . 52
20.3.4 priority . . . . . . . . 52
CONTENTS (cont)
Page
20.4 hardcore support of UID
pathnames . . . . . . . . . . . . 52
20.4.1 reason . . . . . . . . . 53
20.4.2 performance . . . . . . . 53
20.4.3 effort . . . . . . . . . 53
20.4.4 priority . . . . . . . . 53
20.5 provide a pointer interface . 53
20.5.1 reason . . . . . . . . . 54
20.5.2 performance . . . . . . . 54
20.5.3 effort . . . . . . . . . 55
20.5.4 priority . . . . . . . . 55
20.6 file manager command interface 56
20.6.1 reason . . . . . . . . . 56
20.6.2 performance . . . . . . . 56
20.6.3 effort . . . . . . . . . 56
20.6.4 priority . . . . . . . . 56
20.7 command to list open files . . 56
20.7.1 reason . . . . . . . . . 56
20.7.2 performance . . . . . . . 56
20.7.3 effort . . . . . . . . . 56
20.7.4 priority . . . . . . . . 56
20.8 better validation of msf
manager's pathname . . . . . . . . 56
20.8.1 reason . . . . . . . . . 57
20.8.2 performance . . . . . . . 57
20.8.3 effort . . . . . . . . . 57
20.8.4 priority . . . . . . . . 57
20.9 dynamic array of msf component
segment numbers . . . . . . . . . 57
20.9.1 reason . . . . . . . . . 57
20.9.2 performance . . . . . . . 57
20.9.3 effort . . . . . . . . . 57
20.9.4 priority . . . . . . . . 57
20.10 set ring brackets on msf
components to 2 5 5 . . . . . . . 57
20.10.1 reason . . . . . . . . . 57
20.10.2 performance . . . . . . 58
20.10.3 effort . . . . . . . . . 58
20.10.4 priority . . . . . . . . 58
20.11 make fm_$open_by_uid failsafe 58
20.11.1 reason . . . . . . . . . 58
20.11.2 performance . . . . . . 58
20.11.3 effort . . . . . . . . . 58
20.11.4 priority . . . . . . . . 58
20.12 make fm_$unput failsafe . . . 59
20.12.1 reason . . . . . . . . . 59
20.12.2 performance . . . . . . 59
20.12.3 effort . . . . . . . . . 59
CONTENTS (cont)
Page
20.12.4 priority . . . . . . . . 59
20.13 handle postponed file closing
better . . . . . . . . . . . . . . 59
20.13.1 reason . . . . . . . . . 59
20.13.2 performance . . . . . . 60
20.13.3 effort . . . . . . . . . 60
20.13.4 priority . . . . . . . . 60
20.14 optimize calls to bjm for new
file . . . . . . . . . . . . . . . 60
20.14.1 reason . . . . . . . . . 60
20.14.2 performance . . . . . . 60
20.14.3 effort . . . . . . . . . 60
20.14.4 priority . . . . . . . . 60
20.15 optimize calls to bjm for new
CI . . . . . . . . . . . . . . . . 60
20.15.1 reason . . . . . . . . . 60
20.15.2 performance . . . . . . 61
20.15.3 effort . . . . . . . . . 61
20.15.4 priority . . . . . . . . 61
20.16 find something to lock before
the open operation . . . . . . . . 61
20.16.1 reason . . . . . . . . . 61
20.16.2 performance . . . . . . 61
20.16.3 effort . . . . . . . . . 61
20.16.4 priority . . . . . . . . 61
20.17 keep modified CI list in
persystem storage . . . . . . . . 61
20.17.1 reason . . . . . . . . . 61
20.17.2 performance . . . . . . 61
20.17.3 effort . . . . . . . . . 62
20.17.4 priority . . . . . . . . 62
20.18 give files a type field . . . 62
20.18.1 reason . . . . . . . . . 62
20.18.2 performance . . . . . . 62
20.18.3 effort . . . . . . . . . 62
20.18.4 priority . . . . . . . . 62
20.19 add a debug switch . . . . . 62
20.19.1 reason . . . . . . . . . 62
20.19.2 performance . . . . . . 62
20.19.3 effort . . . . . . . . . 63
20.19.4 priority . . . . . . . . 63
20.20 fix sma patch to delete . . . 63
20.20.1 reason . . . . . . . . . 63
20.20.2 performance . . . . . . 63
20.20.3 effort . . . . . . . . . 63
20.20.4 priority . . . . . . . . 63
20.21 make protected the default . 63
20.21.1 reason . . . . . . . . . 63
CONTENTS (cont)
Page
20.21.2 performance . . . . . . 63
20.21.3 effort . . . . . . . . . 64
20.21.4 priority . . . . . . . . 64
20.22 ability to change attributes 64
20.22.1 reason . . . . . . . . . 64
20.22.2 performance . . . . . . 64
20.22.3 effort . . . . . . . . . 64
20.22.4 priority . . . . . . . . 64
20.23 keep opening count per ring . 64
20.23.1 reason . . . . . . . . . 64
20.23.2 performance . . . . . . 64
20.23.3 effort . . . . . . . . . 65
20.23.4 priority . . . . . . . . 65
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
2 INTRODUCTION
In many ways, data-management files imitate Multics segments.
The main purpose of a segment is to store data. Data-management
files also store data, but they are potentially larger and can
provide protection against data inconsistencies that arise from
failures and concurrent access. They are usually less efficient
than segments because segments are accessed by hardware and
data-management files are accessed by software.
2.1 pathnames, UIDs, and OIDs
A data-management file is identified by three kinds of names.
First, its pathnames identify it by its location in the Multics
file-system hierarchy. Second, its 36 bit unique identifier
(UID) distinguishes it from all other data-management files in
the system. Its UID never changes. Third, a 36 bit opening
identifier (OID) is created in each process for use by programs
when the file is opened.
2.2 control-intervals
The data in the file is divided into equal size
control-intervals. The control-interval (CI) is the unit of data
that is moved between mass-storage and main-memory during file
access. Each control-interval is known by its ordinal number.
The first control-interval is numbered zero. Every
control-interval in a file is the same size. Every file has a
control-interval zero.
2.3 protection and transactions
The meaning of protection is abstract. Protection does not imply
any specific mechanisms such as locking or journalizing. These
are the methods used to implement protection, but they could be
replaced by others. Protection means that data in a file is
protected against inconsistency caused by concurrent access and
system failure. It enables applications to transform data from
one consistent state to another by delimiting groups of changes
into transactions, which are atomic in the sense that either all
the changes take effect or none of them take effect.
Furthermore, only the transaction making the changes can observe
their partial effect. To transactions in other processes, the
changes appear to take place simultaneously.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
2.4 locks
The file manager can protect files from becoming inconsistent
when they are used by several uncoordinated processes. This is
achieved by locking the files and control-intervals that
participate in a transaction. The control-intervals that are
accessed during a transaction are locked so that no other
transaction can modify them. The control-intervals that are
modified during a transaction are locked so that no other
transaction can access or modify them.
2.5 before-journals
The file manager can almost always protect data from becoming
inconsistent when an application program, a process, or Multics
fails. This is achieved by putting a copy of the data prior to
modification into a file called a before-image journal.
Transactions are used to group the modifications. All
transactions which can not continue because of a failure are
aborted. Abortion restores the before-images to the files. This
restoration is called "rollback".
2.6 extended-objects
Users expect data-management files to have many of the features
of segments, like ACLs, ring brackets, pathnames, etc.
Furthermore, users expect to manipulate them using ordinary
commands, as if they were segments. Therefore, it must be
possible to create them, delete them, copy them, list them,
rename them, etc. The file manager provides these capabilities
through interfaces designed to be called by the Multics
extended-object facility.
2.7 related documents
MTB-508: "Data Management: An Architectural Overview".
MTB-511: "Data Management: Page Access Layer Overview".
MTB-514: "Concurrency Management - Overview".
MTB-553: "Data Management: File Manager Functional Specification".
MTB-560: "Before-journal Manager Design".
MTB-561: "Data Management: After Journal Manager Specification".
MTB-564: "Phasing Page Control and Before Journal".
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
3 DATA STORAGE
The most basic function of a data-management file is, of course,
to store data. The current implementation of the file manager
stores all the data of a file in a ring-two multi-segment file
(MSF). This section explains how it is done and how it is
expected to be done in the future.
3.1 multi-segment files (MSFs)
An MSF is a directory with a nonzero bit count which contains
segments. The segments are called components. The components
are named with the ordinal numbers, starting with "0". The file
manager does not manage the MSFs itself. It uses msf_manager_,
which is a Multics facility which has existed for years.
The file manager puts one control-interval in each page of each
component. The control-interval fills the entire page.
Currently, the file manager only supports one control-interval
size because Multics has only one page size. The number of
control-intervals in each component is one of the attributes of
the file. The file manager can put up to 255 of
control-intervals in each component.
In order to access a particular control-interval, the file
manager calculates its component number and its offset within the
component. It gets the segment number of the component from a
table that it keeps. If the segment number is not yet in the
table, it is obtained from msf_manager_. Using the segment
number and the offset, it manufactures a pointer to the page
which contains the control-interval.
Multi-segment files have certain limitations. A Multics
directory can not contain more than about 4000 branches, so the
maximum number of components is limited to about 4000. The
number is not exact because the limitation results from the fact
that a directory must be contained in a single segment which is
managed like an area. Every branch, name, and ACL term takes up
space. The longer the ACL on each component, the lower the
maximum number of components. This limitation on file size
greatly increases the difficulty of implementing certain
applications with very large data storage requirements.
Another MSF limitation arises from the Multics virtual memory
implementation of the component segments. Each segment is
represented by a table with one word for each page. These page
tables are read into main-memory from the mass-storage volume
tables of contents and are purged from main-memory if they have
not been used recently, because main-memory can only hold a few
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
thousand of them at once and this capacity is shared by all files
on the system. When pages of a large MSF or a large number of
MSFs are accessed randomly, there can be almost as many page
table reads as page reads, since the likelyhood of the page table
being in main-memory can be almost as low as the likelyhood of
the page being in main-memory.
3.2 proposed large-files
In order to overcome the main limitations of multi-segment files,
a new type of Multics file, called a "large-file" is envisioned.
In this type of file, the data is not stored in segments.
Instead, it is read directly from mass-storage into main-memory
buffers. Instead of a table of mass-storage page locations, it
has a table of equal size mass-storage extents. Each extent is a
contiguous area of a mass-storage device and contains many pages.
The mass-storage address of a page is determined by calculating
the ordinal number of the extent that contains the page and
calculating the offset of the page within the extent. This
method permits the location of pages on mass-storage devices to
be compactly represented. It allows one directory entry to
describe a very large file.
The amount of effort necessary to implement these files is
expected to be large. The mass-storage allocation is complicated
because the large contiguous extents must be managed so as to
minimize wasted space. The main-memory buffer space must be well
managed in order to efficiently use main-memory. Many difficult
modifications to the highly optimized hardcore supervisor will be
required.
The file manager has been designed to anticipate the transition
to large-files. In fact, the MSF implimentation can be regarded
as a prototype. It has not been decided whether large-files will
only be used for data-management files or whether they will be
made general enough for other applications.
3.3 data-management-ring
Multics rings are used to protect the data in the files as well
as the perprocess and persystem data required by the file
manager. The system global variable
sys_info$data_management_ringno contains the number of the
data-management-ring. Currently, ring-two is the
data-management-ring. The file manager runs mostly in the
data-management-ring, and so do the ancillary services (locking
and before-journalizing). Like any inner ring subsystem, the
file manager has a gate to control the access to it. In the
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
following discussion, the data-management-ring will be
represented by the numeral "2".
An MSF directory is created with directory ring brackets of 2,2
so it can only be accessed through the file manager gate. The
MSF components are created with segment ring brackets 2,5,5 so
that they can be read from the user ring but can be modified only
from the data-management-ring. The file manager's perprocess and
persystem data are only accessable from within the
data-management-ring so that they can not be corrupted by outer
ring programs and so that outer ring programs can not develop a
depedency on their format or content.
In many cases ring brackets are set implicitly. The ring
brackets on perprocess data are an effect of the fact that they
are allocated in the combined linkage area of the
data-management-ring. The ring brackets of the MSF directories
and components are set by the hardcore to the software validation
level. The file manager always sets the validation level to the
ring of execution when it expects to call a hardcore primitive.
3.4 control-interval management
The first four words of a control-interval are used as a header.
The header has several purposes. It identifies the
control-interval because it contains the file UID and the
control-interval number. It contains some information used to
prevent the control-interval from being written to mass-storage
until its before-image has been written to mass-storage. It
tells how big the control-interval is. It can be used to check
the consistency of the control-interval because the first two
words of the control-interval header are identical to the last
two words of the control-interval which is called the trailer.
The control interval trailer has the same form and content as the
first two words of the header.
The control-interval header and trailer are maintained
exclusively by the file manager. They are not in the user
addressable range of the control-interval, so "get" and "put"
operations can not read or write them. The addressable portion
consists of all the bytes between the header and trailer, except
in control-interval zero, where the addressable portion is
between the header and the beginning of the area reserved for
file attributes. The addressable portion is always the same for
get and put. The control-interval sizes of the current
implementation are as follows:
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
actual size 4096 bytes
addressable size 4072 bytes
addressable size of ci 0 3176 bytes
Throughout the file manager and in all its interfaces,
control-interval numbers are represented by fixed binary (27).
This representation was chosen because it permits very large
files and because it fits evenly in three bytes which can be
efficiently loaded and stored by the CPU. Its value should never
be negative.
When a file is created, it has only one control-interval, which
is control interval zero. It is stored in page zero of component
zero. Control-interval zero can never be freed because the
attributes of the file are stored in it.
The present specification permits the user to allocate and free
control intervals anywhere in the file. The file manager does
not have a list of which control-intervals are allocated and
which are free. It must reference the page in which the
control-interval is stored in order to determine whether it is
allocated.
An unallocated control-interval is represented by a page that
contains nothing but zeros. Multics page-control allocates no
mass-storage for such a page. When the file manager is asked to
allocate a control-interval, it writes the header and trailer,
which are never zero. This causes page-control to assign a page
of mass-storage and increase the quota used by the MSF. When the
file manager is asked to free a control-interval, it zeros the
page, which causes page-control to return the page to the pool of
available pages and decrease the quota used. When the file
manager is asked to allocate a control-interval in an MSF
component which does not exist, it first calls msf_manager_ to
create the component. The file manager has no mechanism for
noticing when all of the control-intervals in a component have
been freed, so it never deletes a component.
When the file manager is asked to put data into a
control-interval that is not allocated, it automatically
allocates one. When the file manager is asked to get data from a
control-interval that is not allocated, it simulates one by
returning all zeros, but does not actually allocate one.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
3.5 control-interval structure
dcl 1 ci aligned based,
2 header,
3 stamp fixed bin (71),
3 id fixed bin (71),
2 addressable_bytes char (4072),
2 trailer fixed bin (71);
dcl 1 ci_stamp aligned based,
2 version bit (9) unal,
2 bj_idx fixed bin (9) uns unal,
2 time_modified fixed bin (53) unal;
dcl 1 ci_id aligned based,
3 uid bit (36),
3 size_code bit (9) unal,
3 num fixed bin (27) uns unal;
dcl 1 ci_size_code aligned based,
2 exponent fixed bin (6) uns unal,
2 addon fixed bin (3) uns unal;
"ci" is a control-interval. Its subsidiary items are
described in storage order below.
"header" is the first four words of every allocated
control-interval. The stamp and id are declared as
double precision numeric values so that they can be
atomically referenced. This is required when, for
example, the before-journal index and the time modified
fields must be updated simultaneously so that
page-control never sees the old version of one and the
new version of the other. It also aids efficiency.
"stamp" is the two word modification stamp of the
control-interval. It contains the two items in the
header that change. Think of the file manager rubber
stamping this area whenever the data content of the
control interal is modified.
"version" is the version number of the control-interval
format. It was chosen to be as unique as possible.
Currently its octal value is 641. It is not expected
to change often. The value is a reminiscence of the GE
645, which was the first Multics CPU. The last digit
was changed to 1 to represent version number 1.
"bj_idx" is the index of the before-journal that contains
the before-image taken prior to the last modification.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
Page-control uses this index in order to determine if
this page can be purified (written to mass-storage).
It can not be purified until its before-image has been
written to mass-storage.
"time_modified" is the Multics clock reading associated with
the before-image taken prior to the last modification.
Page-control uses this reading in order to determine if
this page can be purified (written to mass-storage).
It can not be purified while the clock reading of the
most recent before-image safely on mass-storage is
earlier than this one.
"id" is the two word control-interval identifier. It never
changes.
"uid" is the 36 bit unique identifier (UID) of the
data-management file to which the control-interval
belongs. It is used to verify that the control
interval being referenced is of the file being
referenced. It could be used to repatriate
control-intervals when the file map is lost.
"size_code" represents the size of the control-interval.
The number of bytes in the control-interval is (64 +
8*addon) * 2**exponent. This representation was chosen
because it can compactly represent a wide range of
sizes. Currently, the only valid size code is octal
060 which means 4096 bytes or one page. Currently, it
is not used for any purpose. It could be very useful
to a repatriation scheme.
"num" is the ordinal number of the control-interval. It is
used to verify that the desired control-interval has
been read from mass-storage. It could be used to
repatriate the control-interval to the correct place in
the file when the file map is lost.
"addressable_bytes" is the user addressable portion of the
control-interval, where the data is stored.
"ci_trailer" is the last two words of a control-interval.
It always matches the control-interval header stamp,
except in the midst of a modification. If a
control-interval modification is interrupted, the
trailer will not match the header stamp. The current
file manager does not detect this mismatch condition
and it is straightened out by the next modification.
The trailer could be used to verify that interrupted
operations get rolled back.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
3.6 file attributes management
The file manager stores the attributes of a file in the upper
part of control-interval zero. This part is inaccessable to the
user. The upper part was chosen because it seemed more
straightforward to special case the bound check on
ci_parts.length_in_bytes then to bump the offset to get over the
attributes. The last two bytes before the trailer are used to
contain the length of the attributes area. Thus, the size of the
area could be variable. In the current implementation, it is
fixed as the value of a declared constant.
The attributes are only used when the file is opened, at which
time, some of them are copied into the perprocess file
access-data structure. There is no reason why the attributes can
not be accessed at any time, or even modified, but the current
implementation does not. The attributes are set when the file is
created and are never changed thereafter. Eventually, the file
manager should permit the user to access and modify the file
attributes. If protection could be turned on and off, it would
permit bulk operations such as database loading, database
conversion, restructuring, and long transactions such as
invoicing to be run with potentially expensive protection turned
off.
3.7 file attributes structure
dcl 1 file_attributes aligned,
2 version bit (36),
2 unique_id bit (36),
2 ci_size_in_bytes fixed bin (35),
2 blocking_factor fixed bin,
2 date_time_created fixed bin (71),
2 mbz_1 (2) fixed bin (71),
2 protected bit unal,
2 no_concurrency bit unal,
2 no_rollback bit unal,
2 mbz_1a fixed bin,
2 mbz_5 (8) fixed bin (71),
2 time_last_dumped fixed bin (71),
2 dump_file_path char (168),
2 mbz_6 (4) fixed bin (71),
2 after_journal_path char (168),
2 mbz_7 (50) fixed bin (71),
2 mbz_8 fixed bin,
2 mbz_9 fixed bin (17) unal,
2 length_of_attributes fixed bin (17) unal;
where:
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
"version" is the version of the structure.
"unique_id" is the unique identifier (UID) of the file.
"ci_size_in_bytes" is the control-interval length. In the
current implementation, it is always 4096.
"blocking_factor" is the number of control-intervals to put
into each MSF component. The file manager code can
handle any number from 1 to 255, but the create
operation only allows 64 and 255 on the assumption that
these are the only reasonable values.
"date_time_created" is the reading of the Multics clock when
the file was created.
"protected" means that the file is given all the protection
that the file manager is capable of, except for
protections that are explicitly turned off. If this
bit is turned on, the file may only be accessed during
a transaction. "^protected" means that the file is to
be given none of the protection that the file manager
is capable of.
"no_concurrency" is only meaningful if "protected" is on.
It means that the file manager should not protect the
file against inconsistency arising from uncoordinated
access by multiple processes.
"no_rollback" is only meaningful if "protected" is on. It
means that the file manager should not protect the file
against inconsistency arising from application program,
process, or system failures.
"time_last_dumped" is not used because there is no
data-management file dumper.
"dump_file_path" is not used because there is no
data-management file dumper.
"after_journal_path" is not used because after-image
journalization has not been implemented yet.
"length_of_attributes" is the number of bytes in the
attributes structure minus two. The minus two is so
that this number does not include its own two bytes.
This may seem silly, but it permits the possibility of
creating a file with no attributes at all by not
storing anything at all in control-interval zero (ie
zero means no attributes).
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
4 FILE IDENTIFICATION
4.1 pathnames
Path names are used to designate files in the calling sequences
of some file manager primitives. There are also two internal
uses of pathnames. One is in the UID/pathname table which is
used to find a pathname for a file when its UID is given. The
other is in the perprocess access-data of a file. The pathname
is actually stored by the msf_manager_ in its perprocess data.
The file manager keeps a pointer to msf_manager_'s perprocess
data which is used in all calls to msf_manager_.
The main problem with pathnames is that they are not very
reliable, in the sense that they are not guaranteed by Multics to
remain valid. This is because the user can change the name of a
file or any of the directories above it. After such name
changes, the pathname may be invalid or may designate a different
file. This problem is expected to be solved in the large-file
implementation. Nevertheless, it is unfortunate that Multics
does not provide a more reliable way to designate an object in
the hierarchy.
4.2 component zero FSUID
Each Multics segment has a 36 bit file-system unique identifier
(FSUID) assigned to it by the supervisor. The file manager uses
the FSUID of MSF component zero for two purposes. First, it is
the source of a data-management file's UID. Nothing depends on
the file UID being the same as the FSUID, it is just a convenient
way to get a 36 bit UID. Second, the FSUID is used in the
UID/pathname table to to verify that two files do not have the
same data-management file UID. This could happen, for example,
if the same file were retrieved twice, with different pathnames.
The FSUID of the MSF directory could have been used instead of
the FSUID of component zero. Component zero was chosen because
there is a simple hardcore primitive to get the FSUID of a
segment.
4.3 component zero segment number
The Multics segment number of MSF component zero is used by the
file manager for two purposes other than manufacturing pointers
to control-intervals in component zero. First, when it opens a
file, it calls msf_manager_ to get a pointer to the base of
component zero from which it extracts the segment number. It
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
then searches its table of open files, using this segment number,
to see if the file is already open. Second, the segment number
is used to call the supervisor whenever the file manager needs
the pathname of the file.
4.4 opening identifiers (OIDs)
The 36 bit opening identifier is a perprocess nickname for a
file. The user can make as many copies of it as he wants and can
store it wherever he wants but it is only valid in one process.
The file can be opened any number of times in any ring and all
openings will return the same OID. Openings for the same file in
different processes will usually not match.
Opening identifiers have the following structure:
dcl 1 oid aligned,
2 proc_ad_idx fixed bin (17) unal,
2 uid_tail bit (18) unal;
where:
"proc_ad_idx" is the index of the perprocess access-data of
the file in the file manager's table of open files.
"uid_tail" is the last 18 bits of the file UID.
A file may be opened in a process by several subsystems that do
not know about each other. When one of the subsystems closes the
file, openings held by other subsystems should continue to
function. In order to achieve this, the file manager maintains a
count of the number of times the file has been opened minus the
number of times it has been closed. The perprocess open data of
a file will not be discarded and therefore the OID can not be
associated with another file until that count drops to zero. If
each subsystem closes a file the same number of times as it opens
it, none will ever find the OID invalid.
A word of zeros is the null value of an OID. Subsystems should
initialize OIDS to zero. When file manager opens a file, it
increments the opening count before it sets the OID in the
argument list. When it closes a file, it zeros the OID in the
argument list before it decrements the count. Thus, if the
process should be interrupted in between, the count will be too
high and file manager will err by unnecessarily retaining the
open data. Subsystems that initialize OIDS to zero before
calling the open primitive can assume that, upon return, if the
OID is not zero, it can be safely used.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
When a file is deleted, it might seem appropriate to discard all
data about the file. This would be a mistake because some
subsystems may still hold OIDS for the file. The open data must
be retained until all subsystems close the file. Obviously, the
deleted file can not be used. The retained access-data provides
the means for detecting any further attempts to use the deleted
file. The only operation that will return a zero status-code is
close.
4.5 unique identifiers (UIDs)
Each file has a 36 bit UID which uniquely identifies it within
the data-management system. The UID has several uses. Each
before-image for the file contains the UID and it is used to open
the file when rolling back the transaction after a failure. The
header of each control-interval contains the UID of the file to
which it belongs. This can be used to help verify that the
correct control-interval has been read from mass-storage. The
UID is used to identify the file when locking it. When
after-image journalization is implemented, it will be in each
after-image.
4.6 UID/pathname table management
The file manager specification includes an open_by_uid entry
point that is intended for use by the before-journal manager
during the rollback of a transaction. In order for this entry
point to work, the file manager must be able to find the pathname
of the file when given its UID. To this end, it maintains a
UID/pathname table in persystem storage. The table is managed so
as to guarantee that a particular UID can be found in it as long
as there is any chance that the UID is used in a before-image in
an active transaction.
When the data-management system is initialized, the table is
empty. Whenever a file is opened in a process, the file manager
makes an entry for it in the table if it is protected and the
no_rollback bit is not set. If there is already an entry for it
in the table, the file manager adds one to its count. The count
represents the number of processes that have the file open and
therefore could produce before-images of it. When the file is
closed in a process, the file manager decrements the count and,
if the it goes to zero, removes the entry from the table.
The correctness of the above policy depends on another policy
according to which the file manager never closes a file with
rollback protection during a transaction. As far as the user is
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
concerned, it is closed, but it postpones the actual closing
until the transaction is over.
The UID/pathname table is a segment in the data-management per
AIM level directory. It is force written to mass-storage
whenever a new entry is added to it, in order to guard against
the failure of Multics emergency shutdown which normally writes
all modified pages to mass-storage after a crash. The table has
a lock so that it is accessed by one process at a time.
4.7 UID/pathname table structure
dcl 1 sys_pn_tbl aligned based,
2 h,
3 version bit (36),
3 last_entry fixed bin,
3 lock fixed bin (71),
3 mbz (30) fixed bin (71),
2 e (4096),
3 thread fixed bin,
3 open_count fixed bin,
3 pfuid bit (36),
3 fsuid bit (36),
2 paths (4096) char (168) unal;
where:
"h" is the 64 word header with plenty of room for additional
items.
"version" is the version of the structure.
"last_entry" is the index of the last entry in the table
that is in use.
"lock" is the lock used to prevent simultaneous access by
several processes. It is managed by the lock manager's
"fast lock" facility.
"e" is the array of UID entries.
"thread" was intended to chain entries together into linked
lists. Actually, it is only used to indicate whether
an entry is in use. If it is zero, the entry is free.
If it is minus one, the entry is in use.
"open_count" is the number of processes that have the file
open. When the count is decremented to zero, the entry
is removed from the table by filling the entire entry
with zeros.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
"pfuid" is the data-management file UID. Data-management
files used to be called "page files".
"fsuid" is the Multics file-system UID of MSF component
zero. It is used to verify that two openings that have
the same data-management file UID also refer to the
same Multics file-system object.
"paths" is an array of data-management file pathnames. Each
element in this array is logically part of the entry in
the "e" array with the same index. This large item was
put in a separate array to increase the storage density
of the "e" array which is heavily searched. The
separation has the disadvantage that adding an entry
modifies two pages which must both be flushed.
4.8 old UID/pathname table
When a new data-management system is initialized, a new
UID/pathname table is created. If the previous incarnation of
the data-management system ended because Multics crashed, the old
UID/pathname table is kept for the duration of
recovery-after-crash which rolls back the transactions that were
active when Multics crashed. The file manager provides three
special entry points for initiating, using, and terminating the
old UID/pathname table. This is the only case where the
UID/pathname table is used without locking it. Locking is not
necessary because recovery-after-crash is executed by a single
process, the data-management daemon.
5 FILE ACCESS MECHANISM
The most central component of the access mechanism is a block of
data called the access-data. It is perprocess data and there is
one for each open file in a process. It provides an efficient
short term memory for data necessary to access a file. This data
includes MSF component segment numbers, protection switches, the
number of control-intervals in each component, advice about
locking from the application, etc. The access-data blocks are
kept in an array which is allocated in the process directory in
the data-management-ring. The first 18 bits of the OID are an
index into this array.
There is no persystem access-data block, although it was
considered many times. Some of the information in the perprocess
access-data block could have been shared if there were persystem
access-data. Furthermore, it may be required in the future for
the efficient implementation of some desirable features. The
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
main disadvantage of persystem data is that it must be protected
from concurrent access confusion. This requires either very
intricate programming or a lock which can increase overhead and
reduce concurrency. It also raises complicated questions about
how to recover when a process fails and leaves the data in an
inconsistent state.
5.1 keeping OIDs valid for bj
Each before-image contains the OID of the file to which it
belongs. If a transaction is rolled back in the process that
created it, the before-journal manager uses these OIDS. An
application may close a file at any time. Usually, when a file
is closed, the access-data is discarded and the OID becomes
invalid. Potentially, an application could cause some
before-images, close the file, and abort the transaction, causing
the before-journal manager to use OIDS that are no longer valid.
The file manager has a special mechanism to prevent this
potentiality. When a file is closed before the end of a
transaction, the file manager threads the access-data block onto
a list. After the transaction is over, it walks the list and
frees the access-data blocks. This is why all file manager
primitives that are available to applications check the open
count to see if it is zero, because if it is, the file is closed
as far as the application concerned, even if the access-data has
been kept around for some reason.
5.2 file access-data management
The most unusual feature of the way access-data is handled is
that instead of getting a pointer to it and referencing it in
place, it is always copied into the stack frame of the procedure
that uses it. Thus, it is important to remember that whenever it
is modified it must be copied back into its place in the table.
The access-data contains a set of switches that control the
ancillary services which are the flushing of modified
control-intervals, locking, before-image journalizing, and
eventually, after-image journalizing. These switches are
arranged in a single word which efficiently serves the need to
selectively disable some of them during the execution of certain
primitives. The calculation of the effective ancillary service
switches is achieved by anding them with a mask that contains the
ancillary services that are permitted by a particular entry
point.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
One of the most important items in the access-data block is the
array of MSF component segment numbers. It starts with component
one because component zero is handled specially and separately.
A value of zero indicates that the component has not yet been
initiated. When a file is opened, this array is initialized to
zeros. The size of this array if fixed, but it can be changed
relatively easily. It is only necessary to change the include
file dm_fm_proc_ad and recompile the modules which use it. The
file manager calls the MSF manager to initiate components that
are not yet initiated and to obtain the segment numbers of
components whose component numbers are greater than the dimension
of the array.
Component zero is initiated when the file is opened and it is not
terminated until the access-data is discarded. The item in which
its number is stored has a special interpretation. When the file
is deleted, the item is set to zero. That is why all primitives
that access the file check this item to see if the file still
exists.
The file access-data does not have an item that represents the
size of the file or the number of control-intervals in it.
Correct maintenance of this type of information requires that it
be stored in a place where it can be referenced by all processes
that have the file open. This could be achieved one of two ways.
First, it could be stored in the file header, and a pointer to it
kept in the access-data. Second, it could be stored in a
persystem table if there were one. Currently, only one primitive
returns this kind of information. It is used by the copy command
which must know the number of the last control-interval prior to
commencing its work. The file manager determines the number of
the last control-interval by surveying the MSF components with
hcs_$star_dir_list and then calling hcs_$status_long to get the
current length of the last one.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
5.3 file access-data structure
dcl 1 proc_ad aligned,
2 thread fixed bin unal,
2 seg_0_num bit (18) unal,
2 pn_tbl_idx fixed bin unal,
2 blocking_factor fixed bin (8) unal,
2 opens fixed bin (8) unal,
2 uid bit (36),
2 msf_ptr ptr unal,
2 lock_advice fixed bin,
2 last_tid bit (36),
2 seg_nums (27) fixed bin (12) uns unal,
2 ass,
3 rtm bit unal,
3 txn bit unal,
3 lock bit unal,
3 bj bit unal,
3 aj bit unal;
where:
"thread" is used in the management of the access-data table
in which proc_ad is an entry. If it is zero, the entry
is not in use. If it is negative one, the entry is in
use but is either not part of a linked list or the end
of a linked list. If it is positive, the entry is part
of a linked list. It is intentionally the first item
in the structure so that the entry can be removed by
unspecing it to zero, and even if the operation is
interrupted, if it does anything at all, the entry will
have been marked as not-in-use.
"seg_0_num" is the segment number of MSF component zero. A
value of zero means that the file does not exist.
"pn_tbl_idx" is the index of the entry, in the UID/pathname
table, associated with this file. It is non-zero only
for files that are protected by before-image
journalizing.
"blocking_factor" is the number of control-intervals in each
MSF component segment. It is used to calculate the
component and offset of a control-interval.
"opens" is the number of times the file has been opened by
the user minus the number of times it has been closed.
It helps determine when the access-data can be
discarded.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
"uid" is the data-management system unique identifier of the
file.
"msf_ptr" points to the perprocess data about the MSF that
is managed by msf_manager_. It is used when calling
msf_manager_.
"lock_advice" is the mode in which to lock the file prior to
locking control intervals in it. Zero means do not
lock the file because no lock advice has been given.
"last_tid" is used to determine when to lock the file in
cases where lock advice is available. If it is not
equal to the current transaction identifier then the
process must have started a new transaction and it is
time to lock the file in the advised mode before
locking any control-intervals in it.
"seg_nums" is the array of MSF component segment numbers. A
value of zero means that the component has not been
initiated yet. Twelve bits are considered sufficient
to represent the segment number because only twelve
bits are allowed for the segment number in the packed
pointer format that is used by the hardware.
"ass" are the ancillary service switches. They are derived
from the protection attributes of the file at the time
it is opened. They are used to calculate the ancillary
services that will be effective during the operation of
file manager primitives.
"rtm" means record time modified. It causes the file
manager to record the time of modification in the
control-interval header stamp. It is always turned on.
Some operations, such as unput and raw_put, disable it.
"txn" means this file can only be accessed when the process
is in a transaction. Its value is same as the
"protected" switch of the file.
"lock" means the lock manager is to be called before each
access. It is on if the file is protected and
no_concurrency is off.
"bj" means the before-journal manager is to be called with a
before-image prior to each modification. It is on if
the file is protected and no_rollback is off.
"aj" means the after-journal manager is to be called with an
after-image subsequent to each modification. It will
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
not be set or used until the after-journal manager is
implemented.
5.4 access-data table management
The access-data blocks are stored in a table for maximum
efficiency. The table is allocated in the user-free-area in the
data-management-ring by the file manager at perprocess
initialization time. The first 18 bits of the file OID are an
index into the table. A table of pointers to allocated
access-data blocks could have been used, but that would have
required a few more instructions during each access.
The entries in the table are all access-data blocks, but this is
not necessary. Table slots could be used for other data such as
segment number array extensions. The first half word in every
entry is reserved for the purpose of indicating whether the entry
is in use and threading logically related entries together.
Sometimes, access-data is put or left in the table for a file
that does not exist. During the operation of the create
primitive, the access-data is built up while the file is being
created. This permits the file manager to use standard
mechanisms to allocate control-interval zero and to put the
attributes into it. During the delete operation, the access-data
is left in the table until the open count goes to zero and there
is no chance that the before-journal manager will be using the
OID.
The file manager can not expand or contract the table dynamically
but the number of entries in the table can be changed by
modifying the include file dm_fm_proc_ad_tbl and recompiling the
programs that use it.
5.5 access-data table structure
dcl 1 proc_ad_tbl aligned based,
2 h,
3 version bit (36),
3 last_entry fixed bin,
3 mbz_1 fixed bin,
3 post_txn_closes fixed bin,
3 mbz_2 (6) fixed bin (71),
2 e (1024) like proc_ad;
where:
"h" means table header.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
"version" is the version of the structure.
"last_entry" is the index of the last table entry in use.
"post_txn_closes" is the head of the list of access-data
blocks that can be discarded at the end of the
transaction. If its value is negative one, the list is
empty.
"e" is the array of table entries.
6 BASIC PROTECTION
Even if concurrency control and protection against failure are
turned off, a protected file is not the same as an unprotected
one for two reasons. First, it can only be accessed when the
process is in a transaction. Second, when the transaction
commits, the modified control-intervals are flushed (written) to
mass-storage.
6.1 modified CI table management
This table contains references to control-intervals that have
been modified during a transaction. The entries refer to
control-intervals by refering to the pages that contain them.
Each entry consists of a segment number and page number. The
entries are kept sorted, first by ascending segment number and
then by ascending page number. Each entry is one word. The
segment number and page number are arranged so that the word can
be treated as a fixed binary single precision number that can be
sorted by its value. This ordering is not required by the
supervisor primitive that takes the list as input and flushes the
pages. The supervisor only requires that the pages of each
segment be grouped and be in ascending order.
It would be more efficient to keep the table in hash format while
it is accumulating modified pages and then sort it before calling
the supervisor, but it is done a simpler way. It is kept sorted
at all times. There are two stacks of pages, one at the top of
the table and one at the bottom. There are two pointers, one to
the end page in each stack. To insert a new page, the pages
already in the table are moved from one stack to the other until
the new page can be added to the end of one of the stacks. At
the end of the transaction, all the pages are moved to the lower
stack and the supervisor flushing primitive is called with a
pointer to it.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
The size of the table is fixed at compile time, so when it fills
up, the supervisor is called to flush the pages. If the file is
protected by rollback, some of the pages to be flushed may have
before-images that have not been flushed, so before the file
manager flushes the pages it flushes the before-journal. There
is no need to flush the before-journal if none of the pages in
the table are protected by before-images, so the before-journal
flushing is mediated by a bit that is turned on only when a page
that is protected by rollback is added to the table. At commit
time, when the transaction manager calls the file manager to
flush the modified pages, the bit is turned off because the
transaction manager flushes the before-journal before calling the
file manager.
During rollback the table accumulates modified pages that must be
flushed when the rollback is complete, but since no before-images
are taken during rollback, the bit that mediates the
before-journal flushing is not turned on. The transaction
manager flushes the before-journal before a rollback, but the bit
may still be on from before the rollback since the transaction
manager does not flush the modified pages before a rollback. The
file manager compensates for this by turning the bit off during
each "unput" operation. This works because during the ill fated
transaction it is turned on after each before-image is written,
so unput will certainly be called if the transaction is rolled
back.
These complications could be eliminated simply by not having the
transaction manager flush the before-journal and instead letting
the file manager do it according to its journal flushing bit.
The complications arise from the need to turn the bit off at the
correct times in order to avoid redundent flushes. There is no
need to flush the journal before rollback.
Before a file is deleted or its access-data is discarded,
modified pages are flushed to eliminate from the table the
segment numbers associated with the file, because the segments to
which they refer will soon be unknown to the process.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
6.2 modified CI table structure
dcl 1 proc_txn aligned based,
2 tid bit (36),
2 flush_bj_first bit,
2 going_down bit,
2 low_idx fixed bin,
2 high_idx fixed bin,
2 version fixed bin,
2 n_pages fixed bin,
2 mod_pages (57) fixed bin (33);
dcl 1 mod_page_breakout aligned,
2 segment_number bit (18) unal,
2 page_number fixed bin unal;
where:
"tid" is the transaction identifier of the current
transaction. It is used to make sure that the pages of
one transaction get flushed before the pages of another
are entered into the table.
"flush_bj_first" tells whether to flush the before-journal
before flushing the pages.
"going_down" is used to optimize the insertion algorithm for
the case where the pages are being modified
sequentially. Down is toward the head of the table.
"low_idx" points to the end of the stack originating at the
bottom of the table.
"high_idx" points to the end of the stack originating at the
top of the table.
"version" is the version number required by the supervisor
primitive that flushes the pages. This item is the
beginning of the structure that is passed to the
supervisor.
"n_pages" tells the supervisor how many pages to flush.
"mod_pages" is the array of numbers that represent the
modified pages. They are kept in two stacks
originating at the ends of the array and pointing
toward the center.
"mod_page_breakout" breaks out the internal structure of the
numbers in the array.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
"segment_number" is the segment number of the page to be
flushed.
"page_number" is the zero relative ordinal number of the
page within the segment.
6.3 protection incompleteness
Multics directories are not protected in the data-management
sense, so data-management files have the pecular defect that
although the control-intervals modified by committed transactions
are secure against system failure, a recently created file may
disappear or a recently deleted file may reappear after a system
failure. Furthermore, recently allocated control-intervals and
the data they contain can be lost if the allocation required a
new MSF component to be created. This is true even after the
transaction commits successfully because the file manager can not
force modified directories to be written to mass-storage.
Fortunately, such lossage can only happen when the system crashes
and emergency shutdown (ESD) is not successful, which is rare.
Directory lossages have several other implications which will not
be elaborated on. If the system crashes and ESD fails soon after
data-management system startup, tables in newly created segments,
upon which the file manager and the before-journal manager
depend, may be lost. This would render the data-management
system incapable of undoing the modifications of uncommitted
transactions. The loss of a newly created before-journal would
have the same effect on the files which it was supposed to
protect.
Several solutions to this problem have been proposed. The
supervisor could provide a directory flushing primitive that the
file manager would call after it creates or deletes a file or
component. Each directory could have a switch that, when turned
on, would cause it to be flushed after every modification. Or,
the DIRW card can be added to the config deck. This is an
existing Multics capability, but causes a lot of overhead because
it flushes every directory after every modification.
Another solution is to adopt the orphan segments and directories
after an ESD failure. This would be rather complicated,
especially if ACLs and addnames are to be restored. Since ACLs
and addnames are important in many applications, any scheme that
does not restore them is probably inadequate.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
7 CONCURRENCY CONTROL
There are two aspects of concurrency control. One is the
protection which permits users to read and write files without
regard to other users who are accessing them simultaneously. It
is provided by a hierarchical lock manager which is called by the
file manager. The only lock logic in the file manager is there
to permit an application to give the file manager optimization
advice that allows it to skip some calls to the lock manager
without compromising concurrency protection. The other aspect of
concurrency control is the so called fast lock mechanism that is
used to regulate access to the file manager's internal tables
which are shared by all processes. Currently, there is only one
such table, the UID/pathname table.
7.1 lock hierarchy
The lock manager uses a hierarchical lock model which is intended
to maximize concurrency and minimize overhead. The model has a
two level hierarchy in which files occupy the higher level and
control-intervals occupy the lower level. Locks come in several
degrees of exclusivity, called modes. Some modes are compatible
with themselves or other modes, which means that they can be
granted simultaneously. When incompatable modes are requested
for the same file or control-interval, one of the requesters must
wait. A file may be locked without locking any control-intervals
in it, but no control-intervals may be locked without some mode
of lock on the file. This model is explained briefly here and in
detail in MTB-514 "Concurrency Management - Overview".
Files and control-intervals may be locked in the following modes:
S Share
Let others lock this file or CI in S mode only.
X Exclusive
Let nobody else lock this file or CI in any mode.
IS Intention Share (only for files)
I need to lock at least one CI in S mode.
IX Intention Exclusive (only for files)
I need to lock least one CI in S or X mode.
SIX Share with Intention Exclusive (only for files)
Let others lock this file in IS mode only.
To the lock manager, a file is represented by its UID. A
control-interval is represented by the combination of its file
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
UID and its ordinal number. Normally, the file manager locks a
control-interval in S mode before it reads from it and in X mode
before it writes into it. It uses X mode to allocate or free a
control-interval. It uses X mode at the file level to create or
delete a file. It does not remember, from operation to
operation, what it has locked.
If several transactions become deadlocked, which means that the
first needs a lock that is held by the second and the second
needs a lock that is held by the third and so on back to the
first, the lock manager signals the transaction_deadlock
condition. This usurps control from the file manager during a
call to the lock manager. Since this signal is not handled in
the data-management-ring, a crawlout occurs which unwinds the
stack. The file manager is prepared for this and has cleanup
handlers where needed.
7.2 lock advice
The two levels of the lock hierarchy can be seen as two
granularities. File locks are coarse and control-interval locks
are fine. One of the properties of this model is that locking
too coarsely or using a more exclusive mode than necessary never
compromises the correctness of the locking protocol. This
permits some optimizations. For example, if the application will
be reading every control-interval in the file, it might as well
lock the entire file in S mode and save the overhead of locking
each control-interval.
There is an entry point by which an application can advise the
file manager about what file level lock mode to use on a
particular file. The file manager uses this advice by calling
the lock manager to lock the file in the advised mode just before
calling it to lock the first control-interval in the file, or the
first control-interval since the lock advice was given.
The lock advice mechanism is designed so that concurrency control
can not be compromised even if the application gives incorrect
advice. If no lock advice is given, the file manager does not
explicitly lock the file before locking the first
control-interval. Instead, it relies on the lock manager which
automatically locks the file in the least exclusive mode that is
sufficient for the control-interval mode. Furthermore, if the
file lock mode resulting from lock advice is insufficiently
exclusive, the lock manager automatically upgrades it.
In the future the file manager will probably take full
responsibility for choosing the correct file lock mode and
locking the file when necessary, because it can to do it more
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
efficiently than the lock manager. For the time being the file
manager uncritically uses the mode supplied by the application
and only locks the file when the operation applies to the whole
file or when lock advice has been given.
Two fields in the perprocess file access-data are used to
implement lock advice. One contains the lock mode and the other
contains the transaction identifier of the last transaction
during which the lock advice for this file was used. When the
file manager gets ready to lock a control-interval, it checks to
see if there is any lock advice. If there is, it compares the
transaction identifier in the access-data to the current
transaction identifier, to see if a new transaction has begun.
If it has, it updates the transaction identifier in the
access-data and locks the file in the advised mode.
7.3 fast locks
Fast locks are double-word semaphores used in the data-management
system. The file manager only uses one fast lock, on its
UID/pathname table. It is treated as an exclusive lock on the
table which is never accessed without its protection. Whenever
it is locked, a cleanup handler is established to unlock it in
case of a fault and subsequent unwind. If the process should
fail while it holds the lock, the next process to get the lock
will be advised of this by a status-code from the fast lock
manager. This status-code is explicitly ignored, because the
UID/pathname table is carefully updated and is never in an
inconsistent state.
8 PROTECTION AGAINST FAILURES
Protection against three kinds of failures is provided through
the mechanism of before-image journalizing. The three kinds are
application, process, and Multics failures. Journalizing
protects only the existence, content, and some of the attributes
of files. It does not protect perprocess data, file openings,
file ACLs, data communications, non-file data, etc. Most of the
journalizing mechanism is in the before-journal manager which is
described in MTB-560, "Before-journal Manager Design".
During a transaction, the file manager calls the before-journal
manager to journalize before-images, rollback-handlers, and
postcommit-handlers. Each image or handler contains the OID and
UID of the file to which it belongs. Before-images and
rollback-handlers undo file modifications during a rollback.
Postcommit-handlers perform delayed actions after a commit.
During a rollback, the before-journal manager processes
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
before-images and rollback-handlers in one reverse chronological
pass. After a commit, the before-journal manager processes
postcommit-handlers in reverse chronological order.
Multics page-control, the before-journal manager, and the file
manager obey a write-sync protocol, which prevents pages
containing modified control-intervals from being flushed
(written) to mass-storage until their before-images or
rollback-handlers are flushed. The write-sync protocol
guarantees that, if Multics crashes and main-memory can not be
flushed, all modified control-intervals in mass-storage can be
rolled back using before-images and rollback-handlers in
mass-storage.
8.1 before-images
An application puts data into a control-interval by building an
array of descriptors called "parts", and passing it to the file
manager. Each part consists of a byte offset, a byte length, and
a buffer pointer. Before modifying the control-interval, the
file manager copies the parts into a parts-array in its automatic
storage, so that there is no chance that they will change after
they are checked. When it does the copy, it copies the buffer
pointers into a separate array, and puts a pointer to the
location of the part in the control-interval in the automatic
parts-array. It then calls the before-journal manager with this
set of parts, from which the before-journal manager constructs
the before-image.
The automatic parts-array and the separate array into which the
buffer pointers are copied have the same dimension, which has
been chosen so that it is almost always larger than the dimension
of the parts-array built by the application. If the
application's parts-array is larger, it will be processed in
chunks, so there will be more than one before image. If metering
shows that this is happening regularly, the size of this
dimension should be increased by changing it in the declaration
of the two arrays and recompiling the program (fm_put_).
During rollback, the before-journal manager processes
rollback-handlers by reconstructing the parts-array and calling
the file manager's "unput" primitive. The "unput" primitive
differs from the "put" primitive in that it does not take a
before image and it does not modify the control-interval stamp.
It does not modify the stamp because the journal is not
necessarily flushed before rollback, so the stamp may still be
holding the control-interval in main-memory (see "write-sync
protocol" below). If the rollback does not take place in the
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
process that created the transaction, the before-journal manager
opens files using the UIDs in the before-images.
8.2 rollback-handlers
The file manager uses rollback-handlers to protect operations
such as "create", "delete", "allocate", and "free". Before one
of these operations, the file manager journalizes a
rollback-handler record which identifies the operation and
contains the information necessary to undo it.
Some operations such as "allocate" may need to hold many
control-intervals in main-memory using the write-sync protocol.
When the file manager calls the before-journal manager to write a
rollback-handler, it always tells the number of pages that will
be held. The before-journal manager needs this information to
calculate an upper bound on the number of pages that are being
held by the write-sync protocol, so that it can start flushing
journals if it gets near the limit.
During a rollback, the before-journal manager processes
rollback-handlers by calling the file manager's "undo" primitive
and passing it a pointer to the rollback-handler record. Undo
operations are idempotent, so if rollback is interrupted, it can
be restarted from the beginning. If one process starts a
transaction and another rolls it back, during the rollback, the
before-journal manager opens files using the UIDs in the
rollback-handlers. The UID passed to "undo" is used to double
check the correctness of the OID.
Some operations, such as "create" and "free", can not hold their
effects in main-memory. So, they must flush the before-journal
before they act. In order to save an occasional extra call, the
file manager always tells the before-journal manager whether to
flush the journal after a rollback-handler is written.
8.3 postcommit-handlers
Some file modifications are more efficiently protected by using
delayed posting, instead of rollback. In delayed posting,
instead of journalizing a rollback-handler and then performing
the action immediately, the action is postponed by journalizing a
postcommit-handler. Delayed posting is complicated because, for
the remainder of the transaction, the file manager must maintain
the deception that the action was actually performed.
File deletion is a good example of the need for delayed posting.
Delaying the deletion of a file until the transaction commits is
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
cheaper than copying the whole file into the before journal so
that it can be deleted immediately.
Delayed posting requires a two-phase commit protocol. The first
phase flushes the modified control-intervals, marks the
transaction as "committing", and executes delayed actions
specified by postcommit-handlers in the journal. The second
phase flushes the control-intervals modified during the first
phase, marks the transaction as "committed", discards its
before-journalizations, and unlocks its locks. If the commit is
interrupted during either phase, the recovery action is to begin
again from the beginning of the phase. If a process is
interrupted during the first phase and another process begins it
again, the before-journal manager opens files using the UIDs in
the postcommit-handlers. The UID passed to "postcommit_do" is
used to double check the correctness of the OID.
Two problems must be taken into consideration in the design of a
delayed action. First, the user can request an action and later
request an action that counteracts it. Second, since it is much
easier for the before-journal manager to thread
postcommit-handlers backward than forward, and since it is much
easier to process them in thread order, it presents them to the
file manager in reverse chronological order during commit. These
two problems are solved with two rules. One, a
postcommit-handler must act only if the action is still
postponed. It will always be possible to determine this because
simulating an action requires that there be some way of telling
that it has been postponed. Two, a counteraction must be
prepared to handle the postponed case.
8.4 write-sync protocol
The write-sync protocol is described in MTB-564, "Phasing Page
Control and Before Journal". The MSF component segments of files
protected by rollback are "synchronized-segments". They receive
special handling by page-control which assumes that the first two
words of a "synchronized-page" are formatted as a
control-interval stamp. When page-control intends to purify or
flush a synchronized-page, it first examines the stamp in the
control-interval header. The stamp contains an index into a
table that is maintained by the before-journal manager. Each
table entry represents a before-journal and contains the clock
reading of the latest before-image that has been flushed.
Before-images in a before-journal are flushed sequentially. In
addition to the index, the control-interval stamp also contains
the clock reading of the before-image taken the last time the
control-interval was modified. If the clock reading from the
table is earlier than the clock reading from the stamp, the
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
synchronized-page must be held in main-memory. It can not be
purified yet, because if it were and Multics crashed and ESD
failed, its before image might be lost.
When the file manager writes a before-image (or rollback-handler)
the before-journal manager reads the clock, puts the reading into
the before-image, and adds the before-image to the end of the
journal. It then returns to the file manager with the index of
the before-journal and the clock reading. The file manager puts
the index and clock reading into the control-interval header and
then proceeds to modify the control-interval. When the
before-journal manager flushes a journal, it records the clock
reading of the last before-image flushed in the table referenced
by page-control.
8.5 support for before-journals
Before-journals are unprotected data-management files. The
before-journal manager keeps track of the control-intervals it
has modified and flushes them periodically. The file manager
provides the "flush_consecutive_ci" entry point for this purpose.
It is capable of flushing a consecutive group of
control-intervals.
The before journal-manager depends on the clock reading in the
stamp in the control-interval header. When the file manager
modifies a control interval in an unprotected file, it sets the
before-journal index to zero and puts the current time in the
clock field. The before-journal manager uses this clock reading
when it is trying to find the end of a journal after a crash.
9 PROPOSED ROLLBACK HANDLERS
There are four file manager operations that are either not
protected against failures or are not protected in the most
efficient way. The four operations are:
create file (unprotected)
delete file (unprotected)
allocate control-interval (unprotected)
free control-interval (inefficiently protected)
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
9.1 existing rollback protection
9.1.1 CREATE (UNPROTECTED)
A file is created. If the transaction commits, everything is
fine. If the transaction aborts, the file remains.
9.1.2 DELETE (UNPROTECTED)
A file is deleted. If the transaction commits, everything is
fine. If the transaction aborts, the file is lost.
9.1.3 ALLOCATE (UNPROTECTED)
If a control-interval is already allocated, no action is taken.
If a control-interval is free, storage is reserved for it and
quota is taken up. If the transaction commits, everything is
fine. If the transaction aborts, the storage is not freed and
quota is not released. Unwanted control intervals produced in
this way have an all zero data content because the data content
is all zero at creation and subsequent updates are rolled back.
The lack of rollback protection for the allocate operation is
compensated by the collection manager. It keeps a map of
control-intervals that it has allocated. If it allocates a
control-interval in a transaction that is later aborted, the map
is rolled back, and the content of the control-interval is rolled
back, but the control-interval is not freed. The next time the
collection manager tries to allocate this control-interval, it
receives dm_error_$ci_already_allocated. The collection manager
treats this like a zero status-code. The control-interval
contains all zeros, just like a freshly allocated one.
9.1.4 FREE (INEFFICIENTLY PROTECTED)
If a control-interval is already free, no action is taken. If it
is not already free, a before-image of the entire data content of
the control interval is written to the before-journal, the
before-journal is flushed, the storage of the control-interval is
freed, and quota released. Flushing the before-journal is
necessary because the storage is freed by zeroing the page which
eliminates the before-journal synchronization information which
would have held the page until its before-image was written. If
the transaction commits, everything is fine. If the transaction
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
aborts, the control-interval is automatically reallocated when
the before-image is restored.
9.2 proposed full rollback protection
9.2.1 CREATE (PROTECTED)
A file is created with a shriek name. Then its real name is
added. If the transaction commits, the shriek name is removed.
If the transaction aborts, the file is deleted.
9.2.2 DELETE (PROTECTED)
All the names of a file are replaced with one shriek name. If
the transaction commits, the file is deleted. If the transaction
aborts, the shriek name is replaced with the origional names.
9.2.3 ALLOCATE (PROTECTED)
If a control-interval is already allocated, no action is taken.
If a control-interval is "logically free", a before-image of its
data content is written to the before-journal, its data content
is zeroed, and the "logically free" mark is removed. If the
transaction commits, everything is fine because a
postcommit-handler for free will not free a control-interval
unless it is "logically free". If the transaction aborts, the
before-image of the content will be restored and the
rollback-handler for the free operation, which removes the
"logically free" mark, will do nothing because it is idempotent.
If a control-interval is free, storage is reserved for it, quota
is taken up, and a rollback-handler is written in the
before-journal. If the transaction commits, everything is fine.
If the transaction aborts, the rollback-handler frees the storage
and releases the quota.
9.2.4 FREE (PROTECTED)
If a control-interval is free or "logically free", no action is
taken.
If a control-interval is allocated, a rollback-handler is written
to the before-journal, the control-interval is marked as
"logically free", and a postcommit-handler is written to the
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
before-journal. If the transaction commits and the
control-interval is still "logically free", the
postcommit-handler frees the storage of the control-interval and
releases quota. If the transaction commits and the
control-interval is allocated, the postcommit-handler does
nothing. If the transaction aborts, the rollback-handler removes
the "logically free" mark.
9.3 proposed partial rollback protection
9.3.1 CREATE (UNPROTECTED)
No change to the existing implementation.
9.3.2 DELETE (UNPROTECTED)
No change to the existing implementation.
9.3.3 ALLOCATE (PROTECTED)
If a control-interval is already allocated, no action is taken.
If a control-interval is free, storage is reserved for it, quota
is taken up, and a rollback-handler is written in the
before-journal. If the transaction commits, everything is fine.
If the transaction aborts, the rollback-handler frees the storage
and releases the quota.
9.3.4 FREE (INEFFICIENTLY PROTECTED)
No change to the existing implementation.
9.4 user environment implications
9.4.1 CREATE (WITH AND WITHOUT)
MRDS creates files during database creation and restructuring.
These operations modify the database model which describes the
files in the database directory. The model is not protected.
After a transaction containing database creation or restructuring
aborts, the model may be out of sync with the contents of the
database directory. This is true regardless of whether file
creations are rolled back or not. If file creations are not
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
rolled back, their content is rolled back to all zero so they are
not formatted as relations.
9.4.2 DELETE (WITH AND WITHOUT)
Data-management files in MRDS databases are deleted by MRDS
restructuring and by the delete_dir command which is used to
delete entire databases. In the case of restructuring, the
relations described by the database directory can become out of
sync with those described in the model regardless of whether
deletions are rolled back or not, because the model is not
protected. In the case of the delete_dir command, the contents
of the database directory, including the model, are deleted in an
unspecified order. If there is no current transaction, one is
started before and committed after each file deletion. If the
only operation in a transaction is the deletion of one file,
interrupting it and rolling it back has little value. If the
entire database deletion is to be wrapped in one transaction, the
delete_dir command must be modified to commit the transaction
before it deletes the directory, because rollback will not
recreate the containing directory of a file. Furthermore, the
proposed implementation of protected deletion does not delete the
file until the transaction commits.
9.4.3 ALLOCATE (WITH AND WITHOUT)
Not rolling back control-interval allocations has an effect on
the quota taken up by a MRDS database. When a transaction
allocates control-intervals and then aborts, the quota is not
released. These control-intervals waste quota until the
collection manager uses them again, which might be a long time.
There is an unfortunate special case where the transaction aborts
because it has allocated so many control-intervals that it runs
out of quota. The only way to recover this quota is to reload
the database.
9.4.4 FREE (EXISTING AND PROPOSED)
The free operation is fully protected in both the existing
implementation and the proposed one. The only differences are
the amount of space used in the before-journal and the time at
which quota is released. The existing implementation records a
large before-image of the entire control-interval. The proposed
implementation records a small rollback-handler and a small
postcommit-handler. Moreover, each handler may cover a group of
control intervals. The existing implementation releases quota
immediately. The proposed implementation releases quota after
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
the transaction commits. Neither of these differences are very
significant.
9.5 implementing features later as an incompatable change
9.5.1 CREATE AND DELETE
Without major enhancements, MRDS can not take advantage of the
ability to roll back create and delete. Providing the ability
would not improve MRDS, nor does the lack of the ability injur
it. Non-MRDS applications might be able to use this ability, but
this is uncertain because no such applications are currently
planned.
9.5.2 ALLOCATE AND FREE
Protecting the allocate operation at a future time will be
compatible with the collection manager and thus MRDS. This is
because the collection manager has been designed to work
correctly either way. It is easy for a non-MRDS application to
achieve this compatability, and it can be described in the
documentation.
The free operation is already fully, but inefficiently,
protected. The amount of before-journal space needed for various
operations is already a concern. Protecting free while using
less bj space seems like a good idea, until the amount of effort
is taken into consideration. The large amount of bj space is not
caused by the current implementation of rollback protection for
the free operation. This is because control-intervals in
relations are rarely freed, because neither the relation manager,
the index manager, nor the collection manager, in the current
implementation, do any garbage collection. The only case that
will cause the freeing of a control interval is when all of the
elements in it have been deleted. The only scenario that would
cause this is a dsl_$delete where the selection expression causes
all the index elements in an index node to be freed. Even if
every tuple in the relation is deleted, the amount of bj space
associated with freeing the control-intervals will be less than
the space used up in deleting the elements.
10 PROPOSED FILE DUMPING
In order to protect data-management files against the failure of
the mass-storage media upon which they reside, they can be
periodically dumped onto archival storage media. While files are
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
implemented as MSFs, the existing Multics hierarchy and volume
dumpers can serve this need. Their main defect is that they do
not honor the protection provided by the file manager, so there
is no guarantee that the dump will be consistent. Therefore,
extra administrative and operational effort may be required to
get consistent dumps.
The volume dumper exhibits this defect more severly than the
hierarchy dumper because the component segments of a
multi-segment file on a logical volume which is composed of
several physical volumes are likely to reside on different
physical volumes, and since the volume dumper dumps an entire
physical volume before proceeding to the next, there will be a
substantial time lag during which the file may be updated,
causing the dumped components to be inconsistent with the
components yet to be dumped. The hierarchy dumper, on the other
hand, dumps all of the components in succession so that there is
less chance that they will be updated during the dump.
The capability to consistently dump data-management files has not
yet been specified or designed. It must not only be able to dump
files consistently, but must be able to dump groups of files
which constitute databases and other application dependent
assemblages, which must be kept consistent with respect to each
other.
11 PROPOSED AFTER JOURNALS
Dumping only protects files to the extent of restoring their
content as of the time of the last dump. Protection against mass
media failure can be further improved by journalizing all data
updates after a dump. When the files have been restored from the
dump, these after-images can be reapplied to bring the files up
to date.
An after-image journal is very similar to a before-image journal.
It contains images of the data after modification instead of
before. The after-journal manager has been specified in MTB-561
"Data Management: After Journal Manager Specification". It has
not yet been designed or implemented.
The file manager contains some code that anticipates
after-journals. There is a bit in the perprocess access-data
structure, but there is no corresponding bit in the file
attributes structure. This bit is tested by "if" statements
where the after-journal manager would be called, but the "then"
clauses are null.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
12 THE DATA MANAGEMENT DAEMON
The file manager has some entry points which are intended to be
called by the data-management daemon. These entry points are
associated with system initialization, rolling back transactions
of dead processes, and recovering after a Multics crash.
12.1 transaction adoption
When the daemon attempts to roll back a transaction, the file
manager's list of modified control-intervals grows. The attempt
may fail, leaving modified control intervals in the table.
Therefore, before the daemon attempts to roll back a transaction,
it calls the "adopt" primitive which discards any contents of the
table.
12.2 recovery after crash
The file manager has three entry points to support recovery after
a Multics crash. The first initiates the old UID/pathname table.
During the rollback of transactions caught by the crash, the
before-journal manager calls "open_by_uid_after_crash" instead of
"open_by_uid". Finally, when the recovery is complete, there is
an entry point to terminate the old UID/pathname table which will
soon be deleted by the recovery program.
12.3 daemon access
The file manager always gives the daemon "rw" access to
data-management files, so that it can roll them back when
necessary. It also tries to give the daemon "sma" to the
containing directory, so that the daemon can roll back the create
and delete operations, but the user does not always have enough
power to do this.
13 EXTENDED OBJECT SUPPORT
The most fundamental requirement for extended-object support is a
"validate" primitive. Since it is heavily used, it should be
efficient, so it is implemented in a way that does not require
the file to be opened or perprocess initialization to be run.
When a file is created, the name "_Data_Management_file_._" is
added to component zero. The "validate" primitive verifies that
component zero has the correct ring brackets and addname. The
"validate" primitive is transfered to directly from the gate so
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
that control does not pass through the data-management-ring
transfer-vector, which would cause perprocess initialization to
run. It is in a separate program (fm_validate_) that is not
bound with the vector, for the same reason.
14 ACCESS CONTROL
14.1 ring brackets
Ring brackets for data-management files have not been implemented
yet. See "PROPOSED FEATURES".
14.2 access control lists (ACLs)
The file manager uses the ACL primitives of the MSF manager. The
file manager ACL primitives are just "call-throughs" to the
corresponding MSF manager primitives.
14.3 access isolation mechanism (AIM)
The file manager has no explicit code to support AIM. It relies
on Multics and some data-management system primitives for that.
There is a separate data-management system for each AIM level. A
process can only participate in one data-management system and
thus can access data-management files at only one AIM level.
15 ERROR HANDLING AND STATUS REPORTING
Most of the file manager primitives have a status-code as their
last parameter. The code is used both for error reporting and
status reporting. More current thinking is that the status-code
should only be used for status reporting and that errors should
be signaled.
15.1 status reporting
In the file manager, statuses are usually represented by one bit
flags. Just before a primitive returns, it tests the flag and
either returns a zero status-code or some particular status-code
if the flag is on.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
15.2 error reporting
In most cases, when the file manager detects an error, it jumps
to an error label which sets the status-code to the appropriate
error-code and returns. It is in the process of being converted
to call sub_err_ before returning an error-code. The modules
which have been converted have sub_err_ handlers that record
useful messages in dm_system_log.
15.3 condition handling
The file manager does not handle any conditions other than
sub_err_, although it does have cleanup handlers where necessary.
Cleanup is not really a condition, it is a recovery action. The
file manager should have seg_fault_error handlers so that it
could return a proper status-code when a file is deleted out from
under it. See "PROPOSED FEATURES".
16 INITIALIZATION
16.1 system initialization
System initialization for the file manager consists only of
creating and initializing the UID/pathname table. It runs in the
data-management daemon's process.
16.2 process initialization
Process initialization consists of initiating the UID/pathname
table, allocating the access-data table, and allocating the table
of modified control-intervals.
17 FILE MANAGER MODULARIZATION
The main consideration governing the modularization of the file
manager was performance. The result is that the file manager is
very unmodularized. The primitives are divided among the
separately compiled programs so as to keep the stack-frame size
of the most heavily used operations as small as possible.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
17.1 fm_attribute_.pl1
This module contains most of the primitives that support
extended-objects. In general it deals with attributes of files
such as ACLs, names, switches, etc.
17.2 fm_combos_.pl1
This module contains primitives that do not need the complicated
machinery of fm_open_. It was intended to contain primitives
that were actually combinations of calls to entry points in
fm_open_. Only "create" and "delete" actually work like that.
The "create" primitive is a combination of "create_open" and
"close". The "delete" primitive is a combination of "open" and
"delete_close". It also contains "open_by_uid" and the three
entry points associated with the old UID/pathname table.
17.3 fm_data_.alm
This is an alm data segment. The segdefs that refer to its
linkage section are external static variables used exclusively by
the file manager. The segdefs in its text parameterize a few
file manager characteristics such as the name of the UID/pathname
table and three switches that permit protection, locking, and
before-journalization to be turned off. The switches are used in
the calculation of the ancillary service switches in the
access-data when a file is opened, so they can be used to turn
off protection, locking, or before-journalization for the whole
system. Needless to say, they are intended for use by developers
only.
17.4 fm_fetch_.pl1
This module contains the "fetch" and "store" primitives. If they
prove to be useful, they should be moved to fm_get_ and fm_put_
respectively.
17.5 fm_get_.pl1
This is the most heavily used module. It contains operations
that read control-intervals. It is a small, highly optimized
module that pushes a small stack frame. It contains the most
frequently called primitive, "get". It also contains
"get_ci_header". It has only one subroutine called "INIT" which
just initializes stack frame variables to null values and should
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
be removed when performance becomes the main consideration,
because the logic of the program does not need it.
17.6 fm_open_.pl1
This module contains the slow file manager operations. It pushes
a large stack frame. In general, it contains primitives that
apply to the file as a whole, such as "create_open", "open",
"close", and "delete_close". Most of the access-data table
machinery is in this program. Also, the machinery for inserting
and removing entries in the UID/pathname table is here.
17.7 fm_put_.pl1
This is the second most heavily used module. It contains
operations that modify control-intervals. It is highly
optimized, pushes a medium size stack frame, and contains the
frequently called "put" primitive. It also contains "allocate"
and "free". It has a large subroutine that manages the list of
modified control-intervals.
17.8 fm_read_.pl1
This module contains the "read" and "write" primitives, which
allow a user to view a data-management file as a continuous array
of bytes.
17.9 fm_std_error_handler_.pl1
This module contains what little condition handling the file
manager currently has. It is intended that all primitives will
establish any_other handlers that invoke this module.
17.10 fm_validate_.pl1
This module contains the highly optimized "validate" primitive.
It is transfered to directly from the gate and is not bound with
the rest of the file manager so that its invocation does not set
off perprocess initialization.
18 DESCRIPTIONS OF OPERATIONS
The names of some file manager entry points are part of a
pattern. The ones marked with a star are not primitives.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
OPERATION ROLLBACK OP POSTCOMMIT OP ROLLFORWARD OP
undo postcommit_do redo
create uncreate* recreate*
delete undelete* postdelete* redelete*
allocate unallocate* reallocate*
free unfree* postfree* refree*
put unput reput
18.1 acl_add
This is just a call-through to msf_manager_$acl_add.
18.2 acl_delete
This is just a call-through to msf_manager_$acl_delete.
18.3 acl_list
This is just a call-through to msf_manager_$acl_list.
18.4 acl_replace
This is just a call-through to msf_manager_$acl_replace.
18.5 add_acl_entries
This just opens the file and calls "acl_add".
18.6 adopt
This is used by the data-management daemon to discard anything in
the file manager's list of modified control-intervals, so that it
can begin working on a new transaction.
18.7 allocate
Allocate allows applications to reserve mass-storage in large
chunks for better performance. It avoids the frequent updates to
the file map that would occur if the control-intervals were
allocated as needed by the put primitive. Currently, allocate is
not efficiently implemented, because the way to reserve a
mass-storage address for a page is to put something in it. The
semantics of allocate are designed to allow efficient
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
implementation in the proposed large-files where it should only
reference the file map. There must be no need to IO any of the
control-intervals being allocated.
18.8 chname_file
This is a call-through to hcs_$chname_file.
18.9 close
This primitive is more complicated than one would expect because
it can not close the file if it is in a transaction. Also it has
to worry about clearing the list of modified control-intervals.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
18.10 create
18.11 create_open
18.12 delentry_file
18.13 delete
18.14 delete_acl_entries
18.15 delete_close
18.16 end_of_crash_recovery
18.17 fetch
18.18 find_old_uid_pn_table
18.19 flush_consecutive_ci
18.20 flush_modified_ci
18.21 free
18.22 get
18.23 get_ci_header
18.24 get_exclusive
This is in every way identical to "get" except that it locks the
control interval in exclusive mode. It is intended for use by
applications that expect to do a "put" into a control-interval,
but must get from it first.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
18.25 get_max_length
18.26 get_switch
18.27 get_user_access_modes
18.28 list_acl
18.29 list_switches
18.30 lock_advice
18.31 open
18.32 open_by_uid
18.33 open_by_uid_after_crash
18.34 postcommit_do
The name of this primitive is misspelled "post_commit" in the
specification, in the transfer vectors, and in fm_open_.pl1 where
there is a stub for it. The name "postcommit_do" is correct and
consistent with its meaning and the naming convention that also
includes "unput", "undo", and "redo". The description of this
primitive in the specification is obsolete. When the
specification was written, the plan was for the file manager to
keep a list of postcommit actions. Now, the plan is to keep the
postcommit actions in the before-journal as postcommit-handlers.
The parameter list in the specification and stub is all wrong.
It should be:
file_manager_$postcommit_do (OID, UID, postcommit_handler_ptr)
file_manager_$postcommit_do entry (bit (36) aligned, bit (36)
aligned, ptr)
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
18.35 prepare_to_copy
18.36 put
18.37 put_journal
18.38 raw_get
18.39 raw_put
18.40 read
18.41 redo
The "redo" entry point is not specified or implemented yet. It
will be called by the after-journal manager during roll-forward.
It is to the after-journal what the "undo" entry point is to the
before-journal. It will probably look something like this:
file_manager_$redo (OID, UID, rollforward_handler_ptr)
file_manager_$redo entry (bit (36) aligned, bit (36) aligned,
ptr)
18.42 replace_acl
18.43 reput
This entry point is not specified or implemented yet. It is to
the after-journal manager what "unput" is to the before-journal
manager.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
18.44 set_bit_count
18.45 set_max_length
18.46 set_switch
18.47 status
18.48 store
18.49 sub_err_flag_get
18.50 sub_err_flag_set
18.51 suffix_info
18.52 undo
This entry point is not in the specification and is not
implemented. It exists in fm_put_.pl1, but it is all wrong.
Even the parameter list is wrong. It should be:
file_manager_$undo (OID, UID, rollback_handler_ptr)
file_manager_$undo entry (bit (36) aligned, bit (36) aligned,
ptr)
18.53 unput
The before-journal manager needs a special put primitive in the
file manager for use during rollback. Its parameters are the
same as the regular put entry point. Its action differs as
follows:
1. No before-image is taken.
2. The control-interval is not locked.
3. An after-image is taken if the file is so protected.
4. The process need not be in transaction mode.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
5. The date/time modified field of the control-interval header
is not updated; nor is it restored. It retains the date/time of
the modification that is being rolled back.
18.54 validate
18.55 write
19 TESTING AND DEBUGGING TOOLS
19.1 command interface
A tool to permit file manager primitives to be invoked from
command level and exec_coms would be useful. See "PROPOSED
FEATURES".
19.2 create_file and delete_file
The commands create_file (crf) and delete_file (dlf) call
file_manager_$create and file_manager_$delete. These commands
wrap the creates and deletes in a transaction if there is no
current transaction. If several files are in the same invocation
of the command, they are all done in one transaction. If any
error occurs, the transaction is aborted. These commands do not
honor the star convention. Create only creates protected files
with default attributes.
create_file (crf)
delete_file (dlf)
usage: create_file file1 file2 file3 ... fileN
usage: delete_file file1 file2 file3 ... fileN
19.3 fm_tester
This useful routine tests the main file manager primitives. It
creates several files and does a lot of "get" and "put"
operations on them. It is a quick test that is very useful when
a small change has been made to the file manager.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
19.4 fm_driver
This is intended for wringing out the file manager and the
ancillary services. It copies every segment in a certain
directory into a data-management file. It is designed to be run
from multiple processes in order to test concurrency control. If
a particular segment is already in the file, it compares it to
the segment in the directory to make sure the copy in the file is
correct.
20 PROPOSED FEATURES
This section describes the features that have been proposed for
the file manager. For each feature, it gives the reason,
performance implications, priority, and an estimate of the effort
necessary to provide it. All effort estimates are given in
undiluted time. The actual time can be estimated by doubling the
undiluted time.
20.1 software ring brackets
Access to data-management files would be constrained by two
numbers, called the write bracket and the read bracket. Each
file operation would be classified as either a read operation or
a write operation. Read operations would be permitted only if
the caller's validation level was less than or equal to the read
bracket number. Write operations would be permitted only if the
caller's validation level was less than or equal to the write
bracket number.
The brackets would be specified in the file creation information.
If specified, the read bracket could not be lower than the write
bracket and neither could not be lower than the caller's
validation level. If not specified, they would both default to
the caller's validation level. In either case, neither bracket
could be lower than the data-management-ring. The brackets would
constrain the access to all data-management files, regardless of
whether they were "protected" in the data-management sense. In
the initial implementation, it would not be possible to change
the brackets of an existing file.
Operations that are only used in the data-management-ring would
never be prevented because the brackets could not be lower than
the data-management-ring. As usual, the caller must insure that
the validation level is set. Operations that affect many files,
like fm_$flush_modified_ci are only called from the
data-management-ring.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
20.1.1 REASON
Data-management files are intended to be data storage building
blocks for Multics subsystems. Access to the data of an inner
ring subsystem must be controllable by the subsystem. Currently,
no such controllability is provided. Brackets would provide the
same type of control that segments do. A real need for brackets
already exists in the before-journal manager, which can not
protect its journals from access from outer rings.
20.1.2 PERFORMANCE
The cpu time would be about 60 microseconds on a DPS8 cpu. This
includes the call to cu_$level_get and the comparison between the
level gotten and the appropriate bracket in the opening
information block. It is 4% of the 1500 microsecond average cpu
time of fm_$get, the most cpu time consuming operation in the
mrds_driver test. It is 2.4% of the 2500 microsecond average cpu
time of fm_$put, the second most cpu time consuming operation in
that test. Checking software ring brackets would have increased
the cpu time of the test about 0.4%. The extra storage
requirements are small. They consist of an extra word in the
access-data block and a few extra words in the stack frame of
each operation.
20.1.3 EFFORT
The module fm_get_ contains only operations that would be
classified as file read operations. The module fm_put_ contains
all write and non-file operations, except for fm_$get_uid, which
can be moved to fm_get_. This means that fm_get_ can
unconditionally check the read bracket, because all of its
operations are classified as file read operations, and fm_put_
can check the write bracket whenever the operation is a file
operation, ie whenever it gets the access-data for a file.
20.1.4 PRIORITY
High.
20.2 audit hardcore support
Carefully examine the ring zero programs that support
synchronized segments to make sure they fulfill the expectations
of the file manager.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
20.2.1 REASON
This would help assure that we are giving files the protection we
claim that we are.
20.2.2 PERFORMANCE
20.2.3 EFFORT
20.2.4 PRIORITY
High.
20.3 flushing directories
If a hardcore primitive were available, the file manager could
flush changes it makes to directories during its creation,
deletion, and allocation operations.
20.3.1 REASON
When a file is created or deleted, the result of a committed
transaction could be lost if the containing directory is not
flushed.
20.3.2 PERFORMANCE
20.3.3 EFFORT
20.3.4 PRIORITY
Low.
20.4 hardcore support of UID pathnames
The hardcore supervisor would provide an entry point that would
return a string of directory UIDs that would represent the
location of the file in in the Multics hierarchy. The file
manager would store this string in its UID/pathname table,
instead of the character string representation of the pathname.
The hardcore would also provide an initiate entry point that
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
accepts such a string of directory UIDs instead of a character
string pathname.
20.4.1 REASON
Protected files would be more reliable because reliability would
not be defeated when the name of a directory is changed.
20.4.2 PERFORMANCE
Performance is not affected because the uid pathnames would only
be used during recovery.
20.4.3 EFFORT
Most of the work would be in modifying the hardcore.
20.4.4 PRIORITY
Low.
20.5 provide a pointer interface
Provide an interface that returns a pointer to the contents of a
control interval so that it can be accessed more efficiently. It
should be done in a way that can be supported when we upgrade to
the proposed large-files. The pointer must point to a segment
whos ring brackets are set so that it can be read form the
caller's ring.
The Multics address space is not large enough to provide a unique
pointer to every control-interval in a very large file, so there
must be ways to remove control-intervals from the address space.
One way is to provide an explicit primitive for this purpose.
Another way is to remove the control-intervals when the file is
closed. In the case of protected files, the control intervals
can be automatically removed at the termination of the
transaction.
Given the current msf implementation, passing out
control-interval pointers is very easy. All we have to do is set
the read bracket on the msf components so that they can be read
from the outer ring, provide an entry point to actually obtain
the control-interval pointer, and provide a no-op entry point to
terminate the pointer. One of the implications of this simple
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
implementation is that the file manager could never terminate an
msf component, unless the file were closed. This would limit the
number of components that could be accessed to about 3000,
assuming that nothing else was eating up a lot of segment
numbers.
In the large-file implementation, regardless of whether we
provide a control interval pointer interface, one or several PTWs
will have to be reserved for accessing each control-interval,
depending on its size. The control-interval would not
necessarily have to be read into memory at the time the control
interval pointer is created, because the page fault could be
handled by the file manager. Also, the control-interval could be
removed from main-memory on a least recently used basis.
A PTW does not have enough bits to store all the information that
is necessary to implement the access to a control-interval.
Twenty seven bits will be needed for the control-interval number.
There will probably be before-journal and locking information.
Since several processes could have pointers to the same
control-interval, each control-interval would need a list of the
processes that have pointers to it. When a control-interval
pointer is to be terminated, either explicitly or because the
file is closed or because the transaction is over, the process
must be removed from the list. When a control-interval has no
processes on its list, its PTWs can be reused.
20.5.1 REASON
Improve performance by reducing the number of calls to fm_$get.
20.5.2 PERFORMANCE
On October 28, 1983 a full run of mrds_driver was metered. It
showed that 8.4% of the virtual CPU time was spent in fm_$get and
that the average virtual CPU time per call was 1426 microseconds
on a DPS8 processor. Of that 1426 microseconds, 621 was spent in
the lock manager and 805 was spent in the fm_$get code.
The number of calls to fm_$get was 357472. The number of calls
to dsl_$retrieve was 19043. The number of calls to dsl_$store
was 6383. The number of calls to dsl_$define_temp_rel was 210.
So, figuring roughly, there were about 10 fm_$gets per dsl_
operation. In order to determine how much would be would be
saved by providing a control-interval pointer, we would have to
know how often successive gets were done on the same
control-interval. Normally, getting one element requires two
calls to fm_$get. There were 118734 calls to
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
cm_setup_buffered_ci, which gets the whole control-interval. An
interesting result of this test was that it took almost exactly
the same amount of cpu time to get one element by doing two gets
as it did to get the element by copying out the entire
control-interval and getting one element out of the copy.
Providing a get_ci_pointer interface would probably reduce the
number of calls to fm_$get to negligible. The cost of
fm_$get_ci_ptr would be about the same as an fm_$get that moved
no data. There would be about half as many calls to
fm_$get_ci_ptr as there currently are to fm_$get, and on the
average, each would take about half as long. Therefore, the cpu
time saved would be about 6%. There would be about as many calls
to fm_$terminate_ci_ptr as there were to fm_$get_ci_ptr, but
since this is a no-op, it can be neglected.
The implications for the large-file implementation are that we
would have to keep a list of processes which have a pointer for
each control-interval. This is probably an acceptable tradeoff
considering the substantial performance gain associated with
hardware addressability.
20.5.3 EFFORT
One month.
Given the current msf implementation, passing out
control-interval pointers is very easy. All we have to do is set
the read bracket on the msf components so that they can be read
from the outer ring, provide an entry point to actually obtain
the control-interval pointer, and provide a no-op entry point to
terminate the pointer. One of the implications of this simple
implementation is that the file manager could never terminate an
msf component, unless the file were closed. This would limit the
number of components that could be accessed to about 3000,
assuming that nothing else was eating up a lot of segment
numbers.
In the large-file case, the effort is more substantial and more
is achieved. Keeping the list of processes that have pointers to
each control-interval will require a large and indeterminate
amount of per-system storage. In addition, users will probably
want complicated optimizations based on concurrency control
assumptions.
20.5.4 PRIORITY
High.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
20.6 file manager command interface
This is a tool to access the file manager primitives directly
from command level. The name of this command may be "flmc".
20.6.1 REASON
This command would permit developers can exercise various file
manager features for experimental purposes.
20.6.2 PERFORMANCE
20.6.3 EFFORT
20.6.4 PRIORITY
Low.
20.7 command to list open files
Provide a command to print a list of the files that are open in
the process.
20.7.1 REASON
Users often wish to know what files they have open.
20.7.2 PERFORMANCE
20.7.3 EFFORT
20.7.4 PRIORITY
Medium.
20.8 better validation of msf manager's pathname
Msf manager stores the pathname that a file is opened with in
perprocess storage. This pathname can become invalid if the file
or any of the directories above it are renamed.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
20.8.1 REASON
This would make the file manager more robust.
20.8.2 PERFORMANCE
20.8.3 EFFORT
20.8.4 PRIORITY
Low.
20.9 dynamic array of msf component segment numbers
Currently, the file manager must call the msf manager for
pointers to components with numbers greater than 27. This is
because the file manager has fixed size tables for component
pointers. Only files with more than 7140 control-intervals are
affected. The size of these tables should be made variable.
20.9.1 REASON
Calling the msf manager frequently is inefficient.
20.9.2 PERFORMANCE
20.9.3 EFFORT
20.9.4 PRIORITY
Medium.
20.10 set ring brackets on msf components to 2 5 5
Set ring brackets on msf components to 2 5 5. This is irrelevant
if the get_ci_ptr interface is implemented first.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
20.10.1 REASON
This would permit examination and dumping of the files by user
ring facilities.
20.10.2 PERFORMANCE
20.10.3 EFFORT
20.10.4 PRIORITY
Very low.
20.11 make fm_$open_by_uid failsafe
When the data-management daemon rolls back a transaction, it
calls file_manager_$open_by_uid as necessary to open the files.
If a file does not exist, fm_$open_by_uid returns et_$noentry or
et_$no_dir. Currently, the rollback portion of the
before-journal manager handles this situation by marking the file
as nonexistant, but it would be more elegant and consistent if
the file manager did this. The file manager already has the
concept of a deleted file and already knows to ignore unputs to a
deleted file. When the file does not exist, the open_by_uid
operation should return a zero status-code, but mark the opening
as that of a deleted file.
20.11.1 REASON
This would eliminate some spurious error messages from the
data-management log.
20.11.2 PERFORMANCE
20.11.3 EFFORT
20.11.4 PRIORITY
Low.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
20.12 make fm_$unput failsafe
If a user creates a file in her process directory that is
protected by rollback, puts some data into it, and then does a
new_proc without ending the transaction, the data-management
daemon is likely to experience a seg_fault_error during rollback
when the initializer deletes the process directory and all the
segments in it. The seg_fault_error is currently handled by the
daemon which considers the rollback a failure and tries it again
later. The rollback will succeed later because the fm_$open
operation will return et_$noentry which is acceptable to
rollback.
20.12.1 REASON
If the file manager handled the seg_fault_error, it would prevent
the rollback from being interrupted so that it would not have to
be retried later.
20.12.2 PERFORMANCE
20.12.3 EFFORT
20.12.4 PRIORITY
Low.
20.13 handle postponed file closing better
The code of lock_manager_$unlock_all contains a call to
fm_open_$post_txn so that the file manager can close any files
whose closing was postponed. The closing of files that are
protected by rollback must be postponed until the transaction is
over because the before-journal manager stores the file opening
id in each before-image and uses it if the transaction is rolled
back in the process that began it. Hopefully, we can find a way
to serve the needs of the before-journal manager without
requiring a call to the file manager at the end of each
transaction.
20.13.1 REASON
This would give better performance by eliminating one call per
transaction.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
20.13.2 PERFORMANCE
20.13.3 EFFORT
20.13.4 PRIORITY
Low.
20.14 optimize calls to bjm for new file
No before-images need to be taken for a file that was created in
the current transaction. This optimization can not be made until
the rollback of the create operation is implemented.
20.14.1 REASON
This would reduce the amount of before-journal space that is
used.
20.14.2 PERFORMANCE
20.14.3 EFFORT
20.14.4 PRIORITY
Low.
20.15 optimize calls to bjm for new CI
No before-images need to be taken for a control-interval that was
allocated in the current transaction or for a control-interval
for which a full before-image has already been taken.
20.15.1 REASON
This would reduce the amount of before-journal space that is
used.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
20.15.2 PERFORMANCE
20.15.3 EFFORT
20.15.4 PRIORITY
Low.
20.16 find something to lock before the open operation
The fm_$open operation is unprotected from concurrency conflicts.
For example, a file could be in the midst of being created or
deleted at the time it is being opened.
20.16.1 REASON
This would make the file manager more robust.
20.16.2 PERFORMANCE
20.16.3 EFFORT
20.16.4 PRIORITY
Very low.
20.17 keep modified CI list in persystem storage
Keep the list of modified control-intervals in persystem storage,
instead of perprocess storage, where it is now kept.
20.17.1 REASON
So that the data-management daemon would not have to roll back
transactions that are interrupted during the flush of modified
control-intervals.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
20.17.2 PERFORMANCE
20.17.3 EFFORT
20.17.4 PRIORITY
Very low.
20.18 give files a type field
Store a 32 character type field in the file attributes (eg
"collection_manager_").
20.18.1 REASON
This would tell what type of object the file represents.
20.18.2 PERFORMANCE
20.18.3 EFFORT
20.18.4 PRIORITY
Low.
20.19 add a debug switch
The file manager calls sub_err_ in many cases and has a handler
for the sub_error condition. A debug switch would permit the
handler to halt and call the command processor.
20.19.1 REASON
This change would make it easier to debug the file manager and
programs that call it incorrectly.
20.19.2 PERFORMANCE
No impact.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
20.19.3 EFFORT
20.19.4 PRIORITY
Low.
20.20 fix sma patch to delete
The delete operation was defective because it failed in the case
where the user had enough access to the containing directory to
delete the file, but did not have enough access to the msf
components to delete them. This defect was corrected for the
delete operation, but not for the delete_close operation.
20.20.1 REASON
This would make the operation of delete consistent with
delete_close.
20.20.2 PERFORMANCE
Insignificant effect.
20.20.3 EFFORT
20.20.4 PRIORITY
Low.
20.21 make protected the default
Make fm_$create default to "protected" when create_info_ptr is
null ().
20.21.1 REASON
The user will more often want protected files.
20.21.2 PERFORMANCE
No effect.
MTB-554, Revision 1 Multics Technical Bulletin
DM: file_manager_ design
20.21.3 EFFORT
One day.
20.21.4 PRIORITY
Low.
20.22 ability to change attributes
None of the file protection attributes can not be changed. It
would be nice to be able to have this capability.
20.22.1 REASON
A user might want to load out a file and then turn protection on.
20.22.2 PERFORMANCE
20.22.3 EFFORT
20.22.4 PRIORITY
Low.
20.23 keep opening count per ring
The opening count is to a data-management file what the count of
the number of times initiated is to a reference name. In
Multics, each ring has a separate reference name table and hence
a separate count. Currently, data-management files have just one
opening count for all rings.
20.23.1 REASON
Providing a separate opening count for each ring will prevent
outer rings from closing a file that an inner ring has open.
Multics Technical Bulletin MTB-554, Revision 1
DM: file_manager_ design
20.23.2 PERFORMANCE
20.23.3 EFFORT
20.23.4 PRIORITY
High.