Multics Technical Bulletin MTB-564
To: Distribution
From: André Bensoussan
Date: 02/04/83
Subject: Phasing Page Control and Before Journal
ABSTRACT
This MTB describes how Page Control and Data Management
cooperate in implementing the protocol known as the "Write Ahead
Log" (WAL) protocol.
When a data management file is modified, a "Before Image" is
logged in a Before Journal; that is, the portion of the file
about to be modified is saved in the journal, and used later to
undo the modification if a rollback is requested. In order for
the rollback to operate properly even after an emergency shutdown
failure, it is necessary to hold the data base modification in
main memory until its associated before image is actually
physically written to disk. This is the essence of the WAL
protocol.
Since the first implementation of data management files will
be done using Multi Segment Files, whose pages are moved to disk
by Page Control, the enforcement of this protocol cannot be done
without Page Control's participation. This MTB describes the
respective responsabilties of Page Control, File Manager and
Before Journal Manager in their contract to enforce the WAL
protocol.
_________________________________________________________________
Multics project internal working documentation. Not to be
reproduced or distributed outside the Multics project.
MTB-564 Multics Technical Bulletin
Comments should be sent to the author:
via Multics Mail:
Bensoussan.Multics on System M.
via US Mail:
André Bensoussan
Honeywell Information Systems, inc.
575 Tech Square
Cambridge, Massachusetts 02139
via telephone:
(HVN) 261-9334, or
(617) 492-9334
CONTENTS
Page
Abstract . . . . . . . . . . . . . . . i
1 Introduction . . . . . . . . . . . . 1
2 Abbreviations . . . . . . . . . . . 1
3 Background information . . . . . . . 2
4 Description of the protocol . . . . 3
4.1 Before Journal Manager protocol 4
4.2 File Manager protocol . . . . . 4
4.3 Page Control protocol . . . . . 5
5 Extension to several Before Journals 6
Multics Technical Bulletin MTB-564
1 INTRODUCTION
In the first release of the new Data Management, files will
still be implemented as MSF's and their pages will be written out
at page control's discretion.
In order to be able to undo a set of modifications done by a
transaction, the Data Management uses the "Before Journal"
technique: Before modifying any portion of a file, its original
value is recorded in a so-called "Before Image" (BI), appended as
a logical record to a sequential file called the "Before
Journal". If a modified page is written out to disk before its
before image is safe on disk, the rollback mechanism becomes
vulnerable to a system crash with ESD failure.
This MTB describes the methode used to make Page Control
cooperate with Data Management in such a way as to have Page
Control write out data pages to disk only after their before
images are safe on disk.
If this can be achieved, it gives the recovery mechanism of
the Data Management an enormous advantage: it can rollback all
unfinished transactions EVEN AFTER A SYSTEM CRASH WITH ESD
FAILURE.
If it could not be achieved, recovery after ESD failure
would require reloading the files that were open at the time of
the crash, using their last dumps, and applying all after images
recorded in the after journal(s). This is a very expensive
procedure compared to rolling back unfinished transactions.
A different proposal to achieve the same goal has been
described in MTB-563: "Data Management: Ordering of disk
I/O's", but has not been implemented. The method implemented in
the Data Management Sytem for MR10 is the method explained in
this memo.
2 ABBREVIATIONS
The following abbreviations are used in this document:
BJM = Before Journal Manager
BI = Before Image
CI = Control Interval
FM = File Manager
ESD = Emergency Shut Down
MSF = Multi Segment File
MTB-564 Multics Technical Bulletin
3 BACKGROUND INFORMATION
When the before journal manager is called to journalize a
before image, it enters the before image information in the
current CI of the journal, but it does not write the BI out to
disk at the time it records it. The CI is, in fact, a page of an
MSF and it will be written out to disk by Page Control. However,
when a transaction commits, the before journal manager causes all
CI's of the before journal to be flushed (written to disk) up to
the CI containing the last BI generated by the committing
transaction, and waits for these I/O's to complete.
The BJM is not informed each time a CI (page) of the before
journal has been written on disk; the interrupt is handled by
page control. But it can however keep track of up to what
control interval the journal is completely on disk, each time it
requests the journal to be flushed.
Multics Technical Bulletin MTB-564
4 DESCRIPTION OF THE PROTOCOL
Let us assume that there is only one before journal in the
system; the extension to several journals is simple and is
discussed at the end of this document.
It is convenient, for the description of the protocol, to
use the following definitions:
o A BI is "safe" if it is completely on disk, and all previous
BI's are also safe. A BI is "unsafe" if it is not safe.
o A CI of the before journal is " safe" if it is completely on
disk, and all previous CI's of the journal are also safe. A
journal CI is "unsafe" if it is not safe.
Conceptually,the journal can be broken up into two
contiguous parts: a safe part, which contains all the safe BI's,
follwed by an unsafe part, which contains all the other BI's,
still unsafe. The line that separates the two parts may very
well fall in the middle of a safe CI, if it happens that this CI
contains a portion of a still unsafe BI.
If each BI was time stamped at the time it is entered in the
journal, the time stamp of the last safe BI would always be
higher than the time stamp of any other safe BI, and always lower
than the time stamp of any unsafe BI. If, in addition, each data
page modified and in main memory had the time stamp of the last
BI associated with its modification, it would be possible to
determine if the data page could be written out to disk or if it
had to be held in main memory, until its BI becomes safe. The
proposed method can be sketched as follows:
o The BJM maintains the time stamp of the last safe BI in a
wired down location available for Page Control to examine.
o The FM stores in the standard header of each file CI the time
stamp of the BI produced the last time the CI was modified.
o Page Control writes out a file CI only if the time stamp in
the CI header is smaller than or equal to the time stamp of
the last safe BI maintained by the BJM.
MTB-564 Multics Technical Bulletin
4.1 Before Journal Manager protocol
a. When recording a BI:
o Record the BI, starting at the current position in the before
journal; the BI may span several CI's.
o Generate a time stamp for this BI (the time stamp need not be
recorded in the BI).
o For each unsafe CI, the BJM remembers the time stamp of the
last BI that will become safe when the CI becomes safe. In
order to do so, the BJM associates the time stamp of this BI
with the CI that happens to contain the end of the BI.
o Return the time stamp of the BI to the caller, i.e., the FM.
b. When committing:
o The BJM remembers the last safe CI from the last commit. It
knows the CI number n in which the committing transaction
produced its last BI. It causes the journal to be flushed up
to CI n, and waits for completion of all I/O's. When all
I/O's are completed, CI n becomes safe, as well as all BI's
entirely contained in the flushed CI's.
o The BJM kept track of the time stamp of the last BI that
would become safe when CI n would become safe. It stores
this time in the wired down location containing the time
stamp of the last safe BI of the journal, to be used by Page
Control.
4.2 File Manager protocol
o Before modifying a CI of a protected file, the FM calls the
BJM to record the necessary BI information and gets back the
time stamp of the BI generated by the BJM.
o It then stores this time stamp in the standard header of the
CI about to be modified.
o Only then can it start modifying the control interval.
Note -- The standard CI header contains the time the CI was last
modified. The BI time stamp can be used to also be the time last
modified.
Multics Technical Bulletin MTB-564
4.3 Page Control protocol
Page Control must be able to know that a page is a CI of a
protected file. The FM, when creating an MSF component for a
protected file, will set the "protected file switch" (a new
switch) in the VTOC entry. At segment activation, this switch is
moved in the ASTE. With this assumption, Page Control would have
to do the following:
o When Page Control decides to write out a page, it should now
check in the ASTE if the page is part of a protected file.
If not, it proceeds as if does today.
o If the page does belong to a protected file, it compares the
time stamp stored in the CI with the highest safe time stamp.
If it is greater, the page must not be written out because
its BI is not safe yet; if it is not greater, the page may be
written out, but first its PTW must be faulted to prevent any
new modification to be done to the page while it is written
out.
This protocol must be followed by all programs that write out
pages to disk, that is:
- by Page Control in the normal case
- by the ESD procedure, and
- by the program that flushes memory every 15 minutes.
Since page control makes the decision to defer the writing out of
a page using non ring zero information, it must rely on some kind
of safety valves to prevent the pressure on main memory from
becoming too high.
o First, it could validate time stamps found in data pages as
well as the time stamp associated with the before journal;
all time stamps must be smaller than the current time.
o Next, Page Control could inform BJM each time it has to skip
a page by adding 1 to a count associated with the before
journal. This causes the BJM to flush the journal when the
count becomes "too high," instead of waiting until a
transaction commits to do it.
o Finally, if it happens that the BJM has not been invoked for
a long time, the count may increase beyond its threshold
value without triggering any corrective action. In this
case, page control should have a way to force the invocation
of the BJM to flush the journal.
MTB-564 Multics Technical Bulletin
5 EXTENSION TO SEVERAL BEFORE JOURNALS
If there are more than 1 before journal, the BJM maintains
an array of safe time stamps, one for each journal. When
returning the time stamp of the BI, it also returns the index of
the journal, which is stored in the CI header with the time stamp
by the FM; Page Control then uses this index to access the
appropriate time stamp in the array.