Multics Technical Bulletin MTB-603
DM: Crash Recovery
To: Distribution
From: Lee A. Newcomb
Date: 02/15/83
Subject: Data Management: Crash Recovery
1 ABSTRACT
This document describes the crash recovery mechanism for the
current implementation of the Data Management System (DMS) on
Multics. The following items are discussed:
o Finding the state of the last invocation of DM
o Recovery of DM files to a consistent state
o Deletion of DM tables from previous invocations
Comments should be sent to the author:
via Multics forum:
>udd>Multics>Spratt>meetings>DMS_Development
via Multics Mail:
Newcomb.Multics on System M or LNewcomb.Multics on MIT
Multics.
via US Mail:
Lee A. Newcomb
Honeywell Information Systems, Inc.
575 Tech Square
Cambridge, Massachusetts 02139
via telephone:
(HVN) 261-9332, or
(617) 492-9332
_________________________________________________________________
Multics project internal working documentation. Not to be
reproduced or distributed outside the Multics project without the
consent of the author or the author's management.
CONTENTS
Page
1 Abstract . . . . . . . . . . . . . . i
2 Introduction . . . . . . . . . . . . 1
3 DMS Initialization and Recovery . . 1
4 Find recovery tables . . . . . . . . 2
5 Unfinished Transaction Rollback . . 3
5.1 Open before journals for
recovery . . . . . . . . . . . . . 3
5.2 Create a Temporary Transaction
Table . . . . . . . . . . . . . . 4
5.3 Finish the Transactions Found . 4
5.3.1 Set Crash Recovery
Indicator . . . . . . . . . . . 5
5.3.2 Create Transaction
Manager's Tables . . . . . . . 5
5.3.3 Create Before Journal
Manager's Tables . . . . . . . 5
5.3.4 Do the Rollback . . . . . 5
6 Cleanup various things . . . . . . . 5
7 Delete Useless Directories and Files 6
8 DMS Shutdown . . . . . . . . . . . . 6
9 Note . . . . . . . . . . . . . . . . 6
Multics Technical Bulletin MTB-603
DM: Crash Recovery
2 INTRODUCTION
After a Multics system crash, the Data Management System
(DMS) must be recovered to make all protected (or synchronized)
DM files consistent (simply referred to as DM files for the rest
of this MTB). In the current DMS implementation, this involves
rolling back any unfinished transactions recorded in the before
journals (BJ's) open at crash time. As DMS recovery is very tied
up with DMS initialization, the reader should have a good
understanding of MTB 592, "Data Management: System Structure and
Initialization". The reader should be familiar with the MTB's
concerning the DM before journal manager, esp. "Phasing page
control and before journal". These are MTB's 513, 559, 560, 563,
564, 567, and 568. Also, MTB 508, "Data Management:
Architectural Overview" would be very useful to know.
One of the major factors in this design of crash recovery is
the use of normal DMS software whenever possible. Crash recovery
does not have the critical time constraints on it that a running
DMS does; however, DMS should be available to users as quickly as
possible. A few minutes between Multics and DMS availability is
not felt to be crucial, but it certainly should not take half an
hour. It is felt that time can be better spent making the normal
DMS software run better and reduce the number of specialized
programs needed (and the associated maintainence cost).
3 DMS INITIALIZATION AND RECOVERY
Crash recovery is an integral part of DMS initialization,
done by a DMS Daemon, after a Multics bootload. Recovery is done
about half-way through DMS initialization, after a temporary DMS
has been created. See MTB 592, "DMS System Structure and
Initialization" for more detail about initialization proper; this
MTB will attempt to stay within the recovery process except when
initialization must be referenced.
The following is a basic list of the steps done in DMS
recovery (by the program dm_recovery_.pl1). The items will be
discussed in more detail in later sections.
o Find the bootload directory for the DMS to be recovered.
This step may fail if the DMS is running, or the hierarchy
leading to and/or including the bootload directory has
been lost.
o See if the previous DMS bootload needs to be recovered by
examining the state indicator in the old tables; stop
recovery if normal shutdown was indicated.
MTB-603 Multics Technical Bulletin
DM: Crash Recovery
o Find the previous bootload's file_manager_ UID to pathname
table. This is used to open DM files that have been
modified and must be made consistent.
o Open all before journals open at crash time.
o Loop through the opened journals finding all active
transactions and rolling them back.
o Delete or rename the previous bootload's tables and
hierarchy and generally cleanup the DMS system hierarchy.
This completes the recovery procedure.
In the course of the above operations, errors may occur.
These errors are logged in a DMS system log kept by the
initializer of Data Management. A primitive handling of these
errors is done by flags set by an administrator of DMS. These
flags are: initializing, always_enable, and rename_old_dms_dir.
If initializing is on, a previous bootload of DMS must not exist,
and the other two flags are ignored; basically, recovery is
useless as nothing exists to recover. Otherwise, the last two
flags are used. If recovery takes any error and always_enable is
on, the errors will be reported as normal, but DMS initialization
will continue with the step after recovery. Regardless if an
error occurs, if rename_old_dms_dir is on, the previous directory
containing the DMS tables will be renamed for later
investigation. This is only recommended for debugging.
4 FIND RECOVERY TABLES
One premise of crash recovery is a DMS per-bootload
directory exists containing two critical tables: the DMS file
and before journal managers' UID-pathname tables, which are
flushed to disk each time ANY modification is made to them, and
so are guaranteed accurate. If a per-bootload directory does not
exist and the initializing indicator is off, or if initializing
is on and a per-bootload DOES exist, recovery takes an error that
is fatal to the current attempt to boot DMS.
Once the directory is found, an attempt is made to check the
state of the DMS invocation found. If a normal shutdown is
indicated, nothing more needs to be done and recovery is
finished. Next, the file manager's UID-pathname table is
located; the finding of the before journal manager's table is
left to later.
Multics Technical Bulletin MTB-603
DM: Crash Recovery
Three programs are used to check for the above items:
dm_util_$find_old_boot_dir (to find
dm_dir.<Multics_bootload_time>), dm_util_$dm_status (to see if a
normal shutdown occured), and
file_manager_$find_old_uid_pn_table.
5 UNFINISHED TRANSACTION ROLLBACK
At this point, it is important to realize two DMS
per-bootload directories are in use by recovery:
dms_dir.<Multics_bootload_time> and
dms_dir.<Multics_bootload_time>.temp. The first is the directory
containing the data required for recovery. The latter is the
active version of DMS where the DMS Daemon is doing
initialization, and so recovery.
It is possible that no active transactions were left in the
last DMS invocation, but the old transaction tables are not
guaranteed consistent, only the file and before journal managers'
UID-pathname tables. The only way to be sure no transactions
were left unfinished is to read all the before journals listed in
the before journal UID-pathname table and read them backwards
looking for active transactions. The before images and marks in
a before journal are trusted (according to the protocol that DM
files control intervals will not be written to disk until the
matching before image(s) are on disk).
5.1 Open before journals for recovery
The procedure before_journal_manager_$open_all_after_crash
does this step. It finds the old before journal UID-pathname
table and loops through it opening all journals listed as active.
If no journals are found in the list, nothing needs to be
recovered. Any journal opened is recorded in the new before
journal UID-pathname table in the
dm_dir.<Multics_bootload_time>.temp directory.
In the process of opening the journals, they are positioned
to the last control interval written to. This control interval
is recorded in CI0 of the journal. The definition of the last
control interval in a journal is that CIn is last if time_stamp
(CIn) > time_stamp (CIn + 1). (Remember that a journal is
circular and if CI_ is the last CI in the journal, CI_ + 1 is CI1
of the journal.)
MTB-603 Multics Technical Bulletin
DM: Crash Recovery
Note the control interval found as being last in the journal
is not necessarily the last one written on the operational system
we are recovering. Especially in a no-ESD crash, a CI could have
been written in memory, but the contents not be on disk. The
result is a transaction could have been started or completed and
no record is left for recovery. However, since the writing of BJ
CI's and DM file CI's are phased so the BJ CI's will always make
it to disk first (except for abort and commit marks, in which
case the situation is reversed), recovery does not care. Minimal
work will be lost in this situation. See MTB's 563 and 564 for
more detail on this.
5.2 Create a Temporary Transaction Table
Build a table of all transactions recorded in the BJ's which
have not completed. Two lists are kept: one of completed
transactions (do not rollback), and one of transactions in
progress as far as recorded data indicates in the BJ's. A
transaction with extra work to do after it is committed is still
considered in progress; it will not be rolled back, but the
post-commit actions will be done.
Each BJ record has the number of active transactions in the
BJ at the time the record is written recorded in its header.
This is used so the entire BJ does not have to be walked
backwards to guarantee all active transactions are caught. By
convention (and common sense), commit and abort records do not
count themselves as active transactions when written to a BJ.
Note the previous step does not have to be completed before
this one is called. The steps are put in a loop where only one
BJ is examined and worked over at a time. If at this point in
recovery, no transactions were active, we simply go to the step
to close all BJ's opened.
5.3 Finish the Transactions Found
If the temporary transaction table is not empty, invoke the
procedure transaction_manager_$recover_after_crash with a pointer
to the temporary transaction table. It does the following steps:
Multics Technical Bulletin MTB-603
DM: Crash Recovery
5.3.1 SET CRASH RECOVERY INDICATOR
The DMS state indicator, dm_system_data_$current_dm_state,
is set to show recovery is in progress. This is used by some of
the DMS Daemon's transaction adjustment programs to know that
some special calls need to be made. (This is actually done
earlier, but only has relevance to recovery now.)
5.3.2 CREATE TRANSACTION MANAGER'S TABLES
The temporary transaction table is looped through building a
valid transacton definition table for transaction manager.
5.3.3 CREATE BEFORE JOURNAL MANAGER'S TABLES
Call before_journal_manager_$rebuild_after_crash with the
pointer to the temporary transaction table.
5.3.4 DO THE ROLLBACK
Now loop calling tm_adjust_txn. This is the normal method
of adjusting a transaction for a dead process in an active DMS.
This is done so the before journals will be consistent when the
adjustment is finished. In a sense, the transactions read from
the BJ's have been adopted by the now partially active DMS (i.e.
users still cannot access it).
6 CLEANUP VARIOUS THINGS
All of crash recovery is done except for some house
cleaning. First, call file_manager_$end_of_crash_recovery to
null out the internal pointer kept for the call to
file_manager_$open_by_uid_after_crash. This is to help prevent
accidental modification to a file through this pointer after
recovery is complete. Next, close all BJ's opened in the process
of doing the above examinations and rollbacks (Note the DM files
have already been closed). This is done to clear the per
bootload BJ UID to pathname table for a fresh start.
MTB-603 Multics Technical Bulletin
DM: Crash Recovery
7 DELETE USELESS DIRECTORIES AND FILES
At this point, recovery is actually done. However, the old
dm_dir.<Multics_bootload_time> is simply using quota for
(usually) no good reason. If the rename_old_dms_dir flags is on,
the old directory will be renamed to
dm_dir.<Multics_bootload_time>.hold and will be available for
examination by a suitably privileged user later. Otherwise, the
directory will be deleted and its quota recovered. This step is
not necessary before the users are allowed into DMS. Although
this step is somewhat part of recovery processing, it will
actually be done as part of DMS initialization.
8 DMS SHUTDOWN
DMS crash recovery requires some conventions be observed by
DMS shutdown. The major requirement is the dms_dir.BOOTLOAD not
be deleted. This serves to give crash recovery an extra
assurance the system shutdown normally (if it did), instead of
not being sure the directory was lost in a crash or not. If the
state indicator in the old dm_system_data_ is set to normal
shutdown, no crash recovery need be done.
9 NOTE
One of the major assumptions of DMS recovery is the
directory hierarchy containing the critical system-wide DMS data
can be found. Directories are not DM files, however. If when an
invocation of DMS is made available to users, the DMS per-system
hierarchy is flushed up to the root directory, some cases that
are possible for lossage can be avoided. The use of DIRW seems
extreme for just this one instance. If this is not feasible, it
is still unlikely to even show a problem in most crashes.