Multics Technical Bulletin MTB-568
DM: Rollback
To: Distribution
From: Andre Bensoussan
Date: 06/23/83
Subject: Data Management: Rollback
ABSTRACT
This MTB describes how the recovery system rolls back
unfinished transactions during normal operation, and how it rolls
back all unfinished transactions after a system crash. During
normal operation, a transaction may be rolled back by the process
that started the transaction, if it is still alive; otherwise, it
is rolled back by the Data Management Daemon process. After a
system crash, the Multics system is first initialized; then
various Deamons are logged in, and in particular, the Data
Management Daemon process. Its first task is to check if some
transactions were in progress at the time of the crash and, if
so, to roll them back.
Comments should be sent to the author:
via Multics Mail:
Bensoussan.Multics on System M.
via US Mail:
André Bensoussan
Honeywell Information Systems, inc.
575 Tech Square
Cambridge, Massachusetts 02139
via telephone:
(HVN) 261-9334, or
(617) 492-9334
_________________________________________________________________
Multics project internal working documentation. Not to be
reproduced or distributed outside the Multics project.
CONTENTS
Page
Abstract . . . . . . . . . . . . . . . i
1 Introduction . . . . . . . . . . . . 1
2 Rolling back a transaction . . . . . 2
2.1 Summary of what the rollback
procedure does . . . . . . . . . . 3
2.2 Environment of the rollback
procedure . . . . . . . . . . . . 3
2.3 File identification . . . . . . 3
2.4 How the rollback procedure does
its job . . . . . . . . . . . . . 4
3 Rolling back after crash . . . . . . 8
3.1 Invoking the
rollback_after_crash . . . . . . . 8
3.2 Finding all Journals and Files 8
3.3 Finding the end of each before
journal . . . . . . . . . . . . . 9
3.4 Finding the end of each after
journal . . . . . . . . . . . . . 10
3.5 Phasing before and after
journals . . . . . . . . . . . . . 10
3.6 Finding all unfinished
transactions . . . . . . . . . . . 11
3.7 Rolling back all unfinished
transactions . . . . . . . . . . . 12
3.8 Cleaning up . . . . . . . . . . 12
3.9 Accepting users again . . . . . 13
Multics Technical Bulletin MTB-568
DM: Rollback
1 INTRODUCTION
The Rollback description contained in this memo is the
logical continuation of the Before Journal Manager Design
document (MTB-560). It is the object of a separate MTB because
of practical size consideration. It can be viewed as Part II of
the Before Journal Manager Design. Part I (MTB-560) describes
what information is stored in the journal, and how it is stored,
in order to be used later if needed. Part II (this MTB)
describes how rollback uses the information stored in the
journal.
The first portion of this memo describes how the "rollback"
primitive of the before journal manager does its job of rolling
back a single transaction, during normal system operation. This
rollback may be performed by the process that was executing the
transaction, if it is still alive, or by the Data Management
Daemon process.
The second portion describes how recovery after crash does
its job of finding out, after a crash, what the state of the
system was at the time of the crash, and rolling back all
transactions that were in progress at the time of the crash.
This job is always done by the Data Management Daemon process.
MTB-568 Multics Technical Bulletin
DM: Rollback
2 ROLLING BACK A TRANSACTION
Rolling back a transaction consists of several operations
executed by the before journal, the after journal, the file and
the lock managers, orchestrated by the transaction manager. The
transaction manager may perform a rollback because the
transaction has to be aborted or because it has to be restarted
from the beginning or from a given checkpoint. To rollback a
transaction, the transaction manager takes the following steps:
(1) Call before_journal_manager_$rollback, to undo the
modifications made by the transaction, up to the beginning
or up to a specified checkpoint.
(2) Call file_manager_$flush_modified_ci, to flush all control
intervals modified by the rollback procedure while undoing
the original modifications.
(3) Call after_journal_manager_$flush_transaction, to flush all
after images produced by the transaction being rolled back,
including the after images produced by the rollback
procedure.
(4) Call before_journal_manager_$write_rolled_back_mark, to
write a mark in the before journal used by the transaction,
indicating that the transaction has been rolled back and how
far it has been rolled back.
(4a) Call before_journal_manager_$write_aborted_mark, to write a
mark in the before journal used by the transaction,
indicating that the transaction has been aborted. This step
is taken instead of step 4 if the transaction manager is
rolling back the transactiom in order to abort it.
(5) Call lock_manager_$unlock_all, to unlock all locks set by
the transaction, or the portion of it, that has been rolled
back.
What we are interested in, here, is the
before_journal_manager_$rollback procedure, which does most of
the work, and which will be referred to as the "rollback
procedure" in the remainder of this memo.
Multics Technical Bulletin MTB-568
DM: Rollback
2.1 Summary of what the rollback procedure does
The rollback procedure reads all before journal records
produced by the transaction, in reverse chronological order, from
the last record to the begin mark record (or the checkpoint mark
record specified by the caller). Each time it reads a record, it
performs the appropriate action to undo what the transaction had
done. In order to undo the modifications made to a control
interval of a protected file, the rollback procedure has to write
again in this control interval. It does so by calling the
special entry point file_manager_$unput, which restores the
control interval to its original value, and causes this
modification made by rollback to be journalized in the After
Journal associated with the file it writes into. After images
produced during the rollback logically cancel out the original
after images produced while the transaction was in progress. No
Before Images are produced during rollback.
2.2 Environment of the rollback procedure
The rollback procedure may be executed by the process that
was executing the transaction, or by the Data Management Daemon
process, a daemon process associated with the data management
system. While this function is performed, other transactions may
be in progress concurrently. Several transactions may be being
rolled back concurrently, by several processes.
In order to work properly, the rollback procedure expects
all tables used by the file, transaction, before journal, after
journal and lock managers to be in a consistent state.
2.3 File identification
Each Before Image record was produced by a transaction
before modifying a file and contains the identification of the
file in two forms: the file opening id and the file unique id.
When the rollback is performed by the process that was executing
the transaction, the file opening id is used by the rollback
procedure to refer to the file when calling the file manager.
However, when the rollback is performed by the daemon
process, the file opening id cannot be used, since it is
MTB-568 Multics Technical Bulletin
DM: Rollback
meaningful only in the original process. Instead, the file uid
is used to search a uid to pathname conversion table, in which
all protected files are registered, for as long as they may be
needed by the rollback mechanism. This table is maintained by
the open primitive of the file manager; it is needed to be able
to rollback and it must be as safe as the Before Journal itself.
Ideally, it should be implemented as an Index in a protected
file, whose modifications are journalized in "well known" before
and after journals; in the first release, it will be implemented
as a segment in virtual memory, carefully modified and flushed
after each modification.
2.4 How the rollback procedure does its job
The calling sequence of the rollback procedure is:
call before_journal_manager_$rollback
(txn_id, txn_ix, checkpoint_no, code)
where txn_id is the transaction id of the transaction to be
rolled back, txn_ix is the index in the transaction table of the
entry assigned to the transaction, checkpoint_no is the
checkpoint number at which the rollback is supposed to stop, and
code is a standard system error code. The major steps of this
rollback procedure can be described as follows:
(1) Locate the bj_txte info structure for the transaction to
rollback. This structure is an entry in the bj_txt table,
and contains before journal information about this
transaction.
(2) Get the bj_oid and the bj_uid from the bj_txte info. The
bj_oid must be validated against the bj_uid to determine
whether or not it can be used by the process doing the
rollback to reference the before journal. When the rollback
is done by the Data Management Daemon process, the bj_oid
will be found invalid, because it belongs to the original
process.
In any event, when the bj_oid is not bound to bj_uid in the
process doing the rollback, this process must acquire a
valid one. It does so by using the bj_uid to find the
pathname of the before journal, in the system table which
contains the list of all before journals opened in the
system. With this pathname, it opens the journal and enters
the bj_oid in the bj_txte info.
Multics Technical Bulletin MTB-568
DM: Rollback
(3) Get the record id of the last record stored in the before
journal by the transaction, from the bj_txte info.
(4) Flush the before journal up to this last record to guarantee
that all records necessary for rolling back are in the file
in which the journal is written, and none of them are still
in the main memory buffer used by the before journal
manager.
(5) Read the last record produced by the transaction by calling:
call bj_storage_get (bj_oid, record_id,....)
If the last record produced by the transaction is a
committed or aborted mark, return a status code to the
caller, indicating that the transaction has been committed
or aborted, and that it cannot be rolled back. This case
may occur if the process executing the transaction lost
control while the transaction was being committed, after the
commit mark was logically written in the journal but before
the transaction manager could be informed that the commit
mark was physically on disk.
(6) Analyse the record just read from the journal and take the
appropriate action, according to its type:
(a) If it is a "before_image" record, use its contents to
undo the modification it is supposed to undo; then read
the previous record produced by the transaction in this
journal and go back to step (6): "Analyse the record
just read...".
In order to undo the modification associated with this
before image record, the rollback procedure has to call
the file manager to write in some control interval.
The identification of the file is found in the before
image record in the form of the file_oid and the
file_uid. The file_oid must be validated to make sure
it is bound to the file_uid. If the rollback is done
by the Data Management Daemon process, the file_oid
will, in general, be invalid and the file_oid for the
file in the daemon process must be used when calling
the file manager to write in the control interval.
MTB-568 Multics Technical Bulletin
DM: Rollback
In the event that this file is not open in the process
that does the rollback, it has to be opened: the
file_uid is found in the before image; it is used to
search the table containing the list of all protected
files open in the system (or that were open at the time
of the crash, as explained in the next section), in
order to determine the pathname of the file; then the
pathname is used to open the file, and the new file_oid
is used instead of the file_oid stored in the before
image.
The rollback procedure can now call the file manager to
write the appropriate portions of the control interval,
with the understanding that it is a rollback action and
therefore no before image must be taken, but an after
image must be taken, like for any other modification,
in order to cancel out the after image produced when
the modification was done by the transaction itself. A
special entry point file_manager_$unput is provided by
the file manager, for rolling back modifications.
To take an after image, the file manager must call the
after journal manager with the aj_oid of the after
journal. It can find the pathname and aj_uid of the
after journal in the file attributes stored in control
interval zero of the file. If the after journal is not
open in the process doing the rollback, it must be open
and the aj_oid obtained is then used in subsequent
references to this after journal.
(b) If it is a "rollback_handler" record, the name of the
procedure to be called is extracted from the record, an
entry variable is initialized to the value of this
entry point and the entry point is called, with the bit
representation of the input data it expects to do its
job; this bit string is also extracted from the before
journal record. When the handler returns, the previous
record produced by the transaction in the before
journal is read and control is transferred back to step
(6): "Analyse the record just read...".
(c) If it is a "committed" or an "aborted" mark, this is a
system error, unless this record is the last record
produced by the transaction, as explained above, in
step 5.
Multics Technical Bulletin MTB-568
DM: Rollback
(d) If it is a "rolled_back" mark, it indicates that the
transaction has been rolled back up to a checkpoint, or
up to the beginning. This mark contains a pointer to
the record up to which the transaction has already been
rolled back.
So, when encountering a rolled_back mark, the rollback
procedure skips all the previous records that were
already used in a previous rollback, and goes directly
to the checkpoint record where the previous rollback
stopped. Thus, it reads the record pointed to by the
rolled_back record and goes back to step (6): "Analyse
the record just read...".
(e) If it is a "checkpoint" mark and its checkpoint number
is greater than the checkpoint number at which the
rollback procedure is supposed to stop, then read the
previous record produced by the transaction and go back
to step (6): "Analyse the record just read...".
(f) If it is a "begin" mark or a "checkpoint" mark with a
checkpoint number equal to the checkpoint number at
which the rollback procedure is supposed to stop, no
more record need to be read, and control goes to the
next step. (The begin mark is equivalent to the mark
for checkpoint 0).
(7) Remember, in the bj_txte info structure for this
transaction, the record id of the last record read, which is
either a begin mark or a checkpoint mark. This record id
will be stored later in the rolled_back record, indicating
that the rollback has been physically completed. Now return
to the caller, i.e., the transaction manager.
As explained earlier, the transaction manager must now flush all
control intervals that have been modified during the rollback,
flush all after journal records produced during the rollback, and
wait for all I/O's to complete. Finally, it appends a
rolled_back mark at the end of the before journal, flushes the
mark and waits for it to be physically on disk.
MTB-568 Multics Technical Bulletin
DM: Rollback
3 ROLLING BACK AFTER CRASH
As described in MTB-564, the system will guarantee that a
modification made to a CI of a protected file is never written to
disk before its before image is physically on disk. As a result,
it will be possible to rollback after any system crash, whether
or not ESD was successful, provided no data was damaged by a
media failure. A complete description of the recovery after a
system crash can be found in MTB-603: "Data Management - Crash
Recovery".
3.1 Invoking the rollback_after_crash
After the Multics system has been initialized, the Multics
initializer process logs in the Data Management Daemon process.
This Daemon is responsible for initializing the Data Management
System, but before doing so, it finds out if some transactions
were left unfinished in the previous Multics system invocation,
in which case it rolls them back.
If the system crashed with ESD successfully executed, all
information contained in the various tables used by the
transaction manager, before journal manager, after journal
manager, file manager, lock manager has been written to disk and
could be used by the Data Management Daemon. If ESD failed,
these tables cannot be trusted and the Daemon process must be
able to recover without them. The description that follows
assumes that these tables are lost. Some of the steps described
here might be skipped or simplified when these tables are
available, if one decided to take advantage of that knowledge.
In the current implementation, no table is assumed to be valid,
regardless of whether or not ESD was successful, except for the
uid-pathname tables maintained by the file and journal managers.
3.2 Finding all Journals and Files
The first thing the Daemon process has to do is to find out
what journals were in use at the time of the crash, and to
prepare them again for its own use. The "open" primitive of the
before journal manager maintains a table containing the pathnames
and uids of all before journals opened in the system, i.e.,
opened in at least one process. This table is flushed after
every modification and is available after a system crash, even if
ESD fails. The Daemon knows the pathname of the segment
containing the table; it initiates it, and opens, for itself, all
Multics Technical Bulletin MTB-568
DM: Rollback
before journals that are listed in the table, by calling the
before journal manager special entry point
"$open_all_after_crash".
A similar table, maintained by the "open" primitive of the
after journal manager, contains the pathnames and uids of all
after journals that were opened in the system. The Daemon uses
it to open, for itself, all after journals that were opened at
the time of the crash.
A third table, maintained by the "open" primitive of the
file manager, contains the pathnames and uids of all protected
files that were opened in the system at the time of the crash.
The Daemon process initiates this table but does not open all the
files listed in it. The table will be used during the actual
rollback, to convert file uid's found in before images into
pathnames.
These three tables are supposed to always be consistent, and
available after a crash even when ESD fails. They are necessary
to rollback after a system crash, and must be as safe as the
journals themselves.
3.3 Finding the end of each before journal
For each before journal, the Daemon must find the last
record physically written in the journal, and such that all
records produced before it are also physically on disk.
Assuming that the before journal manager tables are not
available, one has to find the end of the before journal using
the fact that the journal is written sequentially, and that each
control interval contains the time at which it was written in the
journal. The header of the before journal, stored in CI zero
contains the first CI number and the last CI number of the
journal. A search on the time stored in each CI is used to
determine the most recently written CI of the journal. Then,
within this CI, the last logical record is located. The storage
manager module of the before journal manager provides the
appropriate services for the Daemon process to find the end of
each before journal.
MTB-568 Multics Technical Bulletin
DM: Rollback
3.4 Finding the end of each after journal
For each after journal, the Daemon must find the last record
physically written in the journal and such that all after journal
records produced before it are also physically in the journal.
If the after journal is on disk, a method similar to that
described for the before journal can be used. If the after
journal is on tape, the end of the tape has to be found, and the
tape positioned to the end. The after journal manager will
provide a utility procedure to do just that, and it will be
called by the Daemon process to find the end of each after
journal.
3.5 Phasing before and after journals
The strategy that has been chosen for the after journal
manager when rolling forward is to post every single after image
found in the after journal, without trying to determine if it was
produced by a committed or an aborted transaction. This strategy
requires "taking after images during rollback" as explained in
the description of the rollback procedure.
However, this is not quite sufficient. Since the before and
after journals are not phased during normal operation, it is
possible that an after image be physically written in the after
journal before the corresponding before image is physically
written in the before journal. After a crash, it is possible to
have after images in the after journal which do not have their
before image counterpart in the before journal. Taking after
images during rollback_after_crash would not cancel out these
after images.
GCOS solves this problem by phasing the before and after
journals during normal operation to guarantee that this situation
cannot occur. It is difficult to use the same method in Multics.
Instead, we let after images and before images be physically
journalized without trying to phase them. After a system crash,
the ends of all journals are examined and analysed, and all after
images that have no before image are eliminated from the after
journals. A detailed description of how this is done can be
found in MTB-569: "DM: Phasing before and after journals". The
after journal manager provides a procedure to do this job and the
Daemon calls this procedure to "cleanup" the end of all after
journals.
Multics Technical Bulletin MTB-568
DM: Rollback
3.6 Finding all unfinished transactions
Each before journal contains before images of finished and
unfinished transactions. The only information one has so far is
the record id of the last record for each journal. By reading
the before journal in reverse chronological order, from the most
recent to the least recent record, it is possible to determine
which transactions have been committed or aborted, and which one
were still in progress at the time of the crash; while reading
the before journal in reverse order, one can build the list of
all unfinished transactions, with the record id of the last
record produced by each of them.
Reading the entire before journal to find out which
transactions were in progress is a long operation in terms of
real time it takes to rollback after crash. A number of
alternatives are available to find all transactions in progress
without having to read the entire journal. They all consist of
writing historical information in the journal, showing that, at a
particular point in time, only N transactions were in progress.
When reaching that point while reading the journal backwards, the
rollback_after_crash procedure can start a count down until it
finds the corresponding N begin marks. The more frequently this
historical information is stored, the sooner the count down can
be started, making the search shorter. One could:
(1) Store periodically in the header of each before journal the
number of transactions in progress in this journal and the
time of this observation, or
(2) Maintain for each before journal a count of transactions in
progress by incrementing this count at each write_begin_mark
operation and decrementing it at each write_committed_mark
and write_aborted_mark operations. Store this count in each
begin, committed and aborted record, or
(3) Store this count in every before journal record.
The current implementation uses method number (3).
MTB-568 Multics Technical Bulletin
DM: Rollback
3.7 Rolling back all unfinished transactions
Now we have the list of all transactions in progress and the
record id of the last record produced by each of them. In
addition, we know that the after journals have been cleaned up of
any after image that had no before image counterpart. Rolling
back these transactions can start safely.
The rollback procedure described in the previous section can
be used to rollback these unfinished transactions one after the
other, if it is provided with the environment it expects; that
is, all tables used by the transaction manager, before journal
manager, after journal manager, file manager, and lock manager
must be initialized to give the rollback procedure the impression
it is called during normal operation. This technique will be
used instead of writing another rollback procedure.
It is also possible to use the transaction manager to
rollback or abort each transaction; this would cause
"rolled_back" or "aborted" marks to be written in the before
journal, after all appropriate flushing operations have been
done. Since the checkpoint facility is not provided in the
current system implementation, all unfinished transactions are
aborted. Rolling back all transactions can be described as
follows:
(1) Initialize all tables showing that N transactions are in
progress.
(2) For each transaction in progress, call the transaction
manager to abort the transaction, as if it were during normal
system operation.
3.8 Cleaning up
After all transactions have been aborted, all before
journals, after journals and protected files that have been
opened by the Daemon to do its rollback task are closed by
calling the "close" primitives of the before journal manager,
after journal manager and file manager.
Multics Technical Bulletin MTB-568
DM: Rollback
3.9 Accepting users again
The Daemon process now enables the Data Management System
for all users, by renaming to the appropriate name the directory
in which the various tables reside. Then it goes to sleep,
waiting for a request to execute (See MTB-603 and MTB-604).