MULTICS TECHNICAL BULLETIN MTB-745 To: MTB Distribution From: Paul Farley Date: June 5, 1986 Subject: Add Save/Restore to the BCE command set. This MTB describes the Bootload Multics (BCE) version of the physical volume save/restore. This is part of the continuing enhancement of the BCE command set to pickup those BOS functions that are still required now that BOS is being phased out. This MTB only deals with saving/restoring to/from tape. Disk to disk copying is done by using the BCE copy_disk command, written by Keith Loepere and not covered by this MTB. This is the first revision of MTB745. It reflects changes made | thus far in the design. Also contained in this version is a | documentation appendix with several sub-sections, containing the | subsystem info segments and the others describing the | documentation changes required to the manuals. | Comments on this MTB should be directed to: via System-M forum: >udd>Multics>Farley>mtgs>BCE_Save_Restore.forum (bsr) via Multics mail: Farley.Multics on System-M or by phone to: Paul Farley HVN: 249-6776 DDD: 602-249-6776 _________________________________________________________________ Multics Project internal working documentation. Not to be reproduced or distributed outside the Multics Project. MTB-745 BCE Save/Restore CONTENTS Page 1: Introduction . . . . . . . . . . . . . . . . . 1 2: The Save Operation . . . . . . . . . . . . . . 3 2.1: Save Syntax . . . . . . . . . . . . . . . . . 4 2.2: Pre-Save Processing . . . . . . . . . . . . . 4 2.3: The Save Loop . . . . . . . . . . . . . . . . 5 2.4: Save Restart . . . . . . . . . . . . . . . . 8 3: The Restore Operation . . . . . . . . . . . . . 10 3.1: Restore Syntax . . . . . . . . . . . . . . . 10 3.2: Pre-Restore Processing . . . . . . . . . . . 10 3.3: The Restore Loop . . . . . . . . . . . . . . 12 3.4: Restore Restart . . . . . . . . . . . . . . . 14 4: Control File Requests . . . . . . . . . . . . . 16 5: IOI at BCE . . . . . . . . . . . . . . . . . . 18 6: I/O Management . . . . . . . . . . . . . . . . 19 6.1: Tape Error Recovery . . . . . . . . . . . . . 20 6.1.1: Data Alerts . . . . . . . . . . . . . . . . 20 6.1.2: Unrecoverable Errors . . . . . . . . . . . 20 7: Tape Format . . . . . . . . . . . . . . . . . . 22 7.1: Record Header . . . . . . . . . . . . . . . . 22 7.2: Tape Label . . . . . . . . . . . . . . . . . 24 7.3: Volume Info . . . . . . . . . . . . . . . . . 26 7.4: Volume Preamble . . . . . . . . . . . . . . . 27 7.5: Notes . . . . . . . . . . . . . . . . . . . . 27 7.6: Example Tape Layout: . . . . . . . . . . . . 28 | Appendix A: Documentation . . . . . . . . . . . . 29 | A.1: Save Info . . . . . . . . . . . . . . . . . . 30 | A.2: Restore Info . . . . . . . . . . . . . . . . 34 | A.3: AM81 Changes . . . . . . . . . . . . . . . . 38 | A.3.1: Section-1 . . . . . . . . . . . . . . . . . 38 | A.3.2: Section-9 . . . . . . . . . . . . . . . . . 38 | A.3.3: Section-10 . . . . . . . . . . . . . . . . 38 | A.3.4: Section-12 . . . . . . . . . . . . . . . . 46 | A.3.5: Appendix-H . . . . . . . . . . . . . . . . 58 BCE Save/Restore MTB-745 1: INTRODUCTION This MTB describes the BCE program that is taking over the role of saving and restoring a physical volume. This function was previously done by the BOS functions SAVE and RESTOR. BCE is taking over all of the required functions of BOS, as described in MTBs 631 & 651. BOS will not be supported for MR12. Even with the two current on-line backup mechanisms it is necessary to have a backup capability at BCE for quicker recovery when a major problem arises. This method of backup is extremely useful when small test systems are being used. The BCE save/restore functions are performed with the volumes in a static, dismounted state. This allows a snapshot of the volumes to be quickly saved by use of the volume map, and the snap-shot restored simply by writing the records back to their original locations. There are several enhancements in the BCE version that include processing multiple sets (up to 4) of save or restore information, tape error recovery, various restart options and new location techniques for restoring volumes or partitions. BCE programs operate under several constraints that need to be mentioned so that the reader of this MTB will have an understanding of why some of the small sizes and restrictions exist. o Only ONE process/processor/program is in execution. So any problems that occur in one save or restore set WILL affect the execution of the others. o BCE is limited to executing in the first 512 pages of memory. This means that temp segments are few in number (currently 12) and small in size (currently 9 pages). Also the stack is limited in size (currently 24 pages). Each save/restore set uses two temp segments, one for the tape device's IOI workspace and the other to hold several internal structures like the tape label and current volume label. o Many segments that are callable while Multics is running are not available at BCE. These include all the hardcore segments contained in collections 2.0 and 3.0 and all segments in the normal file system hierarchy. This MTB is covering the following topics: o The Save Operation. This describes the major aspects of doing a save. MTB-745 BCE Save/Restore o The Restore Operation. This describes the major aspects of doing a restore. o Control File Requests. This describes the available requests to define the input and output when doing a save or restore. o IOI at BCE. This describes the changes necessary to allow IOI to run while at BCE, which is required to perform the tape I/O. o I/O Management. This describes the internal design that is being used to take the data from disk to tape and visa-versa. o Tape Format. This defines the layout of the various tape records and how they fit together. BCE Save/Restore MTB-745 2: THE SAVE OPERATION The BOS SAVE function required the tape devices that were to be used be supplied on the command line. The only tape devices that could be used had to be part of the bootload tape subsystem. It would then get the volumes to be processed by querying the operator for each. This proved to be very time consuming for sites that have very large disk subsystems. They turned to the BOS RUNCOM mechanism to automate the process, where the command and requests were placed in files. The BCE version skips the operator query mechanism and gets all of its input from control files. The control files are saved in the "file" partition of the RPV. The control file format is quite flexible in the ways for specifying the physical volumes and partitions to be saved and tape devices to be used. All information for a set can be specified in a single control file, or tape device information can be in one control file with physical volume and partition information in other files. All control files defining a set can be given individually in the command line; or they can be referenced by a single grouping control file which is named in the command line; or chained together with the first control file named in the command line; or some combination of the above. To speed up the save process and cut down on system down time, it was felt that the ability to manage multiple sets of save requests was needed. A "set" is defined as one or more tape devices that will be used to record the data from one or more physical volumes. This is all contained in a collection of control files. A maximum of four sets may be specified. The restriction of four sets stems from two reasons. The first is that the space required for more than four sets of internal structure storage would either increase the stack frame size beyond the PL1 limit or cause usage of more than the available temp segments. The second is that an operator would probably have a hard time managing all the tape activity. The collection of tape volumes that are used to perform the save are defined to be the "tape_set". The program requires that each tape_set be named. The name is defined by use of the tape_set control file request and can be from 1 to 32 characters in length. The tape set name is contained in parenthesis in all the output messages to allow the operator to differentiate sets (all the examples in this MTB use a tape_set name of "blue"). It is also recorded in the tape label of each tape, which is used during a restore for validation. The tapes are numbered from 1 to N with a final information tape labeled "Info", which is the first tape to be read during a restore. MTB-745 BCE Save/Restore 2.1: Save Syntax Syntax: save {-set} CF_1 {... CF_N} {-set CF_1 {... CF_N}} c {-restart_set CF_1 {... CF_N}} Arguments -set CF_1 {... CF_N} This defines the control file(s) specifying the tape devices, physical volumes and/or partitions in each save SET. The first "-set" argument is not required. Control files after each "-set" argument become part of that SET. See the "Control File Requests" section for details of each request. A maximum of 4 SETs and up to a total of 32 control files may be specified per save. A control file cannot be specified multiple times for a given set, but can be specified in more than one set. This can be used to save a set of volumes to several sets of tapes at one time. -restart_set CF_1 {... CF_N}, -restart CF_1 {... CF_N}, -rt CF_1 {... CF_N} This argument is to be used in place of the "-set" argument when saving a SET is to be restarted from the beginning of a given tape. This allows for some interrupted save sets to be restarted and others to start from the beginning. 2.2: Pre-Save Processing For each set, the program sets up an internal list of tape devices and volumes/partitions to process by scanning the control file(s). If any errors are detected in the control files an error message is produced describing the error (giving the line number and file name) and the program is exited. After the lists have been setup it surveys each of the tape devices requested to verify that they are accessible and removes any that are not. The first usable tape device in the set is then attached. Each of the disk labels are read and a display/check of the information is done. If a problem is detected the volume is removed from the "to-be-processed" list. This process is duplicated for each save SET. Below are examples of the | information that is displayed. Messages that indicate a possible | problem will have three asterisks (***) displayed on the console | starting in column 76, not shown here. save(blue): The following tape devices will be used: tapa_01 tapa_02 tapa_05 tapa_10 tapb_01 tapb_04 tapb_06 BCE Save/Restore MTB-745 save: Drive not ready. | Could not read label of pub03 on dskc_12. | When the above error occurs an operator query of the form | "save(blue): Do you want to retry or remove the pv?" will be | displayed. It will be up to the operator to either have the | problem corrected and input "retry" or "remove" to continue. | save(blue): Volume on dska_11 is not a Multics Storage System | c Volume. | Removing from PV list. | save(blue): Multics Storage System Volume pub_04 on dskc_10 Last updated: 02/07/86 1209.2 mst Fri Partition save: 5400 for 256 records Partition dump: 45345 for 3500 records save(blue): Multics Storage System Volume rpv on dskb_12 Last updated: 02/07/86 1209.2 mst Fri Partition conf: 3908 for 4 records Partition dump: 34091 for 3500 records Partition log: 37591 for 256 records save(blue): Partition foo is not defined on rpv. | Removing from partition list. | save(blue): The "file" partition of rpv is not being saved. save(blue): Multics Storage System Volume pub01 on dskb_15 Last updated: 02/07/86 1209.2 mst Fri save(blue): Volume was expected to be root3. Removing from PV | c list. | save(blue): Multics Storage System Volume list01 on dskb_10 Last updated: 02/07/86 1209.2 mst Fri save(blue): Volume list01 requires salvaging. | Setting -all to save all paging records on the volume. | Volume salvaging is required when the time the vol_map was last updated is different than the time the volume was unmounted and the time of the last volume salvage is earlier than the last unmount time. Prior to beginning the actual save process the operator is given the query, "save: Would you like to continue?". This gives the operator a chance to examine all of the previous messages for correctness before the program begins. MTB-745 BCE Save/Restore 2.3: The Save Loop Prior to the execution of the save the operator may pre-mount the first tape of each save set. If the tapes are not mounted the following will occur. save(blue): Please mount tape# 1 on tapa_10. The program will go into a loop waiting for a special interrupt from the device. If after two minutes a tape is not mounted, the following query will occur. save(blue): Would you like to skip to the next tape device? The operator will be required to input one of the following responses. yes, y This device is skipped and the next device is selected. The tape mount is then checked in the same manner. The skipped device remains as part of the available tape devices. no, n The device is not skipped. The loop for checking the mount is re-entered. remove This device is removed from the list and the next device is selected. The tape mount is then checked in the same manner. help, ? This causes the program to display the above possible responses, with a small description of each. The tapes are internally labeled with a tape sequence number that is displayed along with the volume or partition information. Each record header written on the tapes also contain a unique ID of the entire save set. At the beginning of each tape, disk volume or partition a message is displayed that defines the current volume, where on the volume the save is and what tape it is currently writing. Examples are: save(blue): Volume root2, record 34080, on tape# 3 (tapa_02) | save(blue): Partition dump on root2, record 34091, on tape# 3 | c (tapa_02) | Prior to the dismount of a tape reel a message in the form of the | following message is displayed. | save(blue): Unloading tape# 1 from tapa_01, 8632 records (27 | c errors) BCE Save/Restore MTB-745 The records to be saved from each physical volume are defined by | several means. First if the volume is in an inconsistent state | (requiring salvaging) or "-all" is specified in the control file, | all vtoc and paging records are saved. Otherwise only the paging | records that are used, as defined in the vol_map, and the vtoc | area upto the last vtoc record in use are saved. If any | partition areas are selected, by using the partition request, | then the areas are merged in with the paging records and are | written in record number sequence. If only volume partitions are | to be saved, then the records to be saved are defined by the | extent of each partition in the label. | Each record on tape begins with a header giving the disk record number from which it came. This record number is then used during a restore to write the record back to its proper location. The only time a tape record is written back to another location is when restoring only a partition, in which case records are written back to the location of the partition defined in the disk volume label. The save process continues until all the requests have been satisfied or the operator requests that it be aborted. Each save tape contains, as part of the tape label, progressive information about the volumes and partitions that are being saved. Items include the tape number that a volume starts and ends on, what partitions were saved with the volume and on what tapes they begin and end. When the save for a set is complete a final tape is written that contains only a "complete" tape label. This tape contains a label of "Info" and is always the first tape to be read when doing a restore. If no tape is mounted, when it is time to write the "Info" tape, the following message is displayed. save(blue): Please mount the "Info" tape on tapa_02. If a tape is mounted an operator query is done to find out if the current tape on the device should be used as the "Info" tape or be dismounted to allow the operator to mount the correct tape. The query will require a yes or no response and have the following format. save(blue): OK to write "Info" tape on tapa_02? A save can be interrupted by use of the console "request" key. | When depressed while a save is in progress, the following prompt | will appear. | save: Abort request: MTB-745 BCE Save/Restore The operator will be required to input one of the following responses. no, n This causes the program to ignore the request and resume the save. abort This causes the program to abort the entire save and return to BCE command level. | restart TAPE_SET | This allows the operator to restart the specified TAPE_SET, using its current tape device. The operator is then required to mount the "restart" tape on the device and follow the procedure as described in the "Save Restart" section below. Once the SET has been restarted, the remaining SETs will continue operation. | stop TAPE_SET | This causes the program to abort the specified TAPE_SET, by marking it complete, and resume the save of the other sets. help, ? This causes the program to display the above possible responses, with a small description of each. 2.4: Save Restart Due to various problems that may arise while performing a save, it may be necessary to restart a set. Restarting consists of skipping all volumes and/or partitions that have been successfully saved, restarting the save of a volume somewhere in the middle and then continuing normally with the remaining volumes. A restart must always start at the beginning of a tape. This means that the last tape label that was successfully written holds all the information of where to restart. The program allows for various ways of restarting. A previous save may have been totally aborted or one set aborted and is to be restarted by using the "-restart_set" argument in the command | line. The operator could be using the "restart TAPE_SET" response from the abort request routine above because it was noticed that the last tape written had a total of 3000 write errors. The operator is using the "restart_set" or "remove_device_from_set" responses that can be given in the "Tape Error Recovery" process defined later in the MTB. BCE Save/Restore MTB-745 The program will read the tape label from the save tape that the operator wishes to restart from. If the tape is not already mounted the following is displayed and the normal mount procedure executed. save(blue): Please mount the "restart" tape on tapa_01. After the tape label has been read the tape creation time is checked. If the time is older than one week the tape is rejected. This involves unloading the current tape and asking that another be mounted. The tape label information is used to locate all the volumes that can be skipped and what record number to start at when rewritting the tape. The following messages are displayed. save(blue): Skipping volume rpv on dska_16. save(blue): Skipping volume root2 on dska_10. save(blue): Skipping volume list02 on dskb_06. save(blue): Starting from record 3423 of volume pub01 on dskb_10. The program then queries the operator with the following: save(blue): Do you want to replace or rewrite tape# 3 on tapa_01? This query gives the operator the chance to select a different tape reel, in case the previous save was aborted because this tape contained too many errors. Below are the possible responses. replace, rep This will cause the current tape to be unloaded and a new tape requested in its place. rewrite, rew The tape will be rewound and used when the save begins again. MTB-745 BCE Save/Restore 3: THE RESTORE OPERATION A restore operation is normally performed when a volume or set of volumes has become defective and now requires restoration or a partition needs to be reloaded; or when a saved test system needs to be reloaded. When restoring an entire volume it is not necessary for it to be init_vol'ed. What ever data is currently on the volume will be overwritten with the data from the save tapes. Restoring from one device type to another is prohibited, because the records are restored to the exact location from which they came. The only time that different device types are allowed is for restoring only partition information. When the new partition is larger than the saved partition, the restored data is padded with zeroes; when it is smaller, the restored data is truncated to fit the smaller partition size. 3.1: Restore Syntax Syntax: restore {-set} CF_1 {... CF_N} {-set CF_1 {... CF_N}} c {-restart_set CF_1 {... CF_N}} Arguments -set CF_1 {... CF_N} This defines the control file(s) specifying the tape devices, volumes and/or partitions in each restore SET. The first "-set" argument is not required. Control files after each "-set" argument become part of that SET. See the "Control File Requests" section for details of each request. A maximum of 4 SETs and up to a total of 32 control files may be specified per restore. -restart_set CF_1 {... CF_N}, -restart CF_1 {... CF_N}, -rt CF_1 {... CF_N} This argument is to be used in place of the "-set" argument when a SET is to be restarted from the beginning of a given tape. This allows for some restore sets to be restarted and others to start normally. 3.2: Pre-Restore Processing For each set, the program sets up an internal list of tape devices and volumes/partitions to process by scanning the control file(s). If any errors are detected in the control files an error message is produced describing the error (giving the line number and file name) and the program is exited. After the lists have been setup it surveys each of the tape devices requested to BCE Save/Restore MTB-745 verify that they are accessible and removes any that are not. The first usable tape device in the set is then attached. The tape devices to be used is then displayed. restore(blue): The following tape devices will be used: tapa_01 tapa_02 tapb_05 At this time the program needs to read in the contents of the "Info" save tape. This tape contains the list of volumes and partitions that were saved and the starting and ending tape number for each. This tape is the last tape written as part of a save. This tape allows program control over what tapes are mounted, which saves alot of time in searching tapes. The program now attempts to read the tape on the first device in the list, but if a tape is not mounted the following will appear. restore(blue): Please mount the "Info" tape on tapa_01. If the tape read does not contain a label of "Info" then the program queries the operator to find out if the "Info" tape is available. If the operator answers "no" then the program will use the label information from the current tape in place of the "Info" data, which is the same format but not as complete. If the operator answers "yes" then the current tape is unloaded and the mount/label read process is restarted. If the "Info" tape is not available, then the save tape closest to the end of the save should be read in its place. This will give the program the greatest amount of information. The volumes to be restored are sorted so that they are in the same order as they were saved. Each of the disk labels are read and a display/check of the information is done. If a problem is detected the volume is removed from the "to-be-processed" list. This process is duplicated for each restore SET. Below are examples of the information that is displayed. Messages that | indicate a possible problem will have three asterisks (***) | displayed on the console starting in column 76, not shown here. | restore: Drive not ready. | Could not read label of pub03 on dskc_12. | When the above error occurs an operator query of the form | "save(blue): Do you want to retry or remove the pv?" will be | displayed. It will be up to the operator to either have the | problem corrected and input "retry" or input "remove" to | continue. | restore(blue): Volume on dska_11 is not a Multics Storage System c Volume. MTB-745 BCE Save/Restore restore(blue): Multics Storage System Volume pub_04 on dskc_10 Last updated: 02/07/86 1209.2 mst Fri Partition dump: 45345 for 3500 records restore(blue): Multics Storage System Volume pub01 on dskb_15 Last updated: 02/07/86 1209.2 mst Fri restore(blue): Volume pub01 will become root3, as requested. | restore(blue): Volume list_14 not found in tape label. | Removing from PV list. | restore(blue): Only partitions were saved for xpub_1. | Removing from PV list. restore(blue): Multics Storage System Volume root3 on dskd_12 Last updated: 02/07/86 1209.2 mst Fri | restore(blue): Device type mis-match. root3 is on a d338, | but was saved from a d501. Removing from PV list. The above process is repeated for each restore set. Prior to beginning the actual restore process the operator is given the query, "restore: Would you like to continue?". This gives the operator a chance to examine all of the previous messages for correctness before the program begins. 3.3: The Restore Loop The program now knows the first tape to be read from the label information or at least a best guess if the first tape read was not the "Info" tape. It attempts to read this tape on the next tape device in the list. If the tape read is not the correct tape or no tape is mounted the following message is displayed. restore(blue): Please mount tape# 3 on tapa_02. The program will go into a loop waiting for a special interrupt from the device. If after two minutes a tape is not mounted, the following query will occur. restore(blue): Would you like to skip to the next tape device? The operator will be required to input one of the following responses. yes, y This device is skipped and the next device is selected. The tape mount is then checked in the same manner. The skipped device remains as part of the available tape devices. BCE Save/Restore MTB-745 no, n The device is not skipped. The loop for checking the mount is re-entered. remove This device is removed from the list and the next device is selected. The tape mount is then checked in the same manner. After a successful read of the current tape label, the program will check to see if another tape in the set is needed. If the | tape will be needed, the following pre-mount message will be displayed. restore(blue): Please pre-mount tape# 7 on tapb_05. | The following message will occur each time a tape label is read. restore(blue): Tape# 3 on tapa_02, created 02/01/86 0014.2 mst Sat The program uses forward-space-file tape commands to locate the starting point of a volume or partition on the save tape. A partition search is only done when restoring only partition(s) of a volume. When the item is found the following message is displayed. restore(blue): Volume rpv, record 0, on tape# 1 (tapa_01) or restore(blue): Partition conf on rpv, record 3908, on tape# 1 | c (tapa_01) | Once the volume or partition has been located, the records that follow can be written to the volume. When restoring a volume the physical volume record number is located in each tape record's header. When restoring only partition information the volume's label defines the location of the partition. The relative partition record number in the record header is added to create the new location for the data. | | If "-all" was specified in the partition request for a volume, | then once the volume has been restored, all partitions that were | not restored from tape will be zero filled. The "bce" partition | on the rpv and any "hc" and "alt" partitions are exempt from this | zeroing phase. | | When a volume has been restored the following message will be | displayed. | | restore(blue): Restore of volume pub01 on dskb_15 is complete. | The restore process continues until all the requests have been satisfied or the operator requests that it be aborted. MTB-745 BCE Save/Restore | A restore set can be interrupted by use of the console "request" | key. When depressed while a restore is in progress, the following prompt will appear. restore: Abort request: The operator will be required to input one of the following responses. no, n This causes the program to ignore the request and resume the restore. abort This causes the program to abort the entire restore and return to BCE command level. | restart TAPE_SET | This allows the operator to restart the specified TAPE_SET, using its current tape device. The operator is then required to mount the "restart" tape on the device and follow the procedure as described in the "Save Restart" section below. Once the SET has been restarted, the remaining SETs will continue operation. | stop TAPE_SET | This causes the program to abort the specified TAPE_SET, by marking it complete, and resume the restore of the other sets. help, ? This causes the program to display the above possible responses, with a small description of each. 3.4: Restore Restart Due to various problems that may arise while performing a restore, it may be necessary to restart a set. The program allows for various ways of restarting. A previous restore may have been totally aborted or one set aborted and is to be restarted by using the "-restart_set" argument in a new | command line. The operator could be using the "restart TAPE_SET" response from the abort request routine above because it was noticed that the wrong disk pack was mounted. The operator is using the "restart_set" or "remove_device_from_set" responses that can be given in the "Tape Error Recovery" process defined later in the MTB. Restarting consists of skipping all volumes and/or partitions that have been successfully restored, restarting the restore of a BCE Save/Restore MTB-745 volume somewhere in the middle and then continuing normally with the remaining volumes. If restarting from the command line, then the "Info" tape must still be read before the "restart" tape. The program will read the tape label from the save tape that the operator wishes to restart from. If the tape is not mounted the following is displayed and the normal mount procedure executed. restore(blue): Please mount the "restart" tape on tapa_01. From the tape label the program can determine which volumes were completed on previous tapes and skip them. It then restarts the restore of the first volume on the tape that has been requested to be restored. The following messages are displayed. restore(blue): Skipping volume rpv on dska_16. restore(blue): Skipping volume root2 on dska_10. restore(blue): Skipping volume list02 on dskb_06. restore(blue): Starting from record 3423 of volume pub01 on dskb_10. MTB-745 BCE Save/Restore 4: CONTROL FILE REQUESTS The save/restore function gets all of its information from control files. They contain the following requests. Only one request may be given per line. Any lines in the control files that begin with /, & or " are treated as comments. All white space prior to a request in a line is trimmed. The control files can be edited using the BCE qedx request, or edited while the system is running and updated in the file partition by either using bootload_fs or regeneration of the MST. When a request can have either a long or short name, both names are given here, separated by a comma. However only one can appear per request line. Items in brackets ("[]") are required arguments. Items in braces ("{}") are optional. Requests: tape_set, ts [tape_set_name] where "tape_set_name" is the name of the collection of tapes that are to be used for the save or restore. The name can be up to 32 characters. There must be one of these requests per set. Names might be defined by the color of the tape reel (e.g. the "blue" set or the "red" set). This name becomes part of the tape label of each tape and is checked during a restore. This name will also appear in parenthesis after the program name in all output messages. tape_device, td [tape_device] {density} where "tape_device" is the standard device identifier (i.e. tapa_05) and "density" is in the form "d=NNNN" or "den=NNNN" or "-density NNNN" or "-den NNNN" or "-d NNNN". The default density will be 6250 bpi. The order the devices are entered defines the sequence for using them. Up to 16 devices can be defined per save/restore set. physical_volume, pv [pv_name] [disk_device] {-all} where "pv_name" is the name of the physical volume to be saved or restored. The "disk_device" would be the standard name | "dska_02" or "dske_02c" for sub-volumes. The "-all" argument | specifies that all the vtoc and paging records should be | saved. The "-all" arg has no meaning while doing a restore. | If "-all" is not specified the records to be saved are: all | records from 0 though the last used record of the VTOC and all | used records in the paging region. No partition records are | saved unless requested via the "partition" request. Up to 63 | volumes can be saved or restored per set. BCE Save/Restore MTB-745 partition, part [pv_name] [disk_device] [part_name] | c {... part_name} | where "pv_name" and "disk_device" are as described in the "pv" | request. "part_name" is the name of the partition to be saved | or restored. A part_name of "-all" during a save will allow | saving of all the defined partitions. During a restore "-all" | will allow all saved partitions to be restored and all others | to be zero filled, except for the following special | partitions. The RPV partition "bce" or any "hc" or "alt" | partitions will not be allowed to be saved or restored. If the RPV partitions "conf", "file" or "log" are not specified, when saving the RPV, a message will be displayed that will state that they are not being saved, just in case the operator really wishes to have them saved. The partitions of a PV will be saved along with the standard disk records, in record number sequence, via an internal bit_map. Up to 7 partitions may be defined per volume. Up to 64 partitions may be defined per save/restore set. control_file, cf [control_file] where "control_file" defines another control file to be examined. This enables control files to be linked together. For instance ONE control file could define all the tape devices for the save or restore. The other control files could be broken down into logical volumes that only reference the tape device control file and then define the physical volumes. MTB-745 BCE Save/Restore 5: IOI AT BCE Currently there is only a primitive tape I/O mechanism at BCE that is used to read in the MST. To allow the save/restore to do tape IO to all configured tape devices it is necessary to use the power and flexibility of IOI. This also opens up the door for doing IO to other peripherals. In order to get IOI executable at BCE several changes had to be made. First, all of the IOI modules had to be moved into collection 1.0 of the MST. Then the IOI initialization has to be done as part of collection 1.0 code in real_initializer, instead of in collection 2.0. This is because programs in collections 2.0 & 3.0 do not get setup until the system is brought up (out of BCE). The external flag sys_info$service_system is now used by the IOI modules to control what external programs that are called, calls to lock$wait and lock$unlock are not available in collection 1.0. This flag is not set until collection 3.0 has been loaded. IOI normally uses a segment called "io_page_table_seg" for holding the I/O page table words. While at BCE this is replaced with a "bce_io_page_table" segment that takes on characteristics needed while at BCE. The module ioi_page_table has been changed to manage either segment depending on the value of sys_info$initialization_state (which equals 1 while at BCE, until collection 2.0 runs). On the interrupt side of an IO operation, IOI normally calls pxss$io_wakeup to signal the user process of the IO termination. While running at BCE this mechanism has not been fully enabled. The new module bce_ioi_post is used to signal the I/O completions. This module works very similar to the way bootload_disk_post does for disk IO while at BCE. Prior to doing a tape IO, a buffer is setup in the segment "bce_ioi_post_seg" that contains the IOI event channel. When bce_ioi_post is called after the IO is complete, it locates the post buffer by using the event channel given to it by IOI, copy the IOI message into the buffer and change the buffer state to IO_COMPLETE. It is up to the calling program to poll this buffer for the state change. This same posting mechanism is used for special interrupt processing. If a special interrupt is expected, then a post buffer is setup with a state of WAITING_SPECIAL and when one arrives the state is changed to SPECIAL_ARRIVED. If a special interrupt occurs on a device that is not waiting for a special, it is ignored. BCE Save/Restore MTB-745 6: I/O MANAGEMENT The save/restore module manages both the disk and tape IO, plus sets up a queuing method that allows the data to not have to be moved around in memory, but simply transferred from disk->memory->tape and visa-versa. The area that is used is the IOI workspace created when a tape device is attached. The tape IO is done by using 2 dcws where the first points to the record header, in the IO buffer, and the second to the "page-aligned" data. The disk I/O only needs to reference the page-aligned data. Each IOI workspace uses seven pages, the first for non-data transfer IO, tape status return space and the IO buffers. Each IO buffer contains overhead information, dcw list, tape record header and an index to the data area assigned to the buffer. The following six pages are the "page-aligned" data area that the six IO buffers in the first page reference. The IO | buffers are all threaded together and their state defines what is | to be done. The buffer states are as follows: | FREE buffer available DISK_SUSPEND buffer being setup for disk I/O DISK_QUEUED buffer ready to be read or written DISK_BUSY I/O in progress DISK_READY disk I/O complete TAPE_SUSPEND buffer being setup for tape I/O TAPE_QUEUED buffer ready to be read or written TAPE_BUSY I/O in progress TAPE_READY tape I/O complete Possible buffer state sequences: (SAVE) FREE -> DISK_SUSPEND -> DISK_QUEUED -> DISK_BUSY -> DISK_READY -> TAPE_QUEUED -> TAPE_BUSY -> FREE. (RESTORE) FREE -> TAPE_SUSPEND -> TAPE_QUEUED -> TAPE_BUSY -> TAPE_READY -> DISK_QUEUED -> DISK_BUSY -> FREE. The disk IO is done using read_disk for checking the label, which also does a test of the device (reset-status) before the label read, and bootload_disk_io$queue_(read write) for doing the actual save or restore because of its low overhead. It is necessary to poll for IO completion by calling bootload_disk_io$test_done. During a save all six IO buffers are queued for disk reads. From this point each buffer follows the sequence above in a first-in, first-out (FIFO) fashion. The tape IO is done using ioi_connect. The ioi_masked module has been changed to call bce_ioi_post for IO notification. MTB-745 BCE Save/Restore This version of the program only does single buffer I/Os, where the IDCW has the "continue & marker" bits OFF and the TDCW (to the next buffer) is not used. This allows for a simpler design, but can be expanded in the future to do the I/O like tape_ioi. During a restore all six IO buffers are queued for tape reads. From this point each buffer follows the sequence above in a first-in, first-out (FIFO) fashion. The save/restore program also performs the status checking and IO retry for the tape IO, see "Tape Error Recovery" below for details. For the disk IO it is done by the normal dctl/disk_control modules, which bootload_disk_io calls. 6.1: Tape Error Recovery During a save or restore there are times when errors occur which require special handling. The errors are handled by the program with the use of a new CDS segment called tape_error_data.cds. This data segment contains an array of all the possible major and sub-statuses along with english interpretation, max retries and flags that defines what to do in case the error occurs. 6.1.1: DATA ALERTS The program uses a channel instruction when doing reads that allows the tape controller to perform automatic retry. Read data errors are retried by the program by chaining a backspace-record IDCW before the original read IDCW and reissuing the connect up to eight times. For each retry the channel instruction is incremented. This allows the controller to go through several different margining patterns. If unable to read the data, the error becomes unrecoverable. The recovery procedure will be selected by the operator. One choice would be to perform the retry attempts. Another would be to skip this record and try to read the next. The full list of possibilities are listed below. Retries of write errors are done by chaining two IDCWs, backspace and erase, before the original write IDCW and then reissuing the connect. If unable to write the data after eight retries the error becomes unrecoverable. 6.1.2: UNRECOVERABLE ERRORS These are errors that are either non-retryable or where the retry process failed. When an unrecoverable error occurs a message will be displayed that shows the error interpreted in english, with detailed status in hex if required. The operator will be queried as to the course of action that the program should take. BCE Save/Restore MTB-745 Listed below is an example error output and the possible responses and their meanings. save(blue): Device Attention, Handler check on tapa_12. detailed status: 20 8C 2B 6D 0A 01 16 00 00 16 48 87 24 18 06 00 00 0C 00 00 08 08 80 00 00 00 save: Action: abort This causes the program to abort the entire save/restore and return to BCE command level. retry, r For errors that are retryable this will force the retry process to be redone. It is invalid for non-retryable errors. skip, s This is only valid for data alert errors detected while doing a restore. The unreadable record is skipped and the program continues by attempting to read the next record. stop_set, stop This will cause this SET to be aborted, but all other SETs will continue. restart_set, restart, rt This allows the operator to restart this SET, using the current tape device. The operator is then required to mount the "restart" tape on the device and follow the same procedure as described in the "Restore Restart" section. Once the SET has been restarted, the remaining SETs will continue operation. remove_device_from_set, remove Works like the "restart_set" request above, but removes the current tape device from the SET and sequences to the next device before going through the restart process. This is not a valid response if this is the only tape device left in the SET. help, ? This causes the program to display the above possible responses, with a small description of each. MTB-745 BCE Save/Restore 7: TAPE FORMAT The tape structure is a non-standard format, with some resemblance to the old BOS format. The method of saving a volume at BCE is totally different than any of the other on-line methods like hierarchy backup (which walks the hierarchy) and volume backup (which walks the VTOC). Because of this and the fact that these tapes will only be used to restore a volume at BCE, it was not necessary to conform to the Multics standard tape format (which allows for a simpler and more direct implementation). Each tape record consists of an 8 word header and 1024 words of data. Save structures that are larger than 1024 words have to be written, using several 1024 word records. dcl 1 tape_record aligned based, /* Save Tape Record */ 2 header like rec_header, 2 data (1024) bit (36); 7.1: Record Header Each record on the tape has the following 8 word header. Records of a given type are grouped together on tape. Groups are separated by an EOF mark. The first type of record on a tape must be the TAPE_LABEL. The second type must be the PV_PREAMBLE. The last record on a tape must be the TAPE_EOR, followed by two EOFs. dcl 1 rec_header aligned based, 2 c1 bit (36), /* "542553413076"b3 */ 2 type fixed bin (17) unal, /* record type */ 2 flags unal, 3 end_of_set bit (1), /* valid in TAPE_EOR */ 3 end_of_part bit (1), /* last PV_PART record */ 3 end_of_pv bit (1), /* last PV record */ 3 pad bit (15), 2 rec_on_tape fixed bin (35), /* physical tape rec# */ 2 pvid bit (36), /* origin of data */ 2 rec_on_pv fixed bin (35), /* volume rec# */ 2 rec_in_type fixed bin, /* rec# of cur rec type */ 2 part_name char (4), /* name of partition */ /* when type = PV_PART */ 2 tape_set_uid bit (36); /* unique Tape SET ID */ Structure elements: c1 This word is used as a check to insure that this record contains valid data. The pattern is the reverse of the one used for normal Multics Standard tapes. BCE Save/Restore MTB-745 type This field defines the type of data contained in this record. The values are defined below. end_of_set This bit will be set ON in the End of Reel (TAPE_EOR) record header of the last tape in a save set. Normally the "Info" tape will have enough information to define what tape is the last. If the "Info" tape was not available this bit will define the end. end_of_part This bit will be set ON in the last PV_PART record for a given partition. end_of_pv This bit will be set ON in the last PV_RECORD or PV_PART record for a physical volume. rec_on_tape This contains the current tape record number. pvid This holds the current physical volume unique ID. It is only valid for the PV_(VTOC RECORD PART) records. Otherwise it is set to zero. rec_on_pv This contains the physical volume record number where the data originated. This is used during a restore to place the data back in its original location. rec_in_type This contains the relative record number within a given group of tape records of the same type. This is used during a partition-only restore as part of the partition relocation process. part_name Will hold the partition name when the record type = PV_PART. tape_set_uid Holds a unique ID that is created when the first tape of a save is written and copied to all the remaining tapes in the set. It is used during a restore as part of the tape validation. rec_header.type values: MTB-745 BCE Save/Restore 1 TAPE_LABEL Tape Label Record. Each tape begins with two of these records. The data areas hold the tape label structure defined below. 2 TAPE_EOR Tape End of Reel Record. One is always located at the end of each save tape, followed by two end of files (EOFs). The data area is zero filled. 3 PV_PREAMBLE Physical Volume Preamble Record. There will be one of these written at the start of each volume and one written after the tape label records when starting a new tape. The data area contains the physical volume label as defined in fs_vol_label.incl.pl1. 4 PV_VTOC Physical Volume VTOC Record. The VTOC is defined as being all records from 0 to the end of the VTOC region on the disk. The data area contains the data pages read from the vtoc region of the volume. 5 PV_RECORD Physical Volume Record. These are all disk records that are not part of the VTOC or a partition. The data area contains the data pages read from the paging region of the volume. 6 PV_PART Physical Volume Partition Record. rec_header.part_name defines what partition this data came from. The data area contains the data pages read from the partition. 7.2: Tape Label The tape label is made up of 2 tape records (2048 words). The 2 records, when put together take on the following format. A temp segment is used to hold the contents of the tape label. | dcl 1 tape_label aligned based (tape_label_ptr), 2 version char (8), /* structure version */ 2 title char (32), /* Save/Restore title */ 2 tape_set char (32), /* Save/Restore set */ 2 tape_number char (4), /* tape number in set */ /* or "Info" */ 2 pad1 bit (36), /* pad to even word */ 2 save_time fixed bin (71), /* creation date/time */ 2 vol_array_size fixed bin, /* # of volumes saved */ 2 vol_array_idx fixed bin, /* current volume being processed */ /* = 0 on "Info" tape */ BCE Save/Restore MTB-745 2 tapes_in_set fixed bin, /* valid on "Info" tape */ 2 pad2 (7) fixed bin, /* pad to 32 words */ 2 vol_array (63) like vol_info; /* array of volume info */ Structure elements: version This contains the current version of the tape_label structure. The value currently is "B_S/R001". title Contains the BCE Save/Restore title which is "Multics BCE Save/Restore Tape". tape_set Contains the tape set name that was specified by the tape_set request in the control file (e.g "blue"). tape_number This contains the current tape number. The numbers start at 1 and are stored via an editing picture of "9999". The last tape written as part of a save contains a tape number of "Info", to identify it as the first tape during a restore. save_time Contains the clock value when the save was done. Value is displayed to the operator during a restore process. vol_array_size Defines the number of vol_array entries that are valid for this save set. vol_array_idx Defines the vol_array entry for the physical volume information at the beginning of this tape. All previous vol_array entries will have been completed. tapes_in_set Defines the total number of tapes that were required to perform the save. This is only valid on the "Info" tape, for all others it will be zero. vol_array Area for holding the information pertaining to each volume that is part of the save. Each tape in the set will have a progressively more complete vol_array. The "Info" tape will then contain the "complete" vol_array picture. See the definition of the vol_info structure below for details. MTB-745 BCE Save/Restore 7.3: Volume Info The part of tape label that contains information about each volume that has been saved. Each vol_info entry requires 32 words. | dcl 1 vol_info aligned based (vol_ptr), 2 pvname char (32), /* physical volume name */ 2 pvid bit (36), /* physical volume ID */ 2 data_saved fixed bin, /* amount of data saved */ 2 restart_rec fixed bin (18), /* record saved */ 2 dev_type fixed bin, /* device type */ 2 nregions fixed bin, 2 current_region fixed bin, 2 pad (2) bit (36), 2 region (8), 3 part_name char (4), /* "" for vtoc/paging area */ 3 begins_on_tape fixed bin (18) uns unal, 3 ends_on_tape fixed bin (18) uns unal; Structure elements: pvname Contains the name of the physical volume that the rest of the area defines. pvid Contains the physical volume's unique ID. This is used during a restore to validate the pvid in the record header. data_saved Contains a number that indicates how much of the volume was included in the save. See the defined values below. restart_rec For the first volume written on a tape, this indicates the first disk record that was written. This is used during a restart to define where to start again. dev_type Contains the device type that the volume was on when saved. When doing a restore, the device being restored must be the same type unless only partitions are being restored. See fs_dev_types.incl.pl1. nregions Defines the number of regions that are valid in the "region" area. A region can either define the vtoc/paging region of the volume or one of its partitions. BCE Save/Restore MTB-745 current_region Points to the region being processed, when this volume is the first on a tape. Otherwise it will be the value of nregions. region.part_name Defines the name of the partition that is being described. This will be blank when describing the vtoc/paging region of the volume. region.begins_on_tape Defines the tape number where this region begins. Is used during a restore to define what tape(s) should be mounted. region.ends_on_tape Defines the tape number where this region ends. Is used during a restore to define what tape(s) should be mounted. vol_info.data_saved values: 0 PV_ONLY This indicates that only the VTOC area and | records in the paging region have been saved | (NO partitions). | 1 PART_ONLY This indicates that only volume partition | areas were saved. | 2 BOTH_SAVED This indicates that the VTOC area, records in | the paging region and at least one partition | have been saved. | 7.4: Volume Preamble At the start of every volume and at the beginning of each tape (except the "Info" tape) is a preamble tape record that contains the volume label (see fs_vol_label.incl.pl1). The preamble is preceded with an EOF mark to make it easier for the restore to find the start of a volume. It requires 1 tape record to save the preamble. An area in the tape_label temp segment is used to hold the contents of the volume preamble. dcl 1 vol_preamble aligned like label based (vol_preamble_ptr); | 7.5: Notes The records from the disk are marked as three different kind of tape records. Either PV_VTOC (records before and including the | VTOC), PV_RECORD (normal paging record) or PV_PART (partition | record). An EOF mark is placed between the different types of | MTB-745 BCE Save/Restore records so that when doing a restore of only a partition the partition will be easier to find using forward-space-file commands. The EOR record contains a data field of all zeros. Followed with two EOF marks. 7.6: Example Tape Layout: REEL-1 REEL-2 Tape Label part:1 Tape Label part:1 Tape Label part:2 Tape Label part:2 eof eof Volume Preamble (V1) Volume Preamble (V1) eof eof VTOC record:0 RECORD record:O+1 ... ... VTOC record:N RECORD record:P eof eof PART record:N+1 PART record:P+1 ... ... PART record:M PART record:R eof eof RECORD record:M+1 Volume Preamble (V2) ... eof RECORD record:O (Tape EOT) VTOC record:0 eof ... End-Of-Reel record ... eof etc... eof BCE Save/Restore MTB-745 APPENDIX A: DOCUMENTATION | | In order to properly document these new BCE commands two new info files, one for save and one for restore, are needed and several manuals need to be updated. The following two sub-sections contain the info segments that should be installed in the >doc>ss>bce directory. The two info segments also need to be added to section-9 (BCE Commands) of GB64 (Multics Administration, Maintenance and Operations Commands). The third sub-section describes changes needed in AM81 (Multics System Maintenance Procedures Manual). MTB-745 BCE Save/Restore | A.1: Save Info | 04/30/86 save Syntax as a command: save {-set} CF_1 {... CF_N} {-set CF_1 {... CF_N}} {-restart_set CF_1 {... CF_N}} Function: used to save the contents of physical volumes on tape. It can be used only at BCE (boot) command level. Arguments: CF_1 {... CF_N} defines the name of a control file or set of control files that will makeup a save set. See "List of control file requests" below. At least one and up to 32 control file names may be defined per save. A control file cannot be specified multiple times for a given set, but can be specified in more than one set. This can be used to save a set of volumes to several sets of tapes at one time. Control arguments: -set used to prefix a set of control file names. The first set of control files do not require this prefix, but it is acceptable. Up to four control file sets may be defined. This may be used in combination with the -restart_set control argument. -restart_set, -restart, -rt used to prefix a set of control file names that are to be restarted. This may be used in combination with the -set control argument. List of control file requests: tape_set [tape_set_name], ts [tape_set_name] where "tape_set_name" is the name of the collection of tapes that are to be used for the save. The name can be up to 32 characters. There must be one of these requests per set. Names might be defined by the color of the tape reel (e.g. the "blue" set or the "red" set). This name becomes part of the tape label of each tape and is checked during a restore. This name will also appear in parenthesis after the program name in all output messages. BCE Save/Restore MTB-745 tape_device [tape_device] {density}, td [tape_device] {density} where "tape_device" is the standard device identifier (i.e. tapa_05) and "density" is in the form "d=NNNN", "den=NNNN", "-density NNNN", "-den NNNN" or "-d NNNN". The default density will be 6250 bpi. The order the devices are entered defines the sequence for using them. Up to 16 devices can be defined per save set. physical_volume [pv_name] [disk_device] {-all}, pv [pv_name] [disk_device] {-all} where "pv_name" is the name of the physical volume to be saved. The "disk_device" would be the standard name "dska_02" or "dske_02c" for sub-volumes. The "-all" argument specifies that all the vtoc and paging records should be saved, instead of just saving the paging records that are in use. This also occurs if the volume requires salvaging. The "-all" arg has no meaning while doing a restore. Up to 63 volumes can be saved per set. partition [pv_name] [disk_device] [part_name] {... part_name}, part [pv_name] [disk_device] [part_name] {... part_name} where "pv_name" and "disk_device" are as described in the "pv" request. "part_name" is the name of the partition to be saved or "-all" to save all the defined partitions. The RPV partition "bce" or any "hc" or "alt" partitions will not be allowed to be saved. If the RPV partitions "conf", "file" or "log" are not specified, when saving the RPV, a message will be displayed that will state that they are not being saved, just in case the operator really wishes to have them saved. Up to 7 partitions may be defined per volume. Up to 64 partitions may be defined per save set. control_file [control_file], cf [control_file] where "control_file" defines another control file to be examined. This enables control files to be linked together. For instance ONE control file could define all the tape devices for the save. The other control files could be broken down into logical volumes that only reference the tape device control file and then define the physical volumes. Up to 32 control file names may be defined per save. Notes on control file requests: Only one request may be given per line. Any lines in a control file that begin with /, & or " are treated as comments. All white space prior to a request in a line is trimmed. MTB-745 BCE Save/Restore Partitions on a physical volume can be saved without having to save the vtoc and paging regions by only defining a partition request. The control files can be edited using the BCE qedx request, or edited while the system is running and updated in the file partition by either using bootload_fs or regeneration of the MST. Notes on save: When a save set is complete it is necessary to write one last tape, called the "Info" tape, that will contain information used during a restore to quickly locate the tapes that items are on. Notes on operator interrupts: A save can be interrupted by use of the console "request" key. When depressed while a save is in progress, the message "save: Abort request:" will appear. The operator will be required to input one of the following responses. no, n This causes the program to ignore the request and resume the save. abort This causes the program to abort the entire save and return to BCE command level. restart TAPE_SET This allows the operator to restart the specified TAPE_SET, using its current tape device. The operator is then required to mount the "restart" tape on the device, which is either the last good tape written or the current tape (as long as the tape label has been written). Once the SET has been restarted, the remaining SETs will continue operation. stop TAPE_SET This causes the program to abort the specified TAPE_SET, by marking it complete, and resume the save of the other sets. help, ? This causes the program to display the above possible responses, with a small description of each. Notes on tape error recovery: During a save there are times when errors occur which require special handling. Retries of write errors are done by doing a backspace and erase followed by the BCE Save/Restore MTB-745 original write. If unable to write the data after eight retries the error becomes unrecoverable. When an unrecoverable error occurs a message will be displayed that shows the error interpreted in english, with detailed status in hex if required. The operator will be queried as to the course of action that the program should take. Listed below is an example error output and the possible responses and their meanings. save(blue): Device Attention, Handler check on tapa_12. detailed status: 20 8C 2B 6D 0A 01 16 00 00 16 48 87 24 18 06 00 00 0C 00 00 08 08 80 00 00 00 save: Action: abort This causes the program to abort the entire save and return to BCE command level. retry, r For errors that are retryable this will force the retry process to be redone. It is invalid for non-retryable errors. stop_set, stop This will cause this SET to be aborted, but all other SETs will continue. restart_set, restart, rt This allows the operator to restart this SET, using the current tape device. The operator is then required to mount the "restart" tape on the device. Once the SET has been restarted, the remaining SETs will continue operation. remove_device_from_set, remove Works like the "restart_set" request above, but removes the current tape device from the SET and sequences to the next device before going through the restart process. This is not a valid response if this is the only tape device left in the SET. help, ? This causes the program to display the above possible responses, with a small description of each. MTB-745 BCE Save/Restore | A.2: Restore Info | 04/30/86 restore Syntax as a command: restore {-set} CF_1 {... CF_N} {-set CF_1 {... CF_N}} {-restart_set CF_1 {... CF_N}} Function: used to restore the contents of physical volumes from tape. It can be used only at BCE (boot) command level. Arguments: CF_1 {... CF_N} defines the name of a control file or set of control files that will makeup a restore set. See "List of control file requests" below. At least one and up to 32 control file names may be defined per restore. Control arguments: -set used to prefix a set of control file names. The first set of control files do not require this prefix, but it is acceptable. Up to four control file sets may be defined. This may be used in combination with the -restart_set control argument. -restart_set, -restart, -rt used to prefix a set of control file names that are to be restarted. This may be used in combination with the -set control argument. List of control file requests: tape_set [tape_set_name], ts [tape_set_name] where "tape_set_name" is the name of the collection of tapes that are to be used for the restore. The name can be up to 32 characters. There must be one of these requests per set. Names might be defined by the color of the tape reel (e.g. the "blue" set or the "red" set). This name is part of the tape label and is checked during at each tape mount. This name will also appear in parenthesis after the program name in all output messages. tape_device [tape_device] {density}, td [tape_device] {density} where "tape_device" is the standard device identifier (i.e. tapa_05) and "density" is in the form "d=NNNN", "den=NNNN", BCE Save/Restore MTB-745 "-density NNNN", "-den NNNN" or "-d NNNN". The density is only needed during a save. During a restore the save tape will define the density. The order the devices are entered defines the sequence for using them. Up to 16 devices can be defined per restore set. physical_volume [pv_name] [disk_device], pv [pv_name] [disk_device] where "pv_name" is the name of the physical volume to be restored. The "disk_device" would be the standard name "dska_02" or "dske_02c" for sub-volumes. Up to 63 volumes can be restored per set. partition [pv_name] [disk_device] [part_name] {... part_name}, part [pv_name] [disk_device] [part_name] {... part_name} where "pv_name" and "disk_device" are as described in the "pv" request. "part_name" is the name of the partition to be restored or "-all" to restore all the partitions that were saved. If "-all" is specified then all partitions defined on the volume that are not restored will be zero filled, except for any "alt" or "hc" partitions and the "bce" partition on the rpv. Up to 64 partitions may be defined per restore set. control_file [control_file], cf [control_file] where "control_file" defines another control file to be examined. This enables control files to be linked together. For instance ONE control file could define all the tape devices for the restore. The other control files could be broken down into logical volumes that only reference the tape device control file and then define the physical volumes. Up to 32 control file names may be defined per restore. Notes on control file requests: Only one request may be given per line. Any lines in a control file that begin with /, & or " are treated as comments. All white space prior to a request in a line is trimmed before processing. Partitions on a physical volume can be restored without having to restore the vtoc and paging regions by only defining a partition request. This can also be used to copy a partition from one volume to another, even of different types. The control files can be edited using the BCE qedx request, or edited while the system is running and updated in the file partition by either using bootload_fs or regeneration of the MST. MTB-745 BCE Save/Restore Notes on restore: The first tape read during a restore is always the "Info" tape, which was the last tape written when the set was saved. This gives the restore information necessary to properly locate items without wasting time spinning tape. Notes on operator interrupts: A restore can be interrupted by use of the console "request" key. When depressed while a restore is in progress, the message "restore: Abort request:" will appear. The operator will be required to input one of the following responses. no, n This causes the program to ignore the request and resume the restore. abort This causes the program to abort the entire restore and return to BCE command level. restart TAPE_SET This allows the operator to restart the specified TAPE_SET, using its current tape device. The operator is then required to mount the "restart" tape on the device, which is the tape that the operator wishs to restart from. Once the SET has been restarted, the remaining SETs will continue operation. stop TAPE_SET This causes the program to abort the specified TAPE_SET, by marking it complete, and resume the restore of the other sets. help, ? This causes the program to display the above possible responses, with a small description of each. Notes on tape error recovery: During a restore there are times when errors occur which require special handling. Read data errors are retried by the program up to eight times. If unable to read the data, the error becomes unrecoverable. The recovery procedure will be selected by the operator. One choice would be to perform the retry attempts again. Another would be to skip this record and try to read the next. The full list of possibilities are listed below. When an unrecoverable error occurs a message will be displayed that shows the error interpreted in english, with detailed status in hex if required. The operator will be queried as to the course of action that the program should take. Listed below is BCE Save/Restore MTB-745 an example error output and the possible responses and their meanings. restore(blue): Device Attention, Handler check on tapa_12. detailed status: 20 8C 2B 6D 0A 01 16 00 00 16 48 87 24 18 06 00 00 0C 00 00 08 08 80 00 00 00 restore: Action: abort This causes the program to abort the entire restore and return to BCE command level. retry, r For errors that are retryable this will force the retry process to be redone. It is invalid for non-retryable errors. skip, s This is only valid for unrecoverable data alert errors detected while doing a restore. The unreadable record is skipped and the program continues by attempting to read the next record. stop_set, stop This will cause this SET to be aborted, but all other SETs will continue. restart_set, restart, rt This allows the operator to restart this SET, using the current tape device. The operator is then required to mount the "restart" tape on the device. Once the SET has been restarted, the remaining SETs will continue operation. remove_device_from_set, remove Works like the "restart_set" request above, but removes the current tape device from the SET and sequences to the next device before going through the restart process. This is not a valid response if this is the only tape device left in the SET. help, ? This causes the program to display the above possible responses, with a small description of each. MTB-745 BCE Save/Restore | A.3: AM81 Changes | | This sub-section contains the changes required to document BCE | Save/Restore in place of the current use of BOS SAVE/RESTOR. A | future MTB/MCR or an update of MTB737 (Dipper Documentation) will | describe the changes required to replace the other BOS functions | with BCE functions (e.g. BCE TEST_DISK instead of BOS TEST and | BCE COPY_DISK instead of BOS SAVE COPY.). However some instances | of "BOS TEST" have been changed to "BCE TEST_DISK", because it | didn't feel right to leave in the old command. | | | A.3.1: SECTION-1 | | ***** On page 1-3 the definition of "BCE" needs to include the | ability to save and restore disk volumes. | | | A.3.2: SECTION-9 | | ***** On page 9-11 the reference to BOS SAVE needs to be "BCE | SAVE". | | | A.3.3: SECTION-10 | | ***** On page 10-27 the references to "BOS RESTOR" & "RESTOR" need to be changed to "BCE RESTORE". ***** References to "BOS SAVE" & "BOS RESTOR" on pages 10-38,39 & 40 will now be "BCE SAVE" & "BCE RESTORE". ***** Also on page 10-40 under the heading "BACKUP TAPE LOGS" the second paragraph needs to be changed to read something like this: For a BCE SAVE, the tape volume name consists of two parts the tape set name (e.g. blue, root, June) and reel number (i.e. 1-9999). A BCE SAVE tape set is a collection of reels numbered from 1 to N and a locator tape called the "Info" tape. The "Info" tape contains the names of all the volumes and partitions that were saved, and the coresponding tape reels that contain this information. This tape is always the first tape read during a BCE RESTORE to allow for program control over tape mounts. The log should identify the tapes used for each set and the physical volumes saved in the set. For a hierarchy reload, the log should identify the tapes included in each incremental, catchup and complete dump set. BCE Save/Restore MTB-745 ***** References to "BOS SAVE/RESTOR", "BOS SAVE" & "BOS RESTOR" on pages 10-42 & 43 will now be "BCE SAVE/RESTORE", "BCE SAVE" & "BCE RESTORE". ***** On page 10-44, the section titled "Recovery of the RPV with Volume Reloading" needs to be changed to read: Recovery of the RPV with Volume Reloading If a disk volume failure occurs for the RPV, the following procedure can be used to recover the contents of the RPV from volume backup tapes. See Section 9 for general information and more details on volume backup and volume reloading. All of the commands used in this procedure are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described above under "Recovering From Disk Failures." If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the original RPV volume, or to recover its data onto a spare disk volume, you will need to boot BCE, and Multics on a | temporary RPV. This temporary RPV may be obtained in any of the following ways: o If your site has prepared a one- or two-volume "test system" for hardware and software checkout purposes, you can boot this test system for use in testing and reloading the original RPV. o If you have BCE SAVE tapes for the original RPV, and a spare | disk volume, you can RESTORE these save tapes onto the spare | disk volume for use as the temporary RPV. The actual data | on the temporary RPV is not important since it will not become part of the production hierarchy; an older set of SAVE tapes can be used, as long as the saved RPV is for the Multics release you are currently running. You will have to boot BCE on the temporary RPV, and specify | "cold" to the "Enter rpv data:" prompt to allow the | temporary RPV to be properly initialized. After restoring | the RPV, remember to update the root and part configuration | cards to describe only the temporary RPV. | MTB-745 BCE Save/Restore o If you have neither a "test system" nor SAVE tapes for an RPV, you can perform a cold boot of Multics on a spare disk volume to create the temporary RPV. To perform the cold boot, follow the procedures in the Installation Instructions for the release you are running. Spare disk volumes should be properly formatted and tested as described above under "Preformatted Disk Volumes." | 3. Boot BCE on the temporary RPV, as described in the Operators' Guide to Multics, Order No. GB61. 4. If your Customer Service Representative believes there has been no physical damage to the original RPV disk volume, | attempt to read it using the BCE TEST_DISK command, as described above under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original RPV, follow the procedures above under "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original RPV is only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures above under "Recovering from Partial Disk Volume Failure," and skip the rest of these steps. The steps below attempt to reload RPV information from volume backup tapes onto a spare disk volume. These steps assume that the original RPV volume is totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that the original RPV is physically damaged (i.e., scratched or warped), then replace the RPV with a spare volume which has already been formatted and tested, as described above under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original RPV. | 7. Boot Multics on the temporary RPV, coming up to Multics ring 1 command level, as described in the Operators' Guide to Multics, Order No. GB61. 8. Mount the disk volume to be reloaded on any available drive. If necessary, convert the drive to a storage system drive, using the set_drive_usage command. For example: sdu dska_04 ss 9. Issue an init_vol command with the -copy control argument. Issue directions to init_vol to define the number of VTOC entries and the partition names and sizes as they were on the destroyed disk volume. Your site should have hardcopy BCE Save/Restore MTB-745 printouts of this disk label information available at all times, as described above under "Disk Volume Layout Information." Note that you may request more VTOC entries on the volume being reloaded than were on the destroyed RPV, but you cannot decrease this number. You may increase or decrease the sizes of partitions on the new RPV, or add or delete partitions. However, if you do change the partition layout, then you will not be able to copy the contents of partitions (such as the LOG and DUMP partitions) from the damaged RPV onto the reloaded RPV. Remember to include an alternate track partition for a removable disk volume, if the disk volume being reloaded has been formatted with alternate track assignments. 10. Convert the disk drive on which the new RPV is mounted to an I/O drive, using the set_drive_usage command. For example: sdu dska_04 io 11. Recover the volume log for the RPV using the recover_volume_log command with the -wd control argument. For example: recover_volume_log rpv -wd Mount the last volume backup tape for the volume backup group which includes the RPV. The volume name of the last tape should be recorded in the tape log, as described above under "Backup Tape Logs." If volume backup operations were ongoing at the time of disk failure, you should mount the tape which was being written at the time of failure. 12. Reload the new RPV using the volume reloader, by issuing the reload_volume command with the -pvname, -operator, and -wd control arguments. For example: reload_volume -pvname rpv -operator Jones -wd Mount tapes as requested by the reload_volume command. When all tapes have been reloaded, continue with the next step. 13. Shutdown Multics on the temporary RPV. | 14. If the RPV was reloaded onto a spare volume and the original RPV is partially readable, you may want to try to copy the contents of the CONF, FILE, DUMP and LOG partitions onto the | new RPV, as described below under "Recovery of Partitions | after RLV Volume Recovery". | MTB-745 BCE Save/Restore 15. If the newly reloaded RPV is not mounted on the proper disk drive for normal operation, move the new RPV to the proper disk drive. | 16. Boot BCE on the newly reloaded RPV, according to normal site | procedures. If reloading was performed on a spare disk | volume rather than on the original RPV, then the contents of | the CONF, BCE and FILE partitions have been lost. In BCE, | you will have to reload the config deck from a config file read off the BCE tape, using the BCE "config <deckname>" command. Make adjustments to the configuration file as necessary, to reflect the current hardware configuration and disk volume locations. 17. Boot Multics according to normal site procedures. 18. Perform the procedures for salvaging, quota adjustment, and connection failure detection described below under "Disk Volume Post-Recovery Procedures." This completes recovery of the RPV. ***** On page 10-46, the section titled "Recovery of a NonRPV Root Volume with Volume Reloading" needs to be changed to read: Recovery of a NonRPV Root Volume with Volume Reloading If a disk volume failure occurs on a volume which is part of the Root Logical Volume (RLV) but is not the RPV, the following procedure can be used to recover the contents of that volume from volume backup tapes. See Section 9 for general information and more details on volume backup and volume reloading. All of the commands used in this procedure are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described above under "Recovering From Disk Failures." If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the original root volume, or to recover its data onto a | spare disk volume, you will need to boot BCE, and Multics on the | RPV. | 3. Boot BCE on the RPV, as described in the Operators' Guide to Multics, Order No. GB61. BCE Save/Restore MTB-745 4. If your Customer Service Representative believes there has been no physical damage to the original root disk volume, attempt to read it using the BCE TEST_DISK command, as | described above under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original root volume, follow the procedures described above under "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original root volume is only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures above under "Recovering from Partial Disk Volume Failure," and skip the rest of these steps. The steps below attempt to reload root volume information from volume backup tapes onto a spare disk volume. These steps assume that the original root volume is totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that the original root volume is physically damaged (i.e., scratched or warped), then replace it with a spare volume which has already been formatted and tested, as described above under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original root volume. 7. Remove all disk volumes from the root config card, except * for the RPV. If any part config cards identify the damaged disk volume, remove those part cards from the config deck. 8. Boot Multics on the RPV, coming up to Multics ring 1 command level, as described in the Operators' Guide to Multics, Order No. GB61. 9. Mount the disk volume to be reloaded on any available drive. If necessary, convert the drive to a storage system drive, using the set_drive_usage command. For example: sdu dska_05 ss 10. Issue an init_vol command with the -special control argument. Issue directions to init_vol to define the number of VTOC entries and the partition names and sizes as they were on the destroyed disk volume. Your site should have hardcopy printouts of this disk label information available at all times, as described above under "Disk Volume Layout Information." Note that you may request more VTOC entries on the volume being reloaded than were on the damaged root volume, but you cannot decrease this number. You may increase or decrease MTB-745 BCE Save/Restore the sizes of partitions on the new root volume, or add or * delete partitions. Remember to include an alternate track partition for a removable disk volume, if the disk volume being reloaded has been formatted with alternate track assignments. 11. Convert the disk drive on which the new root volume is mounted to an I/O drive, using the set_drive_usage command. For example: sdu dska_05 io 12. Recover the volume log for the root volume using the recover_volume_log command with the -wd control argument. For example: recover_volume_log root2 -wd Mount the last volume backup tape for the volume backup group which includes the RLV. The volume name of the last tape should be recorded in the tape log, as described above under "Backup Tape Logs." If volume backup operations were ongoing at the time of disk failure, you should mount the tape which was being written at the time of failure. 13. Reload the new root volume using the volume reloader, by issuing the reload_volume command with the -pvname, -operator, and -wd control arguments. For example: reload_volume -pvname root2 -operator Jones -wd Mount tapes as requested by the reload_volume command. When all tapes have been reloaded, continue with the next step. 14. Shutdown the Multics running on the RPV. 15. Restore the root and part config cards to their normal values, either by retyping the changed cards or by issuing the BCE "config <deckname>" command to load a new copy of the config deck from a BCE file. * 16. If the root volume was reloaded onto a spare volume and the original volume is partially readable, you may want to try to copy the contents of the DUMP and LOG partitions onto the new RPV, if these partitions were on the damaged root | volume. Follow the procedure described below under | "Recovery of Partitions after RLV Volume Recovery". 17. If the newly reloaded root volume is not mounted on the proper disk drive for normal operation, move the volume to the proper disk drive. BCE Save/Restore MTB-745 18. Boot BCE on the RPV, according to normal site procedures. | Make adjustments to the configuration file as necessary, to reflect the current hardware configuration and disk volume locations. 19. Boot Multics according to normal site procedures. 20. Perform the procedures for salvaging, quota adjustment, and connection failure detection described below under "Disk Volume Post-Recovery Procedures." This completes recovery of the root volume. ***** On page 10-49, the section titled "Recovery of a NonRoot Volume with Volume Reloading", info upto and including item 7 need to be changed to read: Recovery of a NonRoot Volume with Volume Reloading If a disk volume failure occurs on a volume which is not part of the Root Logical Volume (RLV), the following procedure can be used to recover the contents of that volume from volume backup tapes. See Section 9 for general information and more details on volume backup and volume reloading. All of the commands used in this procedure are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described above under "Recovering From Disk Failures." If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the original volume, or to recover its data onto a spare disk volume, you will need to boot BCE, and Multics on the RLV. | 3. Boot BCE, as described in the Operators' Guide to Multics, | Order No. GB61. 4. If your Customer Service Representative believes there has been no physical damage to the original disk volume, attempt | to read it using the BCE TEST_DISK command, as described | above under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original volume, follow the procedures above under MTB-745 BCE Save/Restore "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original volume is only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures described above under "Recovering from Partial Disk Volume Failure," and skip the rest of these steps. The steps below attempt to reload information from volume backup tapes onto a spare disk volume. These steps assume that the original volume is totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that the original volume is physically damaged (i.e., scratched or warped), then replace it with a spare volume which has already been formatted and tested, as described above under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original disk volume. | 7. Boot Multics on the RLV, coming up to Multics ring 1 command level, as described in the Operators' Guide to Multics, | Order No. GB61. | | | A.3.4: SECTION-12 | | ***** References to "BOS SAVE" on page 12-4 will now be "BCE SAVE". ***** On page 12-7, the section titled "How to Restart SAVE and RESTOR" needs to be deleted and the following section added in its place: | BCE Save and Restore | 1. What Makes Up A Physical Volume Set. | The save and restore commands allow for saving or | restoring upto four sets of physical volumes at one time. A | volume set is defined as all the physical volumes and | partitions described in a control file or several control | files that are to be saved in one tape set. The syntax of | the commands allow for multiple control files to be defined | for a set. These sets are defined by the parameters | following the "-set" or "-restart" control arguments. See | Multics Administration, Maintenance and Operations Commands | manual, Order No. GB64, for a description of the syntax of | the commands. | 2. What Makes Up A Tape Set. BCE Save/Restore MTB-745 A tape set is defined as the collection of tapes | required to save a set of physical volumes. The tape reels | are numbered 1 to N+1, and the "Info" reel. Each tape label | contains the name of the set as defined by the "tape_set" | control file request. The "Info" tape is the last tape | written during a save, and the first tape read during a | restore. This info tape contains information that relates | the numbered tape reels and the physical volumes saved. | This information aids in tape mount requests and allows for | partial restores. | 3. How To Create A Control File. | The first step required when setting up for a save or | restore is to create the necessary control file(s) that | define the tape set name; tape devices; physical volumes; | and partitions in the volume set. The control file requests | are described in the description of the save/restore | commands in the Multics Administration, Maintenance, and | Operations Commands, Order No. GB64. | An sample control file is shown being created, at BCE, | below. | qx | a | " Save/Restore Tape devices. | tape_device tapa_02 -density 6250 | tape_device tapa_05 -density 6250 | tape_device tapb_03 -density 6250 | f | w save_tapes | b1 | a | " Save/Restore control file for the ROOT logical volume.| tape_set ROOT | physical_volume rpv dska_01 | partition rpv dska_01 conf file log dump | physical_volume root2 dska_02 | physical_volume root3 dska_03 | physical_volume root4 dska_04 | f | w root_lv | q | The tape devices were defined in a separate control file so | that they can be used with several physical volume control | files, during separate saves or restores. | 4. How To Execute A Save And What Messages Are Displayed. | MTB-745 BCE Save/Restore | Once the control files have been properly setup the operator | can then begin the save process by typing the following: | save -set save_tapes root_lv | The tape devices are polled and verified to be accessible | and capable of the requested density. If problems are | detected a message is displayed and the device is removed | from the list of available devices. This list is displayed | in the order that the drives will be used. | save(ROOT): The following tape devices will be used: | tapa_02 tapa_05 tapb_03 | A check is made to insure the physical volume requests match | the corresponding disk packs. For each physical volume a | message is displayed. Errors are noted by (***) in column | 76-78, not shown here. | save(ROOT): Multics Storage System Volume rpv on dska_01 | Last updated: 05/08/86 1209.2 mst Fri | Partition conf: 3908 for 4 records | Partition file: 33836 for 255 records | Partition dump: 34091 for 3500 records | Partition log: 37591 for 256 records | save(ROOT): Multics Storage System Volume root2 on dska_02 | Last updated: 05/08/86 1209.2 mst Fri | save(ROOT): Multics Storage System Volume root3 on dska_03 | Last updated: 05/08/86 1209.2 mst Fri | save(ROOT): Multics Storage System Volume root4 on dska_04 | Last updated: 05/08/86 1209.2 mst Fri | If multiple physical volume sets were requested, then the | above sequence would be repeated for each. After all the | sets are examined the following operator query will be | displayed. | save: Would you like to continue? | At this point the output messages can be examined, If all is | correct and acceptable a "yes" response causes the save to | begin. If any problems need corrected, a "no" response will | abort the save and return to BCE command level. After | corrections are made the save request can then be | re-entered. BCE Save/Restore MTB-745 CAUTION: Any tape that is mounted, with a write ring | present, will be considered as a pre-mounted save tape and | will be written on when the tape device is selected. | If a tape is not mounted the following message will be | displayed. | save(ROOT): Please mount tape# 1 on tapa_01. | If after two minutes no tape has been mounted the following | operator query is displayed. | save(ROOT): Would you like to skip to the next tape device? | One of the following responses must be entered. | yes, y | This device is skipped and the next device is selected. | The tape mount is then checked in the same manner. The | skipped device remains in the list of available tape | devices. | no, n | The device is not skipped. The mount, for this device, | is checked again in the same manner. | remove | This device is removed from the list and the next device | is selected. The tape mount is then checked in the same | manner. | help, ? | This displays the possible responses. | Once a tape is mounted the save process can continue. | Displayed below are the messages that will be displayed as | the save progresses. This example assumes that tapes have | been pre-mounted on devices tapa_05 and tapb_03. | save(ROOT): Volume rpv, record 0, on tape# 1 (tapa_02) | save(ROOT): Partition conf on rpv, record 3908, on tape# 1 | c (tapa_02) | save(ROOT): Partition file on rpv, record 33836, on tape# 1| c (tapa_02) | save(ROOT): Partition dump on rpv, record 34091, on tape# 1| c (tapa_02) | save(ROOT): Partition log on rpv, record 37591, on tape# 1| c (tapa_02) | save(ROOT): Volume root2, record 0, on tape# 1 (tapa_02) | save(ROOT): Volume root3, record 0, on tape# 1 (tapa_02) | save(ROOT): Unloading tape# 1 from tapa_02, 23537 records | c (12 errors) | MTB-745 BCE Save/Restore | save(ROOT): Volume root3, record 4356, on tape# 2 (tapa_05) | save(ROOT): Volume root4, record 0, on tape# 2 (tapa_05) | save(ROOT): Unloading tape# 2 from tapa_05, 5477 records | save(ROOT): OK to write "Info" tape on tapb_03? | The above query allows for pre-assigned "Info" tapes. If | answered "yes" the current tape is used; if answered "no" | the tape will be dismounted and the following will occur. | save(ROOT): Unloading tapb_03 | save(ROOT): Please mount the "Info" tape on tapb_03. | After the correct "Info" tape has been mounted and written | the following is displayed indicating the completion of the | save request. | save(ROOT): Unloading "Info" tape from tapb_03, 3 records | save(ROOT): save complete... | 5. How To Abort A Save. | A save can be interrupted by use of the console "request" | key. When depressed while a save is in progress, the | following prompt will appear. | save: Abort request: | The operator will be required to input one of the following | responses. | no, n | This causes the request to be ignored and the save to | continue. | abort | This aborts all save sets and returns to BCE command | level. | restart TAPE_SET | This allows the operator to restart the specified | TAPE_SET, using its current tape device. The operator is | then required to mount the "restart" tape on the device | and follow the procedure as described below under "How To | Restart A Save". Once the SET has been restarted, the | remaining SETs will continue operation. | stop TAPE_SET | This aborts the specified TAPE_SET, and resume the | process for the other sets. BCE Save/Restore MTB-745 help, ? | This displays the possible responses, with a small | description of each. | 6. How To Restart A Save. | Due to various problems that may arise while performing a | save, it may be necessary to restart a set. | The restart operation can be invoked in one of three ways: | o "-restart_set" argument in the command line. | o "restart TAPE_SET" response to the "Abort request" | above. (See "How to Abort A Save" in this section.) | o "restart_set" or "remove_device_from_set" response in | error recovery. (See "How To Recover From | Unrecoverable Tape Errors" later in this section.) | Restarting consists of skipping all volumes and/or | partitions that have been successfully saved, restarting the | save of a volume somewhere in the middle and then continuing | normally with the remaining volumes. | A restart must always start at the beginning of a tape. | This means that the last tape label that was successfully | written holds all the information of where to restart. | The tape label is read from the save tape that the operator | wishes to restart from. If the tape is not already mounted | the following is displayed and the normal mount procedure | executed. | save(ROOT): Please mount the "restart" tape on tapa_02. | save(ROOT): Tape# 2 on tapa_02, created 05/08/86 1535.3 mst | c Thu | After the tape label has been read the tape creation time is | checked. If the time is older than one week the tape is | rejected. This involves unloading the current tape and | asking that another be mounted. | The tape label information is used to locate all the volumes | that can be skipped and what record number to start at when | rewritting the tape. The following messages are displayed. | save(ROOT): Skipping volume rpv on dska_01. | save(ROOT): Skipping volume root2 on dska_02. | save(ROOT): Starting from record 4356 of volume root3 on | c dska_03. | MTB-745 BCE Save/Restore | The operator is then queried with the following: | save(ROOT): Do you want to replace or rewrite tape# 2 on | c tapa_02? | This query gives the operator the chance to select a | different tape reel, in case the previous save was aborted | because this tape contained too many errors. Below are the | possible responses. | replace, rep | This will cause the current tape to be unloaded and a new | tape requested in its place. | rewrite, rew | The tape will be rewound and used when the save begins | again. | From this point on the save resumes normal operation. | 7. How To Execute A Restore And What Messages Are Displayed. | Once the control files have been properly setup the operator | can then begin the restore process by typing the following: | restore -set save_tapes root_lv | The tape devices are polled and verified to be accessible. | If problems are detected a message is displayed and the | device is removed from the list of available devices. This | list is displayed in the order that the drives will be used. | restore(ROOT): The following tape devices will be used: | tapa_02 tapa_05 tapb_03 | At this time the program needs to read in the contents of | the "Info" save tape. This tape contains the list of | volumes and partitions that were saved and the starting and | ending tape number for each. This tape is the last tape | written as part of a save. This tape allows program control | over what tapes are mounted, which saves alot of time in | searching tapes. | The program now attempts to read the tape on the first | device in the list, but if a tape is not mounted the | following will appear. | restore(ROOT): Please mount the "Info" tape on tapa_02. | If the tape read does not contain a label of "Info" then the | program queries the operator to find out if the "Info" tape BCE Save/Restore MTB-745 is available. If the operator answers "no" then the program | will use the label information from the current tape in | place of the "Info" data, which is the same format but not | as complete. If the operator answers "yes" then the current | tape is unloaded and the mount/label read process is | restarted. | If the "Info" tape is not available, then the save tape | closest to the end of the save should be read in its place. | This will give the program the greatest amount of | information. | The volumes to be restored are sorted so that they are in | the same order as they were saved. Each of the disk labels | are read and a display/check of the information is done. If | a problem is detected the volume is removed from the | "to-be-processed" list. This process is duplicated for each | restore SET. Below is an example of the information that is | displayed. Messages that indicate a possible problem will | have (***) in column 76-78, not shown here. | restore(ROOT): Multics Storage System Volume rpv on dska_01 | Last updated: 05/08/86 1209.2 mst Fri | restore(ROOT): Multics Storage System Volume root2 on dska_02| Last updated: 05/08/86 1209.2 mst Fri | restore(ROOT): Multics Storage System Volume root3 on dska_03| Last updated: 05/08/86 1209.2 mst Fri | restore(ROOT): Multics Storage System Volume root4 on dska_04| Last updated: 05/08/86 1209.2 mst Fri | If multiple physical volume sets were requested, then the | above sequence would be repeated for each. After all the | sets are examined the following operator query will be | displayed. | restore: Would you like to continue? | At this point the output messages can be examined, If all is | correct and acceptable a "yes" response causes the restore | to begin. If any problems need corrected, a "no" response | will abort the restore and return to BCE command level. | After corrections are made the restore request can then be | re-entered. | The program now knows the first tape to be read from the | label information or at least a best guess if the first tape | read was not the "Info" tape. It attempts to read this tape | on the next tape device in the list. If the tape read is | MTB-745 BCE Save/Restore | not the correct tape or no tape is mounted the following | message is displayed. | restore(ROOT): Please mount tape# 1 on tapa_02. | If after two minutes no tape has been mounted the following | operator query is displayed. | restore(ROOT): Would you like to skip to the next tape | c device? | One of the following responses must be entered. | yes, y | This device is skipped and the next device is selected. | The tape mount is then checked in the same manner. The | skipped device remains in the list of available tape | devices. | no, n | The device is not skipped. The mount, for this device, | is checked again in the same manner. | remove | This device is removed from the list and the next device | is selected. The tape mount is then checked in the same | manner. | help, ? | This displays the possible responses. | After a successful read of the current tape label, the | program will check to see if another tape in the set is | needed. If the tape will be needed a pre-mount message will | be displayed. Shown below is an example sequence of events | during a restore process. | restore(ROOT): Tape# 1 on tapa_02, created 05/08/86 1525.0 | c mst Thu | restore(ROOT): Please pre-mount tape# 2 on tapa_05. | restore(ROOT): Volume rpv, record 0, on tape# 1 (tapa_01) | restore(ROOT): Partition conf on rpv, record 3908, on | c tape# 1 (tapa_02) | restore(ROOT): Partition file on rpv, record 33836, on | c tape# 1 (tapa_02) | restore(ROOT): Partition dump on rpv, record 34091, on | c tape# 1 (tapa_02) | restore(ROOT): Partition log on rpv, record 37591, on | c tape# 1 (tapa_02) | restore(ROOT): Volume root2, record 0, on tape# 1 (tapa_02) | restore(ROOT): Volume root3, record 0, on tape# 1 (tapa_02) | restore(ROOT): Unloading tape# 1 from tapa_02, 23537 records BCE Save/Restore MTB-745 restore(ROOT): Tape# 2 on tapa_05, created 05/08/86 1535.3 | c mst Thu | restore(ROOT): Volume root3, record 4356, on tape# 2 | c (tapa_05) | restore(ROOT): Volume root4, record 0, on tape# 2 (tapa_05) | restore(ROOT): Unloading tape# 2 from tapa_05, 5477 records | restore(ROOT): restore complete... | 8. How To Abort A Restore. | A restore set can be interrupted by use of the console | "request" key. When depressed while a restore is in | progress, the following prompt will appear. | restore: Abort request: | The operator will be required to input one of the following | responses. | no, n | This causes the request to be ignored and the restore to | continue. | abort | This aborts all restore sets and returns to BCE command | level. | restart TAPE_SET | This allows the operator to restart the specified | TAPE_SET, using its current tape device. The operator is | then required to mount the "restart" tape on the device | and follow the procedure as described below under "How To | Restart A Restore". Once the SET has been restarted, the | remaining SETs will continue operation. | stop TAPE_SET | This aborts the specified TAPE_SET, and resume the | process for the other sets. | help, ? | This displays the possible responses, with a small | description of each. | 9. How To Restart A Restore. | Due to various problems that may arise while performing a | restore, it may be necessary to restart a set. | The restart operation can be invoked in one of three ways: | o "-restart_set" argument in the command line. | MTB-745 BCE Save/Restore | o "restart TAPE_SET" response to the "Abort request" | above. (See "How to Abort A Restore" in this section.) | o "restart_set" or "remove_device_from_set" response in | error recovery. (See "How To Recover From | Unrecoverable Tape Errors" later in this section.) | Restarting consists of skipping all volumes and/or | partitions that have been successfully restored, restarting | the restore of a volume somewhere in the middle and then | continuing normally with the remaining volumes. | If restarting from the command line, then the "Info" tape | must still be read before the "restart" tape. | The tape label is read from the save tape that the operator | wishes to restart from. If the tape is not already mounted | the following is displayed and the normal mount procedure | executed. | restore(ROOT): Please mount the "restart" tape on tapa_02. | restore(ROOT): Tape# 2 on tapa_02, created 05/08/86 1535.3 | c mst Thu | From the tape label the program can determine which volumes | were completed on previous tapes and skip them. It then | restarts the restore of the first volume on the tape that | has been requested to be restored. The following messages | are displayed. | restore(ROOT): Skipping volume rpv on dska_01. | restore(ROOT): Skipping volume root2 on dska_02. | restore(ROOT): Starting from record 4356 of volume root3 on | c dska_03. | From this point on the program reverts back into a normal | operational mode. | 10. How To Recover From Unrecoverable Tape Errors. | During a save or restore there are times when errors occur | which require special handling. These are errors that are | either non-retryable or where the retry process failed. | When an unrecoverable error occurs a message will be | displayed that shows the error interpreted in english, with | detailed status in hex if required. The operator will be | queried as to the course of action that should taken. | Listed below is an example error output and the possible | responses and their meanings. | save(ROOT): Device Attention, Handler check on tapb_03. | detailed status: 20 8C 2B 6D 0A 01 16 00 00 16 48 87 24 BCE Save/Restore MTB-745 18 06 00 00 0C 00 00 08 08 80 00 00 00 | save: Action: | abort | This causes the program to abort the entire save/restore | and return to BCE command level. | retry, r | For errors that are retryable this will force the retry | process again. It is invalid for non-retryable errors. | skip, s | This is only valid for data alert errors detected while | doing a restore. The unreadable record is skipped and | the restore continues by attempting to read the next | record. | stop_set, stop | This will cause this SET to be aborted, but all other | SETs will continue. | restart_set, restart, rt | This allows the operator to restart this SET, using the | current tape device. The operator is then required to | mount the "restart" tape on the device and follow the | restart procedures. Once the SET has been restarted, the | remaining SETs will continue operation. | remove_device_from_set, remove | Works like the "restart_set" request above, but removes | the current tape device from the SET and sequences to the | next device before going through the restart process. | This is not a valid response if this is the only tape | device left in the SET. | help, ? | This displays the above possible responses. | MTB-745 BCE Save/Restore | A.3.5: APPENDIX-H | | ***** The appendix needs to be changed as follows: Alternate Procedures for Disk Volume Recovery Section 10 discusses different kinds of disk failures and how to recover from them. In its "Disk Volume Recovery Procedures" subsection, it recommends the use of volume reloading. This appendix describes a variation of volume | reloading: a BCE RESTORE operation followed by a volume reload operation. This procedure is almost never needed, and for that reason, its description has been placed in this appendix, rather than in Section 10. This appendix also discusses an alternate procedure for | complete disk volume recovery: a BCE RESTORE operation followed | by a hierarchy reload operation. While BCE RESTORE/hierarchy reloading is not generally recommended for reloading complete volumes, your site may decide to use this procedure if problems are encountered (e.g., many unreadable tapes) during the volume reloading procedures described in Section 10, or if your site does not use the Volume Backup facility. | Disk Volume Recovery via BCE RESTORE/Volume Reloading | Recovery via BCE RESTORE followed by volume reloading | involves replacing the damaged disk volume with a spare volume, | restoring the most recent BCE SAVE tapes for the damaged volume | using the BCE RESTORE command, and then reloading the | consolidated and incremental volume dumper tapes created after | the BCE SAVE operation was performed. The -save control argument | of the reload_volume command indicates that the | date-contents-modified field of each entry being reloaded should | be compared with the date-unmounted field of the volume label. | Since a volume must be unmounted before a BCE SAVE operation can | be performed, the date-unmounted value placed in the volume label | by the BCE RESTORE operation is a good indicator of the date on | which the BCE SAVE operation was performed. If the entry from | the volume backup tape is newer than the date-unmounted field | from the disk label, then the tape entry is reloaded. | Recovery of the RPV with BCE RESTORE/Volume Reloading | If a disk volume failure occurs for the RPV, the following | procedure can be used to recover the contents of the RPV from a | combination of BCE SAVE tapes and volume backup tapes. See Section 9 for general information and more details on volume backup and volume reloading. All of the commands used in this BCE Save/Restore MTB-745 procedure are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described in Section 10 under "Recovering From Disk Failures". If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the original RPV volume, or to recover its data onto a spare disk volume, you will need to boot BCE, and Multics on a | temporary RPV. This temporary RPV may be obtained in any of the following ways: o If your site has prepared a one- or two-volume "test system" for hardware and software checkout purposes, you can boot this test system for use in testing and reloading the original RPV. o You can restore the BCE SAVE tapes for the original RPV onto | a spare disk volume for use as the temporary RPV. The actual data on the temporary RPV is not important since it will not become part of the production hierarchy; an older set of SAVE tapes can be used, as long as the saved RPV is for the Multics release you are currently running. You will have to boot BCE on the temporary RPV, and specify | "cold" to the "Enter rpv data:" prompt to allow the | temporary RPV to be properly initialized. After restoring | the RPV, remember to update the root and part configuration | cards to describe only the temporary RPV. | Spare disk volumes should be properly formatted and tested as described in Section 10 under "Preformatted Disk Volumes." 3. Boot BCE on the temporary RPV, as described in the | Operators' Guide to Multics, Order No. GB61. 4. If your Customer Service Representative believes there has been no physical damage to the original RPV disk volume, attempt to read it using the BCE TEST_DISK command, as | described in Section 10 under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original RPV, follow the procedures described in Section 10 MTB-745 BCE Save/Restore under "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original RPV is only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures described in Section 10 under "Recovering from Partial Disk Volume Failure," and skip the rest of these steps. | The steps below attempt to reload RPV information from BCE SAVE and volume backup tapes onto a spare disk volume. These steps assume that the original RPV volume is totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that the original RPV is physically damaged (i.e., scratched or warped), then replace the RPV with a spare volume which has already been formatted and tested, as described in Section 10 under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original RPV. 7. Mount the disk volume to be reloaded on any available drive. | 8. Create a RESTORE control file that will identify the new | RPV, then use the BCE RESTORE command to load information | from the BCE SAVE tapes onto the new RPV. For example: | qx | a | td tapa_01 | td tapa_02 | ts ROOT | pv rpv dska_01 | part rpv dska_01 -all | f | w rpv_restore | q | restore rpv_restore | 9. Once the BCE SAVE tapes have been restored, boot Multics on the temporary RPV, coming up to Multics ring 1 command level, as described in the Operators' Guide to Multics, Order No. GB61. 10. Convert the disk drive on which the new RPV is mounted to an I/O drive, using the set_drive_usage command. For example: sdu dska_04 io 11. Recover the volume log for the RPV using the recover_volume_log command with the -wd control argument. For example: BCE Save/Restore MTB-745 recover_volume_log rpv -wd Mount the last volume backup tape for the volume backup group which includes the RPV. The volume name of the last tape should be recorded in the tape log, as described in Section 10 under "Backup Tape Logs." If volume backup operations were ongoing at the time of disk failure, you should mount the tape which was being written at the time of failure. 12. Reload the new RPV using the volume reloader, by issuing the reload_volume command with the -pvname, -operator, -save, and -wd control arguments. For example: reload_volume -pvname rpv -operator Jones -wd -save Mount tapes as requested by the reload_volume command. When all tapes have been reloaded, continue with the next step. 13. Shutdown Multics on the temporary RPV. | 14. If the RPV was reloaded onto a spare volume and the original RPV is partially readable, you may want to try to copy the contents of the CONF, FILE, DUMP and LOG partitions onto the | new RPV, as described in Section 10 under "Recovery of | Partitions after RLV Volume Recovery." | 15. If the newly reloaded RPV is not mounted on the proper disk drive for normal operation, move the new RPV to the proper disk drive. 16. Boot BCE on the newly reloaded RPV, according to normal site | procedures. If reloading was performed on a spare disk | volume rather than on the original RPV, then the contents of | the CONF, BCE, and FILE partitions have been lost. In BCE, | you will have to reload the config deck from a config file | read off the BCE tape, using the BCE "config <deckname>" command. Make adjustments to the configuration file as necessary, to reflect the current hardware configuration and disk volume locations. 17. Boot Multics according to normal site procedures. 18. Perform the procedures for salvaging, quota adjustment, and connection failure detection described in Section 10 under "Disk Volume Post-Recovery Procedures." This completes recovery of the RPV. Recovery of a NonRPV Root Volume with BCE RESTORE/Volume | Reloading | MTB-745 BCE Save/Restore If a disk volume failure occurs on a volume which is part of the Root Logical Volume (RLV), but is not the RPV, the following procedure can be used to recover the contents of that volume from | BCE SAVE tapes and volume backup tapes. See Section 9 for general information and more details on volume backup and volume reloading. All of the commands used in this procedure are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described in Section 10 under "Recovering From Disk Failures." If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the original root volume, or to recover its data onto a | spare disk volume, you will need to boot BCE, and Multics on the | RPV. | 3. Boot BCE on the RPV, as described in the Operators' Guide to Multics, Order No. GB61. 4. If your Customer Service Representative believes there has been no physical damage to the original root disk volume, | attempt to read it using the BCE TEST_DISK command, as described in Section 10 under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original root volume, follow the procedures described in Section 10 under "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original root volume is only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures described in Section 10 under "Recovering from Partial Disk Volume Failure," and skip the rest of these steps. The steps below attempt to reload root volume information from volume backup tapes onto a spare disk volume. These steps assume that the original root volume is totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that the original root volume is physically damaged (i.e., scratched or BCE Save/Restore MTB-745 warped), then replace it with a spare volume which has already been formatted and tested, as described in Section 10 under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original root volume. 7. Mount the disk volume to be reloaded on any available drive. 8. Create a RESTORE control file that will identify the | physical volume, then use the BCE RESTORE command to load | information from the BCE SAVE tapes onto the volume. For | example: | qx | a | td tapa_01 | td tapa_02 | ts ROOT | pv root2 dska_02 | f | w root2_restore | q | restore root2_restore | 9. Remove all disk volumes from the root config card, except * for the RPV. If any part config cards identify the damaged disk volume, remove those part cards from the config deck. 10. Boot Multics on the RPV, coming up to Multics ring 1 command level, as described in the Operators' Guide to Multics, Order No. GB61. 11. Convert the disk drive on which the new root volume is mounted to an I/O drive, using the set_drive_usage command. For example: sdu dska_05 io 12. Recover the volume log for the root volume using the recover_volume_log command with the -wd control argument. For example: recover_volume_log root2 -wd Mount the last volume backup tape for the volume backup group which includes the RLV. The volume name of the last tape should be recorded in the tape log, as described in Section 10 under "Backup Tape Logs." If volume backup operations were ongoing at the time of disk failure, you should mount the tape which was being written at the time of failure. MTB-745 BCE Save/Restore 13. Reload the new root volume using the volume reloader by issuing the reload_volume command with the -pvname, -operator, -wd and -save control arguments. For example: reload_volume -pvname root2 -operator Jones -wd -save Mount tapes as requested by the reload_volume command. When all tapes have been reloaded, continue with the next step. 14. Shutdown the Multics running on the RPV. 15. Restore the root and part config cards to their normal values, either by retyping the changed cards or by issuing the BCE "config <deckname>" command to load a new copy of the config deck from a BCE file. * 16. If the root volume was reloaded onto a spare volume and the original volume is partially readable, you may want to try | to copy the contents of the DUMP partition onto the new root | volume, if this partition was on the damaged root volume. Follow the procedure described in Section 10 under "Recovery of Partitions after RLV Volume Recovery." This can only be done if the location of partitions was not changed on the new root. 17. If the newly reloaded root volume is not mounted on the proper disk drive for normal operation, move the volume to the proper disk drive. | 18. Boot BCE on the RPV, according to normal site procedures. Make adjustments to the configuration file as necessary, to reflect the current hardware configuration and disk volume locations. 19. Boot Multics according to normal site procedures. 20. Perform the procedures for salvaging, quota adjustment, and connection failure detection described in Section 10 under "Disk Volume Post-Recovery Procedures." This completes recovery of the root volume. | Recovery of a NonRoot Volume with BCE RESTORE/Volume Reloading If a disk volume failure occurs on a volume which is not part of the Root Logical Volume (RLV), the following procedure | can be used to recover the contents of that volume from BCE SAVE | and volume backup tapes. See Section 9 for general information and more details on volume backup and volume reloading. All of the commands used in this procedure are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. BCE Save/Restore MTB-745 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described in Section 10 under "Recovering From Disk Failures." If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the original volume, or to recover its data onto a spare disk volume, you will need to boot BCE, and Multics on the RLV. | 3. Boot BCE, as described in the Operators' Guide to Multics, | Order No. GB61. 4. If your Customer Service Representative believes there has been no physical damage to the original disk volume, attempt to read it using the BCE TEST_DISK command, as described in | Section 10 under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original volume, follow the procedures described in Section 10 under "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original volume is only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures described in Section 10 under "Recovering from Partial Disk Volume Failure," and skip the rest of these steps. The steps below attempt to reload information from volume backup tapes onto a spare disk volume. These steps assume that the original volume is totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that the original volume is physically damaged (i.e., scratched or warped), then replace it with a spare volume which has already been formatted and tested, as described in Section 10 under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original disk volume. 7. Mount the disk volume to be reloaded on any available drive. 8. Create a RESTORE control file that will identify the | physical volume, then use the BCE RESTORE command to load | information from the BCE SAVE tapes onto the volume. For | example: | MTB-745 BCE Save/Restore | qx | a | td tapa_01 | td tapa_02 | ts Xpublic | pv xpub02 dska_06 | f | w xpub_restore | q | restore xpub_restore | 9. Boot Multics on the RLV, coming up to Multics ring 1 command level, as described in the Operators' Guide to Multics, Order No. GB61. 10. To complete the boot, delete the logical volume which contains the damaged physical volume, using the del_lv command. For example: del_lv Xpublic 11. Issue the standard command to move to ring 4: standard 12. If the system can run reasonably without the deleted logical volume, warn users (via a message_of_the_day, or with a login warning set by the word command) that the logical volume has been deleted for repair operations. For example: word login Xpublic volume is offline for repairs. If the system cannot run reasonably without the deleted logical volume, put the system into a special session, using the multics and go commands. This will prevent users from logging in: multics go 13. Convert the disk drive on which the new volume is mounted to an I/O drive, using the set_drive_usage command. For example: sdu dska_06 io 14. Login the volume reloader and issue a reload_volume command with the -operator, -pvname, and -save control arguments. For example: login Volume_Reloader.Daemon vrld r vrld reload_volume -pvname xpub02 -operator Jones -save BCE Save/Restore MTB-745 Mount tapes as the reloader asks for them; it will indicate when all necessary tapes have been reloaded. If the reloader indicates that the volume log is unavailable, recover the volume log for the volume using the recover_volume_log command. For example: r vrld recover_volume_log xpub02 Mount the last volume backup tape for the volume backup group which includes the failing volume. The volume name of the last tape should be recorded in the tape log, as described in Section 10 under "Backup Tape Logs." If volume backup operations were ongoing at the time of disk failure, you should mount the tape which was being written at the time of failure. After the volume log has been recovered, then reissue the reload_volume command, as shown above. 15. After volume reloading is complete, issue a set_drive_usage command to convert the drive back into storage system usage. For example: sdu dska_06 ss 16. Issue the add_vol command to inform the system of the new location for the reloaded disk volume. For example: add_vol xpub02 dska_06 17. Issue the add_lv command to add the logical volume containing the reloaded disk volume. For example: add_lv Xpublic 18. If the system is in special session, return it to normal session: word login maxu auto abs start abs maxu auto 19. Perform the procedures for salvaging, quota adjustment, and connection failure detection described in Section 10 under "Disk Volume Post-Recovery Procedures." This completes recovery of the volume. Disk Volume Recovery via BCE RESTORE/Hierarchy Reloading | The BCE RESTORE/hierarchy reloading strategy can be used to | reload a volume which is not part of the Root Logical Volume MTB-745 BCE Save/Restore (single volume reload), to reload the entire Root Logical Volume (RLV reload), or to reload the entire hierarchy (complete | reload). | BCE RESTORE/hierarchy reloading cannot be used to recover only a single root volume (either the RPV or an RLV volume). A complete or RLV reload must be performed to recover single RLV volumes. | The BCE RESTORE/hierarchy reload strategy involves replacing physically damaged volumes with spare disk volumes, initializing these volumes, and then reloading complete, consolidated and incremental dump tapes onto them in chronological order (the order in which they were written). Hierarchy Reload of RLV versus Reload of All Volumes The loss of a part of the Root Logical Volume (RLV) is always very serious. The recovery operation when reloading hierarchy dump tapes is more complex than when reloading volume dump tapes. When reloading hierarchy dump tapes, the entire RLV must be reloaded rather than just the damaged root volume. The need to reload the entire RLV stems from the way the hierarchy reloader works. If a directory being reloaded does not already exist, the hierarchy reloader uses the next available VTOCE to hold the directory, rather than placing the directory in the same VTOCE from which it was dumped. Because directories are being reloaded into different locations, superior directories can lose track of the new location, causing connection failures. The only method of avoiding such connection failures is to reload the entire RLV. Another factor adding to the complexity of single volume and RLV hierarchy reloads is the requirement of the hierarchy reloader that it operate on a consistent copy of the hierarchy. | After a BCE RESTORE of one or several volumes is complete, directory salvaging and physical volume connection failure detection operations must be performed to restore the consistency of the hierarchy before the hierarchy reload is performed. Directory salvage operations are needed to delete branches for | entries which were deleted after the BCE SAVE tapes were made. | Reverse connection failure detection is needed to recover VTOCEs | for segments which were deleted after the BCE SAVE tapes were made (either by adopting these segments or by garbage collecting their VTOCEs). The considerable amount of time required to perform these operations must be weighed against the simpler, but sometimes longer procedure of doing a complete reload of the entire system. | Recovery of All Volumes with BCE RESTORE/Hierarchy Reloading BCE Save/Restore MTB-745 If a disk volume failure occurs on several different disk volumes (either on volumes of the RLV or on nonroot volumes), the following procedure can be used to recover the contents of all volumes on the system from BCE SAVE and hierarchy backup tapes. | This procedure is often referred to as a "complete | RESTORE/reload" of the hierarchy. | Note that it is possible to recover just the volumes of the RLV, or just a single nonroot volume. Procedures for such recovery operations are described later in this appendix under | "Recovery of the Root Logical Volume with BCE RESTORE and | Hierarchy Reloading" and "Recovery of a Nonroot Volume with BCE | RESTORE Hierarchy Reloading". However, these recovery operations | are more complex than a complete RESTORE/reload operation, and | they may be more time-consuming as well. You should consider the | steps involved in each type of BCE RESTORE/hierarchy reloading | procedure carefully, and choose the best procedure for your | particular circumstances. | See Section 9 for general information and more details on hierarchy backup and hierarchy reloading. All of the commands used the procedure below are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described in Section 10 under "Recovering From Disk Failures". If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the damaged disk volumes, or to recover their data onto spare disk volumes, you will need to boot BCE, and Multics on an | RPV. The RPV to be used for testing can be obtained in any of | the following ways: o If the RPV of the production Multics system is not one of the damaged disk volumes, you can boot BCE on the original | RPV for testing and reloading the other disk volumes. o If your site has prepared a one- or two-volume "test system" for hardware and software checkout purposes, you can boot this test system for use in testing and reloading the original RPV. MTB-745 BCE Save/Restore | o You can restore the BCE SAVE tapes for your RPV onto a spare disk volume for use as the temporary RPV. The actual data on the temporary RPV is not important since it will not become part of the production hierarchy; an older set of SAVE tapes can be used, as long as the saved RPV is for the Multics release you are currently running. | You will have to boot BCE on the temporary RPV, and specify | "cold" to the "Enter rpv data:" prompt to allow the | temporary RPV to be properly initialized. After restoring | the RPV, remember to update the root and part configuration | cards to describe only the temporary RPV. | 3. Boot BCE on the chosen RPV, as described in the Operators' Guide to Multics, Order No. GB61. 4. If your Customer Service Representative believes there has been no physical damage to the original disk volumes, | attempt to read them using the BCE TEST_DISK command, as described in Section 10 under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original volumes, follow the procedures described in Section 10 under "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original volumes are only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures described in Section 10 under "Recovering from Partial Disk Volume Failure," and skip the rest of these steps. | The steps below attempt to reload information from BCE SAVE and hierarchy backup tapes onto spare disk volumes. These steps assume that the original volumes are totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that one or more of the original volumes are physically damaged (i.e., scratched or warped), then they must be replaced with spare volumes which have already been formatted and tested, as described in Section 10 under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original disk volumes. 7. Mount the disk volumes to be reloaded on any available drive. You can use the original disk drives if the Customer Service Representative says they are in good working condition. 8. If the original RPV was physically damaged, then you must | reboot BCE on the spare volume which will become the new RPV. The spare disk volume should be properly formatted and BCE Save/Restore MTB-745 tested as described in Section 10 under "Preformatted Disk | Volumes". You will have to boot BCE on the temporary RPV, | using an input of "cold" to the "Enter RPV Data:" query to | specify that the RPV is to be initialized. | Similarly, if you are running on a temporary RPV or on a test system and the original RPV is not physically damaged, then you must reboot BCE on the original RPV. | 9. If the RPV was reloaded onto a spare volume and the original RPV is partially readable, you may want to try to copy the contents of the CONF, FILE, DUMP and LOG partitions onto the | new RPV, as described in Section 10 under "Recovery of | Partitions after RLV Volume Recovery." 10. Now you need to either create RESTORE control files that | will define the volumes to restore, or use the control files | that were created for use when the BCE SAVE was done. You | can either restore one or multiple volume sets. For | example: | restore -set tape_devs_1 root_lv -set tape_devs_2 | c public_lv | 11. Once the BCE SAVE tapes have been restored, boot Multics on | the newly reloaded RPV, coming up to ring 1 command level, as described in the Operators' Guide to Multics, Order No. GB61. 12. Attach all logical volumes by typing: add_lv -all 13. Use the reload command to read, in forward chronological order, all hierarchy consolidated and incremental dump tapes made since the BCE SAVE tapes were created: | reload -nomap When all tapes have been reloaded, continue with the next step. 14. Boot Multics according to normal site procedures. 15. Perform the procedures for salvaging, quota adjustment, and connection failure detection described in Section 10 under "Disk Volume Post-Recovery Procedures." This completes recovery of the volume. Recovery of the Root Logical Volume with BCE RESTORE/Hierarchy | Reloading | MTB-745 BCE Save/Restore If a disk volume failure occurs on one or more disk volumes of the RLV, the following procedure can be used to recover the | contents of all volumes of the RLV from BCE SAVE and hierarchy | backup tapes. This procedure is often referred to as an "RLV | RESTORE/reload". It it sometimes better than a complete | RESTORE/reload because it can preserve later copies of nonroot segments than those appearing on the backup tapes. See Section 9 for general information and more details on hierarchy backup and hierarchy reloading. All of the commands used the procedure below are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described in Section 10 under "Recovering From Disk Failures." If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the damaged disk volumes, or to recover their data onto | spare disk volumes, you will need to boot BCE, and Multics on an | RPV. The RPV to be used for testing can be obtained in any of the following ways: o If the RPV of the production Multics system is not one of | the damaged disk volumes, you can boot BCE on the original RPV for testing and reloading the other disk volumes. o If your site has prepared a one- or two-volume "test system" for hardware and software checkout purposes, you can boot this test system for use in testing and reloading the original RPV. | o You can restore the BCE SAVE tapes for your RPV onto a spare disk volume for use as the temporary RPV. The actual data on the temporary RPV is not important since it will not become part of the production hierarchy; an older set of SAVE tapes can be used, as long as the saved RPV is for the Multics release you are currently running. | You will have to boot BCE on the temporary RPV, and specify | "cold" to the "Enter rpv data:" prompt to allow the | temporary RPV to be properly initialized. After restoring | the RPV, remember to update the root and part configuration | cards to describe only the temporary RPV. BCE Save/Restore MTB-745 3. Boot BCE on the chosen RPV, as described in the Operators' | Guide to Multics, Order No. GB61. 4. If your Customer Service Representative believes there has been no physical damage to the original disk volumes, attempt to read them using the BCE TEST_DISK command, as | described in Section 10 under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original volumes, follow the procedures described in Section 10 under "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original volumes are only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures described in Section 10 under "Recovering from Partial Disk Volume Failure," and skip the rest of these steps. The steps below attempt to reload information from BCE SAVE and | hierarchy backup tapes onto spare disk volumes. These steps assume that the original volumes are totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that one or more of the original volumes are physically damaged (i.e., scratched or warped), then they must be replaced with spare volumes which have already been formatted and tested, as described in Section 10 under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original disk volumes. 7. Mount the disk volumes to be reloaded on any available drive. You can use the original disk drives if the Customer Service Representative says they are in good working condition. 8. If the original RPV was physically damaged, then you must reboot BCE on the spare volume which will become the new | RPV. The spare disk volume should be properly formatted and tested as described in Section 10 under "Preformatted Disk | Volumes". You will have to boot BCE on the temporary RPV, | using an input of "cold" to the "Enter RPV Data:" query to | specify that the RPV is to be initialized. | Similarly, if you are running on a temporary RPV or on a test system and the original RPV is not physically damaged, then you must reboot BCE on the original RPV. | 9. If the RPV was reloaded onto a spare volume and the original RPV is partially readable, you may want to try to copy the contents of the CONF, FILE, DUMP and LOG partitions onto the | MTB-745 BCE Save/Restore | new RPV, as described in Section 10 under "Recovery of Partitions after RLV Volume Recovery." | 10. Now you need to either create RESTORE control files that | will define the volumes to restore, or use the control files | that were created for use when the BCE SAVE was done. For | example: | restore -set tape_devs_1 root_lv | 11. Once the BCE SAVE tapes have been restored, boot BCE and Multics on the newly reloaded RPV, coming up to ring 1 command level, as described in the Operators' Guide to Multics, Order No. GB61. 12. Attach all logical volumes by typing: add_lv -all 13. Salvage the Multics hierarchy by typing: salvage_dirs -check_vtoce -delete_connection_failure to delete directory branches for entries that were present | when the BCE SAVE was performed, but have since been deleted. 14. At this point, you must decide whether or not to try performing segment adoption (to create new directory | branches to preserve the VTOCEs and segment contents for | segments created since the BCE SAVE tapes were written). If | you're going to attempt segment adoption, you must do it | now, before copies of segments created since the BCE SAVE | get reloaded from the backup tapes. You must also decide whether there is enough space on nonroot volumes to receive copies of segments created since | the BCE SAVE tapes were written. If any nonroot logical volumes do not have sufficient space to hold new copies of all segments created since the SAVE, you will have to make space on these logical volumes. This can be done by "garbage collection:" looking for reverse connection failures (VTOCEs that have no directory branch), and deleting these VTOCEs. If you decide to perform either of these functions, continue with step 15. Otherwise, continue with step 19. 15. Issue the standard command to move to ring 4: standard BCE Save/Restore MTB-745 16. Enter admin mode, using the admin command. 17. Use the sweep_pv command as described in Section 12 under "Segment Adoption" and "How to Perform VTOC Garbage Collection on a Pack." 18. After performing either of these functions, you must leave admin mode, shutdown Multics (to BCE level), reboot Multics to ring 1 command level, and add all logical volumes: ame shut boot add_lv -all 19. Use the reload command to read, in forward chronological order, all hierarchy consolidated and incremental dump tapes made since the BCE SAVE tapes were created: | reload -nomap If reload error files get created, stop the reload process (at the end of a tape). Cross out to ring 4 and enter admin mode: standard admin Print the error files. If the errors are occurring because one or more logical volumes are full, you must perform VTOC garbage collection via sweep_pv, as described in Section 12. Then you must leave admin mode, shutdown Multics (to BCE level), reboot Multics to ring 1 command level, and add all logical volumes: ame shut boot add_lv -all Finally, you must start the reload process again with the first tape for which an error file was created. 20. When all tapes have been reloaded, shutdown Multics: shut 21. Boot Multics according to normal site procedures. 22. Perform the procedures for salvaging, quota adjustment, and connection failure detection described in Section 10 under MTB-745 BCE Save/Restore "Disk Volume Post-Recovery Procedures." This completes recovery of the volume. | Recovery of a NonRoot Volume with BCE RESTORE/Hierarchy | Reloading If a disk volume failure occurs on one or more nonroot disk volumes, the following procedure can be used to recover the | contents of the damaged volumes from BCE SAVE and hierarchy | backup tapes. This procedure is often referred to as a "single | volume RESTORE/reload". See Section 9 for general information and more details on hierarchy backup and hierarchy reloading. All of the commands used the procedure below are described in the Multics Administration, Maintenance and Operations Commands manual, Order No. GB64. 1. If the system has not already crashed, attempt to recover from the failure by following the procedures described in Section 10 under "Recovering From Disk Failures." If that corrects the problem, then skip the remaining steps. Otherwise, use the last procedure under "Recovering From Disk Failures" to shut down or crash the system. 2. Consult with your Customer Service Representative to correct any hardware failure that is occurring. Have him repair or replace any damaged hardware. To test the damaged disk volumes, or to recover their data onto | spare disk volumes, you will need to boot BCE, and Multics on an | RPV. The RPV to be used for testing can be the RPV of the production Multics system. | 3. Boot BCE, as described in the Operators' Guide to Multics, Order No. GB61. 4. If your Customer Service Representative believes there has been no physical damage to the original disk volumes, | attempt to read them using the BCE TEST_DISK command, as described in Section 10 under "Extent of Disk Volume Failure." 5. If only transient errors are encountered when reading the original volumes, follow the procedures described in Section 10 under "Recovering from Transient Disk Volume Failure," and skip the rest of these steps. 6. If the original volumes are only partially damaged and you decide that loss of the unreadable records is acceptable, follow the procedures described in Section 10 under BCE Save/Restore MTB-745 "Recovering from Partial Disk Volume Failures," and skip the rest of these steps. The steps below attempt to reload information from BCE SAVE and | hierarchy backup tapes onto spare disk volumes. These steps assume that the original volumes are totally unreadable, or that the amount of lost data caused by unreadable records is unacceptably high. If your Customer Service Representative believes that one or more of the original volumes are physically damaged (i.e., scratched or warped), then they must be replaced with spare volumes which have already been formatted and tested, as described in Section 10 under "Preformatted Disk Volumes." Otherwise, you can reload data onto the original disk volumes. 7. Mount the disk volumes to be reloaded on any available drive. You can use the original disk drives if the Customer Service Representative says they are in good working condition. 8. Now you need to either create RESTORE control files that | will define the volumes to restore, or use the control files | that were created for use when the BCE SAVE was done. You | can either restore one or multiple volume sets. For | example: | restore -set tape_devs_1 public_lv -set tape_devs_2 | c xpublic_lv | 9. Once the BCE SAVE tapes have been restored, boot Multics on | the newly reloaded RPV, coming up to ring 1 command level, as described in the Operators' Guide to Multics, Order No. GB61. 10. Attach all logical volumes by typing: add_lv -all 11. Issue the standard command to move to ring 4: standard 12. Enter admin mode, using the admin command. 13. Perform "garbage collection" on the volumes being reloaded, looking for reverse connection failures (VTOCEs that have no directory branch), and deleting these VTOCEs. Such segments have been moved or deleted since the BCE SAVE tapes were | written. Use the sweep_pv command as described in Section 12 under "How to Perform VTOC Garbage Collection on a Pack." MTB-745 BCE Save/Restore 14. Leave admin mode, shutdown Multics (to BCE level), reboot Multics to ring 1 command level, and add all logical volumes: ame shut boot add_lv -all 15. Use the reload command to read, in forward chronological order, all hierarchy consolidated and incremental dump tapes | made since the BCE SAVE tapes were created: reload -nomap -error_on Do not use the -pvname control argument. The reload command will only reload segments from the tape whose date-contents-modified is later than that of the existing segment on disk, or for which there is no existing disk segment. If reload error files get created, stop the reload process (at the end of a tape). Cross out to ring 4 and enter admin mode: standard admin Print the error files. If the errors are occurring because one or more logical volumes are full, you must perform VTOC garbage collection via sweep_pv, as described in Section 12. Then you must leave admin mode, shutdown Multics (to BCE level), reboot Multics to ring 1 command level, and add all logical volumes: ame shut boot add_lv -all Finally, you must start the reload process again with the first tape for which an error file was created. 16. When all tapes have been reloaded, shutdown Multics: shut 17. Boot Multics according to normal site procedures. 18. Perform the procedures for salvaging, quota adjustment, and connection failure detection described in Section 10 under BCE Save/Restore MTB-745 "Disk Volume Post-Recovery Procedures." This completes recovery of the volumes.