1 
  2 09/21/87  volume_backup
  3 Known errors in the current release of volume_backup.
  4 #         Associated TR's
  5 Description
  6 
  7 94  phx20536
  8 Problem Description:
  9 
 10 The dumper bit map segment is not copied to disk after each dump cycle.
 11 
 12 The only time the dumper bit map segment is copied to the disk label is
 13 when the PV is taken off line.  A problem arises when a system
 14 experiences a crash where emergency shutdown procedures cannot be
 15 performed.  The ESD is responsible for taking each PV off line.
 16 Because the packs are not updated, the dumper bit map may not represent
 17 an accurate picture of what has been dumped from the pack.  The result
 18 being that after a crash, the PV can only be restored from the last
 19 time that it was shutdown.
 20 
 21 
 22 Proposed Solution:
 23 
 24 To resolve this problem, the volume dumper process should be enhanced
 25 to copy the memory image of each PV's bit map to the disk label after
 26 the PV is dumped.  To implement this change a call will be added to the
 27 ring-0 code of the dumper - hc_dmpr_primitives to call a new entry in
 28 the fsout_vol ring-0 subroutine.  The fsout_vol subroutine is the entry
 29 called at PV shutdown time copy the dbm_seg to the disk label.  A new
 30 entry point will be added to the fsout_vol subroutine that will allow
 31 the dbm_seg to be copied to disk without freeing the memory image of
 32 the dbm_seg.
 33 
 34 
 35 93  phx20716
 36 retrieve_from_volume [verified]
 37          Said it couldn't recover branch but went ahead and did it
 38                anyway.  .
 39 
 40  As time and resources permit this problem will be evaluated and better
 41 defined at a future date.
 42 
 43 92  phx20715
 44 retrieve_from_volume [verified]
 45          Makes multiple attempts to retrieve the same segment from the
 46                same tape.  .
 47 
 48  As time and resources permit this problem will be evaluated and better
 49 defined at a future date.
 50 
 51 
 52 91  phx20691
 53 volume_retrieve [investigating]
 54          The volume_retrieve command does not properly retrieve segments
 55                if missing parent directories need to be retrieved.  .
 56 
 57  As time and resources permit this problem will be evaluated and better
 58 defined at a future date.
 59 
 60 90  phx20000
 61 retrievals [investigating]
 62          Retrieve dies if certain i/o switches are not closed.  .
 63 
 64 As time and resources permit this problem will be evaluated and better
 65 defined at a future date.
 66 
 67 
 68 89  phx17264
 69 Volume Reloader [verified]
 70          Unable to use the "-input_volume_desc" control argument for ring
 71                1 volume reloads.  .
 72 
 73 As time and resources permit this problem will be evaluated and better
 74 defined at a future date.
 75 
 76 88  phx20537
 77 Volume Backup [investigating]
 78           Volume retriever takes vtoc error under certain circumstances..
 79 
 80  As time and resources permit this problem will be evaluated
 81  and better defined at a future date.
 82 
 83 
 84 87  phx20513
 85 Volume Backup [investigating]
 86           The RPV pvid field in the volog headers are not being set.
 87 
 88  As time and resources permit this problem will be evaluated
 89  and better defined at a future date.
 90 
 91 86  phx20497
 92 Volume Retriever [investigating]
 93           If a user requests the retrieval of a file >A>B>C>x via the err
 94           command such that x doesn't currently exist online and C is a
 95           segment, the Volume Retriever will go to level 2 on the bad_dir_
 96           condition..
 97 
 98  As time and resources permit this problem will be evaluated
 99  and better defined at a future date.
100 
101 
102 85  phx20459
103 Volume Reloader [verified]
104           reload_volume_ argument discrepancy.
105 
106  As time and resources permit this problem will be evaluated
107  and better defined at a future date.
108 
109 84  phx20315
110 Volume Retrieval [investigating]
111           directory retrieval failed using -subtree when the pathname was
112           one using an add_name for the directory.  The add_name was what
113           was supplied by the user..
114 
115  As time and resources permit this problem will be evaluated
116  and better defined at a future date.
117 
118 
119 83  phx20219
120 IO Daemons, Absentee Facility, Volume Retriever [verified]
121           It appears as though the user can pass invalid information onto
122           the daemons and absentee facility, causing either sub_err_
123           conditions to be raised or an output request's "Requested" time
124           to be set to an arbit
125 
126  As time and resources permit this problem will be evaluated
127  and better defined at a future date.
128 
129 80  phx19037
130 Volume Dumper [investigating]
131           lock wait time too short.
132 
133  As time and resources permit this problem will be evaluated
134  and better defined at a future date.
135 
136 
137 79  phx18852
138 Volume Reloader [investigating]
139           consolidated dump map not rebuilt on reload.
140 
141  As time and resources permit this problem will be evaluated
142  and better defined at a future date.
143 
144 78  phx18763
145 Volume Dumper [investigating]
146           inconsistencies in volume data bases.
147 
148  As time and resources permit this problem will be evaluated
149  and better defined at a future date.
150 
151 
152 76  phx18652
153 Volume Retriever [error]
154           (volume_retriever) Use of the -subtree argument in volume
155           retrievals generates proxy requests.  A sufficiently large
156           hierarchy blows out the volume_retriever queues..
157 
158  As time and resources permit this problem will be evaluated
159  and better defined at a future date.
160 
161 75  phx18545
162 Volume Dumper [verified]
163           The entry dmpr_report_$error_output doesn't create very good
164           unique names for its error files..
165 
166  As time and resources permit this problem will be evaluated
167  and better defined at a future date.
168 
169 
170 74  phx18518
171 volume dumper and manager_volume_pool_ [error]
172           The Volume Software nails your default volume segment..
173 
174  As time and resources permit this problem will be evaluated
175  and better defined at a future date.
176 
177 73  phx18462
178 Volume Retriever [investigating]
179           if volume retriever cannot recover an object after recovering
180           the branch, he fails to clean up after himself..
181 
182  As time and resources permit this problem will be evaluated
183  and better defined at a future date.
184 
185 
186 72  phx18419
187 Volume Dumper [error]
188           A typographical error in telling the volume dumper what tape to
189           use results in the dump to be aborted.
190 
191  As time and resources permit this problem will be evaluated
192  and better defined at a future date.
193 
194 71  phx18417
195 Volume Dumper [verified]
196           hc_dmpr_primitives should provide all available info when
197           encountering an error so corrective action can be performed..
198 
199  As time and resources permit this problem will be evaluated
200  and better defined at a future date.
201 
202 
203 70  phx18357
204 Volume Retriever [investigating]
205           In certain cases the volume retriever will attempt to search
206           tapes that it cannot possibly satisfy a request from..
207 
208  As time and resources permit this problem will be evaluated
209  and better defined at a future date.
210 
211 69  phx18049
212 Volume Backup Tools [error]
213           The -test option to purge_volume_log fails to unlock pvologs
214           under certain conditions..
215 
216  As time and resources permit this problem will be evaluated
217  and better defined at a future date.
218 
219 
220 68  phx17936
221 Volume Dumper [investigating]
222           tape mount problems leave dead pvolog segments around..
223 
224  As time and resources permit this problem will be evaluated
225  and better defined at a future date.
226 
227 67  phx17735
228 Volume Retriever [investigating]
229           incorrect type in message for cross retrievals.
230 
231  As time and resources permit this problem will be evaluated
232  and better defined at a future date.
233 
234 
235 66  phx17577
236 Volume Retriever [investigating]
237           Will not volume retrieve links..
238 
239  As time and resources permit this problem will be evaluated
240  and better defined at a future date.
241 
242 63  phx17165
243 Volume Retriever [investigating]
244           "Failed to append branch for ^a because object already there
245            with other name".
246 
247  As time and resources permit this problem will be evaluated
248  and better defined at a future date.
249 
250 
251 62  phx17164
252 Volume Retriever [investigating]
253           "Failed to delete user queue entry;  Message not found.".
254 
255  As time and resources permit this problem will be evaluated
256  and better defined at a future date.
257 
258 61  phx17011
259 Volume Dumper [verified]
260           retry strategy in dump_volume_ ineffectual.
261 
262  As time and resources permit this problem will be evaluated
263  and better defined at a future date.
264 
265 
266 60  phx16964
267 Volume Retriever [verified]
268           There is no facility to Volume Retrieve a level-1 directory..
269 
270  As time and resources permit this problem will be evaluated
271  and better defined at a future date.
272 
273 55  phx16727
274 Volume Retriever [verified]
275           Partial completion of volume retrieval can result in disconnect
276           failures..
277 
278  As time and resources permit this problem will be evaluated
279  and better defined at a future date.
280 
281 
282 51  phx15590
283 Volume Retriever [limitation]
284           Renaming a project generates retrieval problems..
285 
286  As time and resources permit this problem will be evaluated
287  and better defined at a future date.
288 
289 48  phx15579
290 Volume Retriever [error] Linked TRs:  phx13481
291           Retrieving a mailbox should not require a ring_1 process..
292 
293  As time and resources permit this problem will be evaluated
294  and better defined at a future date.
295 
296 
297 47  phx15458
298 Volume Retriever [investigating]
299           Should adjust bitcounts to character, NOT word boundaries..
300 
301  As time and resources permit this problem will be evaluated
302  and better defined at a future date.
303 
304 45  phx14444
305 Volume Retriever [verified]
306           Some error messages have about 20 trailing blanks..
307 
308  As time and resources permit this problem will be evaluated
309  and better defined at a future date.
310 
311 
312 38  phx17732
313 A recovered volume log is not truncated or its bit count set properly.
314 
315 37  phx16302
316 If the error message at line 151 of merge_volume_log.pl1 (null entry in
317 both logs) is emitted, it will be repeated forever since neither i nor
318 j are advancing within the loop.
319 
320 Also if the two logs being merged contain duplicate entries, then both
321 entries are included in the merged log.  Now this is not strictly
322 within the mandate of merge_volume_log (which is supposed to merge a
323 newly created volume log with an older copy) but there are instances
324 where this arises.
325 
326 
327 36  phx14574
328 The module dump_volume_ attempts to handle and recover from a couple of
329 error conditions during dumping -- segfault errors by retrying the
330 output up to ten times, and page_fault_error by skipping the current
331 object.  The technique employed for both of these can result in serious
332 problems if the tape thus created is subsequently used for a reload.
333 The problems may potentially extend to many segments not directly
334 involved in the original dump errors.
335 
336 When writing an object to tape, dump_volume_ calls dmpr_output_ to
337 write both the backup header and the actual data object.  The length of
338 the data object is stored in the header in
339 backup_volume_record.rec2_len.  If a segfault or page_fault_error
340 occurs during the output of the object, then the resulting tape_mult_
341 image for the object consists of a header which says that rec2_len
342 characters have been written to tape for the object and an arbitrary
343 number of characters (less than rec2_len) which have actually made it
344 to the tape.
345 
346 
347 When the reloader goes to read such a tape, the following occurs:  it
348 reads the header record which indicates that the object consists of the
349 following rec2_len characters on tape.  It issues a read for rec2_len
350 characters and gobbles up both the partial dump of the object PLUS the
351 header and contents of possible many other segments to satisfy the
352 rec2_len specification.  When it attempts to read the next header on
353 the tape, it realizes that it is out of synchronization and attempts
354 recovery.  By now, however, the original segment which sustained the
355 dump error contains bad pages consisting of whatever was on the tape to
356 a length of rec2_len characters (including backup headers and pages of
357 other user's segments); segments which were supposedly dumped without
358 error after the original segment are not reloaded at all because they
359 have been absorbed into the original segment up to its quota of
360 rec2_len characters; and the segment encountered after rec2_len
361 characters is also lost because of the reloader skipping during
362 resynchronization.
363 
364 
365 30
366 The Volume Dumper must force write each database after modification.
367 This could cause some degradation in dumping, but esd-less crashes
368 would be less painful when one finds a database with null pages just
369 when one wishes to use it in a hurry.
370 
371 28  phx15580 phx17205
372 This is a generalized placeholder for Volume Retriever performance
373 problems in regard to subtree retrievals.  I am not convinced that
374 there is much that can be done, but just in case...
375 
376 26  phx16930
377 error in the implementation of -stop_vtocx
378 
379 
380 24  phx16692
381 Companion entry to 626.  This holds the bug in the Volume retriever
382 that mis-uses the authorization passed to hardcore to append a segment.
383 
384 19  phx13032
385 retrieve_from_volume, the -list control argument causes all other
386 control arguments to be ignored (silently).  It also says nothing when
387 there are no pending retreievals.
388 
389 11  phx13031
390 If the uid of an object cannot be determined, ie the parent dir does
391 not exist, the message "Online records indicate that X cannot be found
392 ..." This is a truly mis-leading error message.
393 
394 
395 10  phx13032
396 The error message from retv -list -any_other_arguments is too terse.
397 
398 8  phx11848
399 A window exists between closing one dump volume and opening the next
400 (volume log entry) in which a restart will get an incorrect cycle_uid.
401 This will make reload groups inconsistent.