1
2 09/21/87 volume_backup
3 Known errors in the current release of volume_backup.
4 # Associated TR's
5 Description
6
7 94 phx20536
8 Problem Description:
9
10 The dumper bit map segment is not copied to disk after each dump cycle.
11
12 The only time the dumper bit map segment is copied to the disk label is
13 when the PV is taken off line. A problem arises when a system
14 experiences a crash where emergency shutdown procedures cannot be
15 performed. The ESD is responsible for taking each PV off line.
16 Because the packs are not updated, the dumper bit map may not represent
17 an accurate picture of what has been dumped from the pack. The result
18 being that after a crash, the PV can only be restored from the last
19 time that it was shutdown.
20
21
22 Proposed Solution:
23
24 To resolve this problem, the volume dumper process should be enhanced
25 to copy the memory image of each PV's bit map to the disk label after
26 the PV is dumped. To implement this change a call will be added to the
27 ring-0 code of the dumper - hc_dmpr_primitives to call a new entry in
28 the fsout_vol ring-0 subroutine. The fsout_vol subroutine is the entry
29 called at PV shutdown time copy the dbm_seg to the disk label. A new
30 entry point will be added to the fsout_vol subroutine that will allow
31 the dbm_seg to be copied to disk without freeing the memory image of
32 the dbm_seg.
33
34
35 93 phx20716
36 retrieve_from_volume verified
37 Said it couldn't recover branch but went ahead and did it
38 anyway. .
39
40 As time and resources permit this problem will be evaluated and better
41 defined at a future date.
42
43 92 phx20715
44 retrieve_from_volume verified
45 Makes multiple attempts to retrieve the same segment from the
46 same tape. .
47
48 As time and resources permit this problem will be evaluated and better
49 defined at a future date.
50
51
52 91 phx20691
53 volume_retrieve investigating
54 The volume_retrieve command does not properly retrieve segments
55 if missing parent directories need to be retrieved. .
56
57 As time and resources permit this problem will be evaluated and better
58 defined at a future date.
59
60 90 phx20000
61 retrievals investigating
62 Retrieve dies if certain i/o switches are not closed. .
63
64 As time and resources permit this problem will be evaluated and better
65 defined at a future date.
66
67
68 89 phx17264
69 Volume Reloader verified
70 Unable to use the "-input_volume_desc" control argument for ring
71 1 volume reloads. .
72
73 As time and resources permit this problem will be evaluated and better
74 defined at a future date.
75
76 88 phx20537
77 Volume Backup investigating
78 Volume retriever takes vtoc error under certain circumstances..
79
80 As time and resources permit this problem will be evaluated
81 and better defined at a future date.
82
83
84 87 phx20513
85 Volume Backup investigating
86 The RPV pvid field in the volog headers are not being set.
87
88 As time and resources permit this problem will be evaluated
89 and better defined at a future date.
90
91 86 phx20497
92 Volume Retriever investigating
93 If a user requests the retrieval of a file >A>B>C>x via the err
94 command such that x doesn't currently exist online and C is a
95 segment, the Volume Retriever will go to level 2 on the bad_dir_
96 condition..
97
98 As time and resources permit this problem will be evaluated
99 and better defined at a future date.
100
101
102 85 phx20459
103 Volume Reloader verified
104 reload_volume_ argument discrepancy.
105
106 As time and resources permit this problem will be evaluated
107 and better defined at a future date.
108
109 84 phx20315
110 Volume Retrieval investigating
111 directory retrieval failed using -subtree when the pathname was
112 one using an add_name for the directory. The add_name was what
113 was supplied by the user..
114
115 As time and resources permit this problem will be evaluated
116 and better defined at a future date.
117
118
119 83 phx20219
120 IO Daemons, Absentee Facility, Volume Retriever verified
121 It appears as though the user can pass invalid information onto
122 the daemons and absentee facility, causing either sub_err_
123 conditions to be raised or an output request's "Requested" time
124 to be set to an arbit
125
126 As time and resources permit this problem will be evaluated
127 and better defined at a future date.
128
129 80 phx19037
130 Volume Dumper investigating
131 lock wait time too short.
132
133 As time and resources permit this problem will be evaluated
134 and better defined at a future date.
135
136
137 79 phx18852
138 Volume Reloader investigating
139 consolidated dump map not rebuilt on reload.
140
141 As time and resources permit this problem will be evaluated
142 and better defined at a future date.
143
144 78 phx18763
145 Volume Dumper investigating
146 inconsistencies in volume data bases.
147
148 As time and resources permit this problem will be evaluated
149 and better defined at a future date.
150
151
152 76 phx18652
153 Volume Retriever error
154 volume_retriever Use of the -subtree argument in volume
155 retrievals generates proxy requests. A sufficiently large
156 hierarchy blows out the volume_retriever queues..
157
158 As time and resources permit this problem will be evaluated
159 and better defined at a future date.
160
161 75 phx18545
162 Volume Dumper verified
163 The entry dmpr_report_$error_output doesn't create very good
164 unique names for its error files..
165
166 As time and resources permit this problem will be evaluated
167 and better defined at a future date.
168
169
170 74 phx18518
171 volume dumper and manager_volume_pool_ error
172 The Volume Software nails your default volume segment..
173
174 As time and resources permit this problem will be evaluated
175 and better defined at a future date.
176
177 73 phx18462
178 Volume Retriever investigating
179 if volume retriever cannot recover an object after recovering
180 the branch, he fails to clean up after himself..
181
182 As time and resources permit this problem will be evaluated
183 and better defined at a future date.
184
185
186 72 phx18419
187 Volume Dumper error
188 A typographical error in telling the volume dumper what tape to
189 use results in the dump to be aborted.
190
191 As time and resources permit this problem will be evaluated
192 and better defined at a future date.
193
194 71 phx18417
195 Volume Dumper verified
196 hc_dmpr_primitives should provide all available info when
197 encountering an error so corrective action can be performed..
198
199 As time and resources permit this problem will be evaluated
200 and better defined at a future date.
201
202
203 70 phx18357
204 Volume Retriever investigating
205 In certain cases the volume retriever will attempt to search
206 tapes that it cannot possibly satisfy a request from..
207
208 As time and resources permit this problem will be evaluated
209 and better defined at a future date.
210
211 69 phx18049
212 Volume Backup Tools error
213 The -test option to purge_volume_log fails to unlock pvologs
214 under certain conditions..
215
216 As time and resources permit this problem will be evaluated
217 and better defined at a future date.
218
219
220 68 phx17936
221 Volume Dumper investigating
222 tape mount problems leave dead pvolog segments around..
223
224 As time and resources permit this problem will be evaluated
225 and better defined at a future date.
226
227 67 phx17735
228 Volume Retriever investigating
229 incorrect type in message for cross retrievals.
230
231 As time and resources permit this problem will be evaluated
232 and better defined at a future date.
233
234
235 66 phx17577
236 Volume Retriever investigating
237 Will not volume retrieve links..
238
239 As time and resources permit this problem will be evaluated
240 and better defined at a future date.
241
242 63 phx17165
243 Volume Retriever investigating
244 "Failed to append branch for ^a because object already there
245 with other name".
246
247 As time and resources permit this problem will be evaluated
248 and better defined at a future date.
249
250
251 62 phx17164
252 Volume Retriever investigating
253 "Failed to delete user queue entry; Message not found.".
254
255 As time and resources permit this problem will be evaluated
256 and better defined at a future date.
257
258 61 phx17011
259 Volume Dumper verified
260 retry strategy in dump_volume_ ineffectual.
261
262 As time and resources permit this problem will be evaluated
263 and better defined at a future date.
264
265
266 60 phx16964
267 Volume Retriever verified
268 There is no facility to Volume Retrieve a level-1 directory..
269
270 As time and resources permit this problem will be evaluated
271 and better defined at a future date.
272
273 55 phx16727
274 Volume Retriever verified
275 Partial completion of volume retrieval can result in disconnect
276 failures..
277
278 As time and resources permit this problem will be evaluated
279 and better defined at a future date.
280
281
282 51 phx15590
283 Volume Retriever limitation
284 Renaming a project generates retrieval problems..
285
286 As time and resources permit this problem will be evaluated
287 and better defined at a future date.
288
289 48 phx15579
290 Volume Retriever error Linked TRs: phx13481
291 Retrieving a mailbox should not require a ring_1 process..
292
293 As time and resources permit this problem will be evaluated
294 and better defined at a future date.
295
296
297 47 phx15458
298 Volume Retriever investigating
299 Should adjust bitcounts to character, NOT word boundaries..
300
301 As time and resources permit this problem will be evaluated
302 and better defined at a future date.
303
304 45 phx14444
305 Volume Retriever verified
306 Some error messages have about 20 trailing blanks..
307
308 As time and resources permit this problem will be evaluated
309 and better defined at a future date.
310
311
312 38 phx17732
313 A recovered volume log is not truncated or its bit count set properly.
314
315 37 phx16302
316 If the error message at line 151 of merge_volume_log.pl1 null entry in
317 both logs is emitted, it will be repeated forever since neither i nor
318 j are advancing within the loop.
319
320 Also if the two logs being merged contain duplicate entries, then both
321 entries are included in the merged log. Now this is not strictly
322 within the mandate of merge_volume_log which is supposed to merge a
323 newly created volume log with an older copy but there are instances
324 where this arises.
325
326
327 36 phx14574
328 The module dump_volume_ attempts to handle and recover from a couple of
329 error conditions during dumping -- segfault errors by retrying the
330 output up to ten times, and page_fault_error by skipping the current
331 object. The technique employed for both of these can result in serious
332 problems if the tape thus created is subsequently used for a reload.
333 The problems may potentially extend to many segments not directly
334 involved in the original dump errors.
335
336 When writing an object to tape, dump_volume_ calls dmpr_output_ to
337 write both the backup header and the actual data object. The length of
338 the data object is stored in the header in
339 backup_volume_record.rec2_len. If a segfault or page_fault_error
340 occurs during the output of the object, then the resulting tape_mult_
341 image for the object consists of a header which says that rec2_len
342 characters have been written to tape for the object and an arbitrary
343 number of characters less than rec2_len which have actually made it
344 to the tape.
345
346
347 When the reloader goes to read such a tape, the following occurs: it
348 reads the header record which indicates that the object consists of the
349 following rec2_len characters on tape. It issues a read for rec2_len
350 characters and gobbles up both the partial dump of the object PLUS the
351 header and contents of possible many other segments to satisfy the
352 rec2_len specification. When it attempts to read the next header on
353 the tape, it realizes that it is out of synchronization and attempts
354 recovery. By now, however, the original segment which sustained the
355 dump error contains bad pages consisting of whatever was on the tape to
356 a length of rec2_len characters including backup headers and pages of
357 other user's segments; segments which were supposedly dumped without
358 error after the original segment are not reloaded at all because they
359 have been absorbed into the original segment up to its quota of
360 rec2_len characters; and the segment encountered after rec2_len
361 characters is also lost because of the reloader skipping during
362 resynchronization.
363
364
365 30
366 The Volume Dumper must force write each database after modification.
367 This could cause some degradation in dumping, but esd-less crashes
368 would be less painful when one finds a database with null pages just
369 when one wishes to use it in a hurry.
370
371 28 phx15580 phx17205
372 This is a generalized placeholder for Volume Retriever performance
373 problems in regard to subtree retrievals. I am not convinced that
374 there is much that can be done, but just in case...
375
376 26 phx16930
377 error in the implementation of -stop_vtocx
378
379
380 24 phx16692
381 Companion entry to 626. This holds the bug in the Volume retriever
382 that mis-uses the authorization passed to hardcore to append a segment.
383
384 19 phx13032
385 retrieve_from_volume, the -list control argument causes all other
386 control arguments to be ignored silently. It also says nothing when
387 there are no pending retreievals.
388
389 11 phx13031
390 If the uid of an object cannot be determined, ie the parent dir does
391 not exist, the message "Online records indicate that X cannot be found
392 ..." This is a truly mis-leading error message.
393
394
395 10 phx13032
396 The error message from retv -list -any_other_arguments is too terse.
397
398 8 phx11848
399 A window exists between closing one dump volume and opening the next
400 volume log entry in which a restart will get an incorrect cycle_uid.
401 This will make reload groups inconsistent.