1 
   2 09/21/87  hardcore
   3 Known errors in the current release of hardcore.
   4 #         Associated TR's
   5 Description
   6 
   7 922  phx20933
   8 Some hardcore module needs to know if the disk is operative.  This is
   9 done by calling disk_control$test_disk or dctl$test_disk.  The module
  10 loops until the IO is complete.  The problems come when the hardware is
  11 broken in such a way that the IO never completes.  Therefore the
  12 pvte.testing is never reset.  I gess that this is another place that
  13 the disk dim should give up.  Because it knows that the IO did not
  14 complete and that it is a test type IO.  One of the problems with
  15 makeing disk_control smarter is that more pages need to be wired in
  16 ring 0.
  17 
  18 921  phx20930
  19 During a BCE restore tape record sequence errors are occuring at the
  20 end of the tape.  Sometimes the sequence error shows an actual disk
  21 record missing and others appear to only show the tape record numbers
  22 in error with the disk record numbers still in sequence (no gap).
  23 
  24 920  phx20152
  25 vacate_pv is setting pvte.pc_vacating and pvte.vacating.  The use of
  26 pvte.vacating is to keep new segments from being created on this pv.
  27 pc_vacating will inhibit and new pages being created on this pv.  The
  28 contract of vacate_pv is only to keep new segments from beeing created.
  29 Therefore pvte.pc_vacating should not be set in vacate_pv.pl1.
  30 
  31 919  phx20908
  32 Another call in disk_queue to code in >udd>m>lib.  The fix will be to
  33 remove the -interpret support from the disk_queue command.  This should
  34 not present any grate problems because it could not be working at sites
  35 other than system M.
  36 
  37 918  phx20868
  38 The TR claimes a 17th level can exists in the hierachy and problemes
  39 exists if the pack is demounted when this segment is active?
  40 decativate_for_demount.pl1 line 261.
  41 
  42 917  phx20922
  43 disk_control will, on certain types of disk errors such as MPC data
  44 alerts, continuously retry the failing IO.  The main complant is for
  45 bootload_io type at BCE.  this includes such things as copy_disk and
  46 save and restore commands.  The reason for this is disk_control
  47 determines that this is a "bad_path" status, its job is to delete this
  48 channel and then another will be tried until all channels, save one
  49 have been deleted .  Then add them all back and just keep doing it over
  50 and over again.
  51 
  52 897  phx13424 phx17773 phx17819
  53 Problems with directory quota management/enforcement.
  54 
  55 895
  56 No automatic hierarchy salvage is occuring when "boot rpvs" or "boot
  57 rlvs" is done.
  58 
  59 894  phx20661
  60 Linkage error at bce early loading firmware in mpcs.
  61 
  62 891
  63 delete_ calls hcs_$get_segment_ptr_path to determine if a segment is
  64 known in the calling ring (it wants to call term_ only when known
  65 segments are being deleted).  The hcs_ gate target is
  66 initiate_$get_segment_ptr_path, which currently calls
  67 dc_find$obj_initiate to find the object's directory entry.  This can
  68 cause a superfluous GRANT audit message, since $get_segment_ptr_path
  69 only returns a pointer to the segment if it is already known (in any
  70 ring) to the process.  And it can cause a superfluous DENY audit
  71 message, since no operation is performed unless the segment is known.
  72 
  73 The fix involves creating a new entrypoint, dc_find$obj_initiate_priv,
  74 which bypasses access checks and auditing, and changing
  75 initiate_$get_segment_ptr_path to call this new entrypoint.
  76 
  77 The intent of the fix would be to never audit the operation of
  78 hcs_$get_segment_ptr_path.  This is true even if the caller asks about
  79 a segment known only in a ring other than the caller's ring.  Since the
  80 original audit message included the ring brackets of the segment, it
  81 documents the caller's access to the segment from all rings within
  82 those ring brackets.
  83 
  84 890  phx19527
  85 ioa_$ioa_stream prints garbage or blows up when no control string is
  86 given.
  87 
  88 887  phx19986
  89 The disk_control$test_drive entry does not wait for an interrupt for
  90 its I/O, but polls the status word.  For FIPS devices or those on a
  91 DAU, this will not work since the status words are not valid under the
  92 interrupt is sent.
  93 
  94 885
  95 The program install_ttt_ does no auditing.
  96 
  97 884
  98 The hcs_$truncate_file entry logs a DENIED message even though other
  99 entries log GRANTED, as the reason the call fails (this operation is
 100 not allowed for a directory) has nothing to do with access control.
 101 
 102 882
 103 It appears that hcs_$make_entry does not null its output argument when
 104 it returns an error code, although the documentation states that it
 105 does.  Since it doesn't modify the output argument at all in this case,
 106 this is not a security problem.
 107 
 108 881
 109 Several problems with hcs_$fs_move_file and hcs_$fs_move_seg.
 110 
 111 They return an error code if the caller has rw access to both the
 112 source and destination segments, but null access to the directory in
 113 which they are contained.  The audit messages show various GRANTED and
 114 DENIED fs_obj_prop_read's.  The reason is that the inner ring module
 115 attempts to get the status on the destination to find out its current
 116 length.  Unfortunately it uses an entry in status_ which returns more
 117 information (which requires S on the parent).
 118 
 119 Since the entries are considered obsolete, it's not worth fixing this
 120 silly restriction.
 121 
 122 Another, more serious problem with hcs_$fs_move_file is that if the
 123 user does not have RW access to the destination, error_table_$no_move
 124 is returned, but no DENIED is logged.  It audits GRANTED read of fs_obj
 125 prop, and GRANTED initiation of FS_obj.  This was in a case where the
 126 user's authorization was greater than the access class of the existing
 127 destination segment, so the process had R effective access to the
 128 segment and S effective access to the containing dir.  This bug should
 129 be fixed, but it requires a new entry into dc_find.
 130 
 131 880
 132 Many filesystem operations consist of a name lookup followed by an
 133 access check.  The way dc_find implements these, an operation which
 134 requires more than S access to a directory can fail (with
 135 error_table_$namedup or error_table_$seg_not_found) and generate no
 136 audit message, even though the caller has insufficient access to
 137 perform the operation.  This occurs when the eventual failure of the
 138 operation can be determined from the name lookup.
 139 
 140 879
 141 The hcs_$tty_get_name returns a channel name for a channel belonging to
 142 a process other than the caller.
 143 
 144 877
 145 None of the entries in the dm_hcs_ gate do any auditing.
 146 
 147 876
 148 Several file system attribute setting operations generate audit
 149 messages which say GRANTED even though the operation is later denied.
 150 This happens when M access is required to the parent and the process
 151 must be in the write bracket of the entry.  Worse, no DENIED audit
 152 message is ever generated.  The entries in question are:  set_$(copysw
 153 volume_dump_switches safety_sw_ptr safety_sw_path synchronized_sw
 154 max_length_ptr max_length_path entry_bound_ptr entry_bound_path)
 155 
 156 With the fixing of the bug described in entry 23, the entries
 157 set$(damaged_sw_path damaged_sw_ptr dnzp_sw_path dnzp_ptr) must be
 158 added to the list.
 159 
 160 875
 161 Upgraded directories created under dir privilege are left in a
 162 process's address space after dir privilege is turned off.  The
 163 suspected cause is that the pathname associative memory is not being
 164 flushed when dir privilege is turned off.
 165 
 166 This poses no security problems since only a person with privileges
 167 could have gotten into this position.
 168 
 169 874
 170 log_read_$position_time will not find any messages later then the
 171 latest message in the log at the time that the log was opened.
 172 log_read_$position_sequence has the equivalent problem.
 173 
 174 872
 175 There is an ambiguity in the definition of "security auditing" that is
 176 particularly apparent in the case of append.  The ambiguity is this:
 177 some system operations make both security-related and
 178 non-security-related checks.  Either check can fail.  If the security
 179 check passes, but the non- security check fails, it is unclear what the
 180 "correct" security audit message is:  Grant, or Deny?
 181 
 182 The ideal implementation would probably be to indicate the exact
 183 situation in the audit message:  that access would have been granted,
 184 but was not.
 185 
 186 The current implementation of append (and others) is to audit the
 187 access grant, but later abort the operation if the non-security check
 188 fails.  This is particularly confusing in the case where the requested
 189 multi-class max authorization is above the process authorization or in
 190 the case that the requested authorization is below the containing
 191 directory access class.  This is considered to be a non-security
 192 related failure (no attempt was made to access information or destroy
 193 it) but the error code, ai_restricted, appears security-related.
 194 Nonetheless, the audit is a GRANT.
 195 
 196 This behavior should be documented in MDD004 and in the MDD on
 197 Ring 0 Auditing and Logging.
 198 
 199 863  phx19695
 200 If Data Management has not yet been used during a bootload, and a fault
 201 while in ring-0 causes verify_lock to be invoked, a ring-0 loop will
 202 result because verify_lock attempts to reference dm_journal_seg without
 203 first checking the switch sst$dm_enabled to determine if data
 204 management has been enabled.
 205 
 206 861  phx19582
 207 The entry dc_find$dir_move_quota performs an superfluous and incorrect
 208 AIM check.  It is superfluous because the KST access modes will ensure
 209 there is no writedown path and it is incorrect because the call to
 210 aim_check_ attempts to compare the access class in the directory header
 211 with the access class in the entry for the directory -- both of these
 212 should always be equal.  The check may be safely removed.
 213 
 214 858  phx19491
 215 The alarm_clock_meters command is missing its addname, "acm".  The
 216 documention claims the addname exists.
 217 
 218 856  phx19472
 219 ioi_page_table$ptx_to_ptp may return an invalid pointer it the supplied
 220 ptx is invalid.  The verify_ptx internal suboutine causes a non-local
 221 return (via procedure quit) if the ptx is invalid, this will result in
 222 a return to the caller of ioi_page_table$ptx_to_ptp with an invalid
 223 return pointer.
 224 
 225 852  phx19433
 226 The check_vtoce dir salvager and the volume retriever can both produce
 227 segments whose security out of service switch is set on.  reset_soos,
 228 however, refuses to work on non-directory segments.
 229 
 230 851  phx19285
 231 sys_trouble.alm lacks message documentation for "Fault while in masked
 232 environment"
 233 
 234 850  phx16984
 235 Nothing in MDC will replace missing add-names in >lv.  This can cause
 236 various inconsistencies.,
 237 
 238 849  phx17979
 239 Disk MPC's get confused when individual drives generate many, many,
 240 errors, and begin to report errors for other drives.  This is reported
 241 here to cover the TR and to record it for future reference.
 242 
 243 841  phx19270
 244 Because page control will not decrement a quota through zero, this can
 245 invalidate the assumptions made by fix_quota_used with respect to the
 246 constancy of the quota error during operation.
 247 
 248 839  phx19254
 249 initiate_ does not distinguish calls from phcs_$initiate's gate target
 250 (ring0_init_$initiate) from calls to hcs_$initiate.  For the former,
 251 attempts to initiate a directory should return error_table_$moderr if
 252 user does not have proper ACL or AIM access to the directory.  For the
 253 latter, it should return the "traditional" error_table_$dirseg, since
 254 directories can never be initiated (via hcs_$initiate) from an outer
 255 ring.
 256 
 257 Fixing this may require a change to dc_find$obj_initiate and
 258 $obj_initiate_raw since these entrypoints currently map
 259 error_table_$moderr into error_table_$dirseg.  And the fix may require
 260 separating the entrypoint in initiate_ used by ring 0 modules (eg,
 261 ring0_init_) from that used by hcs_$initiate.
 262 
 263 836  phx19180
 264 vtoc buffer allocation and usage can too easily crash the system from
 265 lack of buffers.  A more graceful way to warn about pending doom
 266 appears in the TR, along with a suggestion for avoiding the problem at
 267 ast flush time.
 268 
 269 835  phx15923
 270 hc_ipc$send_wakeup should protest if a non-null info pointer is
 271 supplied for a fast channel.
 272 
 273 833  phx19071
 274 quota uses error_table_$invalid_qmax for any error.  It should be more
 275 informative.
 276 
 277 832  phx19073
 278 You can set maxe as high as max_maxe.  Unfortunately, this is too high
 279 (does not count max stopped stack_0's) and therefore crashes the system
 280 when the system runs out of stack_0's.
 281 
 282 831  phx19074
 283 The two calls to range in hc_tune for setting mine are out of order.
 284 As such, attempts to set mine above maxe produces the wrong error
 285 message.
 286 
 287 829  phx18779
 288 add_bit_offset_ (and the corresponding addbitno pl1 builtin) do not
 289 properly handle negative bit offsets.  Similarly, add_char_offset_ (and
 290 the corresponding addcharno pl1 builtin function) do not properly
 291 handle negative character offsets.
 292 
 293 The failure lies in the abd and a9bd instructions, which assume that
 294 only positive offsets will be used.  These instructions assume that
 295 negative offsets will be handled by negating the offset and using the
 296 sbd or s9bd instruction to subtract from the bit or character
 297 displacement.  The proper solution is to detect negative offsets,
 298 negate the offsets and use the sbd or s9bd instruction.
 299 
 300 828  phx15340
 301 terminate_proc should not truncate the ring 0 stack; it should leave it
 302 around for analysis.  terminate_proc needs clean up in general.
 303 
 304 827  phx18873
 305 Inner rings should not be allowed to set search rules or working dirs.
 306 
 307 825  phx15219
 308 Attempts to type start after a call to sub_err_ with the can't restart
 309 option causes an illegal return.
 310 
 311 822  phx18837
 312 make_msf_ copies the IACL from a dir onto the components of an MSF it
 313 creates.  If the IACL does not give the specified user w access to
 314 these components, then copy/move will fail to be able to copy/move the
 315 MSF into the directory.
 316 
 317 815  phx18756
 318 Having any AIM privilege on makes RCP think that you are a system_high
 319 process.
 320 
 321 810  phx18607
 322 If a SCU's size (as correctly described by its config card) is less
 323 than the port switches on the CPUs (i.e., it is 3M whereas the CPU says
 324 4M, as it must), running ISOLTS (memory tests) in this case can crash
 325 the system with a store fault.
 326 
 327 809  phx18517
 328 The system has been known to crash in ioi_masked while processing a
 329 channel time-out.
 330 
 331 806  phx18566
 332 Typos in fim, et al, misinterpret the hregs bits associated with parity
 333 faults.
 334 
 335 805  phx18565
 336 The history registers for a parity fault that crashes the system do not
 337 appear in the pds.  See the TR for details.
 338 
 339 798  phx18352
 340 sct_manager_$get is supposed to return a null pointer for non-set sct
 341 values.  However, it checks for the sct entry being null after
 342 converting the null value (a zero packed pointer) into a unpacked
 343 pointer.  This unpacked pointer is not all zero so sct_manager_'s zero
 344 check fails.  The fix is to check for zero before the pointer
 345 assignment.
 346 
 347 783  phx09958
 348 The default potential attributes for a resource in the RTDT can be
 349 mistreated when the RTDT is installed.  The symptoms are that the
 350 attributes are shifted in the attributes word, causing all attempts to
 351 access the resource to fail.
 352 
 353 775  phx17026
 354 The limit and process_limit fields in the rtdt are ignored.  (Actually,
 355 only the values for the fields in the default_rtdt on the MST are
 356 used.)
 357 
 358 765  phx18243
 359 The ring zero derail fault mechanism needs improvement.  In particular,
 360 it should save as much information as other faults (fault_time
 361 especially) so that azm displays this fault in proper order with the
 362 others.
 363 
 364 760  phx18185
 365 Calling hcs_$grow_lot makes your lot of max size.  Calling it again
 366 causes a FPE even if you have more room left in the lot.
 367 
 368 754  phx17875
 369 It has been experienced, on single physical volume logical volumes,
 370 that, when the volume becomes full (and a user encounteres the logical
 371 volume full error), that deleting segments from the volume does not
 372 seem to reset the logical volume full condition for some number of
 373 minutes afterwards.  This is not well understood.
 374 
 375 751  phx17482
 376 msf_manager_ does not understand multiclass msfs.  For such an msf,
 377 msf_manager_ will add new components at the aim level of the dir that
 378 is the msf, rather than at the aim class of the components of the msf.
 379 
 380 749  phx17981
 381 ips signals are not correctly masked in mrd_util_.  As a result, it is
 382 possible to hit QUIT or have other conditions which can cause
 383 operations to fail, killing off the daemon in question.  A fix is
 384 known.
 385 
 386 744  phx17943 phx18054
 387 status_ won't allow the allocated return structures to be in a
 388 different segment than the segment supplied as the return area (that
 389 is, it doesn't allocate into extensible areas).
 390 
 391 742  phx17838
 392 The volume salvager should report page and vtoce bit map
 393 inconsistencies.
 394 
 395 735  phx17815
 396 set_mdir_quota correctly sets the quota in the vtoce, but incorrectly
 397 sets the value in the aste, when inferior dirs have terminal quotas.
 398 
 399 733  phx17690
 400 If an error is indicated when an i/o completion of a volmap page is
 401 posted, volmap_page does not strip the state away from the page number
 402 producing a bogus error message.
 403 
 404 732  phx15640
 405 Hardcore sets damage switches for directories and there is no way for
 406 users to turn them off.  The Salvager should be changed to salvage
 407 directories that have the damage switch set and turn it off once
 408 salvaging is complete.
 409 
 410 731  phx17688
 411 Hardcore should validate pds$stacks (validation_level) before using it.
 412 
 413 730  phx17662
 414 A second call to delmain to delete a frame previosuly deleted will
 415 cause the calling process to hang on a bogus page wait event.
 416 
 417 723  phx17551
 418 More errors in hdx (not copying args, not terminating segments
 419 correctly).
 420 
 421 722  phx17553
 422 More errors in mdx (not copying parameters, not terminating
 423 disk_table_).
 424 
 425 721  phx17615
 426 init_disk_pack_ (actually, calling countervalidate_label_) produces an
 427 error message not documented within init_disk_pack_.
 428 
 429 720  phx17614
 430 init_disk_pack_ references an unreferenced variable when looking for
 431 the undocumented copy option.
 432 
 433 718  phx17552
 434 mdc_status_ does not properly copy all of its args.  For that matter,
 435 it doesn't even compile.
 436 
 437 717  phx17597
 438 io_syserr_msg is declared to be three words long, but is overlaid with
 439 a structure which is five words long.
 440 
 441 712  phx17186
 442 You will die if another process deletes your working dir.
 443 
 444 711  phx16992
 445 A page error uses mc.errcode to encode the relevant information.
 446 Unfortunately, system_startup_ cannot decipher this and crashes the
 447 system (which would have happened probably anyway).
 448 
 449 708  phx17416
 450 hcs_$status_mins does not work on the root.
 451 
 452 707  phx17413
 453 act_proc uses the wrong value when determining maximum possible access
 454 class.
 455 
 456 705  phx17394
 457 A timeout from resetting a channel from a timeout will cause a fault
 458 while in masked environment, crashing the system.
 459 
 460 704  phx17374
 461 hcs_$quota_read returns "Some directory in path..." instead of "Entry
 462 not found" when the target does not exist but its parent does.
 463 
 464 701  phx17302
 465 hcs_$fs_get_brackets will not return the ring brackets of an inner ring
 466 object.
 467 
 468 700  phx17259
 469 attach_lv references the non-existant error_table_$notacted.
 470 
 471 699  phx17257
 472 scavange_vol refers to the non-existant error_table_$no_arg.
 473 
 474 698  phx17219
 475 disk_rebuild examines too many bits in a vtoce file map when examining
 476 it to see if it is free, when performing volmap compression.  This
 477 sometimes causes the compression to fail.
 478 
 479 696  phx17141
 480 The aste/vtoce.dtm fields are examined to set the dbm_map bits used by
 481 the volume dumper when dumping objects.  For directories, these fields
 482 lead to an incorrect interpretation as to whether a directory has been
 483 modified, leading to extraneous directory dumping.
 484 
 485 694  phx17132
 486 The volume retriever does not collect enough AIM related information.
 487 To process a retrieval request, it needs to store, in ring 1, the user
 488 auth, and max auth.  Now it only stores the auth, which is
 489 automatically stored by message_segment_.
 490 
 491 The volume retriever needs its own gate to ring 1 which will store the
 492 ring, auth, and max auth securely in the message.
 493 
 494 693  phx17132
 495 append$retv_append cannot possibly append a multi-class object, since
 496 it only has two of the three quantities
 497 
 498     user auth
 499     user max auth
 500     desired object max acc
 501 
 502 THe structure passed to it needs to be changed.
 503 
 504 692  phx17141
 505 The volume dumper examines the wrong field when determining if it
 506 should dump a directory, thus dumping unneeded directories.
 507 
 508 691  phx16992
 509 A page_fault_error occuring at the Initializer's ring-1 command level
 510 causes a crash, but the attempt to produce the crash message itself
 511 produces a crash because the ring-1 condition handler cannot interpret
 512 the mc.errcode value.
 513 
 514 690  phx15255
 515 The SCU can return the same value for the clock twice.  Some software
 516 uniquification isa needed.
 517 
 518 689  phx14716
 519 When the directory salvager determines that the sons LVID in a
 520 directory header is different from the value in the branch for the
 521 directory, it mindlessly copies the value from the branch into the
 522 directory header.  This has the effect that if the value is wrong in
 523 the branch, it will be wrong everywhere afterwards.
 524 
 525 At least, the salvager should check the value to see whether it's zero
 526 (and obviously invalid) before propagating it.
 527 
 528 This is a genuine problem, and not already on the hardcore error list.
 529 The particular problem that provoked this report has been fixed
 530 elsewhere, and is no longer relevant, but the general problem remains.
 531 
 532 685  phx17055
 533 Various modules, in particular sys_trouble, are missing some error
 534 message documentation.
 535 
 536 684  phx15585
 537 A situation (not understood) exists in which the records used exceeds
 538 the current length, preventing further access to the segment.
 539 
 540 682  phx15752
 541 core flushing (for pleasure from the as) should not flush pdir segs.
 542 Also, thew scheduling of the core flush is not at precise times.
 543 
 544 681  phx15833
 545 reclassify_seg should avoid the work if what it is reclassifying is
 546 already at the level it needs to be.
 547 
 548 679  phx15852
 549 Both illegal_procedure.pl1 and the documentation suggest that illegal
 550 op_code, illegal addr/modifier and other illegal procedure faults
 551 should be audited.  This third group, however, is not.
 552 
 553 678  phx15172
 554 syserr_real should check its error code parameter for non-zero-ness
 555 when producing the message text.
 556 
 557 676  phx14420
 558 The ascii_to_ebcdic_ and ebcdic_to_ascii_ tables and routines should
 559 handle the 256 character ebcdic set and map it onto some extended ascii
 560 set.
 561 
 562 664  phx17116
 563 The vtoce_checksum implementation is hamstrung by two problems:
 564 
 565   1) the "checksum_valid" flag is quite likely to be turned off by
 566 damage, causing the checksum to be recalculated for invalid data.
 567 
 568   2) part 3 has no checksum, and disk damage quite frequently fries it.
 569 
 570 663  phx17010
 571 Hot buffers can fill up vtoc_buffer_seg, crashing the system.  The
 572 retry fix for 662 reduces the problem, but not all if it, since an
 573 authentically broken disk can fill up the segment.
 574 
 575 657  phx17050
 576 No gullibility checking, checksumming, or other protection against
 577 damage exists for
 578 
 579     Record 6 -- the vtoc map
 580     Record 0 -- the label (except for "Multics Storage System Volume")
 581 
 582 Damage to these areas can cause widespread disaster, due to confusion
 583 as to the location of the paging region!
 584 
 585 We need:
 586 
 587    1) Sentinels on all records of the label
 588    2) Checksums on all records of the label
 589    3) A (or multiple) safe-store records that store only permanent
 590 information for recovery from damage to one of the records (like the
 591 vtoc map) that contain both permanent and dynamic information.
 592 
 593 656  phx17052
 594 Detaching a device with I/O in progress can cause a fualt while in
 595 wired environment due to an uninitialized pointer in the reset_device
 596 entry of the program ioi_masked.
 597 
 598 654  phx16046
 599 No re-verification of the label of an offline disk is made when it
 600 comes back online.  As a result, mistakes with patch plusgs are
 601 extremely dangerous.  disk_control should not declare a disk back to
 602 life unless the label checks out in some simple fashion.
 603 
 604 652  phx16979
 605 The ring 0 portion of the three-ring circus (volume management) is not
 606 protected by a cleanup handler, and can leave pvtes in an inconsistent
 607 state.
 608 
 609 647  phx16929
 610 See the TR for a complete exposition of this.  When all 4K aste's have
 611 a page in memory get_aste behaves very badly (very slowly).
 612 
 613 643  phx16592
 614 master directory acs checking should use raw access.  Otherwise, it is
 615 impossible to get e access to work right in both ring 4 and ring 1.
 616 
 617 642  phx16743
 618 disk_pack.incl.pl1 has the wrong include file listed as the home of the
 619 dumper bit map.
 620 
 621 634  phx16905
 622 boundsfault.pl1 does not recognize the case where the bound is less
 623 than the msl but still within the page table size.  This breaks setting
 624 the max length within the page table but larger for active segments,
 625 since the 10.2 performance optimization for set_max_length took out the
 626 setfaults in this case.
 627 
 628 628  phx14990
 629 Volume backup to a IO disk does not work with the current
 630 implementation of rdisk_ stream IO.  The current version has no
 631 buffering ability and no sense of logical End of Space (ala EOT on
 632 tape) and physical End of Space on the pack, which is needed to allow
 633 flushing of IO when this(EOT) is detected.
 634 
 635 626  phx16692
 636 append$retv_append has a bug wherein it misuses the "max_authorization"
 637 field of the structure.  It should just consider that the max to put in
 638 the multi-class segment max.
 639 
 640 There is a companion bug in the retriever (volume) that fills in the
 641 structure wrong to begin with.  The field has to be filled in with the
 642 authorization out of the message segment for the retrieval request.
 643 
 644 625  phx16548
 645 When you try to terminate a segment with more than about 250 ref names,
 646 the call aborts with the message "The RNT is in an inconsistent state."
 647 
 648 623  phx02779
 649 Because of a problem with accepting a zero buffer size, it has been
 650 found that a returned hardware status that contains channel or central
 651 fault status is being overlooked and assumed to be good.
 652 
 653 614  phx16489
 654 ring0_get_ miscdeclares code parameters as fixed bin.
 655 
 656 613  phx16351
 657 set_bc should not let you set a negative bit count.  (set or change).
 658 
 659 611  phx16506
 660 append only checks mountedp when segments are appended, not dirs or
 661 links. While this may be convienient, it is inconsistent. The marginal
 662 utility of creating dirs and links on unmounted LV's is outweighed by:
 663 
 664   1) the inconsistency: some operations work, some don't.
 665   2) for private LV's: the desire to have NOTHING happen to the LV when
 666      unmounted. Even if your access to attach a private logical volume
 667     has been taken away, you can still append links and dirs.
 668   3) If we ever move dirs onto the LV that they describe, this will
 669      clearly have to have the restriction.
 670   4) LV aim restrictions cannot be enforced if the LV is not munted.
 671 
 672 605  phx16501
 673 check_mdcs does not salvage quota inconsistencies between master
 674 directories and their registration in the mdcs.  Only register_mdir
 675 does this.  This requires the administrator to run register_mdir over
 676 each mdir on a logical volume to be sure that everything is consistent.
 677 
 678 Also, check_mdcs does not validate that a master directory actually has
 679 the correct sons logical volume.
 680 
 681 604  phx16500
 682 Master directory control allows up to fixed bin (35) worth of quota for
 683 an entire logical volume, but many fields are only declared fixed bin;
 684 This creates periodic disasters in the control segments.
 685 
 686 603  phx16499
 687 Master directory control was not updated when quota was increased to 18
 688 bits.  This can cause a wide variety of misbehaviors.
 689 
 690 593  phx16015
 691 The file system should log or meter invalid quota changes (attempts to
 692 decrement used below 0).
 693 
 694 592  phx16093
 695 quota_received is not supported very nicely.  The TR complains that it
 696 is not reported by any existing hcs_ entry.  There are other problems,
 697 such as failure of salvagers to correct it, a way to forcibly set it.
 698 
 699 587  phx15298 phx16005
 700 peruse_crossref bugs:  does not detect LV not mounted; does not
 701 initialize brief_sw; does not print satisfactory message when module is
 702 not referenced.
 703 
 704 583  phx15258 phx15275
 705 Invalid iacl terms cause append to fail.  asd_ allows acl terms that
 706 are invalid, like R..*, to be added to an initial acl.  append fails
 707 trying to copy then the assumption that the
 708 entire RVL will be mounted, else you will be doing 1pack recovery (a
 709 risky assumption).
 710 
 711 This is a limitation rather than a suggestion since we really aught to
 712 have such a mechanism.
 713 
 714 581  phx15044
 715 fim should not save history registers that have just been freshly
 716 cleared by fim_util.
 717 
 718 572  phx14942
 719 act_proc$create fails to return the empty APT entry in almost all error
 720 cases.
 721 
 722 569  phx14225
 723 Incorrect warning message from scas_init.
 724 
 725 568  phx14877
 726 It is impossible to run hc_pf_meters without phcs_ access; metering
 727 gate access should be sufficient.
 728 
 729 566  phx14824
 730 sweep_pv (segment_mover actually) cannot move rpv-only segments.  This
 731 makes it difficult or impossible to compress the RPV VTOC.
 732 
 733 565  phx14875
 734 When the operator does an x deny (using RCPRM at site) the process
 735 still thinks it has the drive.
 736 
 737 561  phx14705
 738 The accept_fs_disk check for partitions overlapping gets confused by
 739 HIGH hardcore partitions.
 740 
 741 557  phx14657
 742 ebcidic_to_ascii_ and ascii_to_ebcidic_ should be in the same bound
 743 segment, and not bound in with anything that uses them.  This will
 744 allow prople to replace them when reading tapes with nonstandard (or,
 745 nonMultics) EBCDIC encoding.
 746 
 747 529  phx10098
 748 save_dir_info fails if any of the entries in the dir are connection
 749 failures.
 750 
 751 527  phx08068
 752 Strange things are done with the IC for certain faults in the FIM.
 753 Perhaps they should be improved.  In particular, the IC reported in the
 754 machine conditions for dfmp taking underflows is unexpected.
 755 
 756 523  phx05319
 757 ioa_ ^( and ^) execute at least once, instead of zero times, when fed
 758 zero things to iterate over.
 759 
 760 520  phx14440
 761 page_error displays an erroneous disk address in the error message for
 762 an I/O error on the volume map.  The fix is to ANA -1,dl before saving
 763 the Areg, which contains the disk address in the lower.
 764 
 765 518  phx14405
 766 print_configuration_deck does not display negative numbers correctly.
 767 It prints them as very large positive numbers.  This is not currently a
 768 problem, since the BOS command parser does not understand negative
 769 numbers completely (and marks them as octal in the config deck).  It
 770 will be a problem when BOS is fixed or superceded.
 771 
 772 516  phx14381
 773 copy_out will fail is requested to copy a segment whose length is
 774 larger than 255K.  In this case, it should attempt to set the max
 775 length to 256K via phcs_ (or hcs_$something, when this operation
 776 becomes non-privileged).
 777 
 778 514  phx14387
 779 rebuild_disk for the RPV may not copy the root directory correctly.
 780 Specifically, modified pages in memory will not be copied - instead,
 781 the earlier instances on disk will be copied instead.  This may cause a
 782 crash during the subsequent initialization until the root in salvaged
 783 (due to bad_dir_).  The problem is that disk_rebuild (the ring-0 module
 784 which does the rebuild) does not call pc$cleanup for entry-held
 785 segments (indeed, it should not do so in general).  The root directory
 786 is entry-held, and so it goes.
 787 
 788 513  phx14276
 789 If a trouble fault occurs at a point where it is not caught by
 790 fim_util$check_fault, the history registers from the trouble fault will
 791 be over-written by those from the subsequent sys_trouble connect.  This
 792 destroys potentially useful diagnostic data.
 793 
 794 501  phx14181
 795 There is a window in ring-0 ITT message processing.  If a fault occurs
 796 in that window, ITT entries are lost for the bootload.  Further, they
 797 are lost in a way which disables the logic in pxss which prevents ITT
 798 overflow.  The likely result is a crash in pxss when the system runs
 799 out of ITT entries.
 800 
 801 498  phx05686
 802 The time-record product maintained for a directory with a terminal
 803 quota account is only an approximation to an ideal space-time integral
 804 of disk usage.  This approximation is reasonably accurate for accounts
 805 which have stable usage, but it has several anomalies for more volatile
 806 accounts.  The problem is that the cumulative time-record product is
 807 updated only when the directory VTOCE is updated (it is incremented by
 808 the product of the instantaneous quota used and the delta-time since
 809 the last update).  If, for example, a large amount of space was used
 810 and returned in the interval between updates, there is no accounting
 811 for that space.  A visible anomaly results from a further approximation
 812 when get_quota is invoked.  At this time, the time-record product is
 813 reported as the value it would have if the VTOCE were being updated at
 814 that time (although it is not).  For the reasons cited, this can cause
 815 time-record product to decrease with time.  The only reasonable
 816 solution is to maintain time-record product continuously.  This would
 817 not be expensive computationally, but it would require significantly
 818 more wired storage per active segment.
 819 
 820 497  phx14069
 821 Most store faults should be recorded into the Syserr Log, as they are
 822 usually indicative of faulty hardware [sic.].  hardware_fault should
 823 filter out store faults in BAR mode, however, as they are caused by
 824 program error.
 825 
 826 490  phx13931
 827 Values for select_switch parameters to hcs_$star_XXX entries in
 828 star_structures.incl.pl1 are declared as fixed bin (2) (e.g.,
 829 star_LINKS_ONLY).  They should be fixed bin (3).
 830 
 831 487  phx13896
 832 It should be possible to change the size of the AST pools while the
 833 system is running (well, it should be possible to increase them,
 834 anyway).  If the SST is expanded to multiple segments, this could be
 835 done with moderately more work.
 836 
 837 486  phx13897 phx14320
 838 A volume which is inoperative cannot be demounted.  There should be a
 839 way to do this, such as abandoning everything associated with the
 840 volume which is in memory (VTOC buffers, ASTEs, pages, etc.)  and
 841 marking it as demounted.  Also, disk I/O error processing should be
 842 smarter about detecting inoperative devices, particularly devices which
 843 appear operative but cannot do I/O without errors.
 844 
 845 Note that this is the one case where it is safe to abandon VTOCE
 846 buffers, since nobody will do an await_vtoce afterwards and lose (if
 847 demounting does things in the proper order).  If there are I/O errors
 848 and the volume remains mounted, it is never safe to abandon VTOCE
 849 buffers.
 850 
 851 468  phx13716
 852 The various tables used in disk volume management (ring-0, ring-1, and
 853 ring-4) can become inconsistent.  Several instances of this problem
 854 have been corrected.  One which has not shows itself after an "alv"
 855 followed by an "av -all".  The ring-4 copy of the disk table is not
 856 updated after the second command, preventing pdir_volume_manager_ from
 857 knowing that the logical volume is mounted (and hence eligible for
 858 pdirs).
 859 
 860 460  phx13544
 861 master directory control can become confused if a master directory has a
 862 subordinate directory with quota.  A set_mdir_quota {plus or minus} X
 863 will cause the page control quota of the master directory to be the same
 864 as the master directory quota.
 865 
 866 448  phx12864
 867 KST overflow has strange effects, not readily traceable to this problem.
 868 KST overflow should probably be signalled, rather than indicated by an
 869 error code.
 870 
 871 436  phx05497
 872 When signaller.alm pushes a stack frame, it first extends the previous
 873 frame by 48 words to allow for interrupted push operations.  If a non
 874 local goto is used to transfer control back into that extended stack
 875 frame, it never gets shrunk.  Repeated occurences of this will
 876 eventually use up the stack.
 877 
 878 The fix should be to change signaller.alm to put the new frame 48 words
 879 up the stack without doing an extension of the existing frame.  This
 880 requires hand-coding the push, but thats not too hard.  The alternative
 881 is to try to use a cleanup handler to shrink it, which would be awfully
 882 hard since the cleanup handler would be associated with the frame above,
 883 which would still be on the stack.  Its hard to shrink your callerr's
 884 stack frame.
 885 
 886 429  phx12689
 887 When cpt is invoked with the -lg control argument, it does not print
 888 full pathnames in the summary report.  It does, however, print full
 889 pathnames in the detailed trace file if -trace is also specified.
 890 
 891 410  phx12355
 892 Attempted logins to ring-6 or ring-7 fail, since makestack requires
 893 non-null effective access (at the validation level of the initial ring)
 894 to signal_, unwinder_, operator_pointers_, and pl1_operators_.  These
 895 have ring brackets of 0,5,5.  The general solution is not clear.  Rings
 896 6 and 7 are supposed to be available for totally encapsulated
 897 subsystems, with only facilities provided explicitly by the subsystem
 898 available.  The difficulty is to balance this against the need to
 899 provide a rudimentary environment to initialize the subsystem.
 900 
 901 409  phx12251
 902 A more compact method of logging I/O errors is needed.  Currently, each
 903 I/O error is logged into the syserr log.  This can flood the log with
 904 largely meaningless I/O error messages (for example, when reading a tape
 905 of marginal quality.  An approach is to write summary records,
 906 periodically (based on time or on error thresholds), and optionally
 907 record detailed messages.
 908 
 909 407  phx12250
 910 Deletion of a segment with wired pages causes the segment not to be
 911 deleted, left active, with PTWs for the wired pages having nulled
 912 addresses and wired bits on.  Under some circumstances, this can cause a
 913 system crash.  This situation can be caused by a user wiring pages
 914 (through hphcs_).  This can also happen if a process terminates with an
 915 active ioi buffer.
 916 
 917 399  phx12134
 918 append$retv should validate the entry supplied more carefully.  An
 919 instance An instance of the problem is that the cross-retrieval of an
 920 object with multiple names will contain a non-null forward name thread
 921 in the primary name field.
 922 
 923 393  phx12070 phx10495
 924 Segments should be created with access of r to *.SysDaemon, rather than
 925 rw.
 926 
 927 383
 928 There should be a system-maintained database which keeps track of recent
 929 crash history, and types of shutdowns.  Possibly it could be as simple
 930 as logging, at bootload, the time and type of the last shutdown.  The
 931 syserr log is probably robust enough, and can easily be scanned to find
 932 the information.
 933 
 934 382  phx04847
 935 fix_quota_used should also adjust TRP totals in accordance with the
 936 adjustment being applied to quota used and the length of time since the
 937 last ESD failure crash.  This should be automatically driven from the
 938 last crash info, and be manually overridable if necessary.
 939 
 940 378  phx12013
 941 setfaults should have a recovery strategy for page_fault_errors on a
 942 target dseg; probably it should kill the other process, rather than
 943 crashing the system with a crawlout with AST lock set.
 944 
 945 376  phx12003
 946 trace_mc should use a hardcore segment for the buffer, to avoid problems
 947 with recursive faults caused by flushing trailers or dseg ptw misses.
 948 
 949 364  phx01612
 950 The iocb structure in iocb.incl.pl1 contains an implicit word of padding
 951 between iocb.name and iocb.actual_iocb_ptr, which should be explicitly
 952 declared as pad.
 953 
 954 362  phx11904
 955 verify_lock should check all ring-0 locks which could be held on
 956 call-side.  It should not allow a process to crawlout with any ring-0
 957 lock held.  For some locks detected by verify_lock, the system should be
 958 crashed immediately; for others (vtoc buffer lock), some recovery is
 959 possible.
 960 
 961 360  phx11870
 962 On a multi-process salvage, one of the processes may take an unexpected
 963 error (page_fault_error, for example).  This will cause the process to
 964 go to a new command level and wait for terminal input.  Eventually, all
 965 other processes will hang (blocked) waiting for this process to respond
 966 to the dispatch wakeup.  The solution is probably for do_subtree to
 967 establish an any_other handler and do something appropriate on
 968 unexpected signals.
 969 
 970 357  phx11839
 971 The supervisor should take more pains to ensure that a setfaults
 972 operation is performed on segments dynamically marked as damaged, either
 973 when the damage is detected, or soon thereafter.
 974 
 975 356  phx10004
 976 The primitive for setting the damaged switch should perform a setfaults
 977 operation, since it operates in a better environment than page control
 978 does when doing so, and it is desirable to provide damage notification
 979 as quickly as possible to other processes.
 980 
 981 352  phx11831
 982 If a directory hash table overflows while the directory is being rebuilt
 983 by salv_dir_checker_, some names on the entry which caused the overflow
 984 may not be hashed in correctly.  This is because the special-case code
 985 to keep hash from faulting on the partially rebuilt directory does not
 986 ensure that all the names already processed are rehashed.
 987 
 988 306  phx11600
 989 The entry structure (dir_entry.incl.pl1) is misdeclared; the structure
 990 takes only 37 words, despite the comment claiming that it takes 38.
 991 This seems to be benign, but should be rectified.
 992 
 993 305  phx11593
 994 Although there are hcs_ entries to set it, the DNZP switch is not
 995 reported by any status_ entrypoints.
 996 
 997 303  phx11555 phx06112 phx04846
 998 The quota salvager should correct inconsistencies in quota allocated and
 999 quota received fields, as well as quota used.  There is presently no way
1000 to repair these fields other than BOS PATCH.
1001 
1002 300  phx11553
1003 Damage to >lv and >disk_table_ should be detected and acted upon
1004 automatically at bootload, rather than requiring use of BOOT NOLV and
1005 NODT.
1006 
1007 272  phx11009
1008 traffic_control_queue should never be reporting a negative value for
1009 tssc.  It does so because the snap of the APTEs consumes non-negligible
1010 time (due to paging) with no locks held.  A fix is to read the current
1011 time immediately after copying out the APTEs.
1012 
1013 260  phx10996
1014 A volume administrator can adjust the quota on a master directory of
1015 which he is not the owner, if he has sma access.  This use charges the
1016 quota account of the Initializer, which is clearly bogus.
1017 
1018 239  phx10114
1019 Although the salvager can set the security-out-of-service bit for
1020 segment branches as well as directories, the privileged gate entry to
1021 reset the switch works only on directories.  It should work on segments
1022 as well.
1023 
1024 229  phx09675
1025 There should be a mechanism for establishing hardcore crash handlers
1026 which would be executed by sys_trouble before crashing the system, so
1027 that (for instance) the IMPDIM could shut itself down, by establishing a
1028 handler to send a going-down connect to the IMP.
1029 
1030 223  phx09383
1031 Attempting to add a memory which is already online causes an OOB fault
1032 in reconfigure (line 193) because it fumbles one of the error codes.
1033 
1034 222  phx09341
1035 The error message for incorrect access should be specific about the type
1036 of access which the process lacks:  ACL, ring bracket, or AIM.
1037 Presently, some primitives distinguish between ring bracket and ACL
1038 violations, and others do not.  AIM violations would have to be detected
1039 specially; there is no error code for this today.  See also entries 78
1040 and 157.
1041 
1042 219  phx09240 phx11009
1043 system_performance_graph cannot properly represent more than 100
1044 logged-in users.  It should use a different scale, or wrap around.
1045 
1046 217  phx09162
1047 When walking the AST to demount a volume, demount_pv gives up upon
1048 encountering very minor anomalies, causing ESD to fail completely when
1049 it should have almostr succeeded.  It needs a better way of walking the
1050 AST, to eliminate the "demount_pv:  AST out of sync" message.  The AST
1051 pools should be described by pointers and counts kept in the SST, rather
1052 than just by count.
1053 
1054 215  phx09082 phx12302
1055 Checking of CPUs which are being added should be both more complete and
1056 more flexible.  Proper settings for both cache and associative memories
1057 should be checked.  It should also be possible for a site to over-ride
1058 these checks (by arguments to add_cpu).
1059 
1060 214  phx09047
1061 There should be a DRL instruction at the beginning of page_fault, so
1062 that history registers would be saved if a wild transfer occurred.
1063 
1064 213  phx08965
1065 There should be more state recorded in the PVT when a volume cannot be
1066 accessed, such as the real fsdisk error coderather than just
1067 pvte.device_inoperative.  This lack causes add_vol to be unable to
1068 distinguish between "drive in protect" and "drive offline".
1069 
1070 212  phx08963
1071 The check_trailers procedure can only be enabled by recompilation.  It
1072 should be possible to simply patch something.
1073 
1074 211  phx10123
1075 Messages from hardcore (disk_control, get_aste, hc_dmpr_primitives,
1076 etc.)  should include the physical volume name where appropriate.  This
1077 must be preceded by putting the name into ring zero.  (see entry 210)
1078 
1079 210  phx11769 phx08952
1080 The ring one volume management tables should be direct copies of the
1081 ring zero PVT and LVT, which should be changed to include all the
1082 information (names and special flags) now only in the disk_table.  This
1083 is the only real way to fix the problems due to inconsistencies between
1084 these databases.
1085 
1086 203  phx11765
1087 hcs_$fs_get_mode always returns the 4 bit set in directory modes.  It
1088 should leave this bit off, like hcs_$get_user_effmode.
1089 
1090 199  phx11761
1091 The ioa controls ^e and ^f have difficulty formatting integers.  For
1092 instance, ^.2f gives completely inappropriate results when given
1093 1234567, though it does fine with 1234567.12
1094 
1095 193  phx08451 phx11705
1096 There should be special entries to status_ for the primary name, the
1097 link path, and the list of names.  The existing status_ interfaces are
1098 seriously defective here (see entry 192).  See phx11705 for interface
1099 details.
1100 
1101 189  phx08286
1102 There should be a way to turn on the audit flag in the branch.  A
1103 primitive mechanism, but better than nothing.  Now that the audit flag
1104 does nothing, this will become a limitation until a proper per branch
1105 audit mechanism is created.
1106 
1107 188  phx08284
1108 The privileged quota-setting primitives should log a message when used,
1109 to aid in keeping track of the operations.
1110 
1111 187  phx08076
1112 When a process running ISOLTS is temrinated abnormally, the CPU and
1113 memory is was using for the test are not released.  This, despite the
1114 code in deact_proc which appears to do just that.
1115 
1116 186  phx08263 phx03859 phx06694
1117 There should be a way to interrupt the Initializer process, "no matter
1118 what".  Perhaps a tiny debugging environment entered on receipt of an
1119 execute fault.
1120 
1121 184  phx10589
1122 The MPC error counters should be read out and stored in the syserr log
1123 when a pack is mounted or dismounted; this would make it much easier to
1124 keep track of per-drive error histories.
1125 
1126 183  phx07983 phx11700
1127 The system should perform probabilistic verification of disk writes,
1128 checking some small fraction of them for success.  The fraction would be
1129 increased if errors occurred, decreased as the drive was seen to
1130 operate, and be manually tunable, as well.
1131 
1132 181  phx08237
1133 There should be a way to change the time zone (CLOK card and sys_info
1134 correction constant) while the system is running.
1135 
1136 179  phx07814
1137 verify_lock will recurse, faulting, if it tries to unlock a directory
1138 which is no longer accessable due to seg_fault_error or page_fault_error
1139 problems.  It should have condition handlers for this.
1140 
1141 176  phx07711
1142 The traffic_control_queue command should display the states of all the
1143 interesting APTE flags; pre_empt_pending, in particular.
1144 
1145 170  phx06979
1146 The system should further analyze the MOS EDAC error messages to the
1147 extent that it determines which pages in the SCU are affected by the
1148 error, so that the pages can be removed, either manually or
1149 automatically.  This will also save syserr log space.
1150 
1151 167  phx06374
1152 When a hardware fault occurs as a result of an Illegal Action from an
1153 SCU, software should unlock the SCU history registers on that SCU, to
1154 allow data from a fault which crashes the system (later) to be retained.
1155 Unfortunately, it is not possible for software to read these registers.
1156 
1157 166  phx06326
1158 The hp_delete command tries to set some AIM flags in the directory it is
1159 trying to delete.  This will not work if the directory is
1160 connection-failed.  Since initiate was changes to activate directories
1161 immediately, this problem is masked, but hp_delete shouldn't do this
1162 anyway.
1163 
1164 164  phx04854 phx05954
1165 The UID generator and pxss should check the difference between the last
1166 clock reading and the current one periodically, and crash the system if
1167 it is too large.  This situation arises when a clock makes a sudden
1168 jump, and could otherwise seriously damage the file system.
1169 
1170 163  phx04854 phx05954
1171 Dates in VTOCEs and directories should be corrected by the volume and
1172 hierarchy salvagers.  Dates in the future should be set to the current
1173 time, and dates from before NSS should be set to some early date.  This
1174 situation can arise either from damage, or because the clock was
1175 incorrectly set.  UIDs should also be checked for validity, and reset to
1176 new UIDs (from getuid) if they fall outside the range of acceptable
1177 times.
1178 
1179 161  phx07238
1180 The system should make some attempt to determine whether all the
1181 configured IOMs can access a memory module being added.  This is
1182 probably difficult to do, since it would have to be done by experiment,
1183 which might prove disasterous if the IOM configuration panel were not
1184 set properly.
1185 
1186 157  phx06101
1187 When attempting to append an entry, if the append cannot be performed
1188 because of containing directory ring brackets, the error message should
1189 be Validation level not in ring bracket, rather than Incorrect access to
1190 directory containing entry.
1191 
1192 155  phx06075
1193 When a name on a branch is changed, it should be changed in place, so it
1194 remains in the same place in the list of names, rather than behave as if
1195 it had been deleted and added back.
1196 
1197 145  phx03708
1198 The attach_lv command should accept -a as well as -all.
1199 
1200 142  phx03109
1201 The FIM should distinguish (via different error codes for termination)
1202 between an out-of-bounds on the ring zero stack and one on an outer ring
1203 stack, to aid in identifying situations which cause this particular ring
1204 zero error condition.
1205 
1206 139  phx07240
1207 When there is bad parity in memory, the resulting error messages are
1208 very verbose.  Especially at ESD time, they should simply be flushed.
1209 This requires more specific info about the messages in question to solve
1210 well.
1211 
1212 137  phx08082
1213 The reclassify_sys_seg primitive doesn't work when system_high equals
1214 system_low, because it requires that the segment end up with an acccess
1215 class greater than that of the containing directory.  This is a
1216 limitation derived from the implementation of multi-class segments,
1217 which are required by various modules of directory control to really be
1218 multi-class.
1219 
1220 135  phx07543
1221 When a directory is deleted from another process, strange things happen
1222 when it is referenced.  Most often, lock takes a fault trying to look at
1223 the UID.  Perhaps it should have a handler for that condition.
1224 
1225 130  phx05245
1226 It is possible for a users virtual CPU time to become very inaccurate as
1227 the result of a large number of faults, because of the adjustment which
1228 must be applied to compensate for fault processing time.  There is no
1229 real way to fix this.
1230 
1231 121
1232 A crawlout may leave a directory initiated which really should be
1233 terminated, cluttering the KST.
1234 
1235 119
1236 Reference names for inner ring segments can be made available to outer
1237 ring programs; a violation of security.  Not well understood.
1238 
1239 118
1240 copy_on_write makes the copy unencachable until the next setfaults
1241 restores access.  Not well understood.
1242 
1243 114
1244 The messages in the syserr log describing page control errors are
1245 truncated when printed.  This appears to be a problem in the printing
1246 routines, rather than in page_error or the log itself.
1247 
1248 108  phx04071 phx04955
1249 The cleanup handler in an absentee job is never executed if the absentee
1250 terminates by a call to cu_$cl.  This mechanism should be considerably
1251 more robust.
1252 
1253 102  phx03345 phx09268
1254 The fim does not properly handle EIS decimal overflows and underflows,
1255 in that it does not respect the values to be reset, and also does not
1256 reset the IC properly.
1257 
1258 95  phx03943
1259 The machine conditions resulting from inability to add a processor
1260 should be saved somewhere for later analysis.  Presently they are just
1261 discarded by init_processor.
1262 
1263 80  phx03232
1264 The write_limit is reset at each memory reconfiguration, resulting in
1265 the PARM WLIM value apparently being ignored if reconfigurations occur.
1266 Should fix it by having reconfiguration not reset it.
1267 
1268 77  phx11596
1269 The error code from hcs_$fs_move_xxx is not specific enough, partly due
1270 to the lack of a corresponding source/target switch.
1271 
1272 73
1273 Pathnames can be much longer than 168 characters (max is 16*32+1, 513).
1274 This causes problems for all the interfaces which use the standard char
1275 (168) declarations.  Fortunately, find_ can handle it, but many user
1276 ring programs behave inconsistently.  The solution is not easy.
1277 
1278 69  phx03152
1279 The initializer can "find" directories by its linker search rules, due
1280 to the special-casing in access_mode$effective.  This leads to
1281 surprising, though harmless, behaviour.
1282 
1283 68  phx11588
1284 The structure for hcs_$create_branch_ has not kept up to date with file
1285 system changes, and no longer contains all the values which might want
1286 to be set when a branch is created.  It should be upgraded whenever the
1287 file system is changed.
1288 
1289 65
1290 The SST, limited in size to but one segment, cannot be made large
1291 enough to optimally support the largest configurations available today,
1292 and this situation can only get worse.  The fix is to split it up into
1293 several tables, possibly using more than one segment for the AST
1294 itself.  This is very hard.  83-01-18:  well, try this.  Get a pointer
1295 register back by changing all references to sst|foo to sst$foo (use
1296 pr4, that is).  Now, make a wired table of packed pointers to astes.
1297 Interpret the aste threads as ndexes into this table.  This costs only
1298 1 word per aste, as opposed to changing all 6 threads to packed
1299 pointers (3 words).  It should just be grunt work to implement.
1300 
1301 60
1302 There is no general mechanism for determining how many pages should be
1303 wired by pmut$wire_and_mask, since error cases (calls to syserr, mainly)
1304 may use up a large amount of stack space not normally required.  This
1305 has been partially fixed by changing syserr to run on the PRDS when
1306 called masked.
1307 
1308 53  phx01533 phx01978
1309 ESD will fail if an MPC is broken.  Multics should be more robust about
1310 dealing with bad hardware, and delete the devices more rapidly.
1311 
1312 32
1313 Many system meters overflow when the system stays up for a long time.
1314 This causes faults in the idle process, and in various places in ring
1315 zero.  This is a catch-all error list entry, to be reserved for the
1316 general solution if we ever invent one.  Other specific entries address
1317 specific instances of the problem.
1318 
1319 22  phx02203
1320 The quota moving primitives sometimes fail to adjust things properly
1321 when working on active directories.  More details are not known at this
1322 time.
1323 
1324 19
1325 If the HC partition on the RPV is not large enough, it may not be
1326 possible to boot with a partial RLV.
1327 
1328 11
1329 A bad error message is provided if process initialization fails; for
1330 instance, if the user has incorrect access to the process overseer.
1331 This is possibly an answering service problem, actually.
1332 
1333 10
1334 The linker and the fim look at instructions in the object segment
1335 itself, rather than in the SCU data.  This is just one more reason why
1336 execute-only code does not work.
1337 
1338 9
1339 The system loops or otherwise misbehaves when the permanent syserr log
1340 is damaged.  (>sc1>perm_syserr_log) This is partly a vfile_ problem in
1341 dealing with trashed keyed vfiles.  Should fix syserr_log_man_ to be
1342 better about dealing with problems in >sc1>perm_syserr_log.  If it has
1343 difficulty, it should rename the old one and create a new one, rather
1344 than simply giving up and not copying the partition.