1 :Info: sort_seg: ss:  1986-01-30  sort_seg, ss
  2 
  3 Syntax as a command:  ss path {-control_args}
  4 
  5 
  6 Function:  orders "sort units" within a segment by comparing contents
  7 of each unit.  A sort unit is composed of one or more contiguous "sort
  8 strings".
  9 
 10 A sort string is either: a character string having a fixed length; or
 11 a ending with specific delimiter character(s).
 12 
 13 Several adjacent sort strings may be grouped together to form the sort
 14 units which are reordered.
 15 
 16 Sort units may be compared as a contiguous ASCII character string; or
 17 one or more sort fields may be defined within each sort unit to limit
 18 parts of the sort unit being compared.  Each sort field may be
 19 compared as an integer value, a float decimal value, or as a string of
 20 ASCII characters.
 21 
 22 
 23 Arguments:
 24 path
 25    specifies the pathname of an input segment.  The star convention is
 26    NOT allowed.
 27 
 28 
 29 Control arguments (output file):
 30    Control arguments in this group are mutually exclusive.  If more
 31    than one is given, the last specified overrides the others.  If
 32    neither is given, sort_seg asks if the input segment should be
 33    replaced with its sorted contents.
 34 -output_file path, -of path
 35    places the sorted units in a segment whose pathname is path.  The
 36    equal convention is allowed.
 37 -replace, -rp
 38    replaces the original contents of the input segment with the sorted
 39    units.
 40 
 41 
 42 Control arguments (sort strings):
 43    Control arguments in this group are mutually exclusive.  If more
 44    than one is given, the last specified overrides the others.  If
 45    none are used, the default is to treat each line as the sort string
 46    (use a single newline character as the delimiter).
 47 -delimiter L, -dm L
 48    makes each L characters of the input segment a delimited string
 49    where L is a positive integer.  This essentially divides the input
 50    into character strings of length L.
 51 -delimiter {-string} STR, -dm {-str} STR
 52    uses STR concatenated with a newline character as the string
 53    delimiter.  The character STR can be any sequence of ASCII
 54    characters.  It can be preceded by -string (-str) to distinguish it
 55    from an integer or a regular expression.
 56 
 57 
 58 -delimiter /REGEXP/, -dm /REGEXP/
 59    uses REGEXP as a regular expression as the string delimiter.
 60    Strings to be sorted are delimited by the characters which match the
 61    regular expression.  See the description of regular expressions
 62    under the qedx command.
 63 
 64 
 65 Control arguments (sort units):
 66 -block N, -bk N
 67    makes the sort unit a block of N sort strings, where N must be a
 68    positive integer.  The default for N is 1.
 69 
 70 
 71 Control arguments (default field type):
 72    Control arguments in this group are mutually exclusive.  If more
 73    than one is given, the last specified overrides the others.  If
 74    none are given, -character is the default.
 75 -character, -ch
 76    makes the sort based on the character representation of the sort
 77    field.
 78 -integer, -int
 79    makes the sort by converting the sort fields to fixed binary(71,0)
 80    integers when comparing one sort unit to another.  (See the Notes
 81    section below.)
 82 -numeric, -num
 83    makes the sort by converting the sort fields to float decimal(59)
 84    numbers when comparing one sort unit to another.  (See the Notes
 85    section below.)
 86 
 87 
 88 Control arguments (default sort direction):
 89    Control arguments in this group are mutually exclusive.  If more
 90    than one is given, the last specified overrides the others.
 91 -ascending, -asc
 92    makes the sort in ascending order, according to the ASCII collating
 93    sequence.  This is the default mode of operation.
 94 -descending, -dsc
 95    makes the sort in descending order, according to the ASCII collating
 96    sequence.
 97 
 98 
 99 Control arguments (default character comparision):
100    Control arguments in this group are mutually exclusive.  If more
101    than one is given, the last specified overrides the others.
102 -case_sensitive, -cs
103    makes the sort by comparing sort fields without translating letters
104    to lowercase.  This is the default.
105 -non_case_sensitive, -ncs
106    makes the sort by translating letters in the sort fields to
107    lowercase when comparing one sort unit to another.  The actual
108    sorted results remain unchanged.
109 
110 
111 Control arguments (duplicates sort units):
112 -duplicates, -dup
113    retains duplicate sort units in the sorted results.  This is the
114    default.
115 -only_duplicates, -odup
116    only sort units which occur more than once in the segment appear in
117    the sorted results.  One unit from each set of duplicate sort units
118    is placed in the output segment, in sorted order.
119 
120 
121 -only_duplicate_keys, -odupk
122    only sort units which have duplicate sort fields appear in the
123    sorted results.  All such units having duplicate sort fields are
124    placed in the output segment, since the non-sort field portions of
125    the units may differ.
126 -only_unique, -ouq
127    only sort units which are unique appear in the sorted results.
128    Whenever a set of duplicate units are found, they are removed
129    entirely from the output segment.
130 
131 
132 -only_unique_keys, -ouqk
133    only sort units which have unique sort fields appear in the sorted
134    results.  All units having duplicate sort fields are removed
135    entirely from the output segment.
136 -unique, -uq
137    deletes duplicate sort units from the sorted results.  For each set
138    of duplicate sort units, only the first appears in the sorted
139    results, along with nonduplicate sort units.
140 -unique_keys, -uqk
141    deletes sort units having duplicate sort fields from the sorted
142    results.  For each set of sort units having duplicate fields, only
143    the first appears in the sorted results, along with nonduplicate
144    sort units.
145 
146 
147 Control arguments (sort fields):
148    Choose either -all, or one or more -field specifications.
149 -all, -a
150    makes the primary (and only) sort field the entire sort unit; i.e.,
151    the entire sort unit is considered when sorting.  This is the
152    default mode of operation.
153 -field FIELD_START FIELD_LENGTH {-sort_controls},
154 -fl FIELD_START FIELD_LENGTH {-sort_controls}
155    defines a part within each sort unit that is compared with other
156    sort units during the sort operation.  -field may be
157    given several times.  The first -field control defines the primary
158    sort field; the second defines a secondary sort field (used to
159    distinguish sort units having equal primary sort field values), etc.
160    Sections below give details for FIELD_START FIELD_LENGTH and
161    optional sort control values.
162 
163 
164 List of field_start formats:
165    The sort field start location may be specified in one of the
166    following formats:
167 S
168    a positive integer, giving the character position of the start of
169    the field in the sort unit (eg, 1 if the field begins at the first
170    character).  If the sort unit contains fewer than S characters, then
171    the unit is sorted as if space characters appeared in the sort
172    field.
173 -from S, -fm S
174    where S is a positive integer giving the character position of the
175    start of the field in the sort unit.
176 
177 
178 -from STR, -fm STR
179    where STR is a character string which identifies the beginning of
180    the sort field.  The field begins with the first character of the
181    sort unit which follows STR.  If STR does not appear in the sort
182    unit, then the unit is sorted as if the sort field contains space
183    characters.
184 -from /REGEXP/, -fm /REGEXP/
185    where REGEXP is a regular expression which identifies the beginning
186    of the sort field.  The field begins with the first character of the
187    sort unit which follows the part of the sort unit matching REGEXP.
188    See the writeup of the qedx command for the definition of regular
189    expressions.  If no match for REGEXP is found in the sort unit, then
190    the unit is sorted as if the sort field contains space characters.
191 
192 
193 -from -string STR, -fm -str STR
194    treats STR as a character string which identifies the beginning of
195    the sort field, even though STR may look like an integer or a
196    regular expression.  For example,
197 
198       -from -string 25
199 
200    identifies a sort field which begins with the character following
201    "25" in the sort unit.
202 
203 
204 List of field_length formats:
205    The sort field length may be specified in one of the following ways.
206 L
207    a positive integer, giving the length of the sort field in
208    characters.  If the sort unit is too short to hold a sort field of L
209    characters (that is, if the number of characters from the first
210    character of the sort field to the end of the sort unit is less than
211    L), then the unit is sorted as if the field were extended on the
212    right with space characters to a length of L characters.
213    Alternately, L can be -1 to indicate that the remainder of the sort
214    unit is to be used as the sort field.
215 -for L
216    where L is a positive integer giving the length of the sort field in
217    characters, or -1 to use the remainder of the sort unit as the sort
218    field.
219 
220 
221 -to E
222    where E is a positive integer giving the character position of the
223    end of the sort field in the sort unit (eg, 5 if the field stops
224    after the fifth character of the sort unit).  If the sort unit
225    contains fewer then E characters, then the unit is sorted as if
226    space characters were added on the right to extend the unit to E
227    characters.
228 -to STR
229    where STR is a character string which identifies the end of the sort
230    field.  The field ends with the first character of the sort unit
231    preceding STR.  If STR does not appear in the sort unit after the
232    starting position of the sort field, then the unit is sorted as if
233    space characters appeared in the sort field.
234 
235 
236 -to /REGEXP/
237    where REGEXP is a regular expression which identifies the end of the
238    sort field.  The field ends with the first character of the sort
239    unit which precedes the part of the sort unit matching REGEXP.  See
240    the writeup of the qedx command for the definition of regular
241    expressions.  If no match for REGEXP is found in the sort unit after
242    the starting position of the sort field, then the unit is sorted as
243    if space characters appeared in the sort field.
244 -to -string STR
245    treats STR as a character string which identifies the end of the
246    sort field, even though STR may look like an integer or a regular
247    expression.
248 
249 
250 Notes on field_length format:
251 When -to is used to specify the end of the field, then sort_seg
252 examines all sort units to determine the length of the longest
253 instance of this sort field in any sort unit.  It then sorts units as
254 if the sort field in each unit were extended on the right with space
255 characters to the length of the longest sort field instance.
256 
257 
258 List of sort_controls:
259    The sort controls may be one from each of the following sets of
260    arguments.  If no sort control is given, then the default is
261    specified by the corresponding control argument (-ascending or
262    -descending, -case_sensitive or -non_case_sensitive, -character or
263    -integer or -numeric).
264 ascending, asc
265    sort units with this field in ascending order.  This sort control is
266    incompatible with descending.
267 descending, dsc
268    sort units with this field in descending order.  This sort control
269    is incompatible with ascending.
270 
271 
272 non_case_sensitive, ncs
273    sort units by translating this field to lowercase.  This sort
274    control is incompatible with case_sensitive.
275 case_sensitive, cs
276    sort units by treating uppercase letters in this field as being
277    different from lowercase letters.  This sort control is incompatible
278    with non_case_sensitive.
279 
280 
281 character, ch
282    sort units with this field by the character representation.  This
283    sort control is incompatible with integer or numeric.
284 integer, int
285    sort units with this field by converting the character
286    representation to its integer value (fixed binary(71,0)).  This sort
287    control is incompatible with character or numeric.
288 numeric, num
289    sort units with this field by converting the character
290    representation to its numeric value (float decimal(59)).  This sort
291    control is incompatible with character or integer.
292 
293 
294 Notes:
295 The input segment is sorted using the following procedure.  Using the
296 control arguments, the segment is broken down into separate sort
297 units, which are strings or blocks of strings.  A string can be
298 composed of one or more lines.  These sort units are then ordered.
299 The re-ordered units either replace the original segment or are stored
300 in a new segment.
301 
302 
303 If the sort_seg command is invoked without any control_args, the
304 -replace, -ascending, -all, -character, and -delimiter control
305 arguments are assumed, and the default delimiter of a newline character
306 is used.  That is, the sort_seg command, when invoked with path as the
307 only argument, sorts the lines of that segment as character strings in
308 ascending ASCII collating sequence, replacing the original segment with
309 the sorted result.  As a safety measure, the following question is
310 asked when -replace is not specified:
311 
312    Do you really want to sort the contents of PATH?
313 
314 This helps avoid accidental sorting of segments.
315 
316 
317 The start position of a sort field is calculated relative to the
318 beginning of a sort unit.  If the blocking factor is N = 1, the start
319 position is calculated relative to the beginning of a string.  If the
320 blocking factor is N > 1, the start position is calculated relative to
321 the beginning of the first string of a block.  When calculating field
322 specifications within a sort unit of N > 1 strings (blocking factor
323 N > 1), string delimiters internal to the sort unit should not be
324 considered.  (See the Examples section below.)
325 
326 
327 Sort fields/units of unequal length are compared by assuming the
328 shorter field/unit to be padded on the right with space characters,
329 immediately following the rightmost character.  If a field/unit
330 contains non-printing graphic characters (such as BS, HT, NL, VT, FF,
331 CR, etc.)  which precede the space character in ASCII collating
332 sequence, they will be sorted accordingly, with sometimes unexpected
333 results.  The string delimiter is never considered when padding. (See
334 the Examples section below.)
335 
336 
337 The numeric sort mode converts the sort field character string to a
338 float decimal(59) value for sorting purposes.  Similarly, the integer
339 sort mode converts the sort field character string to a fixed bin(71,0)
340 value.  The character string representation must be acceptable to the
341 PL/I or Fortran language conversion rules.  The actual sort field
342 remains unchanged in the sorted results.
343 
344 
345 If characters are detected in the input segment following the final
346 delimited sort unit, they are ignored for the purposes of sorting, but
347 appear in the sorted output immediately following the final delimited
348 sort unit.  An error message specifies the location of the first
349 nondelimited character.
350 
351 
352 A maximum of 261,119 units can be sorted.  The sort is stable, i.e.,
353 duplicate units appear in the same order in the sorted segment as in
354 the original segment.
355 
356 
357 The input segment is sorted using temporary segments in the process
358 directory.  If the -output_file control argument is specified, and path
359 is the pathname of an already existing segment, its contents are
360 destroyed upon beginning the sort.  If the sorted results are to
361 replace the original contents of the input file, that replacement does
362 not occur until the last possible moment.
363 
364 
365 The -unique control argument deletes duplicate sort units from the
366 sorted results.  The determination of whether or not a sort unit is to
367 be deleted is independent of sort field specifications; i.e., given a
368 number of nonidentical sort units that contain identical sort fields,
369 all the units do appear in the sorted results.
370 
371 
372 Examples:
373 Suppose a segment contains the following lines (where nl represents the
374 ASCII newline character):
375    ABCDEFGHXYnl
376    ABCDEFXYnl
377    ABCDEFGHIJXYnl
378    ABCXYnl
379 
380 The display below shows how the  sort_seg command sorts the contents of
381 this segment, according to the  arguments specified in the first column
382 (nl stands for  the ASCII newline character and #  stands for the ASCII
383 space character).
384 
385 
386      these    |   define these   |   sorted on    |    giving
387    arguments  |    sort units    |  these fields  | these results
388 --------------|------------------|----------------|--------------
389 -dm XY        |ABCDEFGH          |ABCDEFGH##      |ABCXYnl
390               |ABCDEF            |ABCDEF####      |ABCDEFXYnl
391               |ABCDEFGHIJ        |ABCDEFGHIJ      |ABCDEFGHXYnl
392               |ABC               |ABC#######      |ABCDEFGHIJXYnl
393 --------------|------------------|----------------|--------------
394 -bk 2         |ABCDEFGHABCDEF    |ABCDEFGHABCDEF  |ABCDEFGHXYnl
395 -dm XY        |ABCDEFGHIJABC     |ABCDEFGHIJABC#  |ABCDEFXYnl
396               |                  |                |ABCDEFGHIJXYnl
397               |                  |                |ABCXYnl
398 --------------|------------------|----------------|--------------
399 -fl 6 4       |ABCDEFGHXY        |FGHX            |ABCXYnl
400               |ABCDEFXY          |FXY#            |ABCDEFGHIJXYnl
401               |ABCDEFGHIJXY      |FGHI            |ABCDEFGHXYnl
402               |ABCXY             |####            |ABCDEFXYnl
403 --------------|------------------|----------------|--------------
404               |                  |first   second  |
405 -fl 1 4 7 2   |ABCDEFGHXY        |ABCD    GH      |ABCDEFGHXYnl
406               |ABCDEFXY          |ABCD    XY      |ABCDEFGHIJXYnl
407               |ABCDEFGHIJXY      |ABCD    GH      |ABCDEFXYnl
408               |ABCXY             |ABCX    ##      |ABCXYnl
409 --------------|------------------|----------------|--------------
410 -dm Y         |ABCDEFGHXABCDEFX  |FGHX    DE      |ABCDEFGHIJXYnl
411 -bk 2         |ABCDEFGHIJXABCX   |FGHI    DE      |ABCXYnl
412 -fl 6 4 4 2   |                  |                |
413               |                  |                |ABCDEFGHIJXABCX
414               |                  |                |ABCDEFXYnl
415 --------------|------------------|----------------|--------------
416               |                  |first   second  |
417 -fl 6 4 dsc   |ABCDEFGHXY        |FGHX    CDE     |ABCDEFXYnl
418  3 3 asc      |ABCDEFXY          |FXY#    CDE     |ABCDEFGHXYnl
419               |ABCDEFGHIJXY      |FGHI    CDE     |ABCDEFGHIJXYnl
420               |ABCXY             |####    CXY     |ABCXYnl
421 --------------|------------------|----------------|--------------
422 -fl 1 3       |ABCDEFGH          |ABC             |ABCDEFGHXYnl
423 -unique_key   |ABCDEF            |ABC             |
424 -dm XY        |ABCDEFGHIJ        |ABC             |
425               |ABC               |ABC             |
426 --------------|------------------|----------------|--------------
427               |                  |first   second  |
428 -fl 1 3 5 2   |ABCDEFGH          |ABC     EF      |ABCDEFGHXYnl
429 -odupk        |ABCDEF            |ABC     EF      |ABCDEFXYnl
430 -dm XY        |ABCDEFGHIJ        |ABC     EF      |ABCDEFGHIJXYnl
431               |ABC               |ABC     ##      |
432 
433 
434 :hcom:
435 
436 
437 
438 /****^  HISTORY COMMENTS:
439   1) change(2020-12-16,GDixon), approve(2021-02-22,MCR10088),
440      audit(2021-05-27,Swenson), install(2021-05-27,MR12.6g-0056):
441       A) Change "Syntax:" to "Syntax as a command:" in command info seg
442          last changed after 1984.
443       B) Correct problems found by verify_info.
444       C) Reorganize control arguments into separate sections by
445          category.  Explain defaults and limits at the start of each
446          control argument section.
447                                                    END HISTORY COMMENTS */
448 
449 
450