1 :Info: sort_seg: ss: 1986-01-30 sort_seg, ss
2
3 Syntax as a command: ss path -control_args
4
5
6 Function: orders "sort units" within a segment by comparing contents
7 of each unit. A sort unit is composed of one or more contiguous "sort
8 strings".
9
10 A sort string is either: a character string having a fixed length; or
11 a ending with specific delimiter characters.
12
13 Several adjacent sort strings may be grouped together to form the sort
14 units which are reordered.
15
16 Sort units may be compared as a contiguous ASCII character string; or
17 one or more sort fields may be defined within each sort unit to limit
18 parts of the sort unit being compared. Each sort field may be
19 compared as an integer value, a float decimal value, or as a string of
20 ASCII characters.
21
22
23 Arguments:
24 path
25 specifies the pathname of an input segment. The star convention is
26 NOT allowed.
27
28
29 Control arguments output file:
30 Control arguments in this group are mutually exclusive. If more
31 than one is given, the last specified overrides the others. If
32 neither is given, sort_seg asks if the input segment should be
33 replaced with its sorted contents.
34 -output_file path, -of path
35 places the sorted units in a segment whose pathname is path. The
36 equal convention is allowed.
37 -replace, -rp
38 replaces the original contents of the input segment with the sorted
39 units.
40
41
42 Control arguments sort strings:
43 Control arguments in this group are mutually exclusive. If more
44 than one is given, the last specified overrides the others. If
45 none are used, the default is to treat each line as the sort string
46 use a single newline character as the delimiter.
47 -delimiter L, -dm L
48 makes each L characters of the input segment a delimited string
49 where L is a positive integer. This essentially divides the input
50 into character strings of length L.
51 -delimiter -string STR, -dm -str STR
52 uses STR concatenated with a newline character as the string
53 delimiter. The character STR can be any sequence of ASCII
54 characters. It can be preceded by -string -str to distinguish it
55 from an integer or a regular expression.
56
57
58 -delimiter /REGEXP/, -dm /REGEXP/
59 uses REGEXP as a regular expression as the string delimiter.
60 Strings to be sorted are delimited by the characters which match the
61 regular expression. See the description of regular expressions
62 under the qedx command.
63
64
65 Control arguments sort units:
66 -block N, -bk N
67 makes the sort unit a block of N sort strings, where N must be a
68 positive integer. The default for N is 1.
69
70
71 Control arguments default field type:
72 Control arguments in this group are mutually exclusive. If more
73 than one is given, the last specified overrides the others. If
74 none are given, -character is the default.
75 -character, -ch
76 makes the sort based on the character representation of the sort
77 field.
78 -integer, -int
79 makes the sort by converting the sort fields to fixed binary710
80 integers when comparing one sort unit to another. See the Notes
81 section below.
82 -numeric, -num
83 makes the sort by converting the sort fields to float decimal59
84 numbers when comparing one sort unit to another. See the Notes
85 section below.
86
87
88 Control arguments default sort direction:
89 Control arguments in this group are mutually exclusive. If more
90 than one is given, the last specified overrides the others.
91 -ascending, -asc
92 makes the sort in ascending order, according to the ASCII collating
93 sequence. This is the default mode of operation.
94 -descending, -dsc
95 makes the sort in descending order, according to the ASCII collating
96 sequence.
97
98
99 Control arguments default character comparision:
100 Control arguments in this group are mutually exclusive. If more
101 than one is given, the last specified overrides the others.
102 -case_sensitive, -cs
103 makes the sort by comparing sort fields without translating letters
104 to lowercase. This is the default.
105 -non_case_sensitive, -ncs
106 makes the sort by translating letters in the sort fields to
107 lowercase when comparing one sort unit to another. The actual
108 sorted results remain unchanged.
109
110
111 Control arguments duplicates sort units:
112 -duplicates, -dup
113 retains duplicate sort units in the sorted results. This is the
114 default.
115 -only_duplicates, -odup
116 only sort units which occur more than once in the segment appear in
117 the sorted results. One unit from each set of duplicate sort units
118 is placed in the output segment, in sorted order.
119
120
121 -only_duplicate_keys, -odupk
122 only sort units which have duplicate sort fields appear in the
123 sorted results. All such units having duplicate sort fields are
124 placed in the output segment, since the non-sort field portions of
125 the units may differ.
126 -only_unique, -ouq
127 only sort units which are unique appear in the sorted results.
128 Whenever a set of duplicate units are found, they are removed
129 entirely from the output segment.
130
131
132 -only_unique_keys, -ouqk
133 only sort units which have unique sort fields appear in the sorted
134 results. All units having duplicate sort fields are removed
135 entirely from the output segment.
136 -unique, -uq
137 deletes duplicate sort units from the sorted results. For each set
138 of duplicate sort units, only the first appears in the sorted
139 results, along with nonduplicate sort units.
140 -unique_keys, -uqk
141 deletes sort units having duplicate sort fields from the sorted
142 results. For each set of sort units having duplicate fields, only
143 the first appears in the sorted results, along with nonduplicate
144 sort units.
145
146
147 Control arguments sort fields:
148 Choose either -all, or one or more -field specifications.
149 -all, -a
150 makes the primary and only sort field the entire sort unit; i.e.,
151 the entire sort unit is considered when sorting. This is the
152 default mode of operation.
153 -field FIELD_START FIELD_LENGTH -sort_controls,
154 -fl FIELD_START FIELD_LENGTH -sort_controls
155 defines a part within each sort unit that is compared with other
156 sort units during the sort operation. -field may be
157 given several times. The first -field control defines the primary
158 sort field; the second defines a secondary sort field used to
159 distinguish sort units having equal primary sort field values, etc.
160 Sections below give details for FIELD_START FIELD_LENGTH and
161 optional sort control values.
162
163
164 List of field_start formats:
165 The sort field start location may be specified in one of the
166 following formats:
167 S
168 a positive integer, giving the character position of the start of
169 the field in the sort unit eg 1 if the field begins at the first
170 character. If the sort unit contains fewer than S characters, then
171 the unit is sorted as if space characters appeared in the sort
172 field.
173 -from S, -fm S
174 where S is a positive integer giving the character position of the
175 start of the field in the sort unit.
176
177
178 -from STR, -fm STR
179 where STR is a character string which identifies the beginning of
180 the sort field. The field begins with the first character of the
181 sort unit which follows STR. If STR does not appear in the sort
182 unit, then the unit is sorted as if the sort field contains space
183 characters.
184 -from /REGEXP/, -fm /REGEXP/
185 where REGEXP is a regular expression which identifies the beginning
186 of the sort field. The field begins with the first character of the
187 sort unit which follows the part of the sort unit matching REGEXP.
188 See the writeup of the qedx command for the definition of regular
189 expressions. If no match for REGEXP is found in the sort unit, then
190 the unit is sorted as if the sort field contains space characters.
191
192
193 -from -string STR, -fm -str STR
194 treats STR as a character string which identifies the beginning of
195 the sort field, even though STR may look like an integer or a
196 regular expression. For example,
197
198 -from -string 25
199
200 identifies a sort field which begins with the character following
201 "25" in the sort unit.
202
203
204 List of field_length formats:
205 The sort field length may be specified in one of the following ways.
206 L
207 a positive integer, giving the length of the sort field in
208 characters. If the sort unit is too short to hold a sort field of L
209 characters that is if the number of characters from the first
210 character of the sort field to the end of the sort unit is less than
211 L, then the unit is sorted as if the field were extended on the
212 right with space characters to a length of L characters.
213 Alternately, L can be -1 to indicate that the remainder of the sort
214 unit is to be used as the sort field.
215 -for L
216 where L is a positive integer giving the length of the sort field in
217 characters, or -1 to use the remainder of the sort unit as the sort
218 field.
219
220
221 -to E
222 where E is a positive integer giving the character position of the
223 end of the sort field in the sort unit eg 5 if the field stops
224 after the fifth character of the sort unit. If the sort unit
225 contains fewer then E characters, then the unit is sorted as if
226 space characters were added on the right to extend the unit to E
227 characters.
228 -to STR
229 where STR is a character string which identifies the end of the sort
230 field. The field ends with the first character of the sort unit
231 preceding STR. If STR does not appear in the sort unit after the
232 starting position of the sort field, then the unit is sorted as if
233 space characters appeared in the sort field.
234
235
236 -to /REGEXP/
237 where REGEXP is a regular expression which identifies the end of the
238 sort field. The field ends with the first character of the sort
239 unit which precedes the part of the sort unit matching REGEXP. See
240 the writeup of the qedx command for the definition of regular
241 expressions. If no match for REGEXP is found in the sort unit after
242 the starting position of the sort field, then the unit is sorted as
243 if space characters appeared in the sort field.
244 -to -string STR
245 treats STR as a character string which identifies the end of the
246 sort field, even though STR may look like an integer or a regular
247 expression.
248
249
250 Notes on field_length format:
251 When -to is used to specify the end of the field, then sort_seg
252 examines all sort units to determine the length of the longest
253 instance of this sort field in any sort unit. It then sorts units as
254 if the sort field in each unit were extended on the right with space
255 characters to the length of the longest sort field instance.
256
257
258 List of sort_controls:
259 The sort controls may be one from each of the following sets of
260 arguments. If no sort control is given, then the default is
261 specified by the corresponding control argument -ascending or
262 -descending -case_sensitive or -non_case_sensitive -character or
263 -integer or -numeric.
264 ascending, asc
265 sort units with this field in ascending order. This sort control is
266 incompatible with descending.
267 descending, dsc
268 sort units with this field in descending order. This sort control
269 is incompatible with ascending.
270
271
272 non_case_sensitive, ncs
273 sort units by translating this field to lowercase. This sort
274 control is incompatible with case_sensitive.
275 case_sensitive, cs
276 sort units by treating uppercase letters in this field as being
277 different from lowercase letters. This sort control is incompatible
278 with non_case_sensitive.
279
280
281 character, ch
282 sort units with this field by the character representation. This
283 sort control is incompatible with integer or numeric.
284 integer, int
285 sort units with this field by converting the character
286 representation to its integer value fixed binary710. This sort
287 control is incompatible with character or numeric.
288 numeric, num
289 sort units with this field by converting the character
290 representation to its numeric value float decimal59. This sort
291 control is incompatible with character or integer.
292
293
294 Notes:
295 The input segment is sorted using the following procedure. Using the
296 control arguments, the segment is broken down into separate sort
297 units, which are strings or blocks of strings. A string can be
298 composed of one or more lines. These sort units are then ordered.
299 The re-ordered units either replace the original segment or are stored
300 in a new segment.
301
302
303 If the sort_seg command is invoked without any control_args, the
304 -replace, -ascending, -all, -character, and -delimiter control
305 arguments are assumed, and the default delimiter of a newline character
306 is used. That is, the sort_seg command, when invoked with path as the
307 only argument, sorts the lines of that segment as character strings in
308 ascending ASCII collating sequence, replacing the original segment with
309 the sorted result. As a safety measure, the following question is
310 asked when -replace is not specified:
311
312 Do you really want to sort the contents of PATH?
313
314 This helps avoid accidental sorting of segments.
315
316
317 The start position of a sort field is calculated relative to the
318 beginning of a sort unit. If the blocking factor is N = 1, the start
319 position is calculated relative to the beginning of a string. If the
320 blocking factor is N > 1, the start position is calculated relative to
321 the beginning of the first string of a block. When calculating field
322 specifications within a sort unit of N > 1 strings blocking factor
323 N > 1, string delimiters internal to the sort unit should not be
324 considered. See the Examples section below.
325
326
327 Sort fields/units of unequal length are compared by assuming the
328 shorter field/unit to be padded on the right with space characters,
329 immediately following the rightmost character. If a field/unit
330 contains non-printing graphic characters such as BS HT NL VT FF
331 CR etc. which precede the space character in ASCII collating
332 sequence, they will be sorted accordingly, with sometimes unexpected
333 results. The string delimiter is never considered when padding. See
334 the Examples section below.
335
336
337 The numeric sort mode converts the sort field character string to a
338 float decimal59 value for sorting purposes. Similarly, the integer
339 sort mode converts the sort field character string to a fixed bin710
340 value. The character string representation must be acceptable to the
341 PL/I or Fortran language conversion rules. The actual sort field
342 remains unchanged in the sorted results.
343
344
345 If characters are detected in the input segment following the final
346 delimited sort unit, they are ignored for the purposes of sorting, but
347 appear in the sorted output immediately following the final delimited
348 sort unit. An error message specifies the location of the first
349 nondelimited character.
350
351
352 A maximum of 261,119 units can be sorted. The sort is stable, i.e.,
353 duplicate units appear in the same order in the sorted segment as in
354 the original segment.
355
356
357 The input segment is sorted using temporary segments in the process
358 directory. If the -output_file control argument is specified, and path
359 is the pathname of an already existing segment, its contents are
360 destroyed upon beginning the sort. If the sorted results are to
361 replace the original contents of the input file, that replacement does
362 not occur until the last possible moment.
363
364
365 The -unique control argument deletes duplicate sort units from the
366 sorted results. The determination of whether or not a sort unit is to
367 be deleted is independent of sort field specifications; i.e., given a
368 number of nonidentical sort units that contain identical sort fields,
369 all the units do appear in the sorted results.
370
371
372 Examples:
373 Suppose a segment contains the following lines where nl represents the
374 ASCII newline character:
375 ABCDEFGHXYnl
376 ABCDEFXYnl
377 ABCDEFGHIJXYnl
378 ABCXYnl
379
380 The display below shows how the sort_seg command sorts the contents of
381 this segment, according to the arguments specified in the first column
382 nl stands for the ASCII newline character and # stands for the ASCII
383 space character.
384
385
386 these | define these | sorted on | giving
387 arguments | sort units | these fields | these results
388 --------------|------------------|----------------|--------------
389 -dm XY |ABCDEFGH |ABCDEFGH## |ABCXYnl
390 |ABCDEF |ABCDEF#### |ABCDEFXYnl
391 |ABCDEFGHIJ |ABCDEFGHIJ |ABCDEFGHXYnl
392 |ABC |ABC####### |ABCDEFGHIJXYnl
393 --------------|------------------|----------------|--------------
394 -bk 2 |ABCDEFGHABCDEF |ABCDEFGHABCDEF |ABCDEFGHXYnl
395 -dm XY |ABCDEFGHIJABC |ABCDEFGHIJABC# |ABCDEFXYnl
396 | | |ABCDEFGHIJXYnl
397 | | |ABCXYnl
398 --------------|------------------|----------------|--------------
399 -fl 6 4 |ABCDEFGHXY |FGHX |ABCXYnl
400 |ABCDEFXY |FXY# |ABCDEFGHIJXYnl
401 |ABCDEFGHIJXY |FGHI |ABCDEFGHXYnl
402 |ABCXY |#### |ABCDEFXYnl
403 --------------|------------------|----------------|--------------
404 | |first second |
405 -fl 1 4 7 2 |ABCDEFGHXY |ABCD GH |ABCDEFGHXYnl
406 |ABCDEFXY |ABCD XY |ABCDEFGHIJXYnl
407 |ABCDEFGHIJXY |ABCD GH |ABCDEFXYnl
408 |ABCXY |ABCX ## |ABCXYnl
409 --------------|------------------|----------------|--------------
410 -dm Y |ABCDEFGHXABCDEFX |FGHX DE |ABCDEFGHIJXYnl
411 -bk 2 |ABCDEFGHIJXABCX |FGHI DE |ABCXYnl
412 -fl 6 4 4 2 | | |
413 | | |ABCDEFGHIJXABCX
414 | | |ABCDEFXYnl
415 --------------|------------------|----------------|--------------
416 | |first second |
417 -fl 6 4 dsc |ABCDEFGHXY |FGHX CDE |ABCDEFXYnl
418 3 3 asc |ABCDEFXY |FXY# CDE |ABCDEFGHXYnl
419 |ABCDEFGHIJXY |FGHI CDE |ABCDEFGHIJXYnl
420 |ABCXY |#### CXY |ABCXYnl
421 --------------|------------------|----------------|--------------
422 -fl 1 3 |ABCDEFGH |ABC |ABCDEFGHXYnl
423 -unique_key |ABCDEF |ABC |
424 -dm XY |ABCDEFGHIJ |ABC |
425 |ABC |ABC |
426 --------------|------------------|----------------|--------------
427 | |first second |
428 -fl 1 3 5 2 |ABCDEFGH |ABC EF |ABCDEFGHXYnl
429 -odupk |ABCDEF |ABC EF |ABCDEFXYnl
430 -dm XY |ABCDEFGHIJ |ABC EF |ABCDEFGHIJXYnl
431 |ABC |ABC ## |
432
433
434 :hcom:
435
436
437
438 /****^ HISTORY COMMENTS:
439 1) change2020-12-16GDixon, approve2021-02-22MCR10088,
440 audit2021-05-27Swenson, install2021-05-27MR12.6g-0056:
441 A) Change "Syntax:" to "Syntax as a command:" in command info seg
442 last changed after 1984.
443 B) Correct problems found by verify_info.
444 C) Reorganize control arguments into separate sections by
445 category. Explain defaults and limits at the start of each
446 control argument section.
447 END HISTORY COMMENTS */
448
449
450