1 02/06/84 lex_string_
2
3 The lex_string_ subroutine provides a facility for parsing an ASCII
4 character string into tokens character strings delimited by break
5 characters and statements groups of tokens. It supports the parsing
6 of comments and quoted strings. It parses an entire character string
7 during one invocation, creating a chain of descriptors for the tokens
8 and statements in a temporary segment. The cost per token of
9 lex_string_ is significantly lower than that of parse_file_ because the
10 overhead of calling parse_file_ to obtain each token is eliminated.
11 Therefore, the lex_string_ subroutine is recommended for translators
12 that deal with moderate to large amounts of input.
13
14
15 The descriptors generated when the lex_string_ subroutine parses a
16 character string can be used as input to translators generated by the
17 reduction_compiler command, as well as in other applications. In
18 addition, the information in the statement and token descriptors can be
19 used in error messages printed by the lex_error_ subroutine.
20
21 Refer to the Subroutines manual for details on the operation of the
22 lex_string_ subroutine.
23
24
25 Entry points in lex_string_:
26 List is generated by the help command
27
28
29 :Entry: init_lex_delims: 02/06/84 lex_string_$init_lex_delims
30
31
32 Function: constructs two character strings from the set of break
33 characters and comment, quoting, and statement delimiters: one string
34 contains the first character of every delimiter or break character
35 defined by the language to be parsed; the second string contains a
36 character of control information for each character in the first
37 string. These two character strings form the break tables that the
38 lex_string_ subroutine uses to parse an input string. It is intended
39 that these two delimiter and control character strings be internal
40 static variables of the program that calls lex_string_, and that they
41 be initialized only once per process. They can then be used in
42 successive calls to lex_string_$lex.
43
44
45 Syntax:
46 declare lex_string_$init_lex_delims entry char* char* char*
47 char* char* bit* char* varying aligned
48 char* varying aligned char* varying aligned
49 char* varying aligned;
50 call lex_string_$init_lex_delims quote_open quote_close
51 comment_open comment_close statement_delim Sinit break_chars
52 ignored_break_chars lex_delims lex_control_chars;
53
54
55 Arguments:
56 quote_open
57 is the character string delimiter that begins a quoted string.
58 Input. It can contain up to four characters. If it is a null
59 character string, then quoted strings are not supported during the
60 parsing of an input string.
61 quote_close
62 is the character string delimiter that ends a quoted string.
63 Input. It can be the same character string as quote_open, and can
64 contain up to four characters.
65 comment_open
66 is the character string delimiter that begins a comment. Input.
67 It can contain up to four characters. If it is a null character
68 string, then comments are not supported during the parsing of a
69 character string.
70
71
72 comment_close
73 is the character string delimiter that ends a comment. Input. It
74 can be the same character string as comment_open, and can contain up
75 to four characters.
76 statement_delim
77 is the character string delimiter that ends a statement. Input.
78 It can contain up to four characters. If it is a null character
79 string, then statements are not delimited during the parsing of a
80 character string.
81
82
83 Sinit
84 is a bit string that controls the creation of statement descriptors
85 and token descriptors for quoting delimiters. Input The bit
86 string consists of two bits in the order listed below.
87 Ssuppress_quoting_delims
88 is "1"b if token descriptors for the quote opening and closing
89 delimiters of a quoted string are to be suppressed. A token
90 descriptor is still created for the quoted string itself, and the
91 quoted_string switch in this descriptor is turned on. If
92 Ssuppress_quoting_delims is "0"b, then token descriptors are
93 returned for the quote opening and closing delimiters, as well as
94 for the quoted string.
95
96
97 Ssuppress_stmt_delims
98 is "1"b if the token descriptor for a statement delimiter is to
99 be suppressed. The end_of_stmt switch in the descriptor of the
100 token that precedes the statement delimiter is turned on,
101 instead. If Ssuppress_stmt_delims is "0"b, then a token
102 descriptor is returned for a statement delimiter, and the
103 end_of_stmt switch in this descriptor is turned on.
104
105
106 break_chars
107 is a character string containing all of the characters that can be
108 used to delimit tokens. Input. The string can include characters
109 used also in the quoting, comment, or statement delimiters, and
110 should include any ASCII control characters that are to be treated
111 as delimiters.
112 ignored_break_chars
113 is a character string containing all of the break_chars that can be
114 used to delimit tokens but that are not tokens themselves. Input.
115 No token descriptors are created for these characters.
116
117
118 lex_delims
119 is an output character string containing all of the delimiters that
120 the lex_string_ subroutine uses to parse an input string. Output
121 This string is constructed by the init_lex_delims entry from the
122 preceding arguments. It must be long enough to contain all of the
123 break_chars, plus the first character of the quote_open delimiter,
124 the comment_open delimiter, and the statement_delim delimiter, plus
125 30 additional characters. This length must not exceed 128
126 characters, the number of characters in the ASCII character set.
127 lex_control_chars
128 is an output character string containing one character of control
129 information for each character in lex_delims. Output. This
130 string is also constructed by init_lex_delims from the preceding
131 arguments. It must be as long as lex_delims.
132
133
134 :Entry: lex: 02/06/84 lex_string_$lex
135
136
137 Function: parses an input string according to the delimiters, break
138 characters, and control information given as its arguments. The input
139 string consists of two parts: the first part is a set of characters,
140 which are to be ignored by the parser except for the counting of
141 lines; the second part is the characters to be parsed. It is
142 necessary to count lines in the part that is otherwise ignored so that
143 accurate line numbers can be stored in the token and statement
144 descriptors for the parsed section of the string.
145
146
147 Syntax:
148 declare lex_string_$lex entry ptr fixed bin21 fixed bin21 ptr
149 bit* char* char* char* char* char*
150 char* varying aligned char* varying aligned
151 char* varying aligned char* varying aligned ptr ptr
152 fixed bin35;
153 call lex_string_$lex entry Pinput Linput Lignored_input Psegment
154 Slex quote_open quote_close comment_open comment_close
155 statement_delim break_chars ignored_break_chars lex_delims
156 lex_control_chars Pfirst_stmt_desc Pfirst_token_desc code;
157
158
159 Arguments:
160 Pinput
161 is a pointer to the string to be parsed. Input
162 Linput
163 is the length in characters of the second part of the input
164 string, the part that is actually to be parsed. Input
165 Lignored_input
166 is the length in characters of the first part of the input string,
167 the part that is ignored except for line counting. Input. This
168 length can be 0 if none of the input characters are to be ignored.
169 Psegment
170 is a pointer to a temporary segment created by the translator_temp_
171 subroutine. Input
172
173
174 SLex
175 is a bit string that controls the creation of statement and comment
176 descriptors, the handling of doubled quotes within a quoted string,
177 and the interpretation of a comment_close delimiter that equals the
178 statement_delim. Input. The bit string consists of four bits:
179 Sstatement_desc
180 is "1"b if statement descriptors are to be created along with the
181 token descriptors. If Sstatement_desc is "0"b, or if the
182 statement delimiter is a null character string, then no statement
183 descriptors are created.
184 Sscomment_desc
185 is "1"b if comment descriptors are to be created for any comments
186 that appear in the input string. When Scomment_desc is "0"b,
187 comment_open is a null character string, or statement descriptors
188 are not being created, then no comment descriptors are created.
189
190
191 Sretain_doubled_quotes
192 is "1"b if doubled quote_close delimiters that appear within a
193 quoted string are to be retained. If Sretain_doubled_quotes is
194 "0"b, then a copy of each quoted string containing doubled
195 quote_close delimiters is created in the temporary segment with
196 all doubled quote_close delimiters changed to single quote_close
197 delimiters.
198 Sequate_comment_close_stmt_delim
199 is "1"b if the comment_close and statement_delim character
200 strings are the same, and if the closing of a comment is to be
201 treated as the ending of the statement containing the comment.
202 It could be used when parsing line-oriented languages that have
203 only one statement per line and one comment per statement.
204
205
206 quote_open
207 is the character string delimiter that begins a quoted string.
208 Input. It can contain up to four characters. If it is a null
209 character string, then quoted strings are not supported during the
210 parsing of an input string.
211 quote_close
212 is the character string delimiter that ends a quoted string.
213 Input. It can be the same character string as quote_open, and can
214 contain up to four characters.
215 comment_open
216 is the character string delimiter that begins a comment. Input.
217 It can contain up to four characters. If it is a null character
218 string, then comments are not supported during the parsing of a
219 character string.
220
221
222 comment_close
223 is the character string delimiter that ends a comment. Input. It
224 can be the same character string as comment_open, and can contain up
225 to four characters.
226 statement_delim
227 is the character string delimiter that ends a statement. Input.
228 It can contain up to four characters. If it is a null character
229 string, then statements are not delimited during the parsing of a
230 character string.
231 break_chars
232 is a character string containing all of the characters that can be
233 used to delimit tokens. Input. The string can include characters
234 used also in the quoting, comment, or statement delimiters, and
235 should include any ASCII control characters that are to be treated
236 as delimiters.
237
238
239 ignored_break_chars
240 is a character string containing all of the break_chars that can be
241 used to delimit tokens but that are not tokens themselves. Input.
242 No token descriptors are created for these characters.
243 lex_delims
244 is the character string initialized by lex_string_$init_lex_delims.
245 Input
246 lex_control_chars
247 is the character string initialized by lex_string_$init_lex_delims.
248 Input
249 Pfirst_stmt_desc
250 is a pointer to the first in the chain of statement descriptors.
251 Output. This is a null pointer on return if no statement
252 descriptors have been created.
253
254
255 Pfirst_token_desc
256 is a pointer to the first in the chain of token descriptors.
257 Output. This is a null pointer on return if no tokens were found
258 in the input string.
259
260
261 code
262 is one of the following status codes: Output
263 0
264 the parsing was completed successfully.
265 error_table_$zero_length_seg
266 no tokens were found in the input string.
267 error_table_$no_stmt_delim
268 the input string did not end with a statement delimiter, when
269 statement delimiters were used in the parsing.
270 error_table_$unbalanced_quotes
271 the input string ended with a quoted string that was not
272 terminated by a quote_close delimiter.