summaryrefslogtreecommitdiffstats
path: root/man1p/lex.1p
diff options
context:
space:
mode:
Diffstat (limited to 'man1p/lex.1p')
-rw-r--r--man1p/lex.1p954
1 files changed, 954 insertions, 0 deletions
diff --git a/man1p/lex.1p b/man1p/lex.1p
new file mode 100644
index 000000000..6d84cb9fa
--- /dev/null
+++ b/man1p/lex.1p
@@ -0,0 +1,954 @@
+.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved
+.TH "LEX" P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual"
+.\" lex
+.SH NAME
+lex \- generate programs for lexical tasks (\fBDEVELOPMENT\fP)
+.SH SYNOPSIS
+.LP
+\fBlex\fP \fB[\fP\fB-t\fP\fB][\fP\fB-n|-v\fP\fB][\fP\fIfile\fP \fB...\fP\fB]\fP\fB\fP
+.SH DESCRIPTION
+.LP
+The \fIlex\fP utility shall generate C programs to be used in lexical
+processing of character input, and that can be used as an
+interface to \fIyacc\fP. The C programs shall be generated from \fIlex\fP
+source code and
+conform to the ISO\ C standard. Usually, the \fIlex\fP utility shall
+write the program it generates to the file
+\fBlex.yy.c\fP; the state of this file is unspecified if \fIlex\fP
+exits with a non-zero exit status. See the EXTENDED
+DESCRIPTION section for a complete description of the \fIlex\fP input
+language.
+.SH OPTIONS
+.LP
+The \fIlex\fP utility shall conform to the Base Definitions volume
+of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
+.LP
+The following options shall be supported:
+.TP 7
+\fB-n\fP
+Suppress the summary of statistics usually written with the \fB-v\fP
+option. If no table sizes are specified in the \fIlex\fP
+source code and the \fB-v\fP option is not specified, then \fB-n\fP
+is implied.
+.TP 7
+\fB-t\fP
+Write the resulting program to standard output instead of \fBlex.yy.c\fP.
+.TP 7
+\fB-v\fP
+Write a summary of \fIlex\fP statistics to the standard output. (See
+the discussion of \fIlex\fP table sizes in Definitions in lex .) If
+the \fB-t\fP option is specified and \fB-n\fP is not specified, this
+report shall
+be written to standard error. If table sizes are specified in the
+\fIlex\fP source code, and if the \fB-n\fP option is not
+specified, the \fB-v\fP option may be enabled.
+.sp
+.SH OPERANDS
+.LP
+The following operand shall be supported:
+.TP 7
+\fIfile\fP
+A pathname of an input file. If more than one such \fIfile\fP is specified,
+all files shall be concatenated to produce a
+single \fIlex\fP program. If no \fIfile\fP operands are specified,
+or if a \fIfile\fP operand is \fB'-'\fP , the standard
+input shall be used.
+.sp
+.SH STDIN
+.LP
+The standard input shall be used if no \fIfile\fP operands are specified,
+or if a \fIfile\fP operand is \fB'-'\fP . See
+INPUT FILES.
+.SH INPUT FILES
+.LP
+The input files shall be text files containing \fIlex\fP source code,
+as described in the EXTENDED DESCRIPTION section.
+.SH ENVIRONMENT VARIABLES
+.LP
+The following environment variables shall affect the execution of
+\fIlex\fP:
+.TP 7
+\fILANG\fP
+Provide a default value for the internationalization variables that
+are unset or null. (See the Base Definitions volume of
+IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables
+for
+the precedence of internationalization variables used to determine
+the values of locale categories.)
+.TP 7
+\fILC_ALL\fP
+If set to a non-empty string value, override the values of all the
+other internationalization variables.
+.TP 7
+\fILC_COLLATE\fP
+.sp
+Determine the locale for the behavior of ranges, equivalence classes,
+and multi-character collating elements within regular
+expressions. If this variable is not set to the POSIX locale, the
+results are unspecified.
+.TP 7
+\fILC_CTYPE\fP
+Determine the locale for the interpretation of sequences of bytes
+of text data as characters (for example, single-byte as
+opposed to multi-byte characters in arguments and input files), and
+the behavior of character classes within regular expressions.
+If this variable is not set to the POSIX locale, the results are unspecified.
+.TP 7
+\fILC_MESSAGES\fP
+Determine the locale that should be used to affect the format and
+contents of diagnostic messages written to standard
+error.
+.TP 7
+\fINLSPATH\fP
+Determine the location of message catalogs for the processing of \fILC_MESSAGES
+\&.\fP
+.sp
+.SH ASYNCHRONOUS EVENTS
+.LP
+Default.
+.SH STDOUT
+.LP
+If the \fB-t\fP option is specified, the text file of C source code
+output of \fIlex\fP shall be written to standard
+output.
+.LP
+If the \fB-t\fP option is not specified:
+.IP " *" 3
+Implementation-defined informational, error, and warning messages
+concerning the contents of \fIlex\fP source code input shall
+be written to either the standard output or standard error.
+.LP
+.IP " *" 3
+If the \fB-v\fP option is specified and the \fB-n\fP option is not
+specified, \fIlex\fP statistics shall also be written to
+either the standard output or standard error, in an implementation-defined
+format. These statistics may also be generated if table
+sizes are specified with a \fB'%'\fP operator in the \fIDefinitions\fP
+section, as long as the \fB-n\fP option is not
+specified.
+.LP
+.SH STDERR
+.LP
+If the \fB-t\fP option is specified, implementation-defined informational,
+error, and warning messages concerning the contents
+of \fIlex\fP source code input shall be written to the standard error.
+.LP
+If the \fB-t\fP option is not specified:
+.IP " 1." 4
+Implementation-defined informational, error, and warning messages
+concerning the contents of \fIlex\fP source code input shall
+be written to either the standard output or standard error.
+.LP
+.IP " 2." 4
+If the \fB-v\fP option is specified and the \fB-n\fP option is not
+specified, \fIlex\fP statistics shall also be written to
+either the standard output or standard error, in an implementation-defined
+format. These statistics may also be generated if table
+sizes are specified with a \fB'%'\fP operator in the \fIDefinitions\fP
+section, as long as the \fB-n\fP option is not
+specified.
+.LP
+.SH OUTPUT FILES
+.LP
+A text file containing C source code shall be written to \fBlex.yy.c\fP,
+or to the standard output if the \fB-t\fP option is
+present.
+.SH EXTENDED DESCRIPTION
+.LP
+Each input file shall contain \fIlex\fP source code, which is a table
+of regular expressions with corresponding actions in the
+form of C program fragments.
+.LP
+When \fBlex.yy.c\fP is compiled and linked with the \fIlex\fP library
+(using the \fB-l\ l\fP operand with \fIc99\fP), the resulting program
+shall read character input from the standard input and shall
+partition it into strings that match the given expressions.
+.LP
+When an expression is matched, these actions shall occur:
+.IP " *" 3
+The input string that was matched shall be left in \fIyytext\fP as
+a null-terminated string; \fIyytext\fP shall either be an
+external character array or a pointer to a character string. As explained
+in Definitions in lex ,
+the type can be explicitly selected using the \fB%array\fP or \fB%pointer\fP
+declarations, but the default is
+implementation-defined.
+.LP
+.IP " *" 3
+The external \fBint\fP \fIyyleng\fP shall be set to the length of
+the matching string.
+.LP
+.IP " *" 3
+The expression's corresponding program fragment, or action, shall
+be executed.
+.LP
+.LP
+During pattern matching, \fIlex\fP shall search the set of patterns
+for the single longest possible match. Among rules that
+match the same number of characters, the rule given first shall be
+chosen.
+.LP
+The general format of \fIlex\fP source shall be:
+.sp
+.RS
+.nf
+
+\fIDefinitions\fP
+\fB%%\fP
+\fIRules\fP
+\fB%%\fP
+\fIUser\fPSubroutines
+.fi
+.RE
+.LP
+The first \fB"%%"\fP is required to mark the beginning of the rules
+(regular expressions and actions); the second
+\fB"%%"\fP is required only if user subroutines follow.
+.LP
+Any line in the \fIDefinitions\fP section beginning with a <blank>
+shall be assumed to be a C program fragment and shall
+be copied to the external definition area of the \fBlex.yy.c\fP file.
+Similarly, anything in the \fIDefinitions\fP section
+included between delimiter lines containing only \fB"%{"\fP and \fB"%}"\fP
+shall also be copied unchanged to the external
+definition area of the \fBlex.yy.c\fP file.
+.LP
+Any such input (beginning with a <blank> or within \fB"%{"\fP and
+\fB"%}"\fP delimiter lines) appearing at the
+beginning of the \fIRules\fP section before any rules are specified
+shall be written to \fBlex.yy.c\fP after the declarations of
+variables for the \fIyylex\fP() function and before the first line
+of code in \fIyylex\fP(). Thus, user variables local to
+\fIyylex\fP() can be declared here, as well as application code to
+execute upon entry to \fIyylex\fP().
+.LP
+The action taken by \fIlex\fP when encountering any input beginning
+with a <blank> or within \fB"%{"\fP and
+\fB"%}"\fP delimiter lines appearing in the \fIRules\fP section but
+coming after one or more rules is undefined. The presence
+of such input may result in an erroneous definition of the \fIyylex\fP()
+function.
+.SS Definitions in lex
+.LP
+\fIDefinitions\fP appear before the first \fB"%%"\fP delimiter. Any
+line in this section not contained between \fB"%{"\fP
+and \fB"%}"\fP lines and not beginning with a <blank> shall be assumed
+to define a \fIlex\fP substitution string. The
+format of these lines shall be:
+.sp
+.RS
+.nf
+
+\fIname substitute\fP
+.fi
+.RE
+.LP
+If a \fIname\fP does not meet the requirements for identifiers in
+the ISO\ C standard, the result is undefined. The string
+\fIsubstitute\fP shall replace the string { \fIname\fP} when it is
+used in a rule. The \fIname\fP string shall be recognized in
+this context only when the braces are provided and when it does not
+appear within a bracket expression or within double-quotes.
+.LP
+In the \fIDefinitions\fP section, any line beginning with a \fB'%'\fP
+(percent sign) character and followed by an
+alphanumeric word beginning with either \fB's'\fP or \fB'S'\fP shall
+define a set of start conditions. Any line beginning
+with a \fB'%'\fP followed by a word beginning with either \fB'x'\fP
+or \fB'X'\fP shall define a set of exclusive start
+conditions. When the generated scanner is in a \fB%s\fP state, patterns
+with no state specified shall be also active; in a
+\fB%x\fP state, such patterns shall not be active. The rest of the
+line, after the first word, shall be considered to be one or
+more <blank>-separated names of start conditions. Start condition
+names shall be constructed in the same way as definition
+names. Start conditions can be used to restrict the matching of regular
+expressions to one or more states as described in Regular Expressions
+in lex .
+.LP
+Implementations shall accept either of the following two mutually-exclusive
+declarations in the \fIDefinitions\fP section:
+.TP 7
+\fB%array\fP
+Declare the type of \fIyytext\fP to be a null-terminated character
+array.
+.TP 7
+\fB%pointer\fP
+Declare the type of \fIyytext\fP to be a pointer to a null-terminated
+character string.
+.sp
+.LP
+The default type of \fIyytext\fP is implementation-defined. If an
+application refers to \fIyytext\fP outside of the scanner
+source file (that is, via an \fBextern\fP), the application shall
+include the appropriate \fB%array\fP or \fB%pointer\fP
+declaration in the scanner source file.
+.LP
+Implementations shall accept declarations in the \fIDefinitions\fP
+section for setting certain internal table sizes. The
+declarations are shown in the following table.
+.sp
+.ce 1
+\fBTable: Table Size Declarations in \fIlex\fP\fP
+.TS C
+center; l2 l2 l.
+\fBDeclaration\fP \fBDescription\fP \fBMinimum Value\fP
+%\fBp\fP \fIn\fP Number of positions 2500
+%\fBn\fP \fIn\fP Number of states 500
+%\fBa\fP \fIn\fP Number of transitions 2000
+%\fBe\fP \fIn\fP Number of parse tree nodes 1000
+%\fBk\fP \fIn\fP Number of packed character classes 1000
+%\fBo\fP \fIn\fP Size of the output array 3000
+.TE
+.LP
+In the table, \fIn\fP represents a positive decimal integer, preceded
+by one or more <blank>s. The exact meaning of these
+table size numbers is implementation-defined. The implementation shall
+document how these numbers affect the \fIlex\fP utility and
+how they are related to any output that may be generated by the implementation
+should limitations be encountered during the
+execution of \fIlex\fP. It shall be possible to determine from this
+output which of the table size values needs to be modified to
+permit \fIlex\fP to successfully generate tables for the input language.
+The values in the column Minimum Value represent the
+lowest values conforming implementations shall provide.
+.SS Rules in lex
+.LP
+The rules in \fIlex\fP source files are a table in which the left
+column contains regular expressions and the right column
+contains actions (C program fragments) to be executed when the expressions
+are recognized.
+.sp
+.RS
+.nf
+
+\fIERE action
+ERE action\fP\fB...
+\fP
+.fi
+.RE
+.LP
+The extended regular expression (ERE) portion of a row shall be separated
+from \fIaction\fP by one or more <blank>s. A
+regular expression containing <blank>s shall be recognized under one
+of the following conditions:
+.IP " *" 3
+The entire expression appears within double-quotes.
+.LP
+.IP " *" 3
+The <blank>s appear within double-quotes or square brackets.
+.LP
+.IP " *" 3
+Each <blank> is preceded by a backslash character.
+.LP
+.SS User Subroutines in lex
+.LP
+Anything in the user subroutines section shall be copied to \fBlex.yy.c\fP
+following \fIyylex\fP().
+.SS Regular Expressions in lex
+.LP
+The \fIlex\fP utility shall support the set of extended regular expressions
+(see the Base Definitions volume of
+IEEE\ Std\ 1003.1-2001, Section 9.4, Extended Regular Expressions),
+with the following additions and exceptions to the syntax:
+.TP 7
+\fB"..."\fP
+Any string enclosed in double-quotes shall represent the characters
+within the double-quotes as themselves, except that
+backslash escapes (which appear in the following table) shall be recognized.
+Any backslash-escape sequence shall be terminated by
+the closing quote. For example, \fB"\\01"\fP \fB"1"\fP represents
+a single string: the octal value 1 followed by the character
+\fB'1'\fP .
+.TP 7
+<\fIstate\fP>\fIr\fP,\ <\fIstate1,state2,\fP...>\fIr\fP
+.sp
+The regular expression \fIr\fP shall be matched only when the program
+is in one of the start conditions indicated by \fIstate\fP,
+\fIstate1\fP, and so on; see Actions in lex . (As an exception to
+the typographical conventions of
+the rest of this volume of IEEE\ Std\ 1003.1-2001, in this case <\fIstate\fP>
+does not represent a metavariable, but
+the literal angle-bracket characters surrounding a symbol.) The start
+condition shall be recognized as such only at the beginning
+of a regular expression.
+.TP 7
+\fIr\fP/\fIx\fP
+The regular expression \fIr\fP shall be matched only if it is followed
+by an occurrence of regular expression \fIx\fP (
+\fIx\fP is the instance of trailing context, further defined below).
+The token returned in \fIyytext\fP shall only match
+\fIr\fP. If the trailing portion of \fIr\fP matches the beginning
+of \fIx\fP, the result is unspecified. The \fIr\fP expression
+cannot include further trailing context or the \fB'$'\fP (match-end-of-line)
+operator; \fIx\fP cannot include the \fB'^'\fP
+(match-beginning-of-line) operator, nor trailing context, nor the
+\fB'$'\fP operator. That is, only one occurrence of trailing
+context is allowed in a \fIlex\fP regular expression, and the \fB'^'\fP
+operator only can be used at the beginning of such an
+expression.
+.TP 7
+{\fIname\fP}
+When \fIname\fP is one of the substitution symbols from the \fIDefinitions\fP
+section, the string, including the enclosing
+braces, shall be replaced by the \fIsubstitute\fP value. The \fIsubstitute\fP
+value shall be treated in the extended regular
+expression as if it were enclosed in parentheses. No substitution
+shall occur if { \fIname\fP} occurs within a bracket expression
+or within double-quotes.
+.sp
+.LP
+Within an ERE, a backslash character shall be considered to begin
+an escape sequence as specified in the table in the Base
+Definitions volume of IEEE\ Std\ 1003.1-2001, Chapter 5, File Format
+Notation (
+\fB'\\\\'\fP , \fB'\\a'\fP , \fB'\\b'\fP , \fB'\\f'\fP , \fB'\\n'\fP
+, \fB'\\r'\fP , \fB'\\t'\fP , \fB'\\v'\fP ). In
+addition, the escape sequences in the following table shall be recognized.
+.LP
+A literal <newline> cannot occur within an ERE; the escape sequence
+\fB'\\n'\fP can be used to represent a
+<newline>. A <newline> shall not be matched by a period operator.
+.br
+.sp
+.ce 1
+\fBTable: Escape Sequences in \fIlex\fP\fP
+.TS C
+center; l1 lw(30)1 lw(30).
+\fBEscape\fP T{
+.na
+\fB\ \fP
+.ad
+T} T{
+.na
+\fB\ \fP
+.ad
+T}
+\fBSequence\fP T{
+.na
+\fBDescription\fP
+.ad
+T} T{
+.na
+\fBMeaning\fP
+.ad
+T}
+\\\fIdigits\fP T{
+.na
+A backslash character followed by the longest sequence of one, two, or three octal-digit characters (01234567). If all of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.
+.ad
+T} T{
+.na
+The character whose encoding is represented by the one, two, or three-digit octal integer. If the size of a byte on the system is greater than nine bits, the valid escape sequence used to represent a byte is implementation-defined. Multi-byte characters require multiple, concatenated escape sequences of this type, including the leading \fB'\\'\fP for each byte.
+.ad
+T}
+\\x\fIdigits\fP T{
+.na
+A backslash character followed by the longest sequence of hexadecimal-digit characters (01234567abcdefABCDEF). If all of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.
+.ad
+T} T{
+.na
+The character whose encoding is represented by the hexadecimal integer.
+.ad
+T}
+\\c T{
+.na
+A backslash character followed by any character not described in this table or in the table in the Base Definitions volume of IEEE\ Std\ 1003.1-2001, Chapter 5, File Format Notation ( \fB'\\\\'\fP , \fB'\\a'\fP , \fB'\\b'\fP , \fB'\\f'\fP , \fB'\\n'\fP , \fB'\\r'\fP , \fB'\\t'\fP , \fB'\\v'\fP ).
+.ad
+T} T{
+.na
+The character \fB'c'\fP , unchanged.
+.ad
+T}
+.TE
+.TP 7
+\fBNote:\fP
+If a \fB'\\x'\fP sequence needs to be immediately followed by a hexadecimal
+digit character, a sequence such as
+\fB"\\x1"\fP \fB"1"\fP can be used, which represents a character containing
+the value 1, followed by the character
+\fB'1'\fP .
+.sp
+.LP
+The order of precedence given to extended regular expressions for
+\fIlex\fP differs from that specified in the Base Definitions
+volume of IEEE\ Std\ 1003.1-2001, Section 9.4, Extended Regular
+Expressions. The order of precedence for \fIlex\fP shall be as shown
+in the following table, from high to low.
+.TP 7
+\fBNote:\fP
+The escaped characters entry is not meant to imply that these are
+operators, but they are included in the table to show their
+relationships to the true operators. The start condition, trailing
+context, and anchoring notations have been omitted from the
+table because of the placement restrictions described in this section;
+they can only appear at the beginning or ending of an
+ERE.
+.sp
+.sp
+.sp
+.ce 1
+\fBTable: ERE Precedence in \fIlex\fP\fP
+.TS C
+center; l2 l.
+\fBExtended Regular Expression\fP \fBPrecedence\fP
+collation-related bracket symbols [= =] [: :] [. .]
+escaped characters \\<\fIspecial character\fP>
+bracket expression [ ]
+quoting "..."
+grouping ( )
+definition {\fIname\fP}
+single-character RE duplication * + ?
+concatenation \
+interval expression {m,n}
+alternation |
+.TE
+.LP
+The ERE anchoring operators \fB'^'\fP and \fB'$'\fP do not appear
+in the table. With \fIlex\fP regular expressions, these
+operators are restricted in their use: the \fB'^'\fP operator can
+only be used at the beginning of an entire regular expression,
+and the \fB'$'\fP operator only at the end. The operators apply to
+the entire regular expression. Thus, for example, the pattern
+\fB"(^abc)|(def$)"\fP is undefined; it can instead be written as two
+separate rules, one with the regular expression
+\fB"^abc"\fP and one with \fB"def$"\fP , which share a common action
+via the special \fB'|'\fP action (see below). If the
+pattern were written \fB"^abc|def$"\fP , it would match either \fB"abc"\fP
+or \fB"def"\fP on a line by itself.
+.LP
+Unlike the general ERE rules, embedded anchoring is not allowed by
+most historical \fIlex\fP implementations. An example of
+embedded anchoring would be for patterns such as \fB"(^|\ )foo(\ |$)"\fP
+to match \fB"foo"\fP when it exists as a
+complete word. This functionality can be obtained using existing \fIlex\fP
+features:
+.sp
+.RS
+.nf
+
+\fB^foo/[ \\n] |
+" foo"/[ \\n] /* Found foo as a separate word. */
+\fP
+.fi
+.RE
+.LP
+Note also that \fB'$'\fP is a form of trailing context (it is equivalent
+to \fB"/\\n"\fP ) and as such cannot be used with
+regular expressions containing another instance of the operator (see
+the preceding discussion of trailing context).
+.LP
+The additional regular expressions trailing-context operator \fB'/'\fP
+can be used as an ordinary character if presented
+within double-quotes, \fB"/"\fP ; preceded by a backslash, \fB"\\/"\fP
+; or within a bracket expression, \fB"[/]"\fP . The
+start-condition \fB'<'\fP and \fB'>'\fP operators shall be special
+only in a start condition at the beginning of a
+regular expression; elsewhere in the regular expression they shall
+be treated as ordinary characters.
+.SS Actions in lex
+.LP
+The action to be taken when an ERE is matched can be a C program fragment
+or the special actions described below; the program
+fragment can contain one or more C statements, and can also include
+special actions. The empty C statement \fB';'\fP shall be a
+valid action; any string in the \fBlex.yy.c\fP input that matches
+the pattern portion of such a rule is effectively ignored or
+skipped. However, the absence of an action shall not be valid, and
+the action \fIlex\fP takes in such a condition is
+undefined.
+.LP
+The specification for an action, including C statements and special
+actions, can extend across several lines if enclosed in
+braces:
+.sp
+.RS
+.nf
+
+\fIERE\fP \fB<\fP\fIone or more blanks\fP\fB> {\fP \fIprogram statement
+ program statement\fP \fB}
+\fP
+.fi
+.RE
+.LP
+The default action when a string in the input to a \fBlex.yy.c\fP
+program is not matched by any expression shall be to copy the
+string to the output. Because the default behavior of a program generated
+by \fIlex\fP is to read the input and copy it to the
+output, a minimal \fIlex\fP source program that has just \fB"%%"\fP
+shall generate a C program that simply copies the input to
+the output unchanged.
+.LP
+Four special actions shall be available:
+.sp
+.RS
+.nf
+
+\fB| ECHO; REJECT; BEGIN
+\fP
+.fi
+.RE
+.TP 7
+\fB|\fP
+The action \fB'|'\fP means that the action for the next rule is the
+action for this rule. Unlike the other three actions,
+\fB'|'\fP cannot be enclosed in braces or be semicolon-terminated;
+the application shall ensure that it is specified alone, with
+no other actions.
+.TP 7
+\fBECHO;\fP
+Write the contents of the string \fIyytext\fP on the output.
+.TP 7
+\fBREJECT;\fP
+Usually only a single expression is matched by a given string in the
+input. \fBREJECT\fP means "continue to the next
+expression that matches the current input", and shall cause whatever
+rule was the second choice after the current rule to be
+executed for the same input. Thus, multiple rules can be matched and
+executed for one input string or overlapping input strings.
+For example, given the regular expressions \fB"xyz"\fP and \fB"xy"\fP
+and the input \fB"xyz"\fP , usually only the regular
+expression \fB"xyz"\fP would match. The next attempted match would
+start after \fBz.\fP If the last action in the
+\fB"xyz"\fP rule is \fBREJECT\fP, both this rule and the \fB"xy"\fP
+rule would be executed. The \fBREJECT\fP action may be
+implemented in such a fashion that flow of control does not continue
+after it, as if it were equivalent to a \fBgoto\fP to another
+part of \fIyylex\fP(). The use of \fBREJECT\fP may result in somewhat
+larger and slower scanners.
+.TP 7
+\fBBEGIN\fP
+The action:
+.sp
+.RS
+.nf
+
+\fBBEGIN\fP \fInewstate\fP\fB;
+\fP
+.fi
+.RE
+.LP
+switches the state (start condition) to \fInewstate\fP. If the string
+\fInewstate\fP has not been declared previously as a
+start condition in the \fIDefinitions\fP section, the results are
+unspecified. The initial state is indicated by the digit
+\fB'0'\fP or the token \fBINITIAL\fP.
+.sp
+.LP
+The functions or macros described below are accessible to user code
+included in the \fIlex\fP input. It is unspecified whether
+they appear in the C code output of \fIlex\fP, or are accessible only
+through the \fB-l\ l\fP operand to \fIc99\fP (the \fIlex\fP library).
+.TP 7
+\fBint\ \fP \fIyylex\fP(\fBvoid\fP)
+.sp
+Performs lexical analysis on the input; this is the primary function
+generated by the \fIlex\fP utility. The function shall return
+zero when the end of input is reached; otherwise, it shall return
+non-zero values (tokens) determined by the actions that are
+selected.
+.TP 7
+\fBint\ \fP \fIyymore\fP(\fBvoid\fP)
+.sp
+When called, indicates that when the next input string is recognized,
+it is to be appended to the current value of \fIyytext\fP
+rather than replacing it; the value in \fIyyleng\fP shall be adjusted
+accordingly.
+.TP 7
+\fBint\ \fP \fIyyless\fP(\fBint\ \fP \fIn\fP)
+.sp
+Retains \fIn\fP initial characters in \fIyytext\fP, NUL-terminated,
+and treats the remaining characters as if they had not been
+read; the value in \fIyyleng\fP shall be adjusted accordingly.
+.TP 7
+\fBint\ \fP \fIinput\fP(\fBvoid\fP)
+.sp
+Returns the next character from the input, or zero on end-of-file.
+It shall obtain input from the stream pointer \fIyyin\fP,
+although possibly via an intermediate buffer. Thus, once scanning
+has begun, the effect of altering the value of \fIyyin\fP is
+undefined. The character read shall be removed from the input stream
+of the scanner without any processing by the scanner.
+.TP 7
+\fBint\ \fP \fIunput\fP(\fBint\ \fP \fIc\fP)
+.sp
+Returns the character \fB'c'\fP to the input; \fIyytext\fP and \fIyyleng\fP
+are undefined until the next expression is
+matched. The result of using \fIunput\fP() for more characters than
+have been input is unspecified.
+.sp
+.LP
+The following functions shall appear only in the \fIlex\fP library
+accessible through the \fB-l\ l\fP operand; they can
+therefore be redefined by a conforming application:
+.TP 7
+\fBint\ \fP \fIyywrap\fP(\fBvoid\fP)
+.sp
+Called by \fIyylex\fP() at end-of-file; the default \fIyywrap\fP()
+shall always return 1. If the application requires
+\fIyylex\fP() to continue processing with another source of input,
+then the application can include a function \fIyywrap\fP(),
+which associates another file with the external variable \fBFILE *\fP
+\fIyyin\fP and shall return a value of zero.
+.TP 7
+\fBint\ \fP \fImain\fP(\fBint\ \fP \fIargc\fP, \fBchar *\fP\fIargv\fP[])
+.sp
+Calls \fIyylex\fP() to perform lexical analysis, then exits. The user
+code can contain \fImain\fP() to perform
+application-specific operations, calling \fIyylex\fP() as applicable.
+.sp
+.LP
+Except for \fIinput\fP(), \fIunput\fP(), and \fImain\fP(), all external
+and static names generated by \fIlex\fP shall begin
+with the prefix \fByy\fP or \fBYY\fP.
+.SH EXIT STATUS
+.LP
+The following exit values shall be returned:
+.TP 7
+\ 0
+Successful completion.
+.TP 7
+>0
+An error occurred.
+.sp
+.SH CONSEQUENCES OF ERRORS
+.LP
+Default.
+.LP
+\fIThe following sections are informative.\fP
+.SH APPLICATION USAGE
+.LP
+Conforming applications are warned that in the \fIRules\fP section,
+an ERE without an action is not acceptable, but need not be
+detected as erroneous by \fIlex\fP. This may result in compilation
+or runtime errors.
+.LP
+The purpose of \fIinput\fP() is to take characters off the input stream
+and discard them as far as the lexical analysis is
+concerned. A common use is to discard the body of a comment once the
+beginning of a comment is recognized.
+.LP
+The \fIlex\fP utility is not fully internationalized in its treatment
+of regular expressions in the \fIlex\fP source code or
+generated lexical analyzer. It would seem desirable to have the lexical
+analyzer interpret the regular expressions given in the
+\fIlex\fP source according to the environment specified when the lexical
+analyzer is executed, but this is not possible with the
+current \fIlex\fP technology. Furthermore, the very nature of the
+lexical analyzers produced by \fIlex\fP must be closely tied to
+the lexical requirements of the input language being described, which
+is frequently locale-specific anyway. (For example, writing
+an analyzer that is used for French text is not automatically useful
+for processing other languages.)
+.SH EXAMPLES
+.LP
+The following is an example of a \fIlex\fP program that implements
+a rudimentary scanner for a Pascal-like syntax:
+.sp
+.RS
+.nf
+
+\fB%{
+/* Need this for the call to atof() below. */
+#include <math.h>
+/* Need this for printf(), fopen(), and stdin below. */
+#include <stdio.h>
+%}
+.sp
+
+DIGIT [0-9]
+ID [a-z][a-z0-9]*
+.sp
+
+%%
+.sp
+
+{DIGIT}+ {
+ printf("An integer: %s (%d)\\n", yytext,
+ atoi(yytext));
+ }
+.sp
+
+{DIGIT}+"."{DIGIT}* {
+ printf("A float: %s (%g)\\n", yytext,
+ atof(yytext));
+ }
+.sp
+
+if|then|begin|end|procedure|function {
+ printf("A keyword: %s\\n", yytext);
+ }
+.sp
+
+{ID} printf("An identifier: %s\\n", yytext);
+.sp
+
+"+"|"-"|"*"|"/" printf("An operator: %s\\n", yytext);
+.sp
+
+"{"[^}\\n]*"}" /* Eat up one-line comments. */
+.sp
+
+[ \\t\\n]+ /* Eat up white space. */
+.sp
+
+\&. printf("Unrecognized character: %s\\n", yytext);
+.sp
+
+%%
+.sp
+
+int main(int argc, char *argv[])
+{
+ ++argv, --argc; /* Skip over program name. */
+ if (argc > 0)
+ yyin = fopen(argv[0], "r");
+ else
+ yyin = stdin;
+.sp
+
+ yylex();
+}
+\fP
+.fi
+.RE
+.SH RATIONALE
+.LP
+Even though the \fB-c\fP option and references to the C language are
+retained in this description, \fIlex\fP may be
+generalized to other languages, as was done at one time for EFL, the
+Extended FORTRAN Language. Since the \fIlex\fP input
+specification is essentially language-independent, versions of this
+utility could be written to produce Ada, Modula-2, or Pascal
+code, and there are known historical implementations that do so.
+.LP
+The current description of \fIlex\fP bypasses the issue of dealing
+with internationalized EREs in the \fIlex\fP source code or
+generated lexical analyzer. If it follows the model used by \fIawk\fP
+(the source code is
+assumed to be presented in the POSIX locale, but input and output
+are in the locale specified by the environment variables), then
+the tables in the lexical analyzer produced by \fIlex\fP would interpret
+EREs specified in the \fIlex\fP source in terms of the
+environment variables specified when \fIlex\fP was executed. The desired
+effect would be to have the lexical analyzer interpret
+the EREs given in the \fIlex\fP source according to the environment
+specified when the lexical analyzer is executed, but this is
+not possible with the current \fIlex\fP technology.
+.LP
+The description of octal and hexadecimal-digit escape sequences agrees
+with the ISO\ C standard use of escape sequences. See
+the RATIONALE for \fIed\fP for a discussion of bytes larger than 9
+bits being represented by octal values.
+Hexadecimal values can represent larger bytes and multi-byte characters
+directly, using as many digits as required.
+.LP
+There is no detailed output format specification. The observed behavior
+of \fIlex\fP under four different historical
+implementations was that none of these implementations consistently
+reported the line numbers for error and warning messages.
+Furthermore, there was a desire that \fIlex\fP be allowed to output
+additional diagnostic messages. Leaving message formats
+unspecified avoids these formatting questions and problems with internationalization.
+.LP
+Although the \fB%x\fP specifier for \fIexclusive\fP start conditions
+is not historical practice, it is believed to be a
+minor change to historical implementations and greatly enhances the
+usability of \fIlex\fP programs since it permits an
+application to obtain the expected functionality with fewer statements.
+.LP
+The \fB%array\fP and \fB%pointer\fP declarations were added as a compromise
+between historical systems. The System V-based
+\fIlex\fP copies the matched text to a \fIyytext\fP array. The \fIflex\fP
+program, supported in BSD and GNU systems, uses a
+pointer. In the latter case, significant performance improvements
+are available for some scanners. Most historical programs should
+require no change in porting from one system to another because the
+string being referenced is null-terminated in both cases. (The
+method used by \fIflex\fP in its case is to null-terminate the token
+in place by remembering the character that used to come right
+after the token and replacing it before continuing on to the next
+scan.) Multi-file programs with external references to
+\fIyytext\fP outside the scanner source file should continue to operate
+on their historical systems, but would require one of the
+new declarations to be considered strictly portable.
+.LP
+The description of EREs avoids unnecessary duplication of ERE details
+because their meanings within a \fIlex\fP ERE are the
+same as that for the ERE in this volume of IEEE\ Std\ 1003.1-2001.
+.LP
+The reason for the undefined condition associated with text beginning
+with a <blank> or within \fB"%{"\fP and
+\fB"%}"\fP delimiter lines appearing in the \fIRules\fP section is
+historical practice. Both the BSD and System V \fIlex\fP
+copy the indented (or enclosed) input in the \fIRules\fP section (except
+at the beginning) to unreachable areas of the
+\fIyylex\fP() function (the code is written directly after a \fIbreak\fP
+statement). In some cases, the System V \fIlex\fP generates an error
+message or a syntax error, depending on the form of indented
+input.
+.LP
+The intention in breaking the list of functions into those that may
+appear in \fBlex.yy.c\fP \fIversus\fP those that only
+appear in \fBlibl.a\fP is that only those functions in \fBlibl.a\fP
+can be reliably redefined by a conforming application.
+.LP
+The descriptions of standard output and standard error are somewhat
+complicated because historical \fIlex\fP implementations
+chose to issue diagnostic messages to standard output (unless \fB-t\fP
+was given). IEEE\ Std\ 1003.1-2001 allows this
+behavior, but leaves an opening for the more expected behavior of
+using standard error for diagnostics. Also, the System V behavior
+of writing the statistics when any table sizes are given is allowed,
+while BSD-derived systems can avoid it. The programmer can
+always precisely obtain the desired results by using either the \fB-t\fP
+or \fB-n\fP options.
+.LP
+The OPERANDS section does not mention the use of \fB-\fP as a synonym
+for standard input; not all historical implementations
+support such usage for any of the \fIfile\fP operands.
+.LP
+A description of the \fItranslation table\fP was deleted from early
+proposals because of its relatively low usage in historical
+applications.
+.LP
+The change to the definition of the \fIinput\fP() function that allows
+buffering of input presents the opportunity for major
+performance gains in some applications.
+.LP
+The following examples clarify the differences between \fIlex\fP regular
+expressions and regular expressions appearing
+elsewhere in this volume of IEEE\ Std\ 1003.1-2001. For regular expressions
+of the form \fB"r/x"\fP , the string
+matching \fIr\fP is always returned; confusion may arise when the
+beginning of \fIx\fP matches the trailing portion of \fIr\fP.
+For example, given the regular expression \fB"a*b/cc"\fP and the input
+\fB"aaabcc"\fP , \fIyytext\fP would contain the
+string \fB"aaab"\fP on this match. But given the regular expression
+\fB"x*/xy"\fP and the input \fB"xxxy"\fP , the token
+\fBxxx\fP, not \fBxx\fP, is returned by some implementations because
+\fBxxx\fP matches \fB"x*"\fP .
+.LP
+In the rule \fB"ab*/bc"\fP , the \fB"b*"\fP at the end of \fIr\fP
+extends \fIr\fP's match into the beginning of the
+trailing context, so the result is unspecified. If this rule were
+\fB"ab/bc"\fP , however, the rule matches the text
+\fB"ab"\fP when it is followed by the text \fB"bc"\fP . In this latter
+case, the matching of \fIr\fP cannot extend into the
+beginning of \fIx\fP, so the result is specified.
+.SH FUTURE DIRECTIONS
+.LP
+None.
+.SH SEE ALSO
+.LP
+\fIc99\fP , \fIed\fP , \fIyacc\fP
+.SH COPYRIGHT
+Portions of this text are reprinted and reproduced in electronic form
+from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
+-- Portable Operating System Interface (POSIX), The Open Group Base
+Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
+Electrical and Electronics Engineers, Inc and The Open Group. In the
+event of any discrepancy between this version and the original IEEE and
+The Open Group Standard, the original IEEE and The Open Group Standard
+is the referee document. The original Standard can be obtained online at
+http://www.opengroup.org/unix/online.html .