diff options
Diffstat (limited to 'man-pages-posix-2003/man1p/yacc.1p')
-rw-r--r-- | man-pages-posix-2003/man1p/yacc.1p | 1290 |
1 files changed, 1290 insertions, 0 deletions
diff --git a/man-pages-posix-2003/man1p/yacc.1p b/man-pages-posix-2003/man1p/yacc.1p new file mode 100644 index 0000000..bfda7ac --- /dev/null +++ b/man-pages-posix-2003/man1p/yacc.1p @@ -0,0 +1,1290 @@ +.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved +.TH "YACC" 1P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual" +.\" yacc +.SH PROLOG +This manual page is part of the POSIX Programmer's Manual. +The Linux implementation of this interface may differ (consult +the corresponding Linux manual page for details of Linux behavior), +or the interface may not be implemented on Linux. +.SH NAME +yacc \- yet another compiler compiler (\fBDEVELOPMENT\fP) +.SH SYNOPSIS +.LP +\fByacc\fP \fB[\fP\fB-dltv\fP\fB][\fP\fB-b\fP \fIfile_prefix\fP\fB][\fP\fB-p\fP +\fIsym_prefix\fP\fB]\fP \fIgrammar\fP\fB\fP +.SH DESCRIPTION +.LP +The \fIyacc\fP utility shall read a description of a context-free +grammar in \fIgrammar\fP and write C source code, conforming +to the ISO\ C standard, to a code file, and optionally header information +into a header file, in the current directory. The C +code shall define a function and related routines and macros for an +automaton that executes a parsing algorithm meeting the +requirements in Algorithms . +.LP +The form and meaning of the grammar are described in the EXTENDED +DESCRIPTION section. +.LP +The C source code and header file shall be produced in a form suitable +as input for the C compiler (see \fIc99\fP ). +.SH OPTIONS +.LP +The \fIyacc\fP utility shall conform to the Base Definitions volume +of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines. +.LP +The following options shall be supported: +.TP 7 +\fB-b\ \fP \fIfile_prefix\fP +Use \fIfile_prefix\fP instead of \fBy\fP as the prefix for all output +filenames. The code file \fBy.tab.c\fP, the header +file \fBy.tab.h\fP (created when \fB-d\fP is specified), and the description +file \fBy.output\fP (created when \fB-v\fP is +specified), shall be changed to \fIfile_prefix\fP \fB.tab.c\fP, \fIfile_prefix\fP +\fB\&.tab.h\fP, and \fIfile_prefix\fP +\fB\&.output\fP, respectively. +.TP 7 +\fB-d\fP +Write the header file; by default only the code file is written. The +\fB#define\fP statements associate the token codes +assigned by \fIyacc\fP with the user-declared token names. This allows +source files other than \fBy.tab.c\fP to access the token +codes. +.TP 7 +\fB-l\fP +Produce a code file that does not contain any \fB#line\fP constructs. +If this option is not present, it is unspecified whether +the code file or header file contains \fB#line\fP directives. This +should only be used after the grammar and the associated +actions are fully debugged. +.TP 7 +\fB-p\ \fP \fIsym_prefix\fP +.sp +Use \fIsym_prefix\fP instead of \fByy\fP as the prefix for all external +names produced by \fIyacc\fP. The names affected shall +include the functions \fIyyparse\fP(), \fIyylex\fP(), and \fIyyerror\fP(), +and the variables \fIyylval\fP, \fIyychar\fP, and +\fIyydebug\fP. (In the remainder of this section, the six symbols +cited are referenced using their default names only as a +notational convenience.) Local names may also be affected by the \fB-p\fP +option; however, the \fB-p\fP option shall not affect +\fB#define\fP symbols generated by \fIyacc\fP. +.TP 7 +\fB-t\fP +Modify conditional compilation directives to permit compilation of +debugging code in the code file. Runtime debugging +statements shall always be contained in the code file, but by default +conditional compilation directives prevent their +compilation. +.TP 7 +\fB-v\fP +Write a file containing a description of the parser and a report of +conflicts generated by ambiguities in the grammar. +.sp +.SH OPERANDS +.LP +The following operand is required: +.TP 7 +\fIgrammar\fP +A pathname of a file containing instructions, hereafter called \fIgrammar\fP, +for which a parser is to be created. The format +for the grammar is described in the EXTENDED DESCRIPTION section. +.sp +.SH STDIN +.LP +Not used. +.SH INPUT FILES +.LP +The file \fIgrammar\fP shall be a text file formatted as specified +in the EXTENDED DESCRIPTION section. +.SH ENVIRONMENT VARIABLES +.LP +The following environment variables shall affect the execution of +\fIyacc\fP: +.TP 7 +\fILANG\fP +Provide a default value for the internationalization variables that +are unset or null. (See the Base Definitions volume of +IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables +for +the precedence of internationalization variables used to determine +the values of locale categories.) +.TP 7 +\fILC_ALL\fP +If set to a non-empty string value, override the values of all the +other internationalization variables. +.TP 7 +\fILC_CTYPE\fP +Determine the locale for the interpretation of sequences of bytes +of text data as characters (for example, single-byte as +opposed to multi-byte characters in arguments and input files). +.TP 7 +\fILC_MESSAGES\fP +Determine the locale that should be used to affect the format and +contents of diagnostic messages written to standard +error. +.TP 7 +\fINLSPATH\fP +Determine the location of message catalogs for the processing of \fILC_MESSAGES +\&.\fP +.sp +.LP +The \fILANG\fP and \fILC_*\fP variables affect the execution of the +\fIyacc\fP utility as stated. The \fImain\fP() function +defined in Yacc Library shall call: +.sp +.RS +.nf + +\fBsetlocale(LC_ALL, "") +\fP +.fi +.RE +.LP +and thus the program generated by \fIyacc\fP shall also be affected +by the contents of these variables at runtime. +.SH ASYNCHRONOUS EVENTS +.LP +Default. +.SH STDOUT +.LP +Not used. +.SH STDERR +.LP +If shift/reduce or reduce/reduce conflicts are detected in \fIgrammar\fP, +\fIyacc\fP shall write a report of those conflicts +to the standard error in an unspecified format. +.LP +Standard error shall also be used for diagnostic messages. +.SH OUTPUT FILES +.LP +The code file, the header file, and the description file shall be +text files. All are described in the following sections. +.SS Code File +.LP +This file shall contain the C source code for the \fIyyparse\fP() +function. It shall contain code for the various semantic +actions with macro substitution performed on them as described in +the EXTENDED DESCRIPTION section. It also shall contain a copy of +the \fB#define\fP statements in the header file. If a \fB%union\fP +declaration is used, the declaration for YYSTYPE shall also be +included in this file. +.SS Header File +.LP +The header file shall contain \fB#define\fP statements that associate +the token numbers with the token names. This allows +source files other than the code file to access the token codes. If +a \fB%union\fP declaration is used, the declaration for +YYSTYPE and an \fIextern YYSTYPE yylval\fP declaration shall also +be included in this file. +.SS Description File +.LP +The description file shall be a text file containing a description +of the state machine corresponding to the parser, using an +unspecified format. Limits for internal tables (see Limits ) shall +also be reported, in an +implementation-defined manner. (Some implementations may use dynamic +allocation techniques and have no specific limit values to +report.) +.SH EXTENDED DESCRIPTION +.LP +The \fIyacc\fP command accepts a language that is used to define a +grammar for a target language to be parsed by the tables and +code generated by \fIyacc\fP. The language accepted by \fIyacc\fP +as a grammar for the target language is described below using +the \fIyacc\fP input language itself. +.LP +The input \fIgrammar\fP includes rules describing the input structure +of the target language and code to be invoked when these +rules are recognized to provide the associated semantic action. The +code to be executed shall appear as bodies of text that are +intended to be C-language code. The C-language inclusions are presumed +to form a correct function when processed by \fIyacc\fP +into its output files. The code included in this way shall be executed +during the recognition of the target language. +.LP +Given a grammar, the \fIyacc\fP utility generates the files described +in the OUTPUT FILES section. The code file can be +compiled and linked using \fIc99\fP. If the declaration and programs +sections of the grammar +file did not include definitions of \fImain\fP(), \fIyylex\fP(), and +\fIyyerror\fP(), the compiled output requires linking with +externally supplied versions of those functions. Default versions +of \fImain\fP() and \fIyyerror\fP() are supplied in the +\fIyacc\fP library and can be linked in by using the \fB-l\ y\fP operand +to \fIc99\fP. +The \fIyacc\fP library interfaces need not support interfaces with +other than the default \fByy\fP symbol prefix. The application +provides the lexical analyzer function, \fIyylex\fP(); the \fIlex\fP +utility is specifically +designed to generate such a routine. +.SS Input Language +.LP +The application shall ensure that every specification file consists +of three sections in order: \fIdeclarations\fP, \fIgrammar +rules\fP, and \fIprograms\fP, separated by double percent signs ( +\fB"%%"\fP ). The declarations and programs sections can be +empty. If the latter is empty, the preceding \fB"%%"\fP mark separating +it from the rules section can be omitted. +.LP +The input is free form text following the structure of the grammar +defined below. +.SS Lexical Structure of the Grammar +.LP +The <blank>s, <newline>s, and <form-feed>s shall be ignored, except +that the application shall ensure that +they do not appear in names or multi-character reserved symbols. Comments +shall be enclosed in \fB"/*\ ...\ */"\fP, and +can appear wherever a name is valid. +.LP +Names are of arbitrary length, made up of letters, periods ( \fB'.'\fP +), underscores ( \fB'_'\fP ), and non-initial +digits. Uppercase and lowercase letters are distinct. Conforming applications +shall not use names beginning in \fByy\fP or +\fBYY\fP since the \fIyacc\fP parser uses such names. Many of the +names appear in the final output of \fIyacc\fP, and thus they +should be chosen to conform with any additional rules created by the +C compiler to be used. In particular they appear in +\fB#define\fP statements. +.LP +A literal shall consist of a single character enclosed in single-quotes +( \fB'"\fP ). All of the escape sequences supported +for character constants by the ISO\ C standard shall be supported +by \fIyacc\fP. +.LP +The relationship with the lexical analyzer is discussed in detail +below. +.LP +The application shall ensure that the NUL character is not used in +grammar rules or literals. +.SS Declarations Section +.LP +The declarations section is used to define the symbols used to define +the target language and their relationship with each +other. In particular, much of the additional information required +to resolve ambiguities in the context-free grammar for the target +language is provided here. +.LP +Usually \fIyacc\fP assigns the relationship between the symbolic names +it generates and their underlying numeric value. The +declarations section makes it possible to control the assignment of +these values. +.LP +It is also possible to keep semantic information associated with the +tokens currently on the parse stack in a user-defined +C-language \fBunion\fP, if the members of the union are associated +with the various names in the grammar. The declarations section +provides for this as well. +.LP +The first group of declarators below all take a list of names as arguments. +That list can optionally be preceded by the name of +a C union member (called a \fItag\fP below) appearing within \fB'<'\fP +and \fB'>'\fP . (As an exception to the +typographical conventions of the rest of this volume of IEEE\ Std\ 1003.1-2001, +in this case <\fItag\fP> does not +represent a metavariable, but the literal angle bracket characters +surrounding a symbol.) The use of \fItag\fP specifies that the +tokens named on this line shall be of the same C type as the union +member referenced by \fItag\fP. This is discussed in more +detail below. +.LP +For lists used to define tokens, the first appearance of a given token +can be followed by a positive integer (as a string of +decimal digits). If this is done, the underlying value assigned to +it for lexical purposes shall be taken to be that number. +.LP +The following declares \fIname\fP to be a token: +.sp +.RS +.nf + +\fB%token\fP \fB[\fP\fB<\fP\fItag\fP\fB>\fP\fB]\fP \fIname\fP \fB[\fP\fInumber\fP\fB][\fP\fIname\fP \fB[\fP\fInumber\fP\fB]]\fP\fB... +\fP +.fi +.RE +.LP +If \fItag\fP is present, the C type for all tokens on this line shall +be declared to be the type referenced by \fItag\fP. If a +positive integer, \fInumber\fP, follows a \fIname\fP, that value shall +be assigned to the token. +.LP +The following declares \fIname\fP to be a token, and assigns precedence +to it: +.sp +.RS +.nf + +\fB%left\fP \fB[\fP\fB<\fP\fItag\fP\fB>\fP\fB]\fP \fIname\fP \fB[\fP\fInumber\fP\fB][\fP\fIname\fP \fB[\fP\fInumber\fP\fB]]\fP\fB... +%right\fP \fB[\fP\fB<\fP\fItag\fP\fB>\fP\fB]\fP \fIname\fP \fB[\fP\fInumber\fP\fB][\fP\fIname\fP \fB[\fP\fInumber\fP\fB]]\fP\fB... +\fP +.fi +.RE +.LP +One or more lines, each beginning with one of these symbols, can appear +in this section. All tokens on the same line have the +same precedence level and associativity; the lines are in order of +increasing precedence or binding strength. \fB%left\fP denotes +that the operators on that line are left associative, and \fB%right\fP +similarly denotes right associative operators. If +\fItag\fP is present, it shall declare a C type for \fIname\fPs as +described for \fB%token\fP. +.LP +The following declares \fIname\fP to be a token, and indicates that +this cannot be used associatively: +.sp +.RS +.nf + +\fB%nonassoc\fP \fB[\fP\fB<\fP\fItag\fP\fB>\fP\fB]\fP \fIname\fP \fB[\fP\fInumber\fP\fB][\fP\fIname\fP \fB[\fP\fInumber\fP\fB]]\fP\fB... +\fP +.fi +.RE +.LP +If the parser encounters associative use of this token it reports +an error. If \fItag\fP is present, it shall declare a C type +for \fIname\fPs as described for \fB%token\fP. +.LP +The following declares that union member \fIname\fPs are non-terminals, +and thus it is required to have a \fItag\fP field at +its beginning: +.sp +.RS +.nf + +\fB%type <\fP\fItag\fP\fB>\fP \fIname\fP\fB... +\fP +.fi +.RE +.LP +Because it deals with non-terminals only, assigning a token number +or using a literal is also prohibited. If this construct is +present, \fIyacc\fP shall perform type checking; if this construct +is not present, the parse stack shall hold only the \fBint\fP +type. +.LP +Every name used in \fIgrammar\fP not defined by a \fB%token\fP, \fB%left\fP, +\fB%right\fP, or \fB%nonassoc\fP declaration +is assumed to represent a non-terminal symbol. The \fIyacc\fP utility +shall report an error for any non-terminal symbol that does +not appear on the left side of at least one grammar rule. +.LP +Once the type, precedence, or token number of a name is specified, +it shall not be changed. If the first declaration of a token +does not assign a token number, \fIyacc\fP shall assign a token number. +Once this assignment is made, the token number shall not +be changed by explicit assignment. +.LP +The following declarators do not follow the previous pattern. +.LP +The following declares the non-terminal \fIname\fP to be the \fIstart +symbol\fP, which represents the largest, most general +structure described by the grammar rules: +.sp +.RS +.nf + +\fB%start\fP \fIname\fP +.fi +.RE +.LP +By default, it is the left-hand side of the first grammar rule; this +default can be overridden with this declaration. +.LP +The following declares the \fIyacc\fP value stack to be a union of +the various types of values desired: +.sp +.RS +.nf + +\fB%union {\fP \fIbody of union\fP \fB(\fP\fIin C\fP\fB) } +\fP +.fi +.RE +.LP +By default, the values returned by actions (see below) and the lexical +analyzer shall be of type \fBint\fP. The \fIyacc\fP +utility keeps track of types, and it shall insert corresponding union +member names in order to perform strict type checking of the +resulting parser. +.LP +Alternatively, given that at least one <\fItag\fP> construct is used, +the union can be declared in a header file (which +shall be included in the declarations section by using a \fB#include\fP +construct within \fB%{\fP and \fB%}\fP), and a +\fBtypedef\fP used to define the symbol YYSTYPE to represent this +union. The effect of \fB%union\fP is to provide the declaration +of YYSTYPE directly from the \fIyacc\fP input. +.LP +C-language declarations and definitions can appear in the declarations +section, enclosed by the following marks: +.sp +.RS +.nf + +\fB%{ ... %} +\fP +.fi +.RE +.LP +These statements shall be copied into the code file, and have global +scope within it so that they can be used in the rules and +program sections. +.LP +The application shall ensure that the declarations section is terminated +by the token \fB%%\fP. +.SS Grammar Rules in yacc +.LP +The rules section defines the context-free grammar to be accepted +by the function \fIyacc\fP generates, and associates with +those rules C-language actions and additional precedence information. +The grammar is described below, and a formal definition +follows. +.LP +The rules section is comprised of one or more grammar rules. A grammar +rule has the form: +.sp +.RS +.nf + +\fBA : BODY ; +\fP +.fi +.RE +.LP +The symbol \fBA\fP represents a non-terminal name, and \fBBODY\fP +represents a sequence of zero or more \fIname\fPs, +\fIliteral\fPs, and \fIsemantic action\fPs that can then be followed +by optional \fIprecedence rule\fPs. Only the names and +literals participate in the formation of the grammar; the semantic +actions and precedence rules are used in other ways. The colon +and the semicolon are \fIyacc\fP punctuation. If there are several +successive grammar rules with the same left-hand side, the +vertical bar \fB'|'\fP can be used to avoid rewriting the left-hand +side; in this case the semicolon appears only after the last +rule. The BODY part can be empty (or empty of names and literals) +to indicate that the non-terminal symbol matches the empty +string. +.LP +The \fIyacc\fP utility assigns a unique number to each rule. Rules +using the vertical bar notation are distinct rules. The +number assigned to the rule appears in the description file. +.LP +The elements comprising a BODY are: +.TP 7 +\fIname\fP,\ \fIliteral\fP +These form the rules of the grammar: \fIname\fP is either a \fItoken\fP +or a \fInon-terminal\fP; \fIliteral\fP stands for +itself (less the lexically required quotation marks). +.TP 7 +\fIsemantic action\fP +.sp +With each grammar rule, the user can associate actions to be performed +each time the rule is recognized in the input process. (Note +that the word "action" can also refer to the actions of the parser-shift, +reduce, and so on.) +.LP +These actions can return values and can obtain the values returned +by previous actions. These values are kept in objects of type +YYSTYPE (see \fB%union\fP). The result value of the action shall be +kept on the parse stack with the left-hand side of the rule, +to be accessed by other reductions as part of their right-hand side. +By using the <\fItag\fP> information provided in the +declarations section, the code generated by \fIyacc\fP can be strictly +type checked and contain arbitrary information. In +addition, the lexical analyzer can provide the same kinds of values +for tokens, if desired. +.LP +An action is an arbitrary C statement and as such can do input or +output, call subprograms, and alter external variables. An +action is one or more C statements enclosed in curly braces \fB'{'\fP +and \fB'}'\fP . +.LP +Certain pseudo-variables can be used in the action. These are macros +for access to data structures known internally to +\fIyacc\fP. +.TP 7 +$$ +.RS +The value of the action can be set by assigning it to $$. If type +checking is enabled and the type of the value to be assigned +cannot be determined, a diagnostic message may be generated. +.RE +.TP 7 +$\fInumber\fP +.RS +This refers to the value returned by the component specified by the +token \fInumber\fP in the right side of a rule, reading +from left to right; \fInumber\fP can be zero or negative. If \fInumber\fP +is zero or negative, it refers to the data associated +with the name on the parser's stack preceding the leftmost symbol +of the current rule. (That is, \fB"$0"\fP refers to the name +immediately preceding the leftmost name in the current rule to be +found on the parser's stack and \fB"$-1"\fP refers to the +symbol to \fIits\fP left.) If \fInumber\fP refers to an element past +the current point in the rule, or beyond the bottom of the +stack, the result is undefined. If type checking is enabled and the +type of the value to be assigned cannot be determined, a +diagnostic message may be generated. +.RE +.TP 7 +$<\fItag\fP>\fInumber\fP +.RS +.sp +These correspond exactly to the corresponding symbols without the +\fItag\fP inclusion, but allow for strict type checking (and +preclude unwanted type conversions). The effect is that the macro +is expanded to use \fItag\fP to select an element from the +YYSTYPE union (using \fIdataname.tag\fP). This is particularly useful +if \fInumber\fP is not positive. +.RE +.TP 7 +$<\fItag\fP>$ +.RS +This imposes on the reference the type of the union member referenced +by \fItag\fP. This construction is applicable when a +reference to a left context value occurs in the grammar, and provides +\fIyacc\fP with a means for selecting a type. +.RE +.sp +.LP +Actions can occur anywhere in a rule (not just at the end); an action +can access values returned by actions to its left, and in +turn the value it returns can be accessed by actions to its right. +An action appearing in the middle of a rule shall be equivalent +to replacing the action with a new non-terminal symbol and adding +an empty rule with that non-terminal symbol on the left-hand +side. The semantic action associated with the new rule shall be equivalent +to the original action. The use of actions within rules +might introduce conflicts that would not otherwise exist. +.LP +By default, the value of a rule shall be the value of the first element +in it. If the first element does not have a type +(particularly in the case of a literal) and type checking is turned +on by \fB%type\fP, an error message shall result. +.TP 7 +\fIprecedence\fP +The keyword \fB%prec\fP can be used to change the precedence level +associated with a particular grammar rule. Examples of this +are in cases where a unary and binary operator have the same symbolic +representation, but need to be given different precedences, +or where the handling of an ambiguous if-else construction is necessary. +The reserved symbol \fB%prec\fP can appear immediately +after the body of the grammar rule and can be followed by a token +name or a literal. It shall cause the precedence of the grammar +rule to become that of the following token name or literal. The action +for the rule as a whole can follow \fB%prec\fP. +.sp +.LP +If a program section follows, the application shall ensure that the +grammar rules are terminated by \fB%%\fP. +.SS Programs Section +.LP +The \fIprograms\fP section can include the definition of the lexical +analyzer \fIyylex\fP(), and any other functions; for +example, those used in the actions specified in the grammar rules. +It is unspecified whether the programs section precedes or +follows the semantic actions in the output file; therefore, if the +application contains any macro definitions and declarations +intended to apply to the code in the semantic actions, it shall place +them within \fB"%{\ ...\ %}"\fP in the +declarations section. +.SS Input Grammar +.LP +The following input to \fIyacc\fP yields a parser for the input to +\fIyacc\fP. This formal syntax takes precedence over the +preceding text syntax description. +.LP +The lexical structure is defined less precisely; Lexical Structure +of the Grammar defines most +terms. The correspondence between the previous terms and the tokens +below is as follows. +.TP 7 +\fBIDENTIFIER\fP +This corresponds to the concept of \fIname\fP, given previously. It +also includes literals as defined previously. +.TP 7 +\fBC_IDENTIFIER\fP +This is a name, and additionally it is known to be followed by a colon. +A literal cannot yield this token. +.TP 7 +\fBNUMBER\fP +A string of digits (a non-negative decimal integer). +.TP 7 +\fBTYPE\fP,\ \fBLEFT\fP,\ \fBMARK\fP,\ \fBLCURL\fP,\ \fBRCURL\fP +.sp +These correspond directly to \fB%type\fP, \fB%left\fP, \fB%%\fP, \fB%{\fP, +and \fB%}\fP. +.TP 7 +\fB{\ ...\ }\fP +This indicates C-language source code, with the possible inclusion +of \fB'$'\fP macros as discussed previously. +.sp +.sp +.RS +.nf + +\fB/* Grammar for the input to yacc. */ +/* Basic entries. */ +/* The following are recognized by the lexical analyzer. */ +.sp + +%token IDENTIFIER /* Includes identifiers and literals */ +%token C_IDENTIFIER /* identifier (but not literal) + followed by a :. */ +%token NUMBER /* [0-9][0-9]* */ +.sp + +/* Reserved words : %type=>TYPE %left=>LEFT, and so on */ +.sp + +%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION +.sp + +%token MARK /* The %% mark. */ +%token LCURL /* The %{ mark. */ +%token RCURL /* The %} mark. */ +.sp + +/* 8-bit character literals stand for themselves; */ +/* tokens have to be defined for multi-byte characters. */ +.sp + +%start spec +.sp + +%% +.sp + +spec : defs MARK rules tail + ; +tail : MARK + { + /* In this action, set up the rest of the file. */ + } + | /* Empty; the second MARK is optional. */ + ; +defs : /* Empty. */ + | defs def + ; +def : START IDENTIFIER + | UNION + { + /* Copy union definition to output. */ + } + | LCURL + { + /* Copy C code to output file. */ + } + RCURL + | rword tag nlist + ; +rword : TOKEN + | LEFT + | RIGHT + | NONASSOC + | TYPE + ; +tag : /* Empty: union tag ID optional. */ + | '<' IDENTIFIER '>' + ; +nlist : nmno + | nlist nmno + ; +nmno : IDENTIFIER /* Note: literal invalid with % type. */ + | IDENTIFIER NUMBER /* Note: invalid with % type. */ + ; +.sp + +/* Rule section */ +.sp + +rules : C_IDENTIFIER rbody prec + | rules rule + ; +rule : C_IDENTIFIER rbody prec + | '|' rbody prec + ; +rbody : /* empty */ + | rbody IDENTIFIER + | rbody act + ; +act : '{' + { + /* Copy action, translate $$, and so on. */ + } + '}' + ; +prec : /* Empty */ + | PREC IDENTIFIER + | PREC IDENTIFIER act + | prec ';' + ; +\fP +.fi +.RE +.SS Conflicts +.LP +The parser produced for an input grammar may contain states in which +conflicts occur. The conflicts occur because the grammar is +not LALR(1). An ambiguous grammar always contains at least one LALR(1) +conflict. The \fIyacc\fP utility shall resolve all +conflicts, using either default rules or user-specified precedence +rules. +.LP +Conflicts are either shift/reduce conflicts or reduce/reduce conflicts. +A shift/reduce conflict is where, for a given state and +lookahead symbol, both a shift action and a reduce action are possible. +A reduce/reduce conflict is where, for a given state and +lookahead symbol, reductions by two different rules are possible. +.LP +The rules below describe how to specify what actions to take when +a conflict occurs. Not all shift/reduce conflicts can be +successfully resolved this way because the conflict may be due to +something other than ambiguity, so incautious use of these +facilities can cause the language accepted by the parser to be much +different from that which was intended. The description file +shall contain sufficient information to understand the cause of the +conflict. Where ambiguity is the reason either the default or +explicit rules should be adequate to produce a working parser. +.LP +The declared precedences and associativities (see Declarations Section +) are used to resolve +parsing conflicts as follows: +.IP " 1." 4 +A precedence and associativity is associated with each grammar rule; +it is the precedence and associativity of the last token or +literal in the body of the rule. If the \fB%prec\fP keyword is used, +it overrides this default. Some grammar rules might not have +both precedence and associativity. +.LP +.IP " 2." 4 +If there is a shift/reduce conflict, and both the grammar rule and +the input symbol have precedence and associativity associated +with them, then the conflict is resolved in favor of the action (shift +or reduce) associated with the higher precedence. If the +precedences are the same, then the associativity is used; left associative +implies reduce, right associative implies shift, and +non-associative implies an error in the string being parsed. +.LP +.IP " 3." 4 +When there is a shift/reduce conflict that cannot be resolved by rule +2, the shift is done. Conflicts resolved this way are +counted in the diagnostic output described in Error Handling . +.LP +.IP " 4." 4 +When there is a reduce/reduce conflict, a reduction is done by the +grammar rule that occurs earlier in the input sequence. +Conflicts resolved this way are counted in the diagnostic output described +in Error Handling . +.LP +.LP +Conflicts resolved by precedence or associativity shall not be counted +in the shift/reduce and reduce/reduce conflicts reported +by \fIyacc\fP on either standard error or in the description file. +.SS Error Handling +.LP +The token \fBerror\fP shall be reserved for error handling. The name +\fBerror\fP can be used in grammar rules. It indicates +places where the parser can recover from a syntax error. The default +value of \fBerror\fP shall be 256. Its value can be changed +using a \fB%token\fP declaration. The lexical analyzer should not +return the value of \fBerror\fP. +.LP +The parser shall detect a syntax error when it is in a state where +the action associated with the lookahead symbol is +\fBerror\fP. A semantic action can cause the parser to initiate error +handling by executing the macro YYERROR. When YYERROR is +executed, the semantic action passes control back to the parser. YYERROR +cannot be used outside of semantic actions. +.LP +When the parser detects a syntax error, it normally calls \fIyyerror\fP() +with the character string +\fB"syntax\ error"\fP as its argument. The call shall not be made +if the parser is still recovering from a previous error +when the error is detected. The parser is considered to be recovering +from a previous error until the parser has shifted over at +least three normal input symbols since the last error was detected +or a semantic action has executed the macro \fIyyerrok\fP. The +parser shall not call \fIyyerror\fP() when YYERROR is executed. +.LP +The macro function YYRECOVERING shall return 1 if a syntax error has +been detected and the parser has not yet fully recovered +from it. Otherwise, zero shall be returned. +.LP +When a syntax error is detected by the parser, the parser shall check +if a previous syntax error has been detected. If a +previous error was detected, and if no normal input symbols have been +shifted since the preceding error was detected, the parser +checks if the lookahead symbol is an endmarker (see Interface to the +Lexical Analyzer ). If it is, +the parser shall return with a non-zero value. Otherwise, the lookahead +symbol shall be discarded and normal parsing shall +resume. +.LP +When YYERROR is executed or when the parser detects a syntax error +and no previous error has been detected, or at least one +normal input symbol has been shifted since the previous error was +detected, the parser shall pop back one state at a time until the +parse stack is empty or the current state allows a shift over \fBerror\fP. +If the parser empties the parse stack, it shall return +with a non-zero value. Otherwise, it shall shift over \fBerror\fP +and then resume normal parsing. If the parser reads a lookahead +symbol before the error was detected, that symbol shall still be the +lookahead symbol when parsing is resumed. +.LP +The macro \fIyyerrok\fP in a semantic action shall cause the parser +to act as if it has fully recovered from any previous +errors. The macro \fIyyclearin\fP shall cause the parser to discard +the current lookahead token. If the current lookahead token +has not yet been read, \fIyyclearin\fP shall have no effect. +.LP +The macro YYACCEPT shall cause the parser to return with the value +zero. The macro YYABORT shall cause the parser to return with +a non-zero value. +.SS Interface to the Lexical Analyzer +.LP +The \fIyylex\fP() function is an integer-valued function that returns +a \fItoken number\fP representing the kind of token +read. If there is a value associated with the token returned by \fIyylex\fP() +(see the discussion of \fItag\fP above), it shall +be assigned to the external variable \fIyylval\fP. +.LP +If the parser and \fIyylex\fP() do not agree on these token numbers, +reliable communication between them cannot occur. For +(single-byte character) literals, the token is simply the numeric +value of the character in the current character set. The numbers +for other tokens can either be chosen by \fIyacc\fP, or chosen by +the user. In either case, the \fB#define\fP construct of C is +used to allow \fIyylex\fP() to return these numbers symbolically. +The \fB#define\fP statements are put into the code file, and +the header file if that file is requested. The set of characters permitted +by \fIyacc\fP in an identifier is larger than that +permitted by C. Token names found to contain such characters shall +not be included in the \fB#define\fP declarations. +.LP +If the token numbers are chosen by \fIyacc\fP, the tokens other than +literals shall be assigned numbers greater than 256, +although no order is implied. A token can be explicitly assigned a +number by following its first appearance in the declarations +section with a number. Names and literals not defined this way retain +their default definition. All token numbers assigned by +\fIyacc\fP shall be unique and distinct from the token numbers used +for literals and user-assigned tokens. If duplicate token +numbers cause conflicts in parser generation, \fIyacc\fP shall report +an error; otherwise, it is unspecified whether the token +assignment is accepted or an error is reported. +.LP +The end of the input is marked by a special token called the \fIendmarker\fP, +which has a token number that is zero or +negative. (These values are invalid for any other token.) All lexical +analyzers shall return zero or negative as a token number +upon reaching the end of their input. If the tokens up to, but excluding, +the endmarker form a structure that matches the start +symbol, the parser shall accept the input. If the endmarker is seen +in any other context, it shall be considered an error. +.SS Completing the Program +.LP +In addition to \fIyyparse\fP() and \fIyylex\fP(), the functions \fIyyerror\fP() +and \fImain\fP() are required to make a +complete program. The application can supply \fImain\fP() and \fIyyerror\fP(), +or those routines can be obtained from the +\fIyacc\fP library. +.SS Yacc Library +.LP +The following functions shall appear only in the \fIyacc\fP library +accessible through the \fB-l\ y\fP operand to \fIc99\fP; they can +therefore be redefined by a conforming application: +.TP 7 +\fBint\ \fP \fImain\fP(\fBvoid\fP) +.sp +This function shall call \fIyyparse\fP() and exit with an unspecified +value. Other actions within this function are +unspecified. +.TP 7 +\fBint\ \fP \fIyyerror\fP(\fBconst\ char\fP\ *\fIs\fP) +.sp +This function shall write the NUL-terminated argument to standard +error, followed by a <newline>. +.sp +.LP +The order of the \fB-l\ y\fP and \fB-l\ l\fP operands given to \fIc99\fP +is +significant; the application shall either provide its own \fImain\fP() +function or ensure that \fB-l\ y\fP precedes +\fB-l\ l\fP. +.SS Debugging the Parser +.LP +The parser generated by \fIyacc\fP shall have diagnostic facilities +in it that can be optionally enabled at either compile time +or at runtime (if enabled at compile time). The compilation of the +runtime debugging code is under the control of YYDEBUG, a +preprocessor symbol. If YYDEBUG has a non-zero value, the debugging +code shall be included. If its value is zero, the code shall +not be included. +.LP +In parsers where the debugging code has been included, the external +\fBint\fP \fIyydebug\fP can be used to turn debugging on +(with a non-zero value) and off (zero value) at runtime. The initial +value of \fIyydebug\fP shall be zero. +.LP +When \fB-t\fP is specified, the code file shall be built such that, +if YYDEBUG is not already defined at compilation time +(using the \fIc99\fP \fB-D\fP YYDEBUG option, for example), YYDEBUG +shall be set explicitly +to 1. When \fB-t\fP is not specified, the code file shall be built +such that, if YYDEBUG is not already defined, it shall be set +explicitly to zero. +.LP +The format of the debugging output is unspecified but includes at +least enough information to determine the shift and reduce +actions, and the input symbols. It also provides information about +error recovery. +.SS Algorithms +.LP +The parser constructed by \fIyacc\fP implements an LALR(1) parsing +algorithm as documented in the literature. It is unspecified +whether the parser is table-driven or direct-coded. +.LP +A parser generated by \fIyacc\fP shall never request an input symbol +from \fIyylex\fP() while in a state where the only +actions other than the error action are reductions by a single rule. +.LP +The literature of parsing theory defines these concepts. +.SS Limits +.LP +The \fIyacc\fP utility may have several internal tables. The minimum +maximums for these tables are shown in the following +table. The exact meaning of these values is implementation-defined. +The implementation shall define the relationship between these +values and between them and any error messages that the implementation +may generate should it run out of space for any internal +structure. An implementation may combine groups of these resources +into a single pool as long as the total available to the user +does not fall below the sum of the sizes specified by this section. +.br +.sp +.ce 1 +\fBTable: Internal Limits in \fIyacc\fP\fP +.TS C +center; l l lw(40). +\fB\ \fP \fBMinimum\fP T{ +.na +\fB\ \fP +.ad +T} +\fBLimit\fP \fBMaximum\fP T{ +.na +\fBDescription\fP +.ad +T} +{NTERMS} 126 T{ +.na +Number of tokens. +.ad +T} +{NNONTERM} 200 T{ +.na +Number of non-terminals. +.ad +T} +{NPROD} 300 T{ +.na +Number of rules. +.ad +T} +{NSTATES} 600 T{ +.na +Number of states. +.ad +T} +{MEMSIZE} 5200 T{ +.na +Length of rules. The total length, in names (tokens and non-terminals), of all the rules of the grammar. The left-hand side is counted for each rule, even if it is not explicitly repeated, as specified in Grammar Rules in yacc . +.ad +T} +{ACTSIZE} 4000 T{ +.na +Number of actions. "Actions" here (and in the description file) refer to parser actions (shift, reduce, and so on) not to semantic actions defined in Grammar Rules in yacc . +.ad +T} +.TE +.SH EXIT STATUS +.LP +The following exit values shall be returned: +.TP 7 +\ 0 +Successful completion. +.TP 7 +>0 +An error occurred. +.sp +.SH CONSEQUENCES OF ERRORS +.LP +If any errors are encountered, the run is aborted and \fIyacc\fP exits +with a non-zero status. Partial code files and header +files may be produced. The summary information in the description +file shall always be produced if the \fB-v\fP flag is +present. +.LP +\fIThe following sections are informative.\fP +.SH APPLICATION USAGE +.LP +Historical implementations experience name conflicts on the names +\fByacc.tmp\fP, \fByacc.acts\fP, \fByacc.debug\fP, +\fBy.tab.c\fP, \fBy.tab.h\fP, and \fBy.output\fP if more than one +copy of \fIyacc\fP is running in a single directory at one +time. The \fB-b\fP option was added to overcome this problem. The +related problem of allowing multiple \fIyacc\fP parsers to be +placed in the same file was addressed by adding a \fB-p\fP option +to override the previously hard-coded \fByy\fP variable +prefix. +.LP +The description of the \fB-p\fP option specifies the minimal set of +function and variable names that cause conflict when +multiple parsers are linked together. YYSTYPE does not need to be +changed. Instead, the programmer can use \fB-b\fP to give the +header files for different parsers different names, and then the file +with the \fIyylex\fP() for a given parser can include the +header for that parser. Names such as \fIyyclearerr\fP do not need +to be changed because they are used only in the actions; they +do not have linkage. It is possible that an implementation has other +names, either internal ones for implementing things such as +\fIyyclearerr\fP, or providing non-standard features that it wants +to change with \fB-p\fP. +.LP +Unary operators that are the same token as a binary operator in general +need their precedence adjusted. This is handled by the +\fB%prec\fP advisory symbol associated with the particular grammar +rule defining that unary operator. (See Grammar Rules in yacc .) Applications +are not required to use this operator for unary operators, but the +grammars that do not require it are rare. +.SH EXAMPLES +.LP +Access to the \fIyacc\fP library is obtained with library search operands +to \fIc99\fP. To +use the \fIyacc\fP library \fImain\fP(): +.sp +.RS +.nf + +\fBc99 y.tab.c -l y +\fP +.fi +.RE +.LP +Both the \fIlex\fP library and the \fIyacc\fP library contain \fImain\fP(). +To access the +\fIyacc\fP \fImain\fP(): +.sp +.RS +.nf + +\fBc99 y.tab.c lex.yy.c -l y -l l +\fP +.fi +.RE +.LP +This ensures that the \fIyacc\fP library is searched first, so that +its \fImain\fP() is used. +.LP +The historical \fIyacc\fP libraries have contained two simple functions +that are normally coded by the application programmer. +These functions are similar to the following code: +.sp +.RS +.nf + +\fB#include <locale.h> +int main(void) +{ + extern int yyparse(); +.sp + + setlocale(LC_ALL, ""); +.sp + + /* If the following parser is one created by lex, the + application must be careful to ensure that LC_CTYPE + and LC_COLLATE are set to the POSIX locale. */ + (void) yyparse(); + return (0); +} +.sp + +#include <stdio.h> +.sp + +int yyerror(const char *msg) +{ + (void) fprintf(stderr, "%s\\n", msg); + return (0); +} +\fP +.fi +.RE +.SH RATIONALE +.LP +The references in may be helpful in constructing the parser generator. +The referenced DeRemer and Pennello article (along with +the works it references) describes a technique to generate parsers +that conform to this volume of IEEE\ Std\ 1003.1-2001. +Work in this area continues to be done, so implementors should consult +current literature before doing any new implementations. The +original Knuth article is the theoretical basis for this kind of parser, +but the tables it generates are impractically large for +reasonable grammars and should not be used. The "equivalent to" wording +is intentional to assure that the best tables that are +LALR(1) can be generated. +.LP +There has been confusion between the class of grammars, the algorithms +needed to generate parsers, and the algorithms needed to +parse the languages. They are all reasonably orthogonal. In particular, +a parser generator that accepts the full range of LR(1) +grammars need not generate a table any more complex than one that +accepts SLR(1) (a relatively weak class of LR grammars) for a +grammar that happens to be SLR(1). Such an implementation need not +recognize the case, either; table compression can yield the +SLR(1) table (or one even smaller than that) without recognizing that +the grammar is SLR(1). The speed of an LR(1) parser for any +class is dependent more upon the table representation and compression +(or the code generation if a direct parser is generated) than +upon the class of grammar that the table generator handles. +.LP +The speed of the parser generator is somewhat dependent upon the class +of grammar it handles. However, the original Knuth +article algorithms for constructing LR parsers were judged by its +author to be impractically slow at that time. Although full LR is +more complex than LALR(1), as computer speeds and algorithms improve, +the difference (in terms of acceptable wall-clock execution +time) is becoming less significant. +.LP +Potential authors are cautioned that the referenced DeRemer and Pennello +article previously cited identifies a bug (an +over-simplification of the computation of LALR(1) lookahead sets) +in some of the LALR(1) algorithm statements that preceded it to +publication. They should take the time to seek out that paper, as +well as current relevant work, particularly Aho's. +.LP +The \fB-b\fP option was added to provide a portable method for permitting +\fIyacc\fP to work on multiple separate parsers in +the same directory. If a directory contains more than one \fIyacc\fP +grammar, and both grammars are constructed at the same time +(by, for example, a parallel \fImake\fP program), conflict results. +While the solution is not +historical practice, it corrects a known deficiency in historical +implementations. Corresponding changes were made to all sections +that referenced the filenames \fBy.tab.c\fP (now "the code file"), +\fBy.tab.h\fP (now "the header file"), and \fBy.output\fP +(now "the description file"). +.LP +The grammar for \fIyacc\fP input is based on System V documentation. +The textual description shows there that the \fB';'\fP +is required at the end of the rule. The grammar and the implementation +do not require this. (The use of \fBC_IDENTIFIER\fP causes +a reduce to occur in the right place.) +.LP +Also, in that implementation, the constructs such as \fB%token\fP +can be terminated by a semicolon, but this is not permitted +by the grammar. The keywords such as \fB%token\fP can also appear +in uppercase, which is again not discussed. In most places where +\fB'%'\fP is used, \fB'\\'\fP can be substituted, and there are alternate +spellings for some of the symbols (for example, +\fB%LEFT\fP can be \fB"%<"\fP or even \fB"\\<"\fP ). +.LP +Historically, <\fItag\fP> can contain any characters except \fB'>'\fP, +including white space, in the +implementation. However, since the \fItag\fP must reference an ISO\ C +standard union member, in practice conforming +implementations need to support only the set of characters for ISO\ C +standard identifiers in this context. +.LP +Some historical implementations are known to accept actions that are +terminated by a period. Historical implementations often +allow \fB'$'\fP in names. A conforming implementation does not need +to support either of these behaviors. +.LP +Deciding when to use \fB%prec\fP illustrates the difficulty in specifying +the behavior of \fIyacc\fP. There may be situations +in which the \fIgrammar\fP is not, strictly speaking, in error, and +yet \fIyacc\fP cannot interpret it unambiguously. The +resolution of ambiguities in the grammar can in many instances be +resolved by providing additional information, such as using +\fB%type\fP or \fB%union\fP declarations. It is often easier and it +usually yields a smaller parser to take this alternative when +it is appropriate. +.LP +The size and execution time of a program produced without the runtime +debugging code is usually smaller and slightly faster in +historical implementations. +.LP +Statistics messages from several historical implementations include +the following types of information: +.sp +.RS +.nf + +\fIn\fP\fB/512 terminals,\fP \fIn\fP\fB/300 non-terminals +\fP\fIn\fP\fB/600 grammar rules,\fP \fIn\fP\fB/1500 states +\fP\fIn\fP \fBshift/reduce,\fP \fIn\fP \fBreduce/reduce conflicts reported +\fP\fIn\fP\fB/350 working sets used +Memory: states, etc.\fP \fIn\fP\fB/15000, parser\fP \fIn\fP\fB/15000 +\fP\fIn\fP\fB/600 distinct lookahead sets +\fP\fIn\fP \fBextra closures +\fP\fIn\fP \fBshift entries,\fP \fIn\fP \fBexceptions +\fP\fIn\fP \fBgoto entries +\fP\fIn\fP \fBentries saved by goto default +Optimizer space used: input\fP \fIn\fP\fB/15000, output\fP \fIn\fP\fB/15000 +\fP\fIn\fP \fBtable entries,\fP \fIn\fP \fBzero +Maximum spread:\fP \fIn\fP\fB, Maximum offset:\fP \fIn\fP +.fi +.RE +.LP +The report of internal tables in the description file is left implementation-defined +because all aspects of these limits are +also implementation-defined. Some implementations may use dynamic +allocation techniques and have no specific limit values to +report. +.LP +The format of the \fBy.output\fP file is not given because specification +of the format was not seen to enhance applications +portability. The listing is primarily intended to help human users +understand and debug the parser; use of \fBy.output\fP by a +conforming application script would be unusual. Furthermore, implementations +have not produced consistent output and no popular +format was apparent. The format selected by the implementation should +be human-readable, in addition to the requirement that it be +a text file. +.LP +Standard error reports are not specifically described because they +are seldom of use to conforming applications and there was no +reason to restrict implementations. +.LP +Some implementations recognize \fB"={"\fP as equivalent to \fB'{'\fP +because it appears in historical documentation. This +construction was recognized and documented as obsolete as long ago +as 1978, in the referenced \fIYacc: Yet Another +Compiler-Compiler\fP. This volume of IEEE\ Std\ 1003.1-2001 chose +to leave it as obsolete and omit it. +.LP +Multi-byte characters should be recognized by the lexical analyzer +and returned as tokens. They should not be returned as +multi-byte character literals. The token \fBerror\fP that is used +for error recovery is normally assigned the value 256 in the +historical implementation. Thus, the token value 256, which is used +in many multi-byte character sets, is not available for use as +the value of a user-defined token. +.SH FUTURE DIRECTIONS +.LP +None. +.SH SEE ALSO +.LP +\fIc99\fP, \fIlex\fP +.SH COPYRIGHT +Portions of this text are reprinted and reproduced in electronic form +from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology +-- Portable Operating System Interface (POSIX), The Open Group Base +Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of +Electrical and Electronics Engineers, Inc and The Open Group. In the +event of any discrepancy between this version and the original IEEE and +The Open Group Standard, the original IEEE and The Open Group Standard +is the referee document. The original Standard can be obtained online at +http://www.opengroup.org/unix/online.html . |