summaryrefslogtreecommitdiffstats
path: root/man-pages-posix-2017/man1p/awk.1p
diff options
context:
space:
mode:
Diffstat (limited to 'man-pages-posix-2017/man1p/awk.1p')
-rw-r--r--man-pages-posix-2017/man1p/awk.1p4036
1 files changed, 4036 insertions, 0 deletions
diff --git a/man-pages-posix-2017/man1p/awk.1p b/man-pages-posix-2017/man1p/awk.1p
new file mode 100644
index 0000000..14f68be
--- /dev/null
+++ b/man-pages-posix-2017/man1p/awk.1p
@@ -0,0 +1,4036 @@
+'\" et
+.TH AWK "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
+.\"
+.SH PROLOG
+This manual page is part of the POSIX Programmer's Manual.
+The Linux implementation of this interface may differ (consult
+the corresponding Linux manual page for details of Linux behavior),
+or the interface may not be implemented on Linux.
+.\"
+.SH NAME
+awk
+\(em pattern scanning and processing language
+.SH SYNOPSIS
+.LP
+.nf
+awk \fB[\fR-F \fIsepstring\fB] [\fR-v \fIassignment\fB]\fR... \fIprogram\fB [\fIargument\fR...\fB]\fR
+.P
+awk \fB[\fR-F \fIsepstring\fB] \fR-f \fIprogfile \fB[\fR-f \fIprogfile\fB]\fR... \fB[\fR-v \fIassignment\fB]\fR...
+ \fB[\fIargument\fR...\fB]\fR
+.fi
+.SH DESCRIPTION
+The
+.IR awk
+utility shall execute programs written in the
+.IR awk
+programming language, which is specialized for textual data
+manipulation. An
+.IR awk
+program is a sequence of patterns and corresponding actions. When
+input is read that matches a pattern, the action associated with that
+pattern is carried out.
+.P
+Input shall be interpreted as a sequence of records. By default, a
+record is a line, less its terminating
+<newline>,
+but this can be changed by using the
+.BR RS
+built-in variable. Each record of input shall be matched in turn
+against each pattern in the program. For each pattern matched, the
+associated action shall be executed.
+.P
+The
+.IR awk
+utility shall interpret each input record as a sequence of fields
+where, by default, a field is a string of non-\c
+<blank>
+non-\c
+<newline>
+characters. This default
+<blank>
+and
+<newline>
+field delimiter can be changed by using the
+.BR FS
+built-in variable or the
+.BR \-F
+.IR sepstring
+option. The
+.IR awk
+utility shall denote the first field in a record $1, the second $2, and
+so on. The symbol $0 shall refer to the entire record; setting any
+other field causes the re-evaluation of $0. Assigning to $0 shall reset
+the values of all other fields and the
+.BR NF
+built-in variable.
+.SH OPTIONS
+The
+.IR awk
+utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 12.2" ", " "Utility Syntax Guidelines".
+.P
+The following options shall be supported:
+.IP "\fB\-F\ \fIsepstring\fR" 10
+Define the input field separator. This option shall be equivalent to:
+.RS 10
+.sp
+.RS 4
+.nf
+
+-v FS=\fIsepstring
+.fi
+.P
+.RE
+.P
+except that if
+.BR \-F
+.IR sepstring
+and
+.BR \-v
+.IR \fRFS=\fPsepstring\fR
+are both used, it is unspecified whether the
+.BR FS
+assignment resulting from
+.BR \-F
+.IR sepstring
+is processed in command line order or is processed after the last
+.BR \-v
+.IR \fRFS=\fPsepstring\fR .
+See the description of the
+.BR FS
+built-in variable, and how it is used, in the EXTENDED DESCRIPTION
+section.
+.RE
+.IP "\fB\-f\ \fIprogfile\fR" 10
+Specify the pathname of the file
+.IR progfile
+containing an
+.IR awk
+program. A pathname of
+.BR '\-'
+shall denote the standard input. If multiple instances of this option
+are specified, the concatenation of the files specified as
+.IR progfile
+in the order specified shall be the
+.IR awk
+program. The
+.IR awk
+program can alternatively be specified in the command line as a single
+argument.
+.IP "\fB\-v\ \fIassignment\fR" 10
+.br
+The application shall ensure that the
+.IR assignment
+argument is in the same form as an
+.IR assignment
+operand. The specified variable assignment shall occur prior to
+executing the
+.IR awk
+program, including the actions associated with
+.BR BEGIN
+patterns (if any). Multiple occurrences of this option can be
+specified.
+.SH OPERANDS
+The following operands shall be supported:
+.IP "\fIprogram\fR" 10
+If no
+.BR \-f
+option is specified, the first operand to
+.IR awk
+shall be the text of the
+.IR awk
+program. The application shall supply the
+.IR program
+operand as a single argument to
+.IR awk .
+If the text does not end in a
+<newline>,
+.IR awk
+shall interpret the text as if it did.
+.IP "\fIargument\fR" 10
+Either of the following two types of
+.IR argument
+can be intermixed:
+.RS 10
+.IP "\fIfile\fR" 10
+A pathname of a file that contains the input to be read, which is
+matched against the set of patterns in the program. If no
+.IR file
+operands are specified, or if a
+.IR file
+operand is
+.BR '\-' ,
+the standard input shall be used.
+.IP "\fIassignment\fR" 10
+An operand that begins with an
+<underscore>
+or alphabetic character from the portable character set (see the table
+in the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 6.1" ", " "Portable Character Set"),
+followed by a sequence of underscores, digits, and alphabetics from the
+portable character set, followed by the
+.BR '='
+character, shall specify a variable assignment rather than a pathname.
+The characters before the
+.BR '='
+represent the name of an
+.IR awk
+variable; if that name is an
+.IR awk
+reserved word (see
+.IR "Grammar")
+the behavior is undefined. The characters following the
+<equals-sign>
+shall be interpreted as if they appeared in the
+.IR awk
+program preceded and followed by a double-quote (\c
+.BR '\&"' )
+character, as a
+.BR STRING
+token (see
+.IR "Grammar"),
+except that if the last character is an unescaped
+<backslash>,
+it shall be interpreted as a literal
+<backslash>
+rather than as the first character of the sequence
+.BR \(dq\e"\(dq .
+The variable shall be assigned the value of that
+.BR STRING
+token and, if appropriate, shall be considered a
+.IR "numeric string"
+(see
+.IR "Expressions in awk"),
+the variable shall also be assigned its numeric value. Each such
+variable assignment shall occur just prior to the processing of the
+following
+.IR file ,
+if any. Thus, an assignment before the first
+.IR file
+argument shall be executed after the
+.BR BEGIN
+actions (if any), while an assignment after the last
+.IR file
+argument shall occur before the
+.BR END
+actions (if any). If there are no
+.IR file
+arguments, assignments shall be executed before processing the standard
+input.
+.RE
+.SH STDIN
+The standard input shall be used only if no
+.IR file
+operands are specified, or if a
+.IR file
+operand is
+.BR '\-' ,
+or if a
+.IR progfile
+option-argument is
+.BR '\-' ;
+see the INPUT FILES section. If the
+.IR awk
+program contains no actions and no patterns, but is otherwise a valid
+.IR awk
+program, standard input and any
+.IR file
+operands shall not be read and
+.IR awk
+shall exit with a return status of zero.
+.SH "INPUT FILES"
+Input files to the
+.IR awk
+program from any of the following sources shall be text files:
+.IP " *" 4
+Any
+.IR file
+operands or their equivalents, achieved by modifying the
+.IR awk
+variables
+.BR ARGV
+and
+.BR ARGC
+.IP " *" 4
+Standard input in the absence of any
+.IR file
+operands
+.IP " *" 4
+Arguments to the
+.BR getline
+function
+.P
+Whether the variable
+.BR RS
+is set to a value other than a
+<newline>
+or not, for these files, implementations shall support records
+terminated with the specified separator up to
+{LINE_MAX}
+bytes and may support longer records.
+.P
+If
+.BR \-f
+.IR progfile
+is specified, the application shall ensure that the files named by each
+of the
+.IR progfile
+option-arguments are text files and their concatenation, in the same
+order as they appear in the arguments, is an
+.IR awk
+program.
+.SH "ENVIRONMENT VARIABLES"
+The following environment variables shall affect the execution of
+.IR awk :
+.IP "\fILANG\fP" 10
+Provide a default value for the internationalization variables that are
+unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 8.2" ", " "Internationalization Variables"
+for the precedence of internationalization variables used to determine
+the values of locale categories.)
+.IP "\fILC_ALL\fP" 10
+If set to a non-empty string value, override the values of all the
+other internationalization variables.
+.IP "\fILC_COLLATE\fP" 10
+.br
+Determine the locale for the behavior of ranges, equivalence classes,
+and multi-character collating elements within regular expressions and
+in comparisons of string values.
+.IP "\fILC_CTYPE\fP" 10
+Determine the locale for the interpretation of sequences of bytes of
+text data as characters (for example, single-byte as opposed to
+multi-byte characters in arguments and input files), the behavior of
+character classes within regular expressions, the identification of
+characters as letters, and the mapping of uppercase and lowercase
+characters for the
+.BR toupper
+and
+.BR tolower
+functions.
+.IP "\fILC_MESSAGES\fP" 10
+.br
+Determine the locale that should be used to affect the format and
+contents of diagnostic messages written to standard error.
+.IP "\fILC_NUMERIC\fP" 10
+.br
+Determine the radix character used when interpreting numeric input,
+performing conversions between numeric and string values, and
+formatting numeric output. Regardless of locale, the
+<period>
+character (the decimal-point character of the POSIX locale) is the
+decimal-point character recognized in processing
+.IR awk
+programs (including assignments in command line arguments).
+.IP "\fINLSPATH\fP" 10
+Determine the location of message catalogs for the processing of
+.IR LC_MESSAGES .
+.IP "\fIPATH\fP" 10
+Determine the search path when looking for commands executed by
+\fIsystem\fR(\fIexpr\fR), or input and output pipes; see the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Chapter 8" ", " "Environment Variables".
+.P
+In addition, all environment variables shall be visible via the
+.IR awk
+variable
+.BR ENVIRON .
+.SH "ASYNCHRONOUS EVENTS"
+Default.
+.SH STDOUT
+The nature of the output files depends on the
+.IR awk
+program.
+.SH STDERR
+The standard error shall be used only for diagnostic messages.
+.SH "OUTPUT FILES"
+The nature of the output files depends on the
+.IR awk
+program.
+.br
+.SH "EXTENDED DESCRIPTION"
+.SS "Overall Program Structure"
+.P
+An
+.IR awk
+program is composed of pairs of the form:
+.sp
+.RS 4
+.nf
+
+\fIpattern\fR { \fIaction\fR }
+.fi
+.P
+.RE
+.P
+Either the pattern or the action (including the enclosing brace
+characters) can be omitted.
+.P
+A missing pattern shall match any record of input, and a missing action
+shall be equivalent to:
+.sp
+.RS 4
+.nf
+
+{ print }
+.fi
+.P
+.RE
+.P
+Execution of the
+.IR awk
+program shall start by first executing the actions associated with all
+.BR BEGIN
+patterns in the order they occur in the program. Then each
+.IR file
+operand (or standard input if no files were specified) shall be
+processed in turn by reading data from the file until a record
+separator is seen (\c
+<newline>
+by default). Before the first reference to a field in the record is
+evaluated, the record shall be split into fields, according to the
+rules in
+.IR "Regular Expressions",
+using the value of
+.BR FS
+that was current at the time the record was read. Each pattern in the
+program then shall be evaluated in the order of occurrence, and the
+action associated with each pattern that matches the current record
+executed. The action for a matching pattern shall be executed before
+evaluating subsequent patterns. Finally, the actions associated with
+all
+.BR END
+patterns shall be executed in the order they occur in the program.
+.SS "Expressions in awk"
+.P
+Expressions describe computations used in
+.IR patterns
+and
+.IR actions .
+In the following table, valid expression operations are given in groups
+from highest precedence first to lowest precedence last, with
+equal-precedence operators grouped between horizontal lines. In
+expression evaluation, where the grammar is formally ambiguous, higher
+precedence operators shall be evaluated before lower precedence
+operators. In this table
+.IR expr ,
+.IR expr1 ,
+.IR expr2 ,
+and
+.IR expr3
+represent any expression, while lvalue represents any entity that can
+be assigned to (that is, on the left side of an assignment operator).
+The precise syntax of expressions is given in
+.IR "Grammar".
+.sp
+.ce 1
+\fBTable 4-1: Expressions in Decreasing Precedence in \fIawk\fP\fR
+.TS
+box tab(@) center;
+cB | cB | cB | cB
+l1f5 | l1 | l1 | l.
+Syntax@Name@Type of Result@Associativity
+_
+( \fIexpr\fP )@Grouping@Type of \fIexpr\fP@N/A
+_
+$\fIexpr\fP@Field reference@String@N/A
+_
+lvalue ++@Post-increment@Numeric@N/A
+lvalue \-\|\-@Post-decrement@Numeric@N/A
+_
+++ lvalue@Pre-increment@Numeric@N/A
+\-\|\- lvalue@Pre-decrement@Numeric@N/A
+_
+\fIexpr\fP ^ \fIexpr\fP@Exponentiation@Numeric@Right
+_
+! \fIexpr\fP@Logical not@Numeric@N/A
++ \fIexpr\fP@Unary plus@Numeric@N/A
+\- \fIexpr\fP@Unary minus@Numeric@N/A
+_
+\fIexpr\fP * \fIexpr\fP@Multiplication@Numeric@Left
+\fIexpr\fP / \fIexpr\fP@Division@Numeric@Left
+\fIexpr\fP % \fIexpr\fP@Modulus@Numeric@Left
+_
+\fIexpr\fP + \fIexpr\fP@Addition@Numeric@Left
+\fIexpr\fP \- \fIexpr\fP@Subtraction@Numeric@Left
+_
+\fIexpr\fP \fIexpr\fP@String concatenation@String@Left
+_
+\fIexpr\fP < \fIexpr\fP@Less than@Numeric@None
+\fIexpr\fP <= \fIexpr\fP@Less than or equal to@Numeric@None
+\fIexpr\fP != \fIexpr\fP@Not equal to@Numeric@None
+\fIexpr\fP == \fIexpr\fP@Equal to@Numeric@None
+\fIexpr\fP > \fIexpr\fP@Greater than@Numeric@None
+\fIexpr\fP >= \fIexpr\fP@Greater than or equal to@Numeric@None
+_
+\fIexpr\fP ~ \fIexpr\fP@ERE match@Numeric@None
+\fIexpr\fP !~ \fIexpr\fP@ERE non-match@Numeric@None
+_
+\fIexpr\fP in array@Array membership@Numeric@Left
+( \fIindex\fP ) in \fIarray\fP@Multi-dimension array@Numeric@Left
+@membership
+_
+\fIexpr\fP && \fIexpr\fP@Logical AND@Numeric@Left
+_
+\fIexpr\fP || \fIexpr\fP@Logical OR@Numeric@Left
+_
+\fIexpr1\fP ? \fIexpr2\fP : \fIexpr3\fP@Conditional expression@Type of selected@Right
+@@\fIexpr2\fP or \fIexpr3\fP
+_
+lvalue ^= \fIexpr\fP@Exponentiation assignment@Numeric@Right
+lvalue %= \fIexpr\fP@Modulus assignment@Numeric@Right
+lvalue *= \fIexpr\fP@Multiplication assignment@Numeric@Right
+lvalue /= \fIexpr\fP@Division assignment@Numeric@Right
+lvalue += \fIexpr\fP@Addition assignment@Numeric@Right
+lvalue \-= \fIexpr\fP@Subtraction assignment@Numeric@Right
+lvalue = \fIexpr\fP@Assignment@Type of \fIexpr\fP@Right
+.TE
+.P
+Each expression shall have either a string value, a numeric value, or
+both. Except as stated for specific contexts, the value of an expression
+shall be implicitly converted to the type needed for the context in which
+it is used. A string value shall be converted to a numeric value either by
+the equivalent of the following calls to functions defined by the ISO\ C standard:
+.sp
+.RS 4
+.nf
+
+setlocale(LC_NUMERIC, "");
+\fInumeric_value\fR = atof(\fIstring_value\fR);
+.fi
+.P
+.RE
+.P
+or by converting the initial portion of the string to type
+.BR double
+representation as follows:
+.sp
+.RS
+The input string is decomposed into two parts: an initial, possibly empty,
+sequence of white-space characters (as specified by
+\fIisspace\fR())
+and a subject sequence interpreted as a floating-point constant.
+.P
+The expected form of the subject sequence is an optional
+.BR '+'
+or
+.BR '\-'
+sign, then a non-empty sequence of digits optionally containing a
+<period>,
+then an optional exponent part. An exponent part consists of
+.BR 'e'
+or
+.BR 'E' ,
+followed by an optional sign, followed by one or more decimal digits.
+.P
+The sequence starting with the first digit or the
+<period>
+(whichever occurs first) is interpreted as a floating constant of the
+C language, and if neither an exponent part nor a
+<period>
+appears, a
+<period>
+is assumed to follow the last digit in the string. If the subject
+sequence begins with a
+<hyphen-minus>,
+the value resulting from the conversion is negated.
+.RE
+.P
+A numeric value that is exactly equal to the value of an integer (see
+.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard")
+shall be converted to a string by the equivalent of a call to the
+.BR sprintf
+function (see
+.IR "String Functions")
+with the string
+.BR \(dq%d\(dq
+as the
+.IR fmt
+argument and the numeric value being converted as the first and only
+.IR expr
+argument. Any other numeric value shall be converted to a string by the
+equivalent of a call to the
+.BR sprintf
+function with the value of the variable
+.BR CONVFMT
+as the
+.IR fmt
+argument and the numeric value being converted as the first and only
+.IR expr
+argument. The result of the conversion is unspecified if the value of
+.BR CONVFMT
+is not a floating-point format specification. This volume of POSIX.1\(hy2017 specifies no
+explicit conversions between numbers and strings. An application can
+force an expression to be treated as a number by adding zero to it, or
+can force it to be treated as a string by concatenating the null string
+(\c
+.BR \(dq\^\(dq )
+to it.
+.P
+A string value shall be considered a
+.IR "numeric string"
+if it comes from one of the following:
+.IP " 1." 4
+Field variables
+.IP " 2." 4
+Input from the
+\fIgetline\fR()
+function
+.IP " 3." 4
+.BR FILENAME
+.IP " 4." 4
+.BR ARGV
+array elements
+.IP " 5." 4
+.BR ENVIRON
+array elements
+.IP " 6." 4
+Array elements created by the
+\fIsplit\fR()
+function
+.IP " 7." 4
+A command line variable assignment
+.IP " 8." 4
+Variable assignment from another numeric string variable
+.P
+and an implementation-dependent condition corresponding to either
+case (a) or (b) below is met.
+.IP " a." 4
+After the equivalent of the following calls to functions defined by
+the ISO\ C standard,
+.IR string_value_end
+would differ from
+.IR string_value ,
+and any characters before the terminating null character in
+.IR string_value_end
+would be
+<blank>
+characters:
+.RS 4
+.sp
+.RS 4
+.nf
+
+char *string_value_end;
+setlocale(LC_NUMERIC, "");
+numeric_value = strtod (string_value, &string_value_end);
+.fi
+.P
+.RE
+.RE
+.IP " b." 4
+After all the following conversions have been applied, the resulting
+string would lexically be recognized as a
+.BR NUMBER
+token as described by the lexical conventions in
+.IR "Grammar":
+.RS 4
+.IP -- 4
+All leading and trailing
+<blank>
+characters are discarded.
+.IP -- 4
+If the first non-\c
+<blank>
+is
+.BR '\(pl'
+or
+.BR '\-' ,
+it is discarded.
+.IP -- 4
+Each occurrence of the decimal point character from the current locale
+is changed to a
+<period>.
+.RE
+In case (a) the numeric value of the
+.IR "numeric string"
+shall be the value that would be returned by the
+\fIstrtod\fR()
+call. In case (b) if the first non-\c
+<blank>
+is
+.BR '\-' ,
+the numeric value of the
+.IR "numeric string"
+shall be the negation of the numeric value of the recognized
+.BR NUMBER
+token; otherwise, the numeric value of the
+.IR "numeric string"
+shall be the numeric value of the recognized
+.BR NUMBER
+token. Whether or not a string is a
+.IR "numeric string"
+shall be relevant only in contexts where that term is used in this
+section.
+.P
+When an expression is used in a Boolean context, if it has a numeric
+value, a value of zero shall be treated as false and any other value
+shall be treated as true. Otherwise, a string value of the null string
+shall be treated as false and any other value shall be treated as true.
+A Boolean context shall be one of the following:
+.IP " *" 4
+The first subexpression of a conditional expression
+.IP " *" 4
+An expression operated on by logical NOT, logical AND, or logical OR
+.IP " *" 4
+The second expression of a
+.BR for
+statement
+.IP " *" 4
+The expression of an
+.BR if
+statement
+.IP " *" 4
+The expression of the
+.BR while
+clause in either a
+.BR while
+or
+.BR do .\|.\|.\c
+.BR while
+statement
+.IP " *" 4
+An expression used as a pattern (as in Overall Program Structure)
+.P
+All arithmetic shall follow the semantics of floating-point arithmetic as
+specified by the ISO\ C standard (see
+.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard").
+.P
+The value of the expression:
+.sp
+.RS 4
+.nf
+
+\fIexpr1\fR \(ha \fIexpr2\fR
+.fi
+.P
+.RE
+.P
+shall be equivalent to the value returned by the ISO\ C standard function call:
+.sp
+.RS 4
+.nf
+
+\fRpow(\fIexpr1\fR, \fIexpr2\fR)
+.fi
+.P
+.RE
+.P
+The expression:
+.sp
+.RS 4
+.nf
+
+lvalue \(ha= \fIexpr\fR
+.fi
+.P
+.RE
+.P
+shall be equivalent to the ISO\ C standard expression:
+.sp
+.RS 4
+.nf
+
+lvalue = pow(lvalue, \fIexpr\fR)
+.fi
+.P
+.RE
+.P
+except that lvalue shall be evaluated only once. The value of the
+expression:
+.sp
+.RS 4
+.nf
+
+\fIexpr1\fR % \fIexpr2\fR
+.fi
+.P
+.RE
+.P
+shall be equivalent to the value returned by the ISO\ C standard function call:
+.sp
+.RS 4
+.nf
+
+fmod(\fIexpr1\fR, \fIexpr2\fR)
+.fi
+.P
+.RE
+.P
+The expression:
+.sp
+.RS 4
+.nf
+
+lvalue %= \fIexpr\fR
+.fi
+.P
+.RE
+.P
+shall be equivalent to the ISO\ C standard expression:
+.sp
+.RS 4
+.nf
+
+lvalue = fmod(lvalue, \fIexpr\fR)
+.fi
+.P
+.RE
+.P
+except that lvalue shall be evaluated only once.
+.P
+Variables and fields shall be set by the assignment statement:
+.sp
+.RS 4
+.nf
+
+lvalue = \fIexpression\fR
+.fi
+.P
+.RE
+.P
+and the type of
+.IR expression
+shall determine the resulting variable type. The assignment includes
+the arithmetic assignments (\c
+.BR \(dq+=\(dq ,
+.BR \(dq-=\(dq ,
+.BR \(dq*=\(dq ,
+.BR \(dq/=\(dq ,
+.BR \(dq%=\(dq ,
+.BR \(dq\(ha=\(dq ,
+.BR \(dq++\(dq ,
+.BR \(dq--\(dq )
+all of which shall produce a numeric result. The left-hand side of an
+assignment and the target of increment and decrement operators can be
+one of a variable, an array with index, or a field selector.
+.P
+The
+.IR awk
+language supplies arrays that are used for storing numbers or strings.
+Arrays need not be declared. They shall initially be empty, and their
+sizes shall change dynamically. The subscripts, or element identifiers,
+are strings, providing a type of associative array capability. An array
+name followed by a subscript within square brackets can be used as an
+lvalue and thus as an expression, as described in the grammar; see
+.IR "Grammar".
+Unsubscripted array names can be used in only the following contexts:
+.IP " *" 4
+A parameter in a function definition or function call
+.IP " *" 4
+The
+.BR NAME
+token following any use of the keyword
+.BR in
+as specified in the grammar (see
+.IR "Grammar");
+if the name used in this context is not an array name, the behavior is
+undefined
+.P
+A valid array
+.IR index
+shall consist of one or more
+<comma>-separated
+expressions, similar to the way in which multi-dimensional arrays are
+indexed in some programming languages. Because
+.IR awk
+arrays are really one-dimensional, such a
+<comma>-separated
+list shall be converted to a single string by concatenating the string
+values of the separate expressions, each separated from the other by
+the value of the
+.BR SUBSEP
+variable. Thus, the following two index operations shall be
+equivalent:
+.sp
+.RS 4
+.nf
+
+\fIvar\fB[\fIexpr1\fR, \fIexpr2\fR, ... \fIexprn\fB]
+.P
+\fIvar\fB[\fIexpr1\fR SUBSEP \fIexpr2\fR SUBSEP ... \fRSUBSEP \fIexprn\fB]\fR
+.fi
+.P
+.RE
+.P
+The application shall ensure that a multi-dimensioned
+.IR index
+used with the
+.BR in
+operator is parenthesized. The
+.BR in
+operator, which tests for the existence of a particular array element,
+shall not cause that element to exist. Any other reference to a
+nonexistent array element shall automatically create it.
+.P
+Comparisons (with the
+.BR '<' ,
+.BR \(dq<=\(dq ,
+.BR \(dq!=\(dq ,
+.BR \(dq==\(dq ,
+.BR '>' ,
+and
+.BR \(dq>=\(dq
+operators) shall be made numerically if both operands are numeric, if
+one is numeric and the other has a string value that is a numeric
+string, or if one is numeric and the other has the uninitialized value.
+Otherwise, operands shall be converted to strings as required and a
+string comparison shall be made as follows:
+.IP " *" 4
+For the
+.BR \(dq!=\(dq
+and
+.BR \(dq==\(dq
+operators, the strings should be compared to check if they are
+identical but may be compared using the locale-specific collation
+sequence to check if they collate equally.
+.IP " *" 4
+For the other operators, the strings shall be compared using the
+locale-specific collation sequence.
+.P
+The value of the comparison expression shall be 1 if the relation is
+true, or 0 if the relation is false.
+.SS "Variables and Special Variables"
+.P
+Variables can be used in an
+.IR awk
+program by referencing them. With the exception of function parameters
+(see
+.IR "User-Defined Functions"),
+they are not explicitly declared. Function parameter names shall be
+local to the function; all other variable names shall be global. The
+same name shall not be used as both a function parameter name and as
+the name of a function or a special
+.IR awk
+variable. The same name shall not be used both as a variable name with
+global scope and as the name of a function. The same name shall not be
+used within the same scope both as a scalar variable and as an array.
+Uninitialized variables, including scalar variables, array elements,
+and field variables, shall have an uninitialized value. An
+uninitialized value shall have both a numeric value of zero and a
+string value of the empty string. Evaluation of variables with an
+uninitialized value, to either string or numeric, shall be determined
+by the context in which they are used.
+.P
+Field variables shall be designated by a
+.BR '$'
+followed by a number or numerical expression. The effect of the field
+number
+.IR expression
+evaluating to anything other than a non-negative integer is
+unspecified; uninitialized variables or string values need not be
+converted to numeric values in this context. New field variables can be
+created by assigning a value to them. References to nonexistent fields
+(that is, fields after $\fBNF\fP), shall evaluate to the uninitialized
+value. Such references shall not create new fields. However, assigning
+to a nonexistent field (for example, $(\fBNF\fP+2)=5) shall increase
+the value of
+.BR NF ;
+create any intervening fields with the uninitialized value; and cause
+the value of $0 to be recomputed, with the fields being separated by
+the value of
+.BR OFS .
+Each field variable shall have a string value or an uninitialized value
+when created. Field variables shall have the uninitialized value when
+created from $0 using
+.BR FS
+and the variable does not contain any characters. If appropriate, the
+field variable shall be considered a numeric string (see
+.IR "Expressions in awk").
+.P
+Implementations shall support the following other special variables
+that are set by
+.IR awk :
+.IP "\fBARGC\fR" 10
+The number of elements in the
+.BR ARGV
+array.
+.IP "\fBARGV\fR" 10
+An array of command line arguments, excluding options and the
+.IR program
+argument, numbered from zero to
+.BR ARGC \-1.
+.RS 10
+.P
+The arguments in
+.BR ARGV
+can be modified or added to;
+.BR ARGC
+can be altered. As each input file ends,
+.IR awk
+shall treat the next non-null element of
+.BR ARGV ,
+up to the current value of
+.BR ARGC \-1,
+inclusive, as the name of the next input file. Thus, setting an element
+of
+.BR ARGV
+to null means that it shall not be treated as an input file. The name
+.BR '\-'
+indicates the standard input. If an argument matches the format of an
+.IR assignment
+operand, this argument shall be treated as an
+.IR assignment
+rather than a
+.IR file
+argument.
+.RE
+.IP "\fBCONVFMT\fR" 10
+The
+.BR printf
+format for converting numbers to strings (except for output statements,
+where
+.BR OFMT
+is used);
+.BR \(dq%.6g\(dq
+by default.
+.IP "\fBENVIRON\fR" 10
+An array representing the value of the environment, as described in the
+.IR exec
+functions defined in the System Interfaces volume of POSIX.1\(hy2017. The indices of the array shall be
+strings consisting of the names of the environment variables, and the
+value of each array element shall be a string consisting of the value
+of that variable. If appropriate, the environment variable shall be
+considered a
+.IR "numeric string"
+(see
+.IR "Expressions in awk");
+the array element shall also have its numeric value.
+.RS 10
+.P
+In all cases where the behavior of
+.IR awk
+is affected by environment variables (including the environment of any
+commands that
+.IR awk
+executes via the
+.BR system
+function or via pipeline redirections with the
+.BR print
+statement, the
+.BR printf
+statement, or the
+.BR getline
+function), the environment used shall be the environment at the time
+.IR awk
+began executing; it is implementation-defined whether any
+modification of
+.BR ENVIRON
+affects this environment.
+.RE
+.IP "\fBFILENAME\fR" 10
+A pathname of the current input file. Inside a
+.BR BEGIN
+action the value is undefined. Inside an
+.BR END
+action the value shall be the name of the last input file processed.
+.IP "\fBFNR\fR" 10
+The ordinal number of the current record in the current file. Inside a
+.BR BEGIN
+action the value shall be zero. Inside an
+.BR END
+action the value shall be the number of the last record processed in
+the last file processed.
+.IP "\fBFS\fR" 10
+Input field separator regular expression; a
+<space>
+by default.
+.IP "\fBNF\fR" 10
+The number of fields in the current record. Inside a
+.BR BEGIN
+action, the use of
+.BR NF
+is undefined unless a
+.BR getline
+function without a
+.IR var
+argument is executed previously. Inside an
+.BR END
+action,
+.BR NF
+shall retain the value it had for the last record read, unless a
+subsequent, redirected,
+.BR getline
+function without a
+.IR var
+argument is performed prior to entering the
+.BR END
+action.
+.IP "\fBNR\fR" 10
+The ordinal number of the current record from the start of input.
+Inside a
+.BR BEGIN
+action the value shall be zero. Inside an
+.BR END
+action the value shall be the number of the last record processed.
+.IP "\fBOFMT\fR" 10
+The
+.BR printf
+format for converting numbers to strings in output statements (see
+.IR "Output Statements");
+.BR \(dq%.6g\(dq
+by default. The result of the conversion is unspecified if the value of
+.BR OFMT
+is not a floating-point format specification.
+.IP "\fBOFS\fR" 10
+The
+.BR print
+statement output field separator;
+<space>
+by default.
+.IP "\fBORS\fR" 10
+The
+.BR print
+statement output record separator; a
+<newline>
+by default.
+.IP "\fBRLENGTH\fR" 10
+The length of the string matched by the
+.BR match
+function.
+.IP "\fBRS\fR" 10
+The first character of the string value of
+.BR RS
+shall be the input record separator; a
+<newline>
+by default. If
+.BR RS
+contains more than one character, the results are unspecified. If
+.BR RS
+is null, then records are separated by sequences consisting of a
+<newline>
+plus one or more blank lines, leading or trailing blank lines shall not
+result in empty records at the beginning or end of the input, and a
+<newline>
+shall always be a field separator, no matter what the value of
+.BR FS
+is.
+.IP "\fBRSTART\fR" 10
+The starting position of the string matched by the
+.BR match
+function, numbering from 1. This shall always be equivalent to the
+return value of the
+.BR match
+function.
+.IP "\fBSUBSEP\fR" 10
+The subscript separator string for multi-dimensional arrays; the
+default value is implementation-defined.
+.SS "Regular Expressions"
+.P
+The
+.IR awk
+utility shall make use of the extended regular expression notation
+(see the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 9.4" ", " "Extended Regular Expressions")
+except that it shall allow the use of C-language conventions
+for escaping special characters within the EREs, as specified in the
+table in the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Chapter 5" ", " "File Format Notation"
+(\c
+.BR '\e\e' ,
+.BR '\ea' ,
+.BR '\eb' ,
+.BR '\ef' ,
+.BR '\en' ,
+.BR '\er' ,
+.BR '\et' ,
+.BR '\ev' )
+and the following table; these escape sequences shall be recognized
+both inside and outside bracket expressions. Note that records need not
+be separated by
+<newline>
+characters and string constants can contain
+<newline>
+characters, so even the
+.BR \(dq\en\(dq
+sequence is valid in
+.IR awk
+EREs. Using a
+<slash>
+character within an ERE requires the escaping shown in the following
+table.
+.br
+.sp
+.ce 1
+\fBTable 4-2: Escape Sequences in \fIawk\fP\fR
+.ad l
+.TS
+center tab(@) box;
+cB | cB | cB
+cB | cB | cB
+lf5 | lw(34) | lw(34).
+Escape
+Sequence@Description@Meaning
+_
+\e"@T{
+<backslash> <quotation-mark>
+T}@T{
+<quotation-mark> character
+T}
+_
+\e/@T{
+<backslash> <slash>
+T}@T{
+<slash> character
+T}
+_
+\eddd@T{
+A
+<backslash>
+character followed by the longest sequence of one, two, or
+three octal-digit characters (01234567). If all of the digits are 0
+(that is, representation of the NUL character), the behavior is
+undefined.
+T}@T{
+The character whose encoding is represented by the one, two, or
+three-digit octal integer. Multi-byte characters require
+multiple, concatenated escape sequences of this type, including the
+leading
+<backslash>
+for each byte.
+T}
+_
+\ec@T{
+A
+<backslash>
+character followed by any character not described in this
+table or in the table in the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Chapter 5" ", " "File Format Notation"
+(\c
+.BR '\e\e' ,
+.BR '\ea' ,
+.BR '\eb' ,
+.BR '\ef' ,
+.BR '\en' ,
+.BR '\er' ,
+.BR '\et' ,
+.BR '\ev' ).
+T}@Undefined
+.TE
+.ad b
+.P
+A regular expression can be matched against a specific field or string
+by using one of the two regular expression matching operators,
+.BR '\(ti'
+and
+.BR \(dq!\(ti\(dq .
+These operators shall interpret their right-hand operand as a regular
+expression and their left-hand operand as a string. If the regular
+expression matches the string, the
+.BR '\(ti'
+expression shall evaluate to a value of 1, and the
+.BR \(dq!\(ti\(dq
+expression shall evaluate to a value of 0. (The regular expression
+matching operation is as defined by the term matched in the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 9.1" ", " "Regular Expression Definitions",
+where a match occurs on any part of the string unless the regular
+expression is limited with the
+<circumflex>
+or
+<dollar-sign>
+special characters.) If the regular expression does not match the
+string, the
+.BR '\(ti'
+expression shall evaluate to a value of 0, and the
+.BR \(dq!\(ti\(dq
+expression shall evaluate to a value of 1. If the right-hand operand is
+any expression other than the lexical token
+.BR ERE ,
+the string value of the expression shall be interpreted as an extended
+regular expression, including the escape conventions described above.
+Note that these same escape conventions shall also be applied in
+determining the value of a string literal (the lexical token
+.BR STRING ),
+and thus shall be applied a second time when a string literal is used
+in this context.
+.P
+When an
+.BR ERE
+token appears as an expression in any context other than as the
+right-hand of the
+.BR '\(ti'
+or
+.BR \(dq!\(ti\(dq
+operator or as one of the built-in function arguments described below,
+the value of the resulting expression shall be the equivalent of:
+.sp
+.RS 4
+.nf
+
+$0 \(ti /\fIere\fR/
+.fi
+.P
+.RE
+.P
+The
+.IR ere
+argument to the
+.BR gsub ,
+.BR match ,
+.BR sub
+functions, and the
+.IR fs
+argument to the
+.BR split
+function (see
+.IR "String Functions")
+shall be interpreted as extended regular expressions. These can be
+either
+.BR ERE
+tokens or arbitrary expressions, and shall be interpreted in the same
+manner as the right-hand side of the
+.BR '\(ti'
+or
+.BR \(dq!\(ti\(dq
+operator.
+.P
+An extended regular expression can be used to separate fields by assigning
+a string containing the expression to the built-in variable
+.BR FS ,
+either directly or as a consequence of using the
+.BR \-F
+.IR sepstring
+option.
+The default value of the
+.BR FS
+variable shall be a single
+<space>.
+The following describes
+.BR FS
+behavior:
+.IP " 1." 4
+If
+.BR FS
+is a null string, the behavior is unspecified.
+.IP " 2." 4
+If
+.BR FS
+is a single character:
+.RS 4
+.IP " a." 4
+If
+.BR FS
+is
+<space>,
+skip leading and trailing
+<blank>
+and
+<newline>
+characters; fields shall be delimited by sets of one or more
+<blank>
+or
+<newline>
+characters.
+.IP " b." 4
+Otherwise, if
+.BR FS
+is any other character
+.IR c ,
+fields shall be delimited by each single occurrence of
+.IR c .
+.RE
+.IP " 3." 4
+Otherwise, the string value of
+.BR FS
+shall be considered to be an extended regular expression. Each
+occurrence of a sequence matching the extended regular expression shall
+delimit fields.
+.P
+Except for the
+.BR '\(ti'
+and
+.BR \(dq!\(ti\(dq
+operators, and in the
+.BR gsub ,
+.BR match ,
+.BR split ,
+and
+.BR sub
+built-in functions, ERE matching shall be based on input records; that
+is, record separator characters (the first character of the value of
+the variable
+.BR RS ,
+<newline>
+by default) cannot be embedded in the expression, and no expression
+shall match the record separator character. If the record separator is
+not
+<newline>,
+<newline>
+characters embedded in the expression can be matched. For the
+.BR '\(ti'
+and
+.BR \(dq!\(ti\(dq
+operators, and in those four built-in functions, ERE matching shall be
+based on text strings; that is, any character (including
+<newline>
+and the record separator) can be embedded in the pattern, and an
+appropriate pattern shall match any character. However, in all
+.IR awk
+ERE matching, the use of one or more NUL characters in the pattern,
+input record, or text string produces undefined results.
+.SS "Patterns"
+.P
+A
+.IR pattern
+is any valid
+.IR expression ,
+a range specified by two expressions separated by a comma, or one of the
+two special patterns
+.BR BEGIN
+or
+.BR END .
+.SS "Special Patterns"
+.P
+The
+.IR awk
+utility shall recognize two special patterns,
+.BR BEGIN
+and
+.BR END .
+Each
+.BR BEGIN
+pattern shall be matched once and its associated action executed before
+the first record of input is read\(emexcept possibly by use of the
+.BR getline
+function (see
+.IR "Input/Output and General Functions")
+in a prior
+.BR BEGIN
+action\(emand before command line assignment is done. Each
+.BR END
+pattern shall be matched once and its associated action executed after
+the last record of input has been read. These two patterns shall have
+associated actions.
+.P
+.BR BEGIN
+and
+.BR END
+shall not combine with other patterns. Multiple
+.BR BEGIN
+and
+.BR END
+patterns shall be allowed. The actions associated with the
+.BR BEGIN
+patterns shall be executed in the order specified in the program, as
+are the
+.BR END
+actions. An
+.BR END
+pattern can precede a
+.BR BEGIN
+pattern in a program.
+.P
+If an
+.IR awk
+program consists of only actions with the pattern
+.BR BEGIN ,
+and the
+.BR BEGIN
+action contains no
+.BR getline
+function,
+.IR awk
+shall exit without reading its input when the last statement in the
+last
+.BR BEGIN
+action is executed. If an
+.IR awk
+program consists of only actions with the pattern
+.BR END
+or only actions with the patterns
+.BR BEGIN
+and
+.BR END ,
+the input shall be read before the statements in the
+.BR END
+actions are executed.
+.SS "Expression Patterns"
+.P
+An expression pattern shall be evaluated as if it were an expression in
+a Boolean context. If the result is true, the pattern shall be
+considered to match, and the associated action (if any) shall be
+executed. If the result is false, the action shall not be executed.
+.SS "Pattern Ranges"
+.P
+A pattern range consists of two expressions separated by a comma; in
+this case, the action shall be performed for all records between a
+match of the first expression and the following match of the second
+expression, inclusive. At this point, the pattern range can be repeated
+starting at input records subsequent to the end of the matched range.
+.SS "Actions"
+.P
+An action is a sequence of statements as shown in the grammar in
+.IR "Grammar".
+Any single statement can be replaced by a statement list enclosed in
+curly braces. The application shall ensure that statements in a
+statement list are separated by
+<newline>
+or
+<semicolon>
+characters. Statements in a statement list shall be executed sequentially
+in the order that they appear.
+.P
+The
+.IR expression
+acting as the conditional in an
+.BR if
+statement shall be evaluated and if it is non-zero or non-null, the
+following statement shall be executed; otherwise, if
+.BR else
+is present, the statement following the
+.BR else
+shall be executed.
+.P
+The
+.BR if ,
+.BR while ,
+.BR do .\|.\|.\c
+.BR while ,
+.BR for ,
+.BR break ,
+and
+.BR continue
+statements are based on the ISO\ C standard (see
+.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"),
+except that the Boolean expressions shall be treated as described in
+.IR "Expressions in awk",
+and except in the case of:
+.sp
+.RS 4
+.nf
+
+for (\fIvariable\fR in \fIarray\fR)
+.fi
+.P
+.RE
+.P
+which shall iterate, assigning each
+.IR index
+of
+.IR array
+to
+.IR variable
+in an unspecified order. The results of adding new elements to
+.IR array
+within such a
+.BR for
+loop are undefined. If a
+.BR break
+or
+.BR continue
+statement occurs outside of a loop, the behavior is undefined.
+.P
+The
+.BR delete
+statement shall remove an individual array element. Thus, the following
+code deletes an entire array:
+.sp
+.RS 4
+.nf
+
+for (index in array)
+ delete array[index]
+.fi
+.P
+.RE
+.P
+The
+.BR next
+statement shall cause all further processing of the current input
+record to be abandoned. The behavior is undefined if a
+.BR next
+statement appears or is invoked in a
+.BR BEGIN
+or
+.BR END
+action.
+.P
+The
+.BR exit
+statement shall invoke all
+.BR END
+actions in the order in which they occur in the program source and then
+terminate the program without reading further input. An
+.BR exit
+statement inside an
+.BR END
+action shall terminate the program without further execution of
+.BR END
+actions. If an expression is specified in an
+.BR exit
+statement, its numeric value shall be the exit status of
+.IR awk ,
+unless subsequent errors are encountered or a subsequent
+.BR exit
+statement with an expression is executed.
+.SS "Output Statements"
+.P
+Both
+.BR print
+and
+.BR printf
+statements shall write to standard output by default. The output shall
+be written to the location specified by
+.IR output_redirection
+if one is supplied, as follows:
+.sp
+.RS 4
+.nf
+
+> \fIexpression\fR
+>> \fIexpression\fR
+| \fIexpression\fR
+.fi
+.P
+.RE
+.P
+In all cases, the
+.IR expression
+shall be evaluated to produce a string that is used as a pathname
+into which to write (for
+.BR '>'
+or
+.BR \(dq>>\(dq )
+or as a command to be executed (for
+.BR '|' ).
+Using the first two forms, if the file of that name is not currently
+open, it shall be opened, creating it if necessary and using the first
+form, truncating the file. The output then shall be appended to the
+file. As long as the file remains open, subsequent calls in which
+.IR expression
+evaluates to the same string value shall simply append output to the
+file. The file remains open until the
+.BR close
+function (see
+.IR "Input/Output and General Functions")
+is called with an expression that evaluates to the same string value.
+.P
+The third form shall write output onto a stream piped to the input of a
+command. The stream shall be created if no stream is currently open
+with the value of
+.IR expression
+as its command name. The stream created shall be equivalent to one
+created by a call to the
+\fIpopen\fR()
+function defined in the System Interfaces volume of POSIX.1\(hy2017 with the value of
+.IR expression
+as the
+.IR command
+argument and a value of
+.IR w
+as the
+.IR mode
+argument. As long as the stream remains open, subsequent calls in which
+.IR expression
+evaluates to the same string value shall write output to the existing
+stream. The stream shall remain open until the
+.BR close
+function (see
+.IR "Input/Output and General Functions")
+is called with an expression that evaluates to the same string value.
+At that time, the stream shall be closed as if by a call to the
+\fIpclose\fR()
+function defined in the System Interfaces volume of POSIX.1\(hy2017.
+.P
+As described in detail by the grammar in
+.IR "Grammar",
+these output statements shall take a
+<comma>-separated
+list of
+.IR expression s
+referred to in the grammar by the non-terminal symbols
+.BR expr_list ,
+.BR print_expr_list ,
+or
+.BR print_expr_list_opt .
+This list is referred to here as the
+.IR "expression list" ,
+and each member is referred to as an
+.IR "expression argument" .
+.P
+The
+.BR print
+statement shall write the value of each expression argument onto the
+indicated output stream separated by the current output field separator
+(see variable
+.BR OFS
+above), and terminated by the output record separator (see variable
+.BR ORS
+above). All expression arguments shall be taken as strings, being
+converted if necessary; this conversion shall be as described in
+.IR "Expressions in awk",
+with the exception that the
+.BR printf
+format in
+.BR OFMT
+shall be used instead of the value in
+.BR CONVFMT .
+An empty expression list shall stand for the whole input record ($0).
+.P
+The
+.BR printf
+statement shall produce output based on a notation similar to the
+File Format Notation used to describe file formats in this volume of POSIX.1\(hy2017 (see the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Chapter 5" ", " "File Format Notation").
+Output shall be produced as specified with the first
+.IR expression
+argument as the string
+.IR format
+and subsequent
+.IR expression
+arguments as the strings
+.IR arg1
+to
+.IR argn ,
+inclusive, with the following exceptions:
+.IP " 1." 4
+The
+.IR format
+shall be an actual character string rather than a graphical
+representation. Therefore, it cannot contain empty character
+positions. The
+<space>
+in the
+.IR format
+string, in any context other than a
+.IR flag
+of a conversion specification, shall be treated as an ordinary
+character that is copied to the output.
+.IP " 2." 4
+If the character set contains a
+.BR ' '
+character and that character appears in the
+.IR format
+string, it shall be treated as an ordinary character that is copied to
+the output.
+.IP " 3." 4
+The
+.IR "escape sequences"
+beginning with a
+<backslash>
+character shall be treated as sequences of ordinary characters that are
+copied to the output. Note that these same sequences shall be interpreted
+lexically by
+.IR awk
+when they appear in literal strings, but they shall not be treated
+specially by the
+.BR printf
+statement.
+.IP " 4." 4
+A
+.IR "field width"
+or
+.IR precision
+can be specified as the
+.BR '*'
+character instead of a digit string. In this case the next argument
+from the expression list shall be fetched and its numeric value taken
+as the field width or precision.
+.IP " 5." 4
+The implementation shall not precede or follow output from the
+.BR d
+or
+.BR u
+conversion specifier characters with
+<blank>
+characters not specified by the
+.IR format
+string.
+.IP " 6." 4
+The implementation shall not precede output from the
+.BR o
+conversion specifier character with leading zeros not specified by the
+.IR format
+string.
+.IP " 7." 4
+For the
+.BR c
+conversion specifier character: if the argument has a numeric value, the
+character whose encoding is that value shall be output. If the value is
+zero or is not the encoding of any character in the character set, the
+behavior is undefined. If the argument does not have a numeric value,
+the first character of the string value shall be output; if the string
+does not contain any characters, the behavior is undefined.
+.IP " 8." 4
+For each conversion specification that consumes an argument, the next
+expression argument shall be evaluated. With the exception of the
+.BR c
+conversion specifier character, the value shall be converted (according
+to the rules specified in
+.IR "Expressions in awk")
+to the appropriate type for the conversion specification.
+.IP " 9." 4
+If there are insufficient expression arguments to satisfy all the
+conversion specifications in the
+.IR format
+string, the behavior is undefined.
+.IP 10. 4
+If any character sequence in the
+.IR format
+string begins with a
+.BR '%'
+character, but does not form a valid conversion specification, the
+behavior is unspecified.
+.P
+Both
+.BR print
+and
+.BR printf
+can output at least
+{LINE_MAX}
+bytes.
+.SS "Functions"
+.P
+The
+.IR awk
+language has a variety of built-in functions: arithmetic, string,
+input/output, and general.
+.SS "Arithmetic Functions"
+.P
+The arithmetic functions, except for
+.BR int ,
+shall be based on the ISO\ C standard (see
+.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard").
+The behavior is undefined in cases where the ISO\ C standard specifies that an
+error be returned or that the behavior is undefined. Although the
+grammar (see
+.IR "Grammar")
+permits built-in functions to appear with no arguments or parentheses,
+unless the argument or parentheses are indicated as optional in the
+following list (by displaying them within the
+.BR \(dq[]\(dq
+brackets), such use is undefined.
+.IP "\fBatan2\fR(\fIy\fR,\fIx\fR)" 10
+Return arctangent of \fIy\fP/\fIx\fR in radians in the range
+[\-\(*p,\(*p].
+.IP "\fBcos\fR(\fIx\fR)" 10
+Return cosine of \fIx\fP, where \fIx\fP is in radians.
+.IP "\fBsin\fR(\fIx\fR)" 10
+Return sine of \fIx\fP, where \fIx\fP is in radians.
+.IP "\fBexp\fR(\fIx\fR)" 10
+Return the exponential function of \fIx\fP.
+.IP "\fBlog\fR(\fIx\fR)" 10
+Return the natural logarithm of \fIx\fP.
+.IP "\fBsqrt\fR(\fIx\fR)" 10
+Return the square root of \fIx\fP.
+.IP "\fBint\fR(\fIx\fR)" 10
+Return the argument truncated to an integer. Truncation shall
+be toward 0 when \fIx\fP>0.
+.IP "\fBrand\fP(\|)" 10
+Return a random number \fIn\fP, such that 0\(<=\fIn\fP<1.
+.IP "\fBsrand\fR(\fB[\fIexpr\fB]\fR)" 10
+Set the seed value for
+.IR rand
+to
+.IR expr
+or use the time of day if
+.IR expr
+is omitted. The previous seed value shall be returned.
+.SS "String Functions"
+.P
+The string functions in the following list shall be supported.
+Although the grammar (see
+.IR "Grammar")
+permits built-in functions to appear with no arguments or parentheses,
+unless the argument or parentheses are indicated as optional in the
+following list (by displaying them within the
+.BR \(dq[]\(dq
+brackets), such use is undefined.
+.IP "\fBgsub\fR(\fIere\fR,\ \fIrepl\fB[\fR,\ \fIin\fB]\fR)" 10
+.br
+Behave like
+.BR sub
+(see below), except that it shall replace all occurrences of the
+regular expression (like the
+.IR ed
+utility global substitute) in $0 or in the
+.IR in
+argument, when specified.
+.IP "\fBindex\fR(\fIs\fR,\ \fIt\fR)" 10
+Return the position, in characters, numbering from 1, in string
+.IR s
+where string
+.IR t
+first occurs, or zero if it does not occur at all.
+.IP "\fBlength[\fR(\fB[\fIs\fB]\fR)\fB]\fR" 10
+Return the length, in characters, of its argument taken as a string, or
+of the whole record, $0, if there is no argument.
+.IP "\fBmatch\fR(\fIs\fR,\ \fIere\fR)" 10
+Return the position, in characters, numbering from 1, in string
+.IR s
+where the extended regular expression
+.IR ere
+occurs, or zero if it does not occur at all. RSTART shall be set to the
+starting position (which is the same as the returned value), zero if no
+match is found; RLENGTH shall be set to the length of the matched
+string, \-1 if no match is found.
+.IP "\fBsplit\fR(\fIs\fR,\ \fIa\fB[\fR,\ \fIfs\ \fB]\fR)" 10
+.br
+Split the string
+.IR s
+into array elements
+.IR a [1],
+.IR a [2],
+\&.\|.\|.,
+.IR a [ n ],
+and return
+.IR n .
+All elements of the array shall be deleted before the split is
+performed. The separation shall be done with the ERE
+.IR fs
+or with the field separator
+.BR FS
+if
+.IR fs
+is not given. Each array element shall have a string value when created
+and, if appropriate, the array element shall be considered a numeric
+string (see
+.IR "Expressions in awk").
+The effect of a null string as the value of
+.IR fs
+is unspecified.
+.IP "\fBsprintf\fR(\fIfmt\fR,\ \fIexpr\fR,\ \fIexpr\fR,\ .\|.\|.)" 10
+.br
+Format the expressions according to the
+.BR printf
+format given by
+.IR fmt
+and return the resulting string.
+.IP "\fBsub(\fIere\fR,\ \fIrepl\fB[\fR,\ \fIin\ \fB]\fR)" 10
+.br
+Substitute the string
+.IR repl
+in place of the first instance of the extended regular expression
+.IR ERE
+in string
+.IR in
+and return the number of substitutions. An
+<ampersand>
+(\c
+.BR '&' )
+appearing in the string
+.IR repl
+shall be replaced by the string from
+.IR in
+that matches the ERE. An
+<ampersand>
+preceded with a
+<backslash>
+shall be interpreted as the literal
+<ampersand>
+character. An occurrence of two consecutive
+<backslash>
+characters shall be interpreted as just a single literal
+<backslash>
+character. Any other occurrence of a
+<backslash>
+(for example, preceding any other character) shall be treated as a
+literal
+<backslash>
+character. Note that if
+.IR repl
+is a string literal (the lexical token
+.BR STRING ;
+see
+.IR "Grammar"),
+the handling of the
+<ampersand>
+character occurs after any lexical processing, including any lexical
+<backslash>-escape
+sequence processing. If
+.IR in
+is specified and it is not an lvalue (see
+.IR "Expressions in awk"),
+the behavior is undefined. If
+.IR in
+is omitted,
+.IR awk
+shall use the current record ($0) in its place.
+.IP "\fBsubstr\fR(\fIs\fR,\ \fIm\fB[\fR,\ \fIn\ \fB]\fR)" 10
+.br
+Return the at most
+.IR n -character
+substring of
+.IR s
+that begins at position
+.IR m ,
+numbering from 1. If
+.IR n
+is omitted, or if
+.IR n
+specifies more characters than are left in the string, the length of
+the substring shall be limited by the length of the string
+.IR s .
+.IP "\fBtolower\fR(\fIs\fR)" 10
+Return a string based on the string
+.IR s .
+Each character in
+.IR s
+that is an uppercase letter specified to have a
+.BR tolower
+mapping by the
+.IR LC_CTYPE
+category of the current locale shall be replaced in the returned string
+by the lowercase letter specified by the mapping. Other characters in
+.IR s
+shall be unchanged in the returned string.
+.IP "\fBtoupper\fR(\fIs\fR)" 10
+Return a string based on the string
+.IR s .
+Each character in
+.IR s
+that is a lowercase letter specified to have a
+.BR toupper
+mapping by the
+.IR LC_CTYPE
+category of the current locale is replaced in the returned string by
+the uppercase letter specified by the mapping. Other characters in
+.IR s
+are unchanged in the returned string.
+.P
+All of the preceding functions that take
+.IR ERE
+as a parameter expect a pattern or a string valued expression that is a
+regular expression as defined in
+.IR "Regular Expressions".
+.SS "Input/Output and General Functions"
+.P
+The input/output and general functions are:
+.IP "\fBclose\fR(\fIexpression\fR)" 10
+.br
+Close the file or pipe opened by a
+.BR print
+or
+.BR printf
+statement or a call to
+.BR getline
+with the same string-valued
+.IR expression .
+The limit on the number of open
+.IR expression
+arguments is implementation-defined. If the close was successful, the
+function shall return zero; otherwise, it shall return non-zero.
+.IP "\fIexpression\ |\ \fBgetline\ [\fIvar\fB]\fR" 10
+.br
+Read a record of input from a stream piped from the output of a
+command. The stream shall be created if no stream is currently open
+with the value of
+.IR expression
+as its command name. The stream created shall be equivalent to one
+created by a call to the
+\fIpopen\fR()
+function with the value of
+.IR expression
+as the
+.IR command
+argument and a value of
+.IR r
+as the
+.IR mode
+argument. As long as the stream remains open, subsequent calls in which
+.IR expression
+evaluates to the same string value shall read subsequent records from
+the stream. The stream shall remain open until the
+.BR close
+function is called with an expression that evaluates to the same string
+value. At that time, the stream shall be closed as if by a call to the
+\fIpclose\fR()
+function. If
+.IR var
+is omitted, $0 and
+.BR NF
+shall be set; otherwise,
+.IR var
+shall be set and, if appropriate, it shall be considered a numeric
+string (see
+.IR "Expressions in awk").
+.RS 10
+.P
+The
+.BR getline
+operator can form ambiguous constructs when there are unparenthesized
+operators (including concatenate) to the left of the
+.BR '|'
+(to the beginning of the expression containing
+.BR getline ).
+In the context of the
+.BR '$'
+operator,
+.BR '|'
+shall behave as if it had a lower precedence than
+.BR '$' .
+The result of evaluating other operators is unspecified, and conforming
+applications shall parenthesize properly all such usages.
+.RE
+.IP "\fBgetline\fR" 10
+Set $0 to the next input record from the current input file. This form
+of
+.BR getline
+shall set the
+.BR NF ,
+.BR NR ,
+and
+.BR FNR
+variables.
+.IP "\fBgetline\ \fIvar\fR" 10
+Set variable
+.IR var
+to the next input record from the current input file and, if
+appropriate,
+.IR var
+shall be considered a numeric string (see
+.IR "Expressions in awk").
+This form of
+.BR getline
+shall set the
+.BR FNR
+and
+.BR NR
+variables.
+.IP "\fBgetline\ \fB[\fIvar\fB]\ \fR<\ \fIexpression\fR" 10
+.br
+Read the next record of input from a named file. The
+.IR expression
+shall be evaluated to produce a string that is used as a pathname.
+If the file of that name is not currently open, it shall be opened. As
+long as the stream remains open, subsequent calls in which
+.IR expression
+evaluates to the same string value shall read subsequent records from
+the file. The file shall remain open until the
+.BR close
+function is called with an expression that evaluates to the same string
+value. If
+.IR var
+is omitted, $0 and
+.BR NF
+shall be set; otherwise,
+.IR var
+shall be set and, if appropriate, it shall be considered a numeric
+string (see
+.IR "Expressions in awk").
+.RS 10
+.P
+The
+.BR getline
+operator can form ambiguous constructs when there are unparenthesized
+binary operators (including concatenate) to the right of the
+.BR '<'
+(up to the end of the expression containing the
+.BR getline ).
+The result of evaluating such a construct is unspecified, and conforming
+applications shall parenthesize properly all such usages.
+.RE
+.IP "\fBsystem\fR(\fIexpression\fR)" 10
+.br
+Execute the command given by
+.IR expression
+in a manner equivalent to the
+\fIsystem\fR()
+function defined in the System Interfaces volume of POSIX.1\(hy2017 and return the exit status of the
+command.
+.P
+All forms of
+.BR getline
+shall return 1 for successful input, zero for end-of-file, and \-1
+for an error.
+.P
+Where strings are used as the name of a file or pipeline, the
+application shall ensure that the strings are textually identical. The
+terminology ``same string value'' implies that ``equivalent strings'',
+even those that differ only by
+<space>
+characters, represent different files.
+.SS "User-Defined Functions"
+.P
+The
+.IR awk
+language also provides user-defined functions. Such functions can be
+defined as:
+.sp
+.RS 4
+.nf
+
+function \fIname\fR(\fB[\fIparameter\fR, ...\fB]\fR) { \fIstatements\fR }
+.fi
+.P
+.RE
+.P
+A function can be referred to anywhere in an
+.IR awk
+program; in particular, its use can precede its definition. The scope
+of a function is global.
+.P
+Function parameters, if present, can be either scalars or arrays; the
+behavior is undefined if an array name is passed as a parameter that
+the function uses as a scalar, or if a scalar expression is passed as a
+parameter that the function uses as an array. Function parameters shall
+be passed by value if scalar and by reference if array name.
+.P
+The number of parameters in the function definition need not match the
+number of parameters in the function call. Excess formal parameters can
+be used as local variables. If fewer arguments are supplied in a
+function call than are in the function definition, the extra parameters
+that are used in the function body as scalars shall evaluate to the
+uninitialized value until they are otherwise initialized, and the extra
+parameters that are used in the function body as arrays shall be
+treated as uninitialized arrays where each element evaluates to the
+uninitialized value until otherwise initialized.
+.P
+When invoking a function, no white space can be placed between the
+function name and the opening parenthesis. Function calls can be nested
+and recursive calls can be made upon functions. Upon return from any
+nested or recursive function call, the values of all of the calling
+function's parameters shall be unchanged, except for array parameters
+passed by reference. The
+.BR return
+statement can be used to return a value. If a
+.BR return
+statement appears outside of a function definition, the behavior is
+undefined.
+.P
+In the function definition,
+<newline>
+characters shall be optional before the opening brace and after the
+closing brace. Function definitions can appear anywhere in the program
+where a
+.IR pattern-action
+pair is allowed.
+.SS "Grammar"
+.P
+The grammar in this section and the lexical conventions in the
+following section shall together describe the syntax for
+.IR awk
+programs. The general conventions for this style of grammar are
+described in
+.IR "Section 1.3" ", " "Grammar Conventions".
+A valid program can be represented as the non-terminal symbol
+.IR program
+in the grammar. This formal syntax shall take precedence over the
+preceding text syntax description.
+.sp
+.RS 4
+.nf
+
+%token NAME NUMBER STRING ERE
+%token FUNC_NAME /* Name followed by \(aq(\(aq without white space. */
+.P
+/* Keywords */
+%token Begin End
+/* \(aqBEGIN\(aq \(aqEND\(aq */
+.P
+%token Break Continue Delete Do Else
+/* \(aqbreak\(aq \(aqcontinue\(aq \(aqdelete\(aq \(aqdo\(aq \(aqelse\(aq */
+.P
+%token Exit For Function If In
+/* \(aqexit\(aq \(aqfor\(aq \(aqfunction\(aq \(aqif\(aq \(aqin\(aq */
+.P
+%token Next Print Printf Return While
+/* \(aqnext\(aq \(aqprint\(aq \(aqprintf\(aq \(aqreturn\(aq \(aqwhile\(aq */
+.P
+/* Reserved function names */
+%token BUILTIN_FUNC_NAME
+ /* One token for the following:
+ * atan2 cos sin exp log sqrt int rand srand
+ * gsub index length match split sprintf sub
+ * substr tolower toupper close system
+ */
+%token GETLINE
+ /* Syntactically different from other built-ins. */
+.P
+/* Two-character tokens. */
+%token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
+/* \(aq+=\(aq \(aq-=\(aq \(aq*=\(aq \(aq/=\(aq \(aq%=\(aq \(aq\(ha=\(aq */
+.P
+%token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND
+/* \(aq||\(aq \(aq&&\(aq \(aq!\^\(ti\(aq \(aq==\(aq \(aq<=\(aq \(aq>=\(aq \(aq!=\(aq \(aq++\(aq \(aq--\(aq \(aq>>\(aq */
+.P
+/* One-character tokens. */
+%token \(aq{\(aq \(aq}\(aq \(aq(\(aq \(aq)\(aq \(aq[\(aq \(aq]\(aq \(aq,\(aq \(aq;\(aq NEWLINE
+%token \(aq+\(aq \(aq-\(aq \(aq*\(aq \(aq%\(aq \(aq\(ha\(aq \(aq!\(aq \(aq>\(aq \(aq<\(aq \(aq|\(aq \(aq?\(aq \(aq:\(aq \(aq\(ti\(aq \(aq$\(aq \(aq=\(aq
+.P
+%start program
+%%
+.P
+program : item_list
+ | item_list item
+ ;
+.P
+item_list : /* empty */
+ | item_list item terminator
+ ;
+.P
+item : action
+ | pattern action
+ | normal_pattern
+ | Function NAME \(aq(\(aq param_list_opt \(aq)\(aq
+ newline_opt action
+ | Function FUNC_NAME \(aq(\(aq param_list_opt \(aq)\(aq
+ newline_opt action
+ ;
+.P
+param_list_opt : /* empty */
+ | param_list
+ ;
+.P
+param_list : NAME
+ | param_list \(aq,\(aq NAME
+ ;
+.P
+pattern : normal_pattern
+ | special_pattern
+ ;
+.P
+normal_pattern : expr
+ | expr \(aq,\(aq newline_opt expr
+ ;
+.P
+special_pattern : Begin
+ | End
+ ;
+.P
+action : \(aq{\(aq newline_opt \(aq}\(aq
+ | \(aq{\(aq newline_opt terminated_statement_list \(aq}\(aq
+ | \(aq{\(aq newline_opt unterminated_statement_list \(aq}\(aq
+ ;
+.P
+terminator : terminator NEWLINE
+ | \(aq;\(aq
+ | NEWLINE
+ ;
+.P
+terminated_statement_list : terminated_statement
+ | terminated_statement_list terminated_statement
+ ;
+.P
+unterminated_statement_list : unterminated_statement
+ | terminated_statement_list unterminated_statement
+ ;
+.P
+terminated_statement : action newline_opt
+ | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
+ | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
+ Else newline_opt terminated_statement
+ | While \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
+ | For \(aq(\(aq simple_statement_opt \(aq;\(aq
+ expr_opt \(aq;\(aq simple_statement_opt \(aq)\(aq newline_opt
+ terminated_statement
+ | For \(aq(\(aq NAME In NAME \(aq)\(aq newline_opt
+ terminated_statement
+ | \(aq;\(aq newline_opt
+ | terminatable_statement NEWLINE newline_opt
+ | terminatable_statement \(aq;\(aq newline_opt
+ ;
+.P
+unterminated_statement : terminatable_statement
+ | If \(aq(\(aq expr \(aq)\(aq newline_opt unterminated_statement
+ | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
+ Else newline_opt unterminated_statement
+ | While \(aq(\(aq expr \(aq)\(aq newline_opt unterminated_statement
+ | For \(aq(\(aq simple_statement_opt \(aq;\(aq
+ expr_opt \(aq;\(aq simple_statement_opt \(aq)\(aq newline_opt
+ unterminated_statement
+ | For \(aq(\(aq NAME In NAME \(aq)\(aq newline_opt
+ unterminated_statement
+ ;
+.P
+terminatable_statement : simple_statement
+ | Break
+ | Continue
+ | Next
+ | Exit expr_opt
+ | Return expr_opt
+ | Do newline_opt terminated_statement While \(aq(\(aq expr \(aq)\(aq
+ ;
+.P
+simple_statement_opt : /* empty */
+ | simple_statement
+ ;
+.P
+simple_statement : Delete NAME \(aq[\(aq expr_list \(aq]\(aq
+ | expr
+ | print_statement
+ ;
+.P
+print_statement : simple_print_statement
+ | simple_print_statement output_redirection
+ ;
+.P
+simple_print_statement : Print print_expr_list_opt
+ | Print \(aq(\(aq multiple_expr_list \(aq)\(aq
+ | Printf print_expr_list
+ | Printf \(aq(\(aq multiple_expr_list \(aq)\(aq
+ ;
+.P
+output_redirection : \(aq>\(aq expr
+ | APPEND expr
+ | \(aq|\(aq expr
+ ;
+.P
+expr_list_opt : /* empty */
+ | expr_list
+ ;
+.P
+expr_list : expr
+ | multiple_expr_list
+ ;
+.P
+multiple_expr_list : expr \(aq,\(aq newline_opt expr
+ | multiple_expr_list \(aq,\(aq newline_opt expr
+ ;
+.P
+expr_opt : /* empty */
+ | expr
+ ;
+.P
+expr : unary_expr
+ | non_unary_expr
+ ;
+.P
+unary_expr : \(aq+\(aq expr
+ | \(aq-\(aq expr
+ | unary_expr \(aq\(ha\(aq expr
+ | unary_expr \(aq*\(aq expr
+ | unary_expr \(aq/\(aq expr
+ | unary_expr \(aq%\(aq expr
+ | unary_expr \(aq+\(aq expr
+ | unary_expr \(aq-\(aq expr
+ | unary_expr non_unary_expr
+ | unary_expr \(aq<\(aq expr
+ | unary_expr LE expr
+ | unary_expr NE expr
+ | unary_expr EQ expr
+ | unary_expr \(aq>\(aq expr
+ | unary_expr GE expr
+ | unary_expr \(aq\(ti\(aq expr
+ | unary_expr NO_MATCH expr
+ | unary_expr In NAME
+ | unary_expr AND newline_opt expr
+ | unary_expr OR newline_opt expr
+ | unary_expr \(aq?\(aq expr \(aq:\(aq expr
+ | unary_input_function
+ ;
+.P
+non_unary_expr : \(aq(\(aq expr \(aq)\(aq
+ | \(aq!\(aq expr
+ | non_unary_expr \(aq\(ha\(aq expr
+ | non_unary_expr \(aq*\(aq expr
+ | non_unary_expr \(aq/\(aq expr
+ | non_unary_expr \(aq%\(aq expr
+ | non_unary_expr \(aq+\(aq expr
+ | non_unary_expr \(aq-\(aq expr
+ | non_unary_expr non_unary_expr
+ | non_unary_expr \(aq<\(aq expr
+ | non_unary_expr LE expr
+ | non_unary_expr NE expr
+ | non_unary_expr EQ expr
+ | non_unary_expr \(aq>\(aq expr
+ | non_unary_expr GE expr
+ | non_unary_expr \(aq\(ti\(aq expr
+ | non_unary_expr NO_MATCH expr
+ | non_unary_expr In NAME
+ | \(aq(\(aq multiple_expr_list \(aq)\(aq In NAME
+ | non_unary_expr AND newline_opt expr
+ | non_unary_expr OR newline_opt expr
+ | non_unary_expr \(aq?\(aq expr \(aq:\(aq expr
+ | NUMBER
+ | STRING
+ | lvalue
+ | ERE
+ | lvalue INCR
+ | lvalue DECR
+ | INCR lvalue
+ | DECR lvalue
+ | lvalue POW_ASSIGN expr
+ | lvalue MOD_ASSIGN expr
+ | lvalue MUL_ASSIGN expr
+ | lvalue DIV_ASSIGN expr
+ | lvalue ADD_ASSIGN expr
+ | lvalue SUB_ASSIGN expr
+ | lvalue \(aq=\(aq expr
+ | FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
+ /* no white space allowed before \(aq(\(aq */
+ | BUILTIN_FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
+ | BUILTIN_FUNC_NAME
+ | non_unary_input_function
+ ;
+.P
+print_expr_list_opt : /* empty */
+ | print_expr_list
+ ;
+.P
+print_expr_list : print_expr
+ | print_expr_list \(aq,\(aq newline_opt print_expr
+ ;
+.P
+print_expr : unary_print_expr
+ | non_unary_print_expr
+ ;
+.P
+unary_print_expr : \(aq+\(aq print_expr
+ | \(aq-\(aq print_expr
+ | unary_print_expr \(aq\(ha\(aq print_expr
+ | unary_print_expr \(aq*\(aq print_expr
+ | unary_print_expr \(aq/\(aq print_expr
+ | unary_print_expr \(aq%\(aq print_expr
+ | unary_print_expr \(aq+\(aq print_expr
+ | unary_print_expr \(aq-\(aq print_expr
+ | unary_print_expr non_unary_print_expr
+ | unary_print_expr \(aq\(ti\(aq print_expr
+ | unary_print_expr NO_MATCH print_expr
+ | unary_print_expr In NAME
+ | unary_print_expr AND newline_opt print_expr
+ | unary_print_expr OR newline_opt print_expr
+ | unary_print_expr \(aq?\(aq print_expr \(aq:\(aq print_expr
+ ;
+.P
+non_unary_print_expr : \(aq(\(aq expr \(aq)\(aq
+ | \(aq!\(aq print_expr
+ | non_unary_print_expr \(aq\(ha\(aq print_expr
+ | non_unary_print_expr \(aq*\(aq print_expr
+ | non_unary_print_expr \(aq/\(aq print_expr
+ | non_unary_print_expr \(aq%\(aq print_expr
+ | non_unary_print_expr \(aq+\(aq print_expr
+ | non_unary_print_expr \(aq-\(aq print_expr
+ | non_unary_print_expr non_unary_print_expr
+ | non_unary_print_expr \(aq\(ti\(aq print_expr
+ | non_unary_print_expr NO_MATCH print_expr
+ | non_unary_print_expr In NAME
+ | \(aq(\(aq multiple_expr_list \(aq)\(aq In NAME
+ | non_unary_print_expr AND newline_opt print_expr
+ | non_unary_print_expr OR newline_opt print_expr
+ | non_unary_print_expr \(aq?\(aq print_expr \(aq:\(aq print_expr
+ | NUMBER
+ | STRING
+ | lvalue
+ | ERE
+ | lvalue INCR
+ | lvalue DECR
+ | INCR lvalue
+ | DECR lvalue
+ | lvalue POW_ASSIGN print_expr
+ | lvalue MOD_ASSIGN print_expr
+ | lvalue MUL_ASSIGN print_expr
+ | lvalue DIV_ASSIGN print_expr
+ | lvalue ADD_ASSIGN print_expr
+ | lvalue SUB_ASSIGN print_expr
+ | lvalue \(aq=\(aq print_expr
+ | FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
+ /* no white space allowed before \(aq(\(aq */
+ | BUILTIN_FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
+ | BUILTIN_FUNC_NAME
+ ;
+.P
+lvalue : NAME
+ | NAME \(aq[\(aq expr_list \(aq]\(aq
+ | \(aq$\(aq expr
+ ;
+.P
+non_unary_input_function : simple_get
+ | simple_get \(aq<\(aq expr
+ | non_unary_expr \(aq|\(aq simple_get
+ ;
+.P
+unary_input_function : unary_expr \(aq|\(aq simple_get
+ ;
+.P
+simple_get : GETLINE
+ | GETLINE lvalue
+ ;
+.P
+newline_opt : /* empty */
+ | newline_opt NEWLINE
+ ;
+.fi
+.P
+.RE
+.P
+This grammar has several ambiguities that shall be resolved as
+follows:
+.IP " *" 4
+Operator precedence and associativity shall be as described in
+.IR "Table 4-1, Expressions in Decreasing Precedence in \fIawk\fP".
+.IP " *" 4
+In case of ambiguity, an
+.BR else
+shall be associated with the most immediately preceding
+.BR if
+that would satisfy the grammar.
+.IP " *" 4
+In some contexts, a
+<slash>
+(\c
+.BR '/' )
+that is used to surround an ERE could also be the division operator.
+This shall be resolved in such a way that wherever the division
+operator could appear, a
+<slash>
+is assumed to be the division operator. (There is no unary division
+operator.)
+.P
+Each expression in an
+.IR awk
+program shall conform to the precedence and associativity rules, even
+when this is not needed to resolve an ambiguity. For example, because
+.BR '$'
+has higher precedence than
+.BR '++' ,
+the string
+.BR \(dq$x++--\(dq
+is not a valid
+.IR awk
+expression, even though it is unambiguously parsed by the grammar as
+.BR \(dq$(x++)--\(dq .
+.P
+One convention that might not be obvious from the formal grammar is
+where
+<newline>
+characters are acceptable. There are several obvious placements such as
+terminating a statement, and a
+<backslash>
+can be used to escape
+<newline>
+characters between any lexical tokens. In addition,
+<newline>
+characters without
+<backslash>
+characters can follow a comma, an open brace, logical AND operator (\c
+.BR \(dq&&\(dq ),
+logical OR operator (\c
+.BR \(dq||\(dq ),
+the
+.BR do
+keyword, the
+.BR else
+keyword, and the closing parenthesis of an
+.BR if ,
+.BR for ,
+or
+.BR while
+statement. For example:
+.sp
+.RS 4
+.nf
+
+{ print $1,
+ $2 }
+.fi
+.P
+.RE
+.SS "Lexical Conventions"
+.P
+The lexical conventions for
+.IR awk
+programs, with respect to the preceding grammar, shall be as follows:
+.IP " 1." 4
+Except as noted,
+.IR awk
+shall recognize the longest possible token or delimiter beginning at a
+given point.
+.IP " 2." 4
+A comment shall consist of any characters beginning with the
+<number-sign>
+character and terminated by, but excluding the next occurrence of, a
+<newline>.
+Comments shall have no effect, except to delimit lexical tokens.
+.IP " 3." 4
+The
+<newline>
+shall be recognized as the token
+.BR NEWLINE .
+.IP " 4." 4
+A
+<backslash>
+character immediately followed by a
+<newline>
+shall have no effect.
+.IP " 5." 4
+The token
+.BR STRING
+shall represent a string constant. A string constant shall begin with
+the character
+.BR '\&"' .
+Within a string constant, a
+<backslash>
+character shall be considered to begin an escape sequence as specified
+in the table in the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Chapter 5" ", " "File Format Notation"
+(\c
+.BR '\e\e' ,
+.BR '\ea' ,
+.BR '\eb' ,
+.BR '\ef' ,
+.BR '\en' ,
+.BR '\er' ,
+.BR '\et' ,
+.BR '\ev' ).
+In addition, the escape sequences in
+.IR "Table 4-2, Escape Sequences in \fIawk\fP"
+shall be recognized. A
+<newline>
+shall not occur within a string constant. A string constant shall be
+terminated by the first unescaped occurrence of the character
+.BR '\&"'
+after the one that begins the string constant. The value of the string
+shall be the sequence of all unescaped characters and values of escape
+sequences between, but not including, the two delimiting
+.BR '\&"'
+characters.
+.IP " 6." 4
+The token
+.BR ERE
+represents an extended regular expression constant. An ERE constant
+shall begin with the
+<slash>
+character. Within an ERE constant, a
+<backslash>
+character shall be considered to begin an escape sequence as
+specified in the table in the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Chapter 5" ", " "File Format Notation".
+In addition, the escape sequences in
+.IR "Table 4-2, Escape Sequences in \fIawk\fP"
+shall be recognized. The application shall ensure that a
+<newline>
+does not occur within an ERE constant. An ERE constant shall be
+terminated by the first unescaped occurrence of the
+<slash>
+character after the one that begins the ERE constant. The extended regular
+expression represented by the ERE constant shall be the sequence of all
+unescaped characters and values of escape sequences between, but not
+including, the two delimiting
+<slash>
+characters.
+.IP " 7." 4
+A
+<blank>
+shall have no effect, except to delimit lexical tokens or within
+.BR STRING
+or
+.BR ERE
+tokens.
+.IP " 8." 4
+The token
+.BR NUMBER
+shall represent a numeric constant. Its form and numeric value shall
+either be equivalent to the
+.BR decimal-floating-constant
+token as specified by the ISO\ C standard, or it shall be a sequence of decimal
+digits and shall be evaluated as an integer constant in decimal. In
+addition, implementations may accept numeric constants with the form
+and numeric value equivalent to the
+.BR hexadecimal-constant
+and
+.BR hexadecimal-floating-constant
+tokens as specified by the ISO\ C standard.
+.RS 4
+.P
+If the value is too large or too small to be representable (see
+.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"),
+the behavior is undefined.
+.RE
+.IP " 9." 4
+A sequence of underscores, digits, and alphabetics from the portable
+character set (see the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 6.1" ", " "Portable Character Set"),
+beginning with an
+<underscore>
+or alphabetic character, shall be considered a word.
+.IP 10. 4
+The following words are keywords that shall be recognized as individual
+tokens; the name of the token is the same as the keyword:
+.TS
+tab(@);
+lw(0.6i)eB leB leB leB leB leB.
+T{
+.nf
+BEGIN
+break
+continue
+T}@T{
+.nf
+delete
+do
+else
+T}@T{
+.nf
+END
+exit
+for
+T}@T{
+.nf
+function
+getline
+if
+T}@T{
+.nf
+in
+next
+print
+T}@T{
+.nf
+printf
+return
+while
+T}
+.TE
+.IP 11. 4
+The following words are names of built-in functions and shall be
+recognized as the token
+.BR BUILTIN_FUNC_NAME :
+.TS
+tab(@);
+lw(0.6i)eB leB leB leB leB leB.
+T{
+.nf
+atan2
+close
+cos
+exp
+T}@T{
+.nf
+gsub
+index
+int
+length
+T}@T{
+.nf
+log
+match
+rand
+sin
+T}@T{
+.nf
+split
+sprintf
+sqrt
+srand
+T}@T{
+.nf
+sub
+substr
+system
+tolower
+T}@T{
+.nf
+toupper
+.fi
+T}
+.TE
+.RS 4
+.P
+The above-listed keywords and names of built-in functions are
+considered reserved words.
+.RE
+.IP 12. 4
+The token
+.BR NAME
+shall consist of a word that is not a keyword or a name of a built-in
+function and is not followed immediately (without any delimiters) by
+the
+.BR '('
+character.
+.IP 13. 4
+The token
+.BR FUNC_NAME
+shall consist of a word that is not a keyword or a name of a built-in
+function, followed immediately (without any delimiters) by the
+.BR '('
+character. The
+.BR '('
+character shall not be included as part of the token.
+.IP 14. 4
+The following two-character sequences shall be recognized as the named
+tokens:
+.TS
+box center tab(@);
+cB | cB | cB | cB
+lB | cf5 | lB | cf5.
+Token Name@Sequence@Token Name@Sequence
+_
+ADD_ASSIGN@+=@NO_MATCH@!~
+SUB_ASSIGN@\-=@EQ@==
+MUL_ASSIGN@*=@LE@<=
+DIV_ASSIGN@/=@GE@>=
+MOD_ASSIGN@%=@NE@!=
+POW_ASSIGN@^=@INCR@++
+OR@||@DECR@\-\|\-
+AND@&&@APPEND@>>
+.TE
+.IP 15. 4
+The following single characters shall be recognized as tokens whose
+names are the character:
+.RS 4
+.sp
+.RS 4
+.nf
+
+<newline> { } ( ) [ ] , ; + - * % \(ha ! > < | ? : \(ti $ =
+.fi
+.P
+.RE
+.RE
+.P
+There is a lexical ambiguity between the token
+.BR ERE
+and the tokens
+.BR '/'
+and
+.BR DIV_ASSIGN .
+When an input sequence begins with a
+<slash>
+character in any syntactic context where the token
+.BR '/'
+or
+.BR DIV_ASSIGN
+could appear as the next token in a valid program, the longer of those
+two tokens that can be recognized shall be recognized. In any other
+syntactic context where the token
+.BR ERE
+could appear as the next token in a valid program, the token
+.BR ERE
+shall be recognized.
+.SH "EXIT STATUS"
+The following exit values shall be returned:
+.IP "\00" 6
+All input files were processed successfully.
+.IP >0 6
+An error occurred.
+.P
+The exit status can be altered within the program by using an
+.BR exit
+expression.
+.SH "CONSEQUENCES OF ERRORS"
+If any
+.IR file
+operand is specified and the named file cannot be accessed,
+.IR awk
+shall write a diagnostic message to standard error and terminate
+without any further action.
+.P
+If the program specified by either the
+.IR program
+operand or a
+.IR progfile
+operand is not a valid
+.IR awk
+program (as specified in the EXTENDED DESCRIPTION section), the
+behavior is undefined.
+.LP
+.IR "The following sections are informative."
+.SH "APPLICATION USAGE"
+The
+.BR index ,
+.BR length ,
+.BR match ,
+and
+.BR substr
+functions should not be confused with similar functions in the ISO\ C standard;
+the
+.IR awk
+versions deal with characters, while the ISO\ C standard deals with bytes.
+.P
+Because the concatenation operation is represented by adjacent
+expressions rather than an explicit operator, it is often necessary to
+use parentheses to enforce the proper evaluation precedence.
+.P
+When using
+.IR awk
+to process pathnames, it is recommended that LC_ALL, or at least
+LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
+since pathnames can contain byte sequences that do not form valid
+characters in some locales, in which case the utility's behavior would
+be undefined. In the POSIX locale each byte is a valid single-byte
+character, and therefore this problem is avoided.
+.P
+On implementations where the
+.BR \(dq==\(dq
+operator checks if strings collate equally, applications needing to
+check whether strings are identical can use:
+.sp
+.RS 4
+.nf
+
+length(a) == length(b) && index(a,b) == 1
+.fi
+.P
+.RE
+.P
+On implementations where the
+.BR \(dq==\(dq
+operator checks if strings are identical, applications needing to
+check whether strings collate equally can use:
+.sp
+.RS 4
+.nf
+
+a <= b && a >= b
+.fi
+.P
+.RE
+.SH EXAMPLES
+The
+.IR awk
+program specified in the command line is most easily specified within
+single-quotes (for example, \(aq\fIprogram\fP\(aq) for applications using
+.IR sh ,
+because
+.IR awk
+programs commonly contain characters that are special to the shell,
+including double-quotes. In the cases where an
+.IR awk
+program contains single-quote characters, it is usually easiest to
+specify most of the program as strings within single-quotes
+concatenated by the shell with quoted single-quote characters. For
+example:
+.sp
+.RS 4
+.nf
+
+awk \(aq/\(aq\e\(aq\(aq/ { print "quote:", $0 }\(aq
+.fi
+.P
+.RE
+.P
+prints all lines from the standard input containing a single-quote
+character, prefixed with
+.IR quote :.
+.P
+The following are examples of simple
+.IR awk
+programs:
+.IP " 1." 4
+Write to the standard output all input lines for which field 3 is
+greater than 5:
+.RS 4
+.sp
+.RS 4
+.nf
+
+$3 > 5
+.fi
+.P
+.RE
+.RE
+.IP " 2." 4
+Write every tenth line:
+.RS 4
+.sp
+.RS 4
+.nf
+
+(NR % 10) == 0
+.fi
+.P
+.RE
+.RE
+.IP " 3." 4
+Write any line with a substring matching the regular expression:
+.RS 4
+.sp
+.RS 4
+.nf
+
+/(G|D)(2[0-9][[:alpha:]]*)/
+.fi
+.P
+.RE
+.RE
+.IP " 4." 4
+Print any line with a substring containing a
+.BR 'G'
+or
+.BR 'D' ,
+followed by a sequence of digits and characters. This example uses
+character classes
+.BR digit
+and
+.BR alpha
+to match language-independent digit and alphabetic characters
+respectively:
+.RS 4
+.sp
+.RS 4
+.nf
+
+/(G|D)([[:digit:][:alpha:]]*)/
+.fi
+.P
+.RE
+.RE
+.IP " 5." 4
+Write any line in which the second field matches the regular expression
+and the fourth field does not:
+.RS 4
+.sp
+.RS 4
+.nf
+
+$2 \(ti /xyz/ && $4 !\(ti /xyz/
+.fi
+.P
+.RE
+.RE
+.IP " 6." 4
+Write any line in which the second field contains a
+<backslash>:
+.RS 4
+.sp
+.RS 4
+.nf
+
+$2 \(ti /\e\e/
+.fi
+.P
+.RE
+.RE
+.IP " 7." 4
+Write any line in which the second field contains a
+<backslash>.
+Note that
+<backslash>-escapes
+are interpreted twice; once in lexical processing of the string and once
+in processing the regular expression:
+.RS 4
+.sp
+.RS 4
+.nf
+
+$2 \(ti "\e\e\e\e"
+.fi
+.P
+.RE
+.RE
+.IP " 8." 4
+Write the second to the last and the last field in each line. Separate
+the fields by a
+<colon>:
+.RS 4
+.sp
+.RS 4
+.nf
+
+{OFS=":";print $(NF-1), $NF}
+.fi
+.P
+.RE
+.RE
+.IP " 9." 4
+Write the line number and number of fields in each line. The three
+strings representing the line number, the
+<colon>,
+and the number of fields are concatenated and that string is written to
+standard output:
+.RS 4
+.sp
+.RS 4
+.nf
+
+{print NR ":" NF}
+.fi
+.P
+.RE
+.RE
+.IP 10. 4
+Write lines longer than 72 characters:
+.RS 4
+.sp
+.RS 4
+.nf
+
+length($0) > 72
+.fi
+.P
+.RE
+.RE
+.IP 11. 4
+Write the first two fields in opposite order separated by
+.BR OFS :
+.RS 4
+.sp
+.RS 4
+.nf
+
+{ print $2, $1 }
+.fi
+.P
+.RE
+.RE
+.IP 12. 4
+Same, with input fields separated by a
+<comma>
+or
+<space>
+and
+<tab>
+characters, or both:
+.RS 4
+.sp
+.RS 4
+.nf
+
+BEGIN { FS = ",[ \et]*|[ \et]+" }
+ { print $2, $1 }
+.fi
+.P
+.RE
+.RE
+.IP 13. 4
+Add up the first column, print sum, and average:
+.RS 4
+.sp
+.RS 4
+.nf
+
+ {s += $1 }
+END {print "sum is ", s, " average is", s/NR}
+.fi
+.P
+.RE
+.RE
+.IP 14. 4
+Write fields in reverse order, one per line (many lines out for each
+line in):
+.RS 4
+.sp
+.RS 4
+.nf
+
+{ for (i = NF; i > 0; --i) print $i }
+.fi
+.P
+.RE
+.RE
+.IP 15. 4
+Write all lines between occurrences of the strings
+.BR start
+and
+.BR stop :
+.RS 4
+.sp
+.RS 4
+.nf
+
+/start/, /stop/
+.fi
+.P
+.RE
+.RE
+.IP 16. 4
+Write all lines whose first field is different from the previous one:
+.RS 4
+.sp
+.RS 4
+.nf
+
+$1 != prev { print; prev = $1 }
+.fi
+.P
+.RE
+.RE
+.IP 17. 4
+Simulate
+.IR echo :
+.RS 4
+.sp
+.RS 4
+.nf
+
+BEGIN {
+ for (i = 1; i < ARGC; ++i)
+ printf("%s%s", ARGV[i], i==ARGC-1?"\en":" ")
+}
+.fi
+.P
+.RE
+.RE
+.IP 18. 4
+Write the path prefixes contained in the
+.IR PATH
+environment variable, one per line:
+.RS 4
+.sp
+.RS 4
+.nf
+
+BEGIN {
+ n = split (ENVIRON["PATH"], path, ":")
+ for (i = 1; i <= n; ++i)
+ print path[i]
+}
+.fi
+.P
+.RE
+.RE
+.IP 19. 4
+If there is a file named
+.BR input
+containing page headers of the form:
+Page #
+.RS 4
+.P
+and a file named
+.BR program
+that contains:
+.sp
+.RS 4
+.nf
+
+/Page/ { $2 = n++; }
+ { print }
+.fi
+.P
+.RE
+then the command line:
+.sp
+.RS 4
+.nf
+
+awk -f program n=5 input
+.fi
+.P
+.RE
+.P
+prints the file
+.BR input ,
+filling in page numbers starting at 5.
+.RE
+.SH RATIONALE
+This description is based on the new
+.IR awk ,
+``nawk'', (see the referenced \fIThe AWK Programming Language\fP), which introduced a number of new features to
+the historical
+.IR awk :
+.IP " 1." 4
+New keywords:
+.BR delete ,
+.BR do ,
+.BR function ,
+.BR return
+.IP " 2." 4
+New built-in functions:
+.BR atan2 ,
+.BR close ,
+.BR cos ,
+.BR gsub ,
+.BR match ,
+.BR rand ,
+.BR sin ,
+.BR srand ,
+.BR sub ,
+.BR system
+.IP " 3." 4
+New predefined variables:
+.BR FNR ,
+.BR ARGC ,
+.BR ARGV ,
+.BR RSTART ,
+.BR RLENGTH ,
+.BR SUBSEP
+.IP " 4." 4
+New expression operators:
+.BR ? ,
+.BR : ,
+.BR , ,
+.BR ^
+.IP " 5." 4
+The
+.BR FS
+variable and the third argument to
+.BR split ,
+now treated as extended regular expressions.
+.IP " 6." 4
+The operator precedence, changed to more closely match the C language.
+Two examples of code that operate differently are:
+.RS 4
+.sp
+.RS 4
+.nf
+
+while ( n /= 10 > 1) ...
+if (!"wk" \(ti /bwk/) ...
+.fi
+.P
+.RE
+.RE
+.P
+Several features have been added based on newer implementations of
+.IR awk :
+.IP " *" 4
+Multiple instances of
+.BR \-f
+.IR progfile
+are permitted.
+.IP " *" 4
+The new option
+.BR \-v
+.IR assignment.
+.IP " *" 4
+The new predefined variable
+.BR ENVIRON .
+.IP " *" 4
+New built-in functions
+.BR toupper
+and
+.BR tolower .
+.IP " *" 4
+More formatting capabilities are added to
+.BR printf
+to match the ISO\ C standard.
+.P
+Earlier versions of this standard required implementations to
+support multiple adjacent
+<semicolon>s,
+lines with one or more
+<semicolon>
+before a rule (\c
+.IR pattern-action
+pairs), and lines with only
+<semicolon>(s).
+These are not required by this standard and are considered poor
+programming practice, but can be accepted by an implementation of
+.IR awk
+as an extension.
+.P
+The overall
+.IR awk
+syntax has always been based on the C language, with a few features
+from the shell command language and other sources. Because of this, it
+is not completely compatible with any other language, which has caused
+confusion for some users. It is not the intent of the standard
+developers to address such issues. A few relatively minor changes
+toward making the language more compatible with the ISO\ C standard were
+made; most of these changes are based on similar changes in recent
+implementations, as described above. There remain several C-language
+conventions that are not in
+.IR awk .
+One of the notable ones is the
+<comma>
+operator, which is commonly used to specify multiple expressions in the
+C language
+.BR for
+statement. Also, there are various places where
+.IR awk
+is more restrictive than the C language regarding the type of
+expression that can be used in a given context. These limitations are
+due to the different features that the
+.IR awk
+language does provide.
+.P
+Regular expressions in
+.IR awk
+have been extended somewhat from historical implementations to make
+them a pure superset of extended regular expressions, as defined by
+POSIX.1\(hy2008 (see the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 9.4" ", " "Extended Regular Expressions").
+The main extensions are internationalization
+features and interval expressions. Historical implementations of
+.IR awk
+have long supported
+<backslash>-escape
+sequences as an extension to extended regular expressions, and
+this extension has been retained despite inconsistency with other
+utilities. The number of escape sequences recognized in both extended
+regular expressions and strings has varied (generally increasing with
+time) among implementations. The set specified by POSIX.1\(hy2008 includes most
+sequences known to be supported by popular implementations and by the
+ISO\ C standard. One sequence that is not supported is hexadecimal value escapes
+beginning with
+.BR '\ex' .
+This would allow values expressed in more than 9 bits to be used within
+.IR awk
+as in the ISO\ C standard. However, because this syntax has a non-deterministic
+length, it does not permit the subsequent character to be a hexadecimal
+digit. This limitation can be dealt with in the C language by the use
+of lexical string concatenation. In the
+.IR awk
+language, concatenation could also be a solution for strings, but not
+for extended regular expressions (either lexical ERE tokens or strings
+used dynamically as regular expressions). Because of this limitation,
+the feature has not been added to POSIX.1\(hy2008.
+.P
+When a string variable is used in a context where an extended regular
+expression normally appears (where the lexical token ERE is used in the
+grammar) the string does not contain the literal
+<slash>
+characters.
+.P
+Some versions of
+.IR awk
+allow the form:
+.sp
+.RS 4
+.nf
+
+func name(args, ... ) { statements }
+.fi
+.P
+.RE
+.P
+This has been deprecated by the authors of the language, who asked that
+it not be specified.
+.P
+Historical implementations of
+.IR awk
+produce an error if a
+.BR next
+statement is executed in a
+.BR BEGIN
+action, and cause
+.IR awk
+to terminate if a
+.BR next
+statement is executed in an
+.BR END
+action. This behavior has not been documented, and it was not believed
+that it was necessary to standardize it.
+.P
+The specification of conversions between string and numeric values is
+much more detailed than in the documentation of historical
+implementations or in the referenced \fIThe AWK Programming Language\fP. Although most of the behavior is
+designed to be intuitive, the details are necessary to ensure
+compatible behavior from different implementations. This is especially
+important in relational expressions since the types of the operands
+determine whether a string or numeric comparison is performed. From the
+perspective of an application developer, it is usually sufficient to
+expect intuitive behavior and to force conversions (by adding zero or
+concatenating a null string) when the type of an expression does not
+obviously match what is needed. The intent has been to specify
+historical practice in almost all cases. The one exception is that, in
+historical implementations, variables and constants maintain both
+string and numeric values after their original value is converted by
+any use. This means that referencing a variable or constant can have
+unexpected side-effects. For example, with historical implementations
+the following program:
+.sp
+.RS 4
+.nf
+
+{
+ a = "+2"
+ b = 2
+ if (NR % 2)
+ c = a + b
+ if (a == b)
+ print "numeric comparison"
+ else
+ print "string comparison"
+}
+.fi
+.P
+.RE
+.P
+would perform a numeric comparison (and output numeric comparison) for
+each odd-numbered line, but perform a string comparison (and output
+string comparison) for each even-numbered line. POSIX.1\(hy2008 ensures that
+comparisons will be numeric if necessary. With historical
+implementations, the following program:
+.sp
+.RS 4
+.nf
+
+BEGIN {
+ OFMT = "%e"
+ print 3.14
+ OFMT = "%f"
+ print 3.14
+}
+.fi
+.P
+.RE
+.P
+would output
+.BR \(dq3.140000e+00\(dq
+twice, because in the second
+.BR print
+statement the constant
+.BR \(dq3.14\(dq
+would have a string value from the previous conversion. POSIX.1\(hy2008 requires
+that the output of the second
+.BR print
+statement be
+.BR \(dq3.140000\(dq .
+The behavior of historical implementations was seen as too unintuitive
+and unpredictable.
+.P
+It was pointed out that with the rules contained in early drafts, the
+following script would print nothing:
+.sp
+.RS 4
+.nf
+
+BEGIN {
+ y[1.5] = 1
+ OFMT = "%e"
+ print y[1.5]
+}
+.fi
+.P
+.RE
+.P
+Therefore, a new variable,
+.BR CONVFMT ,
+was introduced. The
+.BR OFMT
+variable is now restricted to affecting output conversions of numbers
+to strings and
+.BR CONVFMT
+is used for internal conversions, such as comparisons or array
+indexing. The default value is the same as that for
+.BR OFMT ,
+so unless a program changes
+.BR CONVFMT
+(which no historical program would do), it will receive the historical
+behavior associated with internal string conversions.
+.P
+The POSIX
+.IR awk
+lexical and syntactic conventions are specified more formally than in
+other sources. Again the intent has been to specify historical
+practice. One convention that may not be obvious from the formal
+grammar as in other verbal descriptions is where
+<newline>
+characters are acceptable. There are several obvious placements such as
+terminating a statement, and a
+<backslash>
+can be used to escape
+<newline>
+characters between any lexical tokens. In addition,
+<newline>
+characters without
+<backslash>
+characters can follow a comma, an open brace, a logical AND operator (\c
+.BR \(dq&&\(dq ),
+a logical OR operator (\c
+.BR \(dq||\(dq ),
+the
+.BR do
+keyword, the
+.BR else
+keyword, and the closing parenthesis of an
+.BR if ,
+.BR for ,
+or
+.BR while
+statement. For example:
+.sp
+.RS 4
+.nf
+
+{ print $1,
+ $2 }
+.fi
+.P
+.RE
+.P
+The requirement that
+.IR awk
+add a trailing
+<newline>
+to the program argument text is to simplify the grammar, making it
+match a text file in form. There is no way for an application or test
+suite to determine whether a literal
+<newline>
+is added or whether
+.IR awk
+simply acts as if it did.
+.P
+POSIX.1\(hy2008 requires several changes from historical implementations in order
+to support internationalization. Probably the most subtle of these is
+the use of the decimal-point character, defined by the
+.IR LC_NUMERIC
+category of the locale, in representations of floating-point numbers.
+This locale-specific character is used in recognizing numeric input, in
+converting between strings and numeric values, and in formatting
+output. However, regardless of locale, the
+<period>
+character (the decimal-point character of the POSIX locale) is the
+decimal-point character recognized in processing
+.IR awk
+programs (including assignments in command line arguments). This is
+essentially the same convention as the one used in the ISO\ C standard. The
+difference is that the C language includes the
+\fIsetlocale\fR()
+function, which permits an application to modify its locale. Because of
+this capability, a C application begins executing with its locale set
+to the C locale, and only executes in the environment-specified locale
+after an explicit call to
+\fIsetlocale\fR().
+However, adding such an elaborate new feature to the
+.IR awk
+language was seen as inappropriate for POSIX.1\(hy2008. It is possible to execute
+an
+.IR awk
+program explicitly in any desired locale by setting the environment in
+the shell.
+.P
+The undefined behavior resulting from NULs in extended regular
+expressions allows future extensions for the GNU
+.IR gawk
+program to process binary data.
+.P
+The behavior in the case of invalid
+.IR awk
+programs (including lexical, syntactic, and semantic errors) is
+undefined because it was considered overly limiting on implementations
+to specify. In most cases such errors can be expected to produce a
+diagnostic and a non-zero exit status. However, some implementations
+may choose to extend the language in ways that make use of certain
+invalid constructs. Other invalid constructs might be deemed worthy of
+a warning, but otherwise cause some reasonable behavior. Still other
+constructs may be very difficult to detect in some implementations.
+Also, different implementations might detect a given error during an
+initial parsing of the program (before reading any input files) while
+others might detect it when executing the program after reading some
+input. Implementors should be aware that diagnosing errors as early as
+possible and producing useful diagnostics can ease debugging of
+applications, and thus make an implementation more usable.
+.P
+The unspecified behavior from using multi-character
+.BR RS
+values is to allow possible future extensions based on extended regular
+expressions used for record separators. Historical implementations take
+the first character of the string and ignore the others.
+.P
+Unspecified behavior when
+.IR split (\c
+.IR string ,\c
+.IR array ,\c
+<null>)
+is used is to allow a proposed future extension that would split up a
+string into an array of individual characters.
+.P
+In the context of the
+.BR getline
+function, equally good arguments for different precedences of the
+.BR |
+and
+.BR <
+operators can be made. Historical practice has been that:
+.sp
+.RS 4
+.nf
+
+getline < "a" "b"
+.fi
+.P
+.RE
+.P
+is parsed as:
+.sp
+.RS 4
+.nf
+
+( getline < "a" ) "b"
+.fi
+.P
+.RE
+.P
+although many would argue that the intent was that the file
+.BR ab
+should be read. However:
+.sp
+.RS 4
+.nf
+
+getline < "x" + 1
+.fi
+.P
+.RE
+.P
+parses as:
+.sp
+.RS 4
+.nf
+
+getline < ( "x" + 1 )
+.fi
+.P
+.RE
+.P
+Similar problems occur with the
+.BR |
+version of
+.BR getline ,
+particularly in combination with
+.BR $ .
+For example:
+.sp
+.RS 4
+.nf
+
+$"echo hi" | getline
+.fi
+.P
+.RE
+.P
+(This situation is particularly problematic when used in a
+.BR print
+statement, where the
+.BR |getline
+part might be a redirection of the
+.BR print .)
+.P
+Since in most cases such constructs are not (or at least should not) be
+used (because they have a natural ambiguity for which there is no
+conventional parsing), the meaning of these constructs has been made
+explicitly unspecified. (The effect is that a conforming application that
+runs into the problem must parenthesize to resolve the ambiguity.)
+There appeared to be few if any actual uses of such constructs.
+.P
+Grammars can be written that would cause an error under these
+circumstances. Where backwards-compatibility is not a large
+consideration, implementors may wish to use such grammars.
+.P
+Some historical implementations have allowed some built-in functions to
+be called without an argument list, the result being a default argument
+list chosen in some ``reasonable'' way. Use of
+.BR length
+as a synonym for
+.BR "length($0)"
+is the only one of these forms that is thought to be widely known or
+widely used; this particular form is documented in various places (for
+example, most historical
+.IR awk
+reference pages, although not in the referenced \fIThe AWK Programming Language\fP) as legitimate practice.
+With this exception, default argument lists have always been
+undocumented and vaguely defined, and it is not at all clear how (or
+if) they should be generalized to user-defined functions. They add no
+useful functionality and preclude possible future extensions that might
+need to name functions without calling them. Not standardizing them
+seems the simplest course. The standard developers considered that
+.BR length
+merited special treatment, however, since it has been documented in the
+past and sees possibly substantial use in historical programs.
+Accordingly, this usage has been made legitimate, but Issue\ 5
+removed the obsolescent marking for XSI-conforming implementations
+and many otherwise conforming applications depend on this feature.
+.P
+In
+.BR sub
+and
+.BR gsub ,
+if
+.IR repl
+is a string literal (the lexical token
+.BR STRING ),
+then two consecutive
+<backslash>
+characters should be used in the string to ensure a single
+<backslash>
+will precede the
+<ampersand>
+when the resultant string is passed to the function. (For example,
+to specify one literal
+<ampersand>
+in the replacement string, use
+.BR gsub (\c
+.BR ERE ,
+.BR \(dq\e\e&\(dq ).)
+.P
+Historically, the only special character in the
+.IR repl
+argument of
+.BR sub
+and
+.BR gsub
+string functions was the
+<ampersand>
+(\c
+.BR '&' )
+character and preceding it with the
+<backslash>
+character was used to turn off its special meaning.
+.P
+The description in the ISO\ POSIX\(hy2:\|1993 standard introduced behavior such that the
+<backslash>
+character was another special character and it was unspecified whether
+there were any other special characters. This description introduced
+several portability problems, some of which are described below, and so
+it has been replaced with the more historical description. Some of the
+problems include:
+.IP " *" 4
+Historically, to create the replacement string, a script could use
+.BR gsub (\c
+.BR ERE ,
+.BR \(dq\e\e&\(dq ),
+but with the ISO\ POSIX\(hy2:\|1993 standard wording, it was necessary to use
+.BR gsub (\c
+.BR ERE ,
+.BR \(dq\e\e\e\e&\(dq ).
+The
+<backslash>
+characters are doubled here because all string literals are subject to
+lexical analysis, which would reduce each pair of
+<backslash>
+characters to a single
+<backslash>
+before being passed to
+.BR gsub .
+.IP " *" 4
+Since it was unspecified what the special characters were, for portable
+scripts to guarantee that characters are printed literally, each
+character had to be preceded with a
+<backslash>.
+(For example, a portable script had to use
+.BR gsub (\c
+.BR ERE ,
+.BR \(dq\e\eh\e\ei\(dq )
+to produce a replacement string of
+.BR \(dqhi\(dq .)
+.P
+The description for comparisons in the ISO\ POSIX\(hy2:\|1993 standard did not properly describe
+historical practice because of the way numeric strings are compared as
+numbers. The current rules cause the following code:
+.sp
+.RS 4
+.nf
+
+if (0 == "000")
+ print "strange, but true"
+else
+ print "not true"
+.fi
+.P
+.RE
+.P
+to do a numeric comparison, causing the
+.BR if
+to succeed. It should be intuitively obvious that this is incorrect
+behavior, and indeed, no historical implementation of
+.IR awk
+actually behaves this way.
+.P
+To fix this problem, the definition of
+.IR "numeric string"
+was enhanced to include only those values obtained from specific
+circumstances (mostly external sources) where it is not possible to
+determine unambiguously whether the value is intended to be a string or
+a numeric.
+.P
+Variables that are assigned to a numeric string shall also be treated
+as a numeric string. (For example, the notion of a numeric string can
+be propagated across assignments.) In comparisons, all variables having
+the uninitialized value are to be treated as a numeric operand
+evaluating to the numeric value zero.
+.P
+Uninitialized variables include all types of variables including
+scalars, array elements, and fields. The definition of an uninitialized
+value in
+.IR "Variables and Special Variables"
+is necessary to describe the value placed on uninitialized variables
+and on fields that are valid (for example,
+.BR <
+.BR $NF )
+but have no characters in them and to describe how these variables are
+to be used in comparisons. A valid field, such as
+.BR $1 ,
+that has no characters in it can be obtained from an input line of
+.BR \(dq\et\et\(dq
+when
+.BR FS= \c
+.BR '\et' .
+Historically, the comparison (\c
+.BR $1< 10)
+was done numerically after evaluating
+.BR $1
+to the value zero.
+.P
+The phrase ``.\|.\|. also shall have the numeric value of the numeric
+string'' was removed from several sections of the ISO\ POSIX\(hy2:\|1993 standard because is
+specifies an unnecessary implementation detail. It is not necessary for
+POSIX.1\(hy2008 to specify that these objects be assigned two different values.
+It is only necessary to specify that these objects may evaluate to two
+different values depending on context.
+.P
+Historical implementations of
+.IR awk
+did not parse hexadecimal integer or floating constants like
+.BR \(dq0xa\(dq
+and
+.BR \(dq0xap0\(dq .
+Due to an oversight, the 2001 through 2004 editions of this standard
+required support for hexadecimal floating constants. This was due to
+the reference to
+\fIatof\fR().
+This version of the standard allows but does not require implementations
+to use
+\fIatof\fR()
+and includes a description of how floating-point numbers are recognized
+as an alternative to match historic behavior. The intent of this change
+is to allow implementations to recognize floating-point constants
+according to either the ISO/IEC\ 9899:\|1990 standard or ISO/IEC\ 9899:\|1999 standard, and to allow (but not require)
+implementations to recognize hexadecimal integer constants.
+.P
+Historical implementations of
+.IR awk
+did not support floating-point infinities and NaNs in
+.IR "numeric strings" ;
+e.g.,
+.BR \(dq-INF\(dq
+and
+.BR \(dqNaN\(dq .
+However, implementations that use the
+\fIatof\fR()
+or
+\fIstrtod\fR()
+functions to do the conversion picked up support for these values if they
+used a ISO/IEC\ 9899:\|1999 standard version of the function instead of a ISO/IEC\ 9899:\|1990 standard version. Due to
+an oversight, the 2001 through 2004 editions of this standard did not
+allow support for infinities and NaNs, but in this revision support is
+allowed (but not required). This is a silent change to the behavior of
+.IR awk
+programs; for example, in the POSIX locale the expression:
+.sp
+.RS 4
+.nf
+
+("-INF" + 0 < 0)
+.fi
+.P
+.RE
+.P
+formerly had the value 0 because
+.BR \(dq-INF\(dq
+converted to 0, but now it may have the value 0 or 1.
+.SH "FUTURE DIRECTIONS"
+A future version of this standard may require the
+.BR \(dq!=\(dq
+and
+.BR \(dq==\(dq
+operators to perform string comparisons by checking if the strings are
+identical (and not by checking if they collate equally).
+.SH "SEE ALSO"
+.IR "Section 1.3" ", " "Grammar Conventions",
+.IR "\fIgrep\fR\^",
+.IR "\fIlex\fR\^",
+.IR "\fIsed\fR\^"
+.P
+The Base Definitions volume of POSIX.1\(hy2017,
+.IR "Chapter 5" ", " "File Format Notation",
+.IR "Section 6.1" ", " "Portable Character Set",
+.IR "Chapter 8" ", " "Environment Variables",
+.IR "Chapter 9" ", " "Regular Expressions",
+.IR "Section 12.2" ", " "Utility Syntax Guidelines"
+.P
+The System Interfaces volume of POSIX.1\(hy2017,
+.IR "\fIatof\fR\^(\|)",
+.IR "\fIexec\fR\^",
+.IR "\fIisspace\fR\^(\|)",
+.IR "\fIpopen\fR\^(\|)",
+.IR "\fIsetlocale\fR\^(\|)",
+.IR "\fIstrtod\fR\^(\|)"
+.\"
+.SH COPYRIGHT
+Portions of this text are reprinted and reproduced in electronic form
+from IEEE Std 1003.1-2017, Standard for Information Technology
+-- Portable Operating System Interface (POSIX), The Open Group Base
+Specifications Issue 7, 2018 Edition,
+Copyright (C) 2018 by the Institute of
+Electrical and Electronics Engineers, Inc and The Open Group.
+In the event of any discrepancy between this version and the original IEEE and
+The Open Group Standard, the original IEEE and The Open Group Standard
+is the referee document. The original Standard can be obtained online at
+http://www.opengroup.org/unix/online.html .
+.PP
+Any typographical or formatting errors that appear
+in this page are most likely
+to have been introduced during the conversion of the source files to
+man page format. To report such errors, see
+https://www.kernel.org/doc/man-pages/reporting_bugs.html .