summaryrefslogtreecommitdiffstats
path: root/man1p/sort.1p
diff options
context:
space:
mode:
Diffstat (limited to 'man1p/sort.1p')
-rw-r--r--man1p/sort.1p512
1 files changed, 512 insertions, 0 deletions
diff --git a/man1p/sort.1p b/man1p/sort.1p
new file mode 100644
index 000000000..3714cceaf
--- /dev/null
+++ b/man1p/sort.1p
@@ -0,0 +1,512 @@
+.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved
+.TH "SORT" P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual"
+.\" sort
+.SH NAME
+sort \- sort, merge, or sequence check text files
+.SH SYNOPSIS
+.LP
+\fBsort\fP \fB[\fP\fB-m\fP\fB][\fP\fB-o\fP \fIoutput\fP\fB][\fP\fB-bdfinru\fP\fB][\fP\fB-t\fP
+\fIchar\fP\fB][\fP\fB-k\fP \fIkeydef\fP\fB]\fP\fB...\fP \fB[\fP\fIfile\fP\fB...\fP\fB]\fP\fB
+.br
+.sp
+sort -c\fP \fB[\fP\fB-bdfinru\fP\fB][\fP\fB-t\fP \fIchar\fP\fB][\fP\fB-k\fP
+\fIkeydef\fP\fB][\fP\fIfile\fP\fB]\fP\fB
+.br
+\fP
+.SH DESCRIPTION
+.LP
+The \fIsort\fP utility shall perform one of the following functions:
+.IP " 1." 4
+Sort lines of all the named files together and write the result to
+the specified output.
+.LP
+.IP " 2." 4
+Merge lines of all the named (presorted) files together and write
+the result to the specified output.
+.LP
+.IP " 3." 4
+Check that a single input file is correctly presorted.
+.LP
+.LP
+Comparisons shall be based on one or more sort keys extracted from
+each line of input (or, if no sort keys are specified, the
+entire line up to, but not including, the terminating <newline>),
+and shall be performed using the collating sequence of the
+current locale.
+.SH OPTIONS
+.LP
+The \fIsort\fP utility shall conform to the Base Definitions volume
+of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines,
+and the \fB-k\fP \fIkeydef\fP option should
+follow the \fB-b\fP, \fB-d\fP, \fB-f\fP, \fB-i\fP, \fB-n\fP, and \fB-r\fP
+options.
+.LP
+The following options shall be supported:
+.TP 7
+\fB-c\fP
+Check that the single input file is ordered as specified by the arguments
+and the collating sequence of the current locale. No
+output shall be produced; only the exit code shall be affected.
+.TP 7
+\fB-m\fP
+Merge only; the input file shall be assumed to be already sorted.
+.TP 7
+\fB-o\ \fP \fIoutput\fP
+Specify the name of an output file to be used instead of the standard
+output. This file can be the same as one of the input
+\fIfile\fPs.
+.TP 7
+\fB-u\fP
+Unique: suppress all but one in each set of lines having equal keys.
+If used with the \fB-c\fP option, check that there are no
+lines with duplicate keys, in addition to checking that the input
+file is sorted.
+.sp
+.LP
+The following options shall override the default ordering rules. When
+ordering options appear independent of any key field
+specifications, the requested field ordering rules shall be applied
+globally to all sort keys. When attached to a specific key (see
+\fB-k\fP), the specified ordering options shall override all global
+ordering options for that key.
+.TP 7
+\fB-d\fP
+Specify that only <blank>s and alphanumeric characters, according
+to the current setting of \fILC_CTYPE ,\fP shall be
+significant in comparisons. The behavior is undefined for a sort key
+to which \fB-i\fP or \fB-n\fP also applies.
+.TP 7
+\fB-f\fP
+Consider all lowercase characters that have uppercase equivalents,
+according to the current setting of \fILC_CTYPE ,\fP to be
+the uppercase equivalent for the purposes of comparison.
+.TP 7
+\fB-i\fP
+Ignore all characters that are non-printable, according to the current
+setting of \fILC_CTYPE .\fP
+.TP 7
+\fB-n\fP
+Restrict the sort key to an initial numeric string, consisting of
+optional <blank>s, optional minus sign, and zero or
+more digits with an optional radix character and thousands separators
+(as defined in the current locale), which shall be sorted by
+arithmetic value. An empty digit string shall be treated as zero.
+Leading zeros and signs on zeros shall not affect ordering.
+.TP 7
+\fB-r\fP
+Reverse the sense of comparisons.
+.sp
+.LP
+The treatment of field separators can be altered using the options:
+.TP 7
+\fB-b\fP
+Ignore leading <blank>s when determining the starting and ending positions
+of a restricted sort key. If the \fB-b\fP
+option is specified before the first \fB-k\fP option, it shall be
+applied to all \fB-k\fP options. Otherwise, the \fB-b\fP
+option can be attached independently to each \fB-k\fP \fIfield_start\fP
+or \fIfield_end\fP option-argument (see below).
+.TP 7
+\fB-t\ \fP \fIchar\fP
+Use \fIchar\fP as the field separator character; \fIchar\fP shall
+not be considered to be part of a field (although it can be
+included in a sort key). Each occurrence of \fIchar\fP shall be significant
+(for example, <\fIchar\fP><\fIchar\fP>
+delimits an empty field). If \fB-t\fP is not specified, <blank>s shall
+be used as default field separators; each maximal
+non-empty sequence of <blank>s that follows a non- <blank> shall be
+a field separator.
+.sp
+.LP
+Sort keys can be specified using the options:
+.TP 7
+\fB-k\ \fP \fIkeydef\fP
+The \fIkeydef\fP argument is a restricted sort key field definition.
+The format of this definition is:
+.sp
+.RS
+.nf
+
+\fIfield_start\fP\fB[\fP\fItype\fP\fB][\fP\fB,\fP\fIfield_end\fP\fB[\fP\fItype\fP\fB]]\fP
+.fi
+.RE
+.LP
+where \fIfield_start\fP and \fIfield_end\fP define a key field restricted
+to a portion of the line (see the EXTENDED
+DESCRIPTION section), and \fItype\fP is a modifier from the list of
+characters \fB'b'\fP , \fB'd'\fP , \fB'f'\fP ,
+\fB'i'\fP , \fB'n'\fP , \fB'r'\fP . The \fB'b'\fP modifier shall behave
+like the \fB-b\fP option, but shall apply only
+to the \fIfield_start\fP or \fIfield_end\fP to which it is attached.
+The other modifiers shall behave like the corresponding
+options, but shall apply only to the key field to which they are attached;
+they shall have this effect if specified with
+\fIfield_start\fP, \fIfield_end\fP, or both. If any modifier is attached
+to a \fIfield_start\fP or to a \fIfield_end\fP, no
+option shall apply to either. Implementations shall support at least
+nine occurrences of the \fB-k\fP option, which shall be
+significant in command line order. If no \fB-k\fP option is specified,
+a default sort key of the entire line shall be used.
+.LP
+When there are multiple key fields, later keys shall be compared only
+after all earlier keys compare equal. Except when the
+\fB-u\fP option is specified, lines that otherwise compare equal shall
+be ordered as if none of the options \fB-d\fP, \fB-f\fP,
+\fB-i\fP, \fB-n\fP, or \fB-k\fP were present (but with \fB-r\fP still
+in effect, if it was specified) and with all bytes in the
+lines significant to the comparison. The order in which lines that
+still compare equal are written is unspecified.
+.sp
+.SH OPERANDS
+.LP
+The following operand shall be supported:
+.TP 7
+\fIfile\fP
+A pathname of a file to be sorted, merged, or checked. If no \fIfile\fP
+operands are specified, or if a \fIfile\fP operand is
+\fB'-'\fP , the standard input shall be used.
+.sp
+.SH STDIN
+.LP
+The standard input shall be used only if no \fIfile\fP operands are
+specified, or if a \fIfile\fP operand is \fB'-'\fP .
+See the INPUT FILES section.
+.SH INPUT FILES
+.LP
+The input files shall be text files, except that the \fIsort\fP utility
+shall add a <newline> to the end of a file ending
+with an incomplete last line.
+.SH ENVIRONMENT VARIABLES
+.LP
+The following environment variables shall affect the execution of
+\fIsort\fP:
+.TP 7
+\fILANG\fP
+Provide a default value for the internationalization variables that
+are unset or null. (See the Base Definitions volume of
+IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables
+for
+the precedence of internationalization variables used to determine
+the values of locale categories.)
+.TP 7
+\fILC_ALL\fP
+If set to a non-empty string value, override the values of all the
+other internationalization variables.
+.TP 7
+\fILC_COLLATE\fP
+.sp
+Determine the locale for ordering rules.
+.TP 7
+\fILC_CTYPE\fP
+Determine the locale for the interpretation of sequences of bytes
+of text data as characters (for example, single-byte as
+opposed to multi-byte characters in arguments and input files) and
+the behavior of character classification for the \fB-b\fP,
+\fB-d\fP, \fB-f\fP, \fB-i\fP, and \fB-n\fP options.
+.TP 7
+\fILC_MESSAGES\fP
+Determine the locale that should be used to affect the format and
+contents of diagnostic messages written to standard
+error.
+.TP 7
+\fILC_NUMERIC\fP
+.sp
+Determine the locale for the definition of the radix character and
+thousands separator for the \fB-n\fP option.
+.TP 7
+\fINLSPATH\fP
+Determine the location of message catalogs for the processing of \fILC_MESSAGES
+\&.\fP
+.sp
+.SH ASYNCHRONOUS EVENTS
+.LP
+Default.
+.SH STDOUT
+.LP
+Unless the \fB-o\fP or \fB-c\fP options are in effect, the standard
+output shall contain the sorted input.
+.SH STDERR
+.LP
+The standard error shall be used for diagnostic messages. A warning
+message about correcting an incomplete last line of an input
+file may be generated, but need not affect the final exit status.
+.SH OUTPUT FILES
+.LP
+If the \fB-o\fP option is in effect, the sorted input shall be written
+to the file \fIoutput\fP.
+.SH EXTENDED DESCRIPTION
+.LP
+The notation:
+.sp
+.RS
+.nf
+
+\fB-k\fP \fIfield_start\fP\fB[\fP\fItype\fP\fB][\fP\fB,\fP\fIfield_end\fP\fB[\fP\fItype\fP\fB]]\fP
+.fi
+.RE
+.LP
+shall define a key field that begins at \fIfield_start\fP and ends
+at \fIfield_end\fP inclusive, unless \fIfield_start\fP
+falls beyond the end of the line or after \fIfield_end\fP, in which
+case the key field is empty. A missing \fIfield_end\fP shall
+mean the last character of the line.
+.LP
+A field comprises a maximal sequence of non-separating characters
+and, in the absence of option \fB-t\fP, any preceding field
+separator.
+.LP
+The \fIfield_start\fP portion of the \fIkeydef\fP option-argument
+shall have the form:
+.sp
+.RS
+.nf
+
+\fIfield_number\fP\fB[\fP\fB.\fP\fIfirst_character\fP\fB]\fP
+.fi
+.RE
+.LP
+Fields and characters within fields shall be numbered starting with
+1. The \fIfield_number\fP and \fIfirst_character\fP
+pieces, interpreted as positive decimal integers, shall specify the
+first character to be used as part of a sort key. If
+\fI\&.first_character\fP is omitted, it shall refer to the first character
+of the field.
+.LP
+The \fIfield_end\fP portion of the \fIkeydef\fP option-argument shall
+have the form:
+.sp
+.RS
+.nf
+
+\fIfield_number\fP\fB[\fP\fB.\fP\fIlast_character\fP\fB]\fP
+.fi
+.RE
+.LP
+The \fIfield_number\fP shall be as described above for \fIfield_start.\fP
+The \fIlast_character\fP piece, interpreted as a
+non-negative decimal integer, shall specify the last character to
+be used as part of the sort key. If \fIlast_character\fP
+evaluates to zero or \fI.last_character\fP is omitted, it shall refer
+to the last character of the field specified by
+\fIfield_number\fP.
+.LP
+If the \fB-b\fP option or \fBb\fP type modifier is in effect, characters
+within a field shall be counted from the first non-
+<blank> in the field. (This shall apply separately to \fIfirst_character\fP
+and \fIlast_character\fP.)
+.SH EXIT STATUS
+.LP
+The following exit values shall be returned:
+.TP 7
+\ 0
+All input files were output successfully, or \fB-c\fP was specified
+and the input file was correctly sorted.
+.TP 7
+\ 1
+Under the \fB-c\fP option, the file was not ordered as specified,
+or if the \fB-c\fP and \fB-u\fP options were both
+specified, two input lines were found with equal keys.
+.TP 7
+>1
+An error occurred.
+.sp
+.SH CONSEQUENCES OF ERRORS
+.LP
+Default.
+.LP
+\fIThe following sections are informative.\fP
+.SH APPLICATION USAGE
+.LP
+The default value for \fB-t\fP, <blank>, has different properties
+from, for example, \fB-t\fP "<space>". If a line
+contains:
+.sp
+.RS
+.nf
+
+\fB<space><space>foo
+\fP
+.fi
+.RE
+.LP
+the following treatment would occur with default separation as opposed
+to specifically selecting a <space>:
+.TS C
+center; l l l.
+\fBField\fP \fBDefault\fP \fB-t "<space>"\fP
+1 <space><space>foo \fIempty\fP
+2 \fIempty\fP \fIempty\fP
+3 \fIempty\fP foo
+.TE
+.LP
+The leading field separator itself is included in a field when \fB-t\fP
+is not used. For example, this command returns an exit
+status of zero, meaning the input was already sorted:
+.sp
+.RS
+.nf
+
+\fBsort -c -k 2 <<eof
+y<tab>b
+x<space>a
+eof
+\fP
+.fi
+.RE
+.LP
+(assuming that a <tab> precedes the <space> in the current collating
+sequence). The field separator is not included
+in a field when it is explicitly set via \fB-t\fP. This is historical
+practice and allows usage such as:
+.sp
+.RS
+.nf
+
+\fBsort -t "|" -k 2n <<eof
+Atlanta|425022|Georgia
+Birmingham|284413|Alabama
+Columbia|100385|South Carolina
+eof
+\fP
+.fi
+.RE
+.LP
+where the second field can be correctly sorted numerically without
+regard to the non-numeric field separator.
+.LP
+The wording in the OPTIONS section clarifies that the \fB-b\fP, \fB-d\fP,
+\fB-f\fP, \fB-i\fP, \fB-n\fP, and \fB-r\fP
+options have to come before the first sort key specified if they are
+intended to apply to all specified keys. The way it is
+described in this volume of IEEE\ Std\ 1003.1-2001 matches historical
+practice, not historical documentation. The results
+are unspecified if these options are specified after a \fB-k\fP option.
+.LP
+The \fB-f\fP option might not work as expected in locales where there
+is not a one-to-one mapping between an uppercase and a
+lowercase letter.
+.SH EXAMPLES
+.IP " 1." 4
+The following command sorts the contents of \fBinfile\fP with the
+second field as the sort key:
+.sp
+.RS
+.nf
+
+\fBsort -k 2,2 infile
+\fP
+.fi
+.RE
+.LP
+.IP " 2." 4
+The following command sorts, in reverse order, the contents of \fBinfile1\fP
+and \fBinfile2\fP, placing the output in
+\fBoutfile\fP and using the second character of the second field as
+the sort key (assuming that the first character of the second
+field is the field separator):
+.sp
+.RS
+.nf
+
+\fBsort -r -o outfile -k 2.2,2.2 infile1 infile2
+\fP
+.fi
+.RE
+.LP
+.IP " 3." 4
+The following command sorts the contents of \fBinfile1\fP and \fBinfile2\fP
+using the second non- <blank> of the second
+field as the sort key:
+.sp
+.RS
+.nf
+
+\fBsort -k 2.2b,2.2b infile1 infile2
+\fP
+.fi
+.RE
+.LP
+.IP " 4." 4
+The following command prints the System\ V password file (user database)
+sorted by the numeric user ID (the third
+colon-separated field):
+.sp
+.RS
+.nf
+
+\fBsort -t : -k 3,3n /etc/passwd
+\fP
+.fi
+.RE
+.LP
+.IP " 5." 4
+The following command prints the lines of the already sorted file
+\fBinfile\fP, suppressing all but one occurrence of lines
+having the same third field:
+.sp
+.RS
+.nf
+
+\fBsort -um -k 3.1,3.0 infile
+\fP
+.fi
+.RE
+.LP
+.SH RATIONALE
+.LP
+Examples in some historical documentation state that options \fB-um\fP
+with one input file keep the first in each set of lines
+with equal keys. This behavior was deemed to be an implementation
+artifact and was not standardized.
+.LP
+The \fB-z\fP option was omitted; it is not standard practice on most
+systems and is inconsistent with using \fIsort\fP to sort
+several files individually and then merge them together. The text
+concerning \fB-z\fP in historical documentation appeared to
+require implementations to determine the proper buffer length during
+the sort phase of operation, but not during the merge.
+.LP
+The \fB-y\fP option was omitted because of non-portability. The \fB-M\fP
+option, present in System V, was omitted because of
+non-portability in international usage.
+.LP
+An undocumented \fB-T\fP option exists in some implementations. It
+is used to specify a directory for intermediate files.
+Implementations are encouraged to support the use of the \fITMPDIR\fP
+environment variable instead of adding an option to support
+this functionality.
+.LP
+The \fB-k\fP option was added to satisfy two objections. First, the
+zero-based counting used by \fIsort\fP is not consistent
+with other utility conventions. Second, it did not meet syntax guideline
+requirements.
+.LP
+Historical documentation indicates that "setting \fB-n\fP implies
+\fB-b\fP". The description of \fB-n\fP already states
+that optional leading <blank>s are tolerated in doing the comparison.
+If \fB-b\fP is enabled, rather than implied, by
+\fB-n\fP, this has unusual side effects. When a character offset is
+used in a column of numbers (for example, to sort modulo 100),
+that offset is measured relative to the most significant digit, not
+to the column. Based upon a recommendation from the author of
+the original \fIsort\fP utility, the \fB-b\fP implication has been
+omitted from this volume of IEEE\ Std\ 1003.1-2001,
+and an application wishing to achieve the previously mentioned side
+effects has to code the \fB-b\fP flag explicitly.
+.SH FUTURE DIRECTIONS
+.LP
+None.
+.SH SEE ALSO
+.LP
+\fIcomm\fP , \fIjoin\fP , \fIuniq\fP , the System
+Interfaces volume of IEEE\ Std\ 1003.1-2001, \fItoupper\fP()
+.SH COPYRIGHT
+Portions of this text are reprinted and reproduced in electronic form
+from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
+-- Portable Operating System Interface (POSIX), The Open Group Base
+Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
+Electrical and Electronics Engineers, Inc and The Open Group. In the
+event of any discrepancy between this version and the original IEEE and
+The Open Group Standard, the original IEEE and The Open Group Standard
+is the referee document. The original Standard can be obtained online at
+http://www.opengroup.org/unix/online.html .