diff options
Diffstat (limited to 'man-pages-posix-2003/man1p/sort.1p')
-rw-r--r-- | man-pages-posix-2003/man1p/sort.1p | 517 |
1 files changed, 517 insertions, 0 deletions
diff --git a/man-pages-posix-2003/man1p/sort.1p b/man-pages-posix-2003/man1p/sort.1p new file mode 100644 index 0000000..5e66fb9 --- /dev/null +++ b/man-pages-posix-2003/man1p/sort.1p @@ -0,0 +1,517 @@ +.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved +.TH "SORT" 1P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual" +.\" sort +.SH PROLOG +This manual page is part of the POSIX Programmer's Manual. +The Linux implementation of this interface may differ (consult +the corresponding Linux manual page for details of Linux behavior), +or the interface may not be implemented on Linux. +.SH NAME +sort \- sort, merge, or sequence check text files +.SH SYNOPSIS +.LP +\fBsort\fP \fB[\fP\fB-m\fP\fB][\fP\fB-o\fP \fIoutput\fP\fB][\fP\fB-bdfinru\fP\fB][\fP\fB-t\fP +\fIchar\fP\fB][\fP\fB-k\fP \fIkeydef\fP\fB]\fP\fB...\fP \fB[\fP\fIfile\fP\fB...\fP\fB]\fP\fB +.br +.sp +sort -c\fP \fB[\fP\fB-bdfinru\fP\fB][\fP\fB-t\fP \fIchar\fP\fB][\fP\fB-k\fP +\fIkeydef\fP\fB][\fP\fIfile\fP\fB]\fP\fB +.br +\fP +.SH DESCRIPTION +.LP +The \fIsort\fP utility shall perform one of the following functions: +.IP " 1." 4 +Sort lines of all the named files together and write the result to +the specified output. +.LP +.IP " 2." 4 +Merge lines of all the named (presorted) files together and write +the result to the specified output. +.LP +.IP " 3." 4 +Check that a single input file is correctly presorted. +.LP +.LP +Comparisons shall be based on one or more sort keys extracted from +each line of input (or, if no sort keys are specified, the +entire line up to, but not including, the terminating <newline>), +and shall be performed using the collating sequence of the +current locale. +.SH OPTIONS +.LP +The \fIsort\fP utility shall conform to the Base Definitions volume +of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines, +and the \fB-k\fP \fIkeydef\fP option should +follow the \fB-b\fP, \fB-d\fP, \fB-f\fP, \fB-i\fP, \fB-n\fP, and \fB-r\fP +options. +.LP +The following options shall be supported: +.TP 7 +\fB-c\fP +Check that the single input file is ordered as specified by the arguments +and the collating sequence of the current locale. No +output shall be produced; only the exit code shall be affected. +.TP 7 +\fB-m\fP +Merge only; the input file shall be assumed to be already sorted. +.TP 7 +\fB-o\ \fP \fIoutput\fP +Specify the name of an output file to be used instead of the standard +output. This file can be the same as one of the input +\fIfile\fPs. +.TP 7 +\fB-u\fP +Unique: suppress all but one in each set of lines having equal keys. +If used with the \fB-c\fP option, check that there are no +lines with duplicate keys, in addition to checking that the input +file is sorted. +.sp +.LP +The following options shall override the default ordering rules. When +ordering options appear independent of any key field +specifications, the requested field ordering rules shall be applied +globally to all sort keys. When attached to a specific key (see +\fB-k\fP), the specified ordering options shall override all global +ordering options for that key. +.TP 7 +\fB-d\fP +Specify that only <blank>s and alphanumeric characters, according +to the current setting of \fILC_CTYPE\fP, shall be +significant in comparisons. The behavior is undefined for a sort key +to which \fB-i\fP or \fB-n\fP also applies. +.TP 7 +\fB-f\fP +Consider all lowercase characters that have uppercase equivalents, +according to the current setting of \fILC_CTYPE\fP, to be +the uppercase equivalent for the purposes of comparison. +.TP 7 +\fB-i\fP +Ignore all characters that are non-printable, according to the current +setting of \fILC_CTYPE\fP. +.TP 7 +\fB-n\fP +Restrict the sort key to an initial numeric string, consisting of +optional <blank>s, optional minus sign, and zero or +more digits with an optional radix character and thousands separators +(as defined in the current locale), which shall be sorted by +arithmetic value. An empty digit string shall be treated as zero. +Leading zeros and signs on zeros shall not affect ordering. +.TP 7 +\fB-r\fP +Reverse the sense of comparisons. +.sp +.LP +The treatment of field separators can be altered using the options: +.TP 7 +\fB-b\fP +Ignore leading <blank>s when determining the starting and ending positions +of a restricted sort key. If the \fB-b\fP +option is specified before the first \fB-k\fP option, it shall be +applied to all \fB-k\fP options. Otherwise, the \fB-b\fP +option can be attached independently to each \fB-k\fP \fIfield_start\fP +or \fIfield_end\fP option-argument (see below). +.TP 7 +\fB-t\ \fP \fIchar\fP +Use \fIchar\fP as the field separator character; \fIchar\fP shall +not be considered to be part of a field (although it can be +included in a sort key). Each occurrence of \fIchar\fP shall be significant +(for example, <\fIchar\fP><\fIchar\fP> +delimits an empty field). If \fB-t\fP is not specified, <blank>s shall +be used as default field separators; each maximal +non-empty sequence of <blank>s that follows a non- <blank> shall be +a field separator. +.sp +.LP +Sort keys can be specified using the options: +.TP 7 +\fB-k\ \fP \fIkeydef\fP +The \fIkeydef\fP argument is a restricted sort key field definition. +The format of this definition is: +.sp +.RS +.nf + +\fIfield_start\fP\fB[\fP\fItype\fP\fB][\fP\fB,\fP\fIfield_end\fP\fB[\fP\fItype\fP\fB]]\fP +.fi +.RE +.LP +where \fIfield_start\fP and \fIfield_end\fP define a key field restricted +to a portion of the line (see the EXTENDED +DESCRIPTION section), and \fItype\fP is a modifier from the list of +characters \fB'b'\fP, \fB'd'\fP, \fB'f'\fP, +\fB'i'\fP, \fB'n'\fP, \fB'r'\fP . The \fB'b'\fP modifier shall behave +like the \fB-b\fP option, but shall apply only +to the \fIfield_start\fP or \fIfield_end\fP to which it is attached. +The other modifiers shall behave like the corresponding +options, but shall apply only to the key field to which they are attached; +they shall have this effect if specified with +\fIfield_start\fP, \fIfield_end\fP, or both. If any modifier is attached +to a \fIfield_start\fP or to a \fIfield_end\fP, no +option shall apply to either. Implementations shall support at least +nine occurrences of the \fB-k\fP option, which shall be +significant in command line order. If no \fB-k\fP option is specified, +a default sort key of the entire line shall be used. +.LP +When there are multiple key fields, later keys shall be compared only +after all earlier keys compare equal. Except when the +\fB-u\fP option is specified, lines that otherwise compare equal shall +be ordered as if none of the options \fB-d\fP, \fB-f\fP, +\fB-i\fP, \fB-n\fP, or \fB-k\fP were present (but with \fB-r\fP still +in effect, if it was specified) and with all bytes in the +lines significant to the comparison. The order in which lines that +still compare equal are written is unspecified. +.sp +.SH OPERANDS +.LP +The following operand shall be supported: +.TP 7 +\fIfile\fP +A pathname of a file to be sorted, merged, or checked. If no \fIfile\fP +operands are specified, or if a \fIfile\fP operand is +\fB'-'\fP, the standard input shall be used. +.sp +.SH STDIN +.LP +The standard input shall be used only if no \fIfile\fP operands are +specified, or if a \fIfile\fP operand is \fB'-'\fP . +See the INPUT FILES section. +.SH INPUT FILES +.LP +The input files shall be text files, except that the \fIsort\fP utility +shall add a <newline> to the end of a file ending +with an incomplete last line. +.SH ENVIRONMENT VARIABLES +.LP +The following environment variables shall affect the execution of +\fIsort\fP: +.TP 7 +\fILANG\fP +Provide a default value for the internationalization variables that +are unset or null. (See the Base Definitions volume of +IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables +for +the precedence of internationalization variables used to determine +the values of locale categories.) +.TP 7 +\fILC_ALL\fP +If set to a non-empty string value, override the values of all the +other internationalization variables. +.TP 7 +\fILC_COLLATE\fP +.sp +Determine the locale for ordering rules. +.TP 7 +\fILC_CTYPE\fP +Determine the locale for the interpretation of sequences of bytes +of text data as characters (for example, single-byte as +opposed to multi-byte characters in arguments and input files) and +the behavior of character classification for the \fB-b\fP, +\fB-d\fP, \fB-f\fP, \fB-i\fP, and \fB-n\fP options. +.TP 7 +\fILC_MESSAGES\fP +Determine the locale that should be used to affect the format and +contents of diagnostic messages written to standard +error. +.TP 7 +\fILC_NUMERIC\fP +.sp +Determine the locale for the definition of the radix character and +thousands separator for the \fB-n\fP option. +.TP 7 +\fINLSPATH\fP +Determine the location of message catalogs for the processing of \fILC_MESSAGES +\&.\fP +.sp +.SH ASYNCHRONOUS EVENTS +.LP +Default. +.SH STDOUT +.LP +Unless the \fB-o\fP or \fB-c\fP options are in effect, the standard +output shall contain the sorted input. +.SH STDERR +.LP +The standard error shall be used for diagnostic messages. A warning +message about correcting an incomplete last line of an input +file may be generated, but need not affect the final exit status. +.SH OUTPUT FILES +.LP +If the \fB-o\fP option is in effect, the sorted input shall be written +to the file \fIoutput\fP. +.SH EXTENDED DESCRIPTION +.LP +The notation: +.sp +.RS +.nf + +\fB-k\fP \fIfield_start\fP\fB[\fP\fItype\fP\fB][\fP\fB,\fP\fIfield_end\fP\fB[\fP\fItype\fP\fB]]\fP +.fi +.RE +.LP +shall define a key field that begins at \fIfield_start\fP and ends +at \fIfield_end\fP inclusive, unless \fIfield_start\fP +falls beyond the end of the line or after \fIfield_end\fP, in which +case the key field is empty. A missing \fIfield_end\fP shall +mean the last character of the line. +.LP +A field comprises a maximal sequence of non-separating characters +and, in the absence of option \fB-t\fP, any preceding field +separator. +.LP +The \fIfield_start\fP portion of the \fIkeydef\fP option-argument +shall have the form: +.sp +.RS +.nf + +\fIfield_number\fP\fB[\fP\fB.\fP\fIfirst_character\fP\fB]\fP +.fi +.RE +.LP +Fields and characters within fields shall be numbered starting with +1. The \fIfield_number\fP and \fIfirst_character\fP +pieces, interpreted as positive decimal integers, shall specify the +first character to be used as part of a sort key. If +\fI\&.first_character\fP is omitted, it shall refer to the first character +of the field. +.LP +The \fIfield_end\fP portion of the \fIkeydef\fP option-argument shall +have the form: +.sp +.RS +.nf + +\fIfield_number\fP\fB[\fP\fB.\fP\fIlast_character\fP\fB]\fP +.fi +.RE +.LP +The \fIfield_number\fP shall be as described above for \fIfield_start.\fP +The \fIlast_character\fP piece, interpreted as a +non-negative decimal integer, shall specify the last character to +be used as part of the sort key. If \fIlast_character\fP +evaluates to zero or \fI.last_character\fP is omitted, it shall refer +to the last character of the field specified by +\fIfield_number\fP. +.LP +If the \fB-b\fP option or \fBb\fP type modifier is in effect, characters +within a field shall be counted from the first non- +<blank> in the field. (This shall apply separately to \fIfirst_character\fP +and \fIlast_character\fP.) +.SH EXIT STATUS +.LP +The following exit values shall be returned: +.TP 7 +\ 0 +All input files were output successfully, or \fB-c\fP was specified +and the input file was correctly sorted. +.TP 7 +\ 1 +Under the \fB-c\fP option, the file was not ordered as specified, +or if the \fB-c\fP and \fB-u\fP options were both +specified, two input lines were found with equal keys. +.TP 7 +>1 +An error occurred. +.sp +.SH CONSEQUENCES OF ERRORS +.LP +Default. +.LP +\fIThe following sections are informative.\fP +.SH APPLICATION USAGE +.LP +The default value for \fB-t\fP, <blank>, has different properties +from, for example, \fB-t\fP "<space>". If a line +contains: +.sp +.RS +.nf + +\fB<space><space>foo +\fP +.fi +.RE +.LP +the following treatment would occur with default separation as opposed +to specifically selecting a <space>: +.TS C +center; l l l. +\fBField\fP \fBDefault\fP \fB-t "<space>"\fP +1 <space><space>foo \fIempty\fP +2 \fIempty\fP \fIempty\fP +3 \fIempty\fP foo +.TE +.LP +The leading field separator itself is included in a field when \fB-t\fP +is not used. For example, this command returns an exit +status of zero, meaning the input was already sorted: +.sp +.RS +.nf + +\fBsort -c -k 2 <<eof +y<tab>b +x<space>a +eof +\fP +.fi +.RE +.LP +(assuming that a <tab> precedes the <space> in the current collating +sequence). The field separator is not included +in a field when it is explicitly set via \fB-t\fP. This is historical +practice and allows usage such as: +.sp +.RS +.nf + +\fBsort -t "|" -k 2n <<eof +Atlanta|425022|Georgia +Birmingham|284413|Alabama +Columbia|100385|South Carolina +eof +\fP +.fi +.RE +.LP +where the second field can be correctly sorted numerically without +regard to the non-numeric field separator. +.LP +The wording in the OPTIONS section clarifies that the \fB-b\fP, \fB-d\fP, +\fB-f\fP, \fB-i\fP, \fB-n\fP, and \fB-r\fP +options have to come before the first sort key specified if they are +intended to apply to all specified keys. The way it is +described in this volume of IEEE\ Std\ 1003.1-2001 matches historical +practice, not historical documentation. The results +are unspecified if these options are specified after a \fB-k\fP option. +.LP +The \fB-f\fP option might not work as expected in locales where there +is not a one-to-one mapping between an uppercase and a +lowercase letter. +.SH EXAMPLES +.IP " 1." 4 +The following command sorts the contents of \fBinfile\fP with the +second field as the sort key: +.sp +.RS +.nf + +\fBsort -k 2,2 infile +\fP +.fi +.RE +.LP +.IP " 2." 4 +The following command sorts, in reverse order, the contents of \fBinfile1\fP +and \fBinfile2\fP, placing the output in +\fBoutfile\fP and using the second character of the second field as +the sort key (assuming that the first character of the second +field is the field separator): +.sp +.RS +.nf + +\fBsort -r -o outfile -k 2.2,2.2 infile1 infile2 +\fP +.fi +.RE +.LP +.IP " 3." 4 +The following command sorts the contents of \fBinfile1\fP and \fBinfile2\fP +using the second non- <blank> of the second +field as the sort key: +.sp +.RS +.nf + +\fBsort -k 2.2b,2.2b infile1 infile2 +\fP +.fi +.RE +.LP +.IP " 4." 4 +The following command prints the System\ V password file (user database) +sorted by the numeric user ID (the third +colon-separated field): +.sp +.RS +.nf + +\fBsort -t : -k 3,3n /etc/passwd +\fP +.fi +.RE +.LP +.IP " 5." 4 +The following command prints the lines of the already sorted file +\fBinfile\fP, suppressing all but one occurrence of lines +having the same third field: +.sp +.RS +.nf + +\fBsort -um -k 3.1,3.0 infile +\fP +.fi +.RE +.LP +.SH RATIONALE +.LP +Examples in some historical documentation state that options \fB-um\fP +with one input file keep the first in each set of lines +with equal keys. This behavior was deemed to be an implementation +artifact and was not standardized. +.LP +The \fB-z\fP option was omitted; it is not standard practice on most +systems and is inconsistent with using \fIsort\fP to sort +several files individually and then merge them together. The text +concerning \fB-z\fP in historical documentation appeared to +require implementations to determine the proper buffer length during +the sort phase of operation, but not during the merge. +.LP +The \fB-y\fP option was omitted because of non-portability. The \fB-M\fP +option, present in System V, was omitted because of +non-portability in international usage. +.LP +An undocumented \fB-T\fP option exists in some implementations. It +is used to specify a directory for intermediate files. +Implementations are encouraged to support the use of the \fITMPDIR\fP +environment variable instead of adding an option to support +this functionality. +.LP +The \fB-k\fP option was added to satisfy two objections. First, the +zero-based counting used by \fIsort\fP is not consistent +with other utility conventions. Second, it did not meet syntax guideline +requirements. +.LP +Historical documentation indicates that "setting \fB-n\fP implies +\fB-b\fP". The description of \fB-n\fP already states +that optional leading <blank>s are tolerated in doing the comparison. +If \fB-b\fP is enabled, rather than implied, by +\fB-n\fP, this has unusual side effects. When a character offset is +used in a column of numbers (for example, to sort modulo 100), +that offset is measured relative to the most significant digit, not +to the column. Based upon a recommendation from the author of +the original \fIsort\fP utility, the \fB-b\fP implication has been +omitted from this volume of IEEE\ Std\ 1003.1-2001, +and an application wishing to achieve the previously mentioned side +effects has to code the \fB-b\fP flag explicitly. +.SH FUTURE DIRECTIONS +.LP +None. +.SH SEE ALSO +.LP +\fIcomm\fP, \fIjoin\fP, \fIuniq\fP, the System +Interfaces volume of IEEE\ Std\ 1003.1-2001, \fItoupper\fP() +.SH COPYRIGHT +Portions of this text are reprinted and reproduced in electronic form +from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology +-- Portable Operating System Interface (POSIX), The Open Group Base +Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of +Electrical and Electronics Engineers, Inc and The Open Group. In the +event of any discrepancy between this version and the original IEEE and +The Open Group Standard, the original IEEE and The Open Group Standard +is the referee document. The original Standard can be obtained online at +http://www.opengroup.org/unix/online.html . |