summaryrefslogtreecommitdiffstats
path: root/man-pages-posix-2013/man1p/tr.1p
diff options
context:
space:
mode:
Diffstat (limited to 'man-pages-posix-2013/man1p/tr.1p')
-rw-r--r--man-pages-posix-2013/man1p/tr.1p699
1 files changed, 699 insertions, 0 deletions
diff --git a/man-pages-posix-2013/man1p/tr.1p b/man-pages-posix-2013/man1p/tr.1p
new file mode 100644
index 0000000..8706a0e
--- /dev/null
+++ b/man-pages-posix-2013/man1p/tr.1p
@@ -0,0 +1,699 @@
+'\" et
+.TH TR "1P" 2013 "IEEE/The Open Group" "POSIX Programmer's Manual"
+.SH PROLOG
+This manual page is part of the POSIX Programmer's Manual.
+The Linux implementation of this interface may differ (consult
+the corresponding Linux manual page for details of Linux behavior),
+or the interface may not be implemented on Linux.
+
+.SH NAME
+tr
+\(em translate characters
+.SH SYNOPSIS
+.LP
+.nf
+tr \fB[\fR\(mic|\(miC\fB] [\fR\(mis\fB] \fIstring1 string2\fR
+.P
+tr \(mis \fB[\fR\(mic|\(miC\fB] \fIstring1\fR
+.P
+tr \(mid \fB[\fR\(mic|\(miC\fB] \fIstring1\fR
+.P
+tr \(mids \fB[\fR\(mic|\(miC\fB] \fIstring1 string2\fR
+.fi
+.SH DESCRIPTION
+The
+.IR tr
+utility shall copy the standard input to the standard output with
+substitution or deletion of selected characters. The options specified
+and the
+.IR string1
+and
+.IR string2
+operands shall control translations that occur while copying characters
+and single-character collating elements.
+.SH OPTIONS
+The
+.IR tr
+utility shall conform to the Base Definitions volume of POSIX.1\(hy2008,
+.IR "Section 12.2" ", " "Utility Syntax Guidelines".
+.P
+The following options shall be supported:
+.IP "\fB\(mic\fP" 10
+Complement the set of values specified by
+.IR string1 .
+See the EXTENDED DESCRIPTION section.
+.IP "\fB\(miC\fP" 10
+Complement the set of characters specified by
+.IR string1 .
+See the EXTENDED DESCRIPTION section.
+.IP "\fB\(mid\fP" 10
+Delete all occurrences of input characters that are specified by
+.IR string1 .
+.IP "\fB\(mis\fP" 10
+Replace instances of repeated characters with a single character, as
+described in the EXTENDED DESCRIPTION section.
+.SH OPERANDS
+The following operands shall be supported:
+.IP "\fIstring1\fR,\ \fIstring2\fR" 10
+.br
+Translation control strings. Each string shall represent a set of
+characters to be converted into an array of characters used for the
+translation. For a detailed description of how the strings are
+interpreted, see the EXTENDED DESCRIPTION section.
+.SH STDIN
+The standard input can be any type of file.
+.SH "INPUT FILES"
+None.
+.SH "ENVIRONMENT VARIABLES"
+The following environment variables shall affect the execution of
+.IR tr :
+.IP "\fILANG\fP" 10
+Provide a default value for the internationalization variables that are
+unset or null. (See the Base Definitions volume of POSIX.1\(hy2008,
+.IR "Section 8.2" ", " "Internationalization Variables"
+for the precedence of internationalization variables used to determine
+the values of locale categories.)
+.IP "\fILC_ALL\fP" 10
+If set to a non-empty string value, override the values of all the
+other internationalization variables.
+.IP "\fILC_COLLATE\fP" 10
+.br
+Determine the locale for the behavior of range expressions and
+equivalence classes.
+.IP "\fILC_CTYPE\fP" 10
+Determine the locale for the interpretation of sequences of bytes of
+text data as characters (for example, single-byte as opposed to
+multi-byte characters in arguments) and the behavior of character
+classes.
+.IP "\fILC_MESSAGES\fP" 10
+.br
+Determine the locale that should be used to affect the format and
+contents of diagnostic messages written to standard error.
+.IP "\fINLSPATH\fP" 10
+Determine the location of message catalogs for the processing of
+.IR LC_MESSAGES .
+.SH "ASYNCHRONOUS EVENTS"
+Default.
+.SH STDOUT
+The
+.IR tr
+output shall be identical to the input, with the exception of the
+specified transformations.
+.SH STDERR
+The standard error shall be used only for diagnostic messages.
+.SH "OUTPUT FILES"
+None.
+.SH "EXTENDED DESCRIPTION"
+The operands
+.IR string1
+and
+.IR string2
+(if specified) define two arrays of characters. The constructs in the
+following list can be used to specify characters or single-character
+collating elements. If any of the constructs result in multi-character
+collating elements,
+.IR tr
+shall exclude, without a diagnostic, those multi-character elements
+from the resulting array.
+.IP "\fIcharacter\fR" 10
+Any character not described by one of the conventions below shall
+represent itself.
+.IP "\e\fIoctal\fR" 10
+Octal sequences can be used to represent characters with specific coded
+values. An octal sequence shall consist of a
+<backslash>
+followed by the longest sequence of one, two, or three-octal-digit
+characters (01234567). The sequence shall cause the value whose encoding
+is represented by the one, two, or three-digit octal integer to be placed
+into the array. Multi-byte characters require multiple, concatenated
+escape sequences of this type, including the leading
+<backslash>
+for each byte.
+.IP "\e\fIcharacter\fR" 10
+The
+<backslash>-escape
+sequences in the Base Definitions volume of POSIX.1\(hy2008,
+.IR "Table 5-1" ", " "Escape Sequences and Associated Actions"
+(\c
+.BR '\e\e' ,
+.BR '\ea' ,
+.BR '\eb' ,
+.BR '\ef' ,
+.BR '\en' ,
+.BR '\er' ,
+.BR '\et' ,
+.BR '\ev' )
+shall be supported. The results of using any other character, other
+than an octal digit, following the
+<backslash>
+are unspecified. Also, if there is no character following the
+<backslash>,
+the results are unspecified.
+.IP "\fIc\fR\(mi\fIc\fR" 10
+In the POSIX locale, this construct shall represent the range of
+collating elements between the range endpoints (as long as neither
+endpoint is an octal sequence of the form \e\fIoctal\fP), inclusive, as
+defined by the collation sequence. The characters or collating elements
+in the range shall be placed in the array in ascending collation
+sequence. If the second endpoint precedes the starting endpoint in the
+collation sequence, it is unspecified whether the range of collating
+elements is empty, or this construct is treated as invalid. In locales
+other than the POSIX locale, this construct has unspecified behavior.
+.RS 10
+.P
+If either or both of the range endpoints are octal sequences of the
+form \e\fIoctal\fP, this shall represent the range of specific coded
+values between the two range endpoints, inclusive.
+.RE
+.IP "[:\fIclass\fR:]" 10
+Represents all characters belonging to the defined character class, as
+defined by the current setting of the
+.IR LC_CTYPE
+locale category. The following character class names shall be accepted
+when specified in
+.IR string1 :
+.TS
+tab(@);
+lB lB lB lB lB lB.
+alnum@blank@digit@lower@punct@upper
+alpha@cntrl@graph@print@space@xdigit
+.TE
+.RS 10
+.P
+In addition, character class expressions of the form [:\c
+.IR name :]
+shall be recognized in those locales where the
+.IR name
+keyword has been given a
+.BR charclass
+definition in the
+.IR LC_CTYPE
+category.
+.P
+When both the
+.BR \(mid
+and
+.BR \(mis
+options are specified, any of the character class names shall be
+accepted in
+.IR string2 .
+Otherwise, only character class names
+.BR lower
+or
+.BR upper
+are valid in
+.IR string2
+and then only if the corresponding character class (\c
+.BR upper
+and
+.BR lower ,
+respectively) is specified in the same relative position in
+.IR string1 .
+Such a specification shall be interpreted as a request for case
+conversion. When [:\c
+.IR lower :]
+appears in
+.IR string1
+and [:\c
+.IR upper :]
+appears in
+.IR string2 ,
+the arrays shall contain the characters from the
+.BR toupper
+mapping in the
+.IR LC_CTYPE
+category of the current locale. When [:\c
+.IR upper :]
+appears in
+.IR string1
+and [:\c
+.IR lower :]
+appears in
+.IR string2 ,
+the arrays shall contain the characters from the
+.BR tolower
+mapping in the
+.IR LC_CTYPE
+category of the current locale. The first character from each mapping
+pair shall be in the array for
+.IR string1
+and the second character from each mapping pair shall be in the array
+for
+.IR string2
+in the same relative position.
+.P
+Except for case conversion, the characters specified by a character
+class expression shall be placed in the array in an unspecified order.
+.P
+If the name specified for
+.IR class
+does not define a valid character class in the current locale, the
+behavior is undefined.
+.RE
+.IP "[=\fIequiv\fR=]" 10
+Represents all characters or collating elements belonging to the same
+equivalence class as
+.IR equiv ,
+as defined by the current setting of the
+.IR LC_COLLATE
+locale category. An equivalence class expression shall be allowed only
+in
+.IR string1 ,
+or in
+.IR string2
+when it is being used by the combined
+.BR \(mid
+and
+.BR \(mis
+options. The characters belonging to the equivalence class shall be
+placed in the array in an unspecified order.
+.IP "[\fIx\fR*\fIn\fR]" 10
+Represents
+.IR n
+repeated occurrences of the character
+.IR x .
+Because this expression is used to map multiple characters to one, it
+is only valid when it occurs in
+.IR string2 .
+If
+.IR n
+is omitted or is zero, it shall be interpreted as large enough to
+extend the
+.IR string2 -based
+sequence to the length of the
+.IR string1 -based
+sequence. If
+.IR n
+has a leading zero, it shall be interpreted as an octal value.
+Otherwise, it shall be interpreted as a decimal value.
+.P
+When the
+.BR \(mid
+option is not specified:
+.IP " *" 4
+If
+.IR string2
+is present, each input character found in the array specified by
+.IR string1
+shall be replaced by the character in the same relative position in the
+array specified by
+.IR string2 .
+If the array specified by
+.IR string2
+is shorter that the one specified by
+.IR string1 ,
+or if a character occurs more than once in
+.IR string1 ,
+the results are unspecified.
+.IP " *" 4
+If the
+.BR \(miC
+option is specified, the complements of the characters specified by
+.IR string1
+(the set of all characters in the current character set, as defined by
+the current setting of
+.IR LC_CTYPE ,
+except for those actually specified in the
+.IR string1
+operand) shall be placed in the array in ascending collation sequence,
+as defined by the current setting of
+.IR LC_COLLATE .
+.IP " *" 4
+If the
+.BR \(mic
+option is specified, the complement of the values specified by
+.IR string1
+shall be placed in the array in ascending order by binary value.
+.IP " *" 4
+Because the order in which characters specified by character class
+expressions or equivalence class expressions is undefined, such
+expressions should only be used if the intent is to map several
+characters into one. An exception is case conversion, as described
+previously.
+.P
+When the
+.BR \(mid
+option is specified:
+.IP " *" 4
+Input characters found in the array specified by
+.IR string1
+shall be deleted.
+.IP " *" 4
+When the
+.BR \(miC
+option is specified with
+.BR \(mid ,
+all characters except those specified by
+.IR string1
+shall be deleted. The contents of
+.IR string2
+are ignored, unless the
+.BR \(mis
+option is also specified.
+.IP " *" 4
+When the
+.BR \(mic
+option is specified with
+.BR \(mid ,
+all values except those specified by
+.IR string1
+shall be deleted. The contents of
+.IR string2
+shall be ignored, unless the
+.BR \(mis
+option is also specified.
+.IP " *" 4
+The same string cannot be used for both the
+.BR \(mid
+and the
+.BR \(mis
+option; when both options are specified, both
+.IR string1
+(used for deletion) and
+.IR string2
+(used for squeezing) shall be required.
+.P
+When the
+.BR \(mis
+option is specified, after any deletions or translations have taken
+place, repeated sequences of the same character shall be replaced by
+one occurrence of the same character, if the character is found in the
+array specified by the last operand. If the last operand contains a
+character class, such as the following example:
+.sp
+.RS 4
+.nf
+\fB
+tr \(mis '[:space:]'
+.fi \fR
+.P
+.RE
+.P
+the last operand's array shall contain all of the characters in that
+character class. However, in a case conversion, as described
+previously, such as:
+.sp
+.RS 4
+.nf
+\fB
+tr \(mis '[:upper:]' '[:lower:]'
+.fi \fR
+.P
+.RE
+.P
+the last operand's array shall contain only those characters defined as
+the second characters in each of the
+.BR toupper
+or
+.BR tolower
+character pairs, as appropriate.
+.P
+An empty string used for
+.IR string1
+or
+.IR string2
+produces undefined results.
+.SH "EXIT STATUS"
+The following exit values shall be returned:
+.IP "\00" 6
+All input was processed successfully.
+.IP >0 6
+An error occurred.
+.SH "CONSEQUENCES OF ERRORS"
+Default.
+.LP
+.IR "The following sections are informative."
+.SH "APPLICATION USAGE"
+If necessary,
+.IR string1
+and
+.IR string2
+can be quoted to avoid pattern matching by the shell.
+.P
+If an ordinary digit (representing itself) is to follow an octal
+sequence, the octal sequence must use the full three digits to avoid
+ambiguity.
+.P
+When
+.IR string2
+is shorter than
+.IR string1 ,
+a difference results between historical System\ V and BSD systems. A
+BSD system pads
+.IR string2
+with the last character found in
+.IR string2 .
+Thus, it is possible to do the following:
+.sp
+.RS 4
+.nf
+\fB
+tr 0123456789 d
+.fi \fR
+.P
+.RE
+.P
+which would translate all digits to the letter
+.BR 'd' .
+Since this area is specifically unspecified in this volume of POSIX.1\(hy2008, both the BSD and
+System\ V behaviors are allowed, but a conforming application cannot rely
+on the BSD behavior. It would have to code the example in the
+following way:
+.sp
+.RS 4
+.nf
+\fB
+tr 0123456789 '[d*]'
+.fi \fR
+.P
+.RE
+.P
+It should be noted that, despite similarities in appearance, the string
+operands used by
+.IR tr
+are not regular expressions.
+.P
+Unlike some historical implementations, this definition of the
+.IR tr
+utility correctly processes NUL characters in its input stream. NUL
+characters can be stripped by using:
+.sp
+.RS 4
+.nf
+\fB
+tr \(mid '\e000'
+.fi \fR
+.P
+.RE
+.SH EXAMPLES
+.IP " 1." 4
+The following example creates a list of all words in
+.BR file1
+one per line in
+.BR file2 ,
+where a word is taken to be a maximal string of letters.
+.RS 4
+.sp
+.RS 4
+.nf
+\fB
+tr \(mics "[:alpha:]" "[\en*]" <file1 >file2
+.fi \fR
+.P
+.RE
+.RE
+.IP " 2." 4
+The next example translates all lowercase characters in
+.BR file1
+to uppercase and writes the results to standard output.
+.RS 4
+.sp
+.RS 4
+.nf
+\fB
+tr "[:lower:]" "[:upper:]" <file1
+.fi \fR
+.P
+.RE
+.RE
+.IP " 3." 4
+This example uses an equivalence class to identify accented variants of
+the base character
+.BR 'e'
+in
+.BR file1 ,
+which are stripped of diacritical marks and written to
+.BR file2 .
+.RS 4
+.sp
+.RS 4
+.nf
+\fB
+tr "[=e=]" "[e*]" <file1 >file2
+.fi \fR
+.P
+.RE
+.RE
+.SH RATIONALE
+In some early proposals, an explicit option
+.BR \(min
+was added to disable the historical behavior of stripping NUL
+characters from the input. It was considered that automatically
+stripping NUL characters from the input was not correct functionality.
+However, the removal of
+.BR \(min
+in a later proposal does not remove the requirement that
+.IR tr
+correctly process NUL characters in its input stream. NUL characters
+can be stripped by using
+.IR tr
+.BR \(mid
+\&'\e000'.
+.P
+Historical implementations of
+.IR tr
+differ widely in syntax and behavior. For example, the BSD version has
+not needed the bracket characters for the repetition sequence. The
+.IR tr
+utility syntax is based more closely on the System V and XPG3 model
+while attempting to accommodate historical BSD implementations. In the
+case of the short
+.IR string2
+padding, the decision was to unspecify the behavior and preserve System
+V and XPG3 scripts, which might find difficulty with the BSD method.
+The assumption was made that BSD users of
+.IR tr
+have to make accommodations to meet the syntax defined here. Since it
+is possible to use the repetition sequence to duplicate the desired
+behavior, whereas there is no simple way to achieve the System V
+method, this was the correct, if not desirable, approach.
+.P
+The use of octal values to specify control characters, while having
+historical precedents, is not portable. The introduction of escape
+sequences for control characters should provide the necessary
+portability. It is recognized that this may cause some historical
+scripts to break.
+.P
+An early proposal included support for multi-character collating elements.
+It was pointed out that, while
+.IR tr
+does employ some syntactical elements from REs, the aim of
+.IR tr
+is quite different; ranges, for example, do not have a similar meaning
+(``any of the chars in the range matches'', \fIversus\fP ``translate
+each character in the range to the output counterpart''). As a result,
+the previously included support for multi-character collating elements
+has been removed. What remains are ranges in current collation order
+(to support, for example, accented characters), character classes, and
+equivalence classes.
+.P
+In XPG3 the [:\c
+.IR class :]
+and [=\c
+.IR equiv =]
+conventions are shown with double brackets, as in RE syntax. However,
+.IR tr
+does not implement RE principles; it just borrows part of the syntax.
+Consequently, [:\c
+.IR class :]
+and [=\c
+.IR equiv =]
+should be regarded as syntactical elements on a par with [\c
+.IR x *\c
+.IR n ],
+which is not an RE bracket expression.
+.P
+The standard developers will consider changes to
+.IR tr
+that allow it to translate characters between different character
+encodings, or they will consider providing a new utility to accomplish
+this.
+.P
+On historical System V systems, a range expression requires enclosing
+square-brackets, such as:
+.sp
+.RS 4
+.nf
+\fB
+tr '[a-z]' '[A-Z]'
+.fi \fR
+.P
+.RE
+.P
+However, BSD-based systems did not require the brackets, and this
+convention is used here to avoid breaking large numbers of BSD scripts:
+.sp
+.RS 4
+.nf
+\fB
+tr a-z A-Z
+.fi \fR
+.P
+.RE
+.P
+The preceding System V script will continue to work because the
+brackets, treated as regular characters, are translated to themselves.
+However, any System V script that relied on
+.BR \(dqa\(hyz\(dq
+representing the three characters
+.BR 'a' ,
+.BR '\(mi' ,
+and
+.BR 'z'
+have to be rewritten as
+.BR \(dqaz\(mi\(dq .
+.P
+The ISO\ POSIX\(hy2:\|1993 standard had a
+.BR \(mic
+option that behaved similarly to the
+.BR \(miC
+option, but did not supply functionality equivalent to the
+.BR \(mic
+option specified in POSIX.1\(hy2008. This meant that historical practice of being
+able to specify
+.IR tr
+.BR \(micd \e000\(mi\e177
+(which would delete all bytes with the top bit set) would have no
+effect because, in the C locale, bytes with the values octal 200 to
+octal 377 are not characters.
+.P
+The earlier version also said that octal sequences referred to
+collating elements and could be placed adjacent to each other to
+specify multi-byte characters. However, it was noted that this caused
+ambiguities because
+.IR tr
+would not be able to tell whether adjacent octal sequences were
+intending to specify multi-byte characters or multiple single byte
+characters. POSIX.1\(hy2008 specifies that octal sequences always refer to single
+byte binary values when used to specify an endpoint of a range of
+collating elements.
+.P
+Earlier versions of this standard allowed for implementations with
+bytes other than eight bits, but this has been modified in this
+version.
+.SH "FUTURE DIRECTIONS"
+None.
+.SH "SEE ALSO"
+.IR "\fIsed\fR\^"
+.P
+The Base Definitions volume of POSIX.1\(hy2008,
+.IR "Table 5-1" ", " "Escape Sequences and Associated Actions",
+.IR "Chapter 8" ", " "Environment Variables",
+.IR "Section 12.2" ", " "Utility Syntax Guidelines"
+.SH COPYRIGHT
+Portions of this text are reprinted and reproduced in electronic form
+from IEEE Std 1003.1, 2013 Edition, Standard for Information Technology
+-- Portable Operating System Interface (POSIX), The Open Group Base
+Specifications Issue 7, Copyright (C) 2013 by the Institute of
+Electrical and Electronics Engineers, Inc and The Open Group.
+(This is POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.) In the
+event of any discrepancy between this version and the original IEEE and
+The Open Group Standard, the original IEEE and The Open Group Standard
+is the referee document. The original Standard can be obtained online at
+http://www.unix.org/online.html .
+
+Any typographical or formatting errors that appear
+in this page are most likely
+to have been introduced during the conversion of the source files to
+man page format. To report such errors, see
+https://www.kernel.org/doc/man-pages/reporting_bugs.html .