diff options
Diffstat (limited to 'man-pages-posix-2017/man1p/join.1p')
-rw-r--r-- | man-pages-posix-2017/man1p/join.1p | 532 |
1 files changed, 532 insertions, 0 deletions
diff --git a/man-pages-posix-2017/man1p/join.1p b/man-pages-posix-2017/man1p/join.1p new file mode 100644 index 0000000..fd2fb0f --- /dev/null +++ b/man-pages-posix-2017/man1p/join.1p @@ -0,0 +1,532 @@ +'\" et +.TH JOIN "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual" +.\" +.SH PROLOG +This manual page is part of the POSIX Programmer's Manual. +The Linux implementation of this interface may differ (consult +the corresponding Linux manual page for details of Linux behavior), +or the interface may not be implemented on Linux. +.\" +.SH NAME +join +\(em relational database operator +.SH SYNOPSIS +.LP +.nf +join \fB[\fR-a \fIfile_number\fR|-v \fIfile_number\fB] [\fR-e \fIstring\fB] [\fR-o \fIlist\fB] [\fR-t \fIchar\fB] + [\fR-1 \fIfield\fB] [\fR-2 \fIfield\fB]\fI file1 file2\fR +.fi +.SH DESCRIPTION +The +.IR join +utility shall perform an equality join on the files +.IR file1 +and +.IR file2 . +The joined files shall be written to the standard output. +.P +The join field is a field in each file on which the files are +compared. The +.IR join +utility shall write one line in the output for each pair of lines in +.IR file1 +and +.IR file2 +that have join fields that collate equally. The output line by default +shall consist of the join field, then the remaining fields from +.IR file1 , +then the remaining fields from +.IR file2 . +This format can be changed by using the +.BR \-o +option (see below). The +.BR \-a +option can be used to add unmatched lines to the output. The +.BR \-v +option can be used to output only unmatched lines. +.P +The files +.IR file1 +and +.IR file2 +shall be ordered in the collating sequence of +.IR sort +.BR \-b +on the fields on which they shall be joined, by default the first in +each line. All selected output shall be written in the same collating +sequence. +.P +The default input field separators shall be +<blank> +characters. In this case, multiple separators shall count as one field +separator, and leading separators shall be ignored. The default output +field separator shall be a +<space>. +.P +The field separator and collating sequence can be changed by using the +.BR \-t +option (see below). +.P +If the same key appears more than once in either file, all combinations +of the set of remaining fields in +.IR file1 +and the set of remaining fields in +.IR file2 +are output in the order of the lines encountered. +.P +If the input files are not in the appropriate collating sequence, the +results are unspecified. +.SH OPTIONS +The +.IR join +utility shall conform to the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 12.2" ", " "Utility Syntax Guidelines". +.P +The following options shall be supported: +.IP "\fB\-a\ \fIfile_number\fR" 10 +.br +Produce a line for each unpairable line in file +.IR file_number , +where +.IR file_number +is 1 or 2, in addition to the default output. If both +.BR \-a 1 +and +.BR \-a 2 +are specified, all unpairable lines shall be output. +.IP "\fB\-e\ \fIstring\fR" 10 +Replace empty output fields in the list selected by +.BR \-o +with the string +.IR string . +.IP "\fB\-o\ \fIlist\fR" 10 +Construct the output line to comprise the fields specified in +.IR list , +each element of which shall have one of the following two forms: +.RS 10 +.IP " 1." 4 +\fIfile_number.field\fR, where +.IR file_number +is a file number and +.IR field +is a decimal integer field number +.IP " 2." 4 +0 (zero), representing the join field +.P +The elements of +.IR list +shall be either +<comma>-separated +or +<blank>-separated, +as specified in Guideline 8 of the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 12.2" ", " "Utility Syntax Guidelines". +The fields specified by +.IR list +shall be written for all selected output lines. Fields selected by +.IR list +that do not appear in the input shall be treated as empty output +fields. (See the +.BR \-e +option.) Only specifically requested fields shall be written. The +application shall ensure that +.IR list +is a single command line argument. +.RE +.IP "\fB\-t\ \fIchar\fR" 10 +Use character +.IR char +as a separator, for both input and output. Every appearance of +.IR char +in a line shall be significant. When this option is specified, the +collating sequence shall be the same as +.IR sort +without the +.BR \-b +option. +.IP "\fB\-v\ \fIfile_number\fR" 10 +.br +Instead of the default output, produce a line only for each unpairable +line in +.IR file_number , +where +.IR file_number +is 1 or 2. If both +.BR \-v 1 +and +.BR \-v 2 +are specified, all unpairable lines shall be output. +.IP "\fB\-1\ \fIfield\fR" 10 +Join on the +.IR field th +field of file 1. Fields are decimal integers starting with 1. +.IP "\fB\-2\ \fIfield\fR" 10 +Join on the +.IR field th +field of file 2. Fields are decimal integers starting with 1. +.SH OPERANDS +The following operands shall be supported: +.IP "\fIfile1\fR,\ \fIfile2\fR" 10 +A pathname of a file to be joined. If either of the +.IR file1 +or +.IR file2 +operands is +.BR '\-' , +the standard input shall be used in its place. +.SH STDIN +The standard input shall be used only if the +.IR file1 +or +.IR file2 +operand is +.BR '\-' . +See the INPUT FILES section. +.SH "INPUT FILES" +The input files shall be text files. +.SH "ENVIRONMENT VARIABLES" +The following environment variables shall affect the execution of +.IR join : +.IP "\fILANG\fP" 10 +Provide a default value for the internationalization variables that are +unset or null. (See the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 8.2" ", " "Internationalization Variables" +for the precedence of internationalization variables used to determine +the values of locale categories.) +.IP "\fILC_ALL\fP" 10 +If set to a non-empty string value, override the values of all the +other internationalization variables. +.IP "\fILC_COLLATE\fP" 10 +.br +Determine the locale of the collating sequence +.IR join +expects to have been used when the input files were sorted. +.IP "\fILC_CTYPE\fP" 10 +Determine the locale for the interpretation of sequences of bytes of +text data as characters (for example, single-byte as opposed to +multi-byte characters in arguments and input files). +.IP "\fILC_MESSAGES\fP" 10 +.br +Determine the locale that should be used to affect the format and +contents of diagnostic messages written to standard error. +.IP "\fINLSPATH\fP" 10 +Determine the location of message catalogs for the processing of +.IR LC_MESSAGES . +.SH "ASYNCHRONOUS EVENTS" +Default. +.SH STDOUT +The +.IR join +utility output shall be a concatenation of selected character fields. +When the +.BR \-o +option is not specified, the output shall be: +.sp +.RS 4 +.nf + +"%s%s%s\en", <\fIjoin field\fR>, <\fIother file1 fields\fR>, + <\fIother file2 fields\fR> +.fi +.P +.RE +.P +If the join field is not the first field in a file, the +<\fIother\ file\ fields\fP> for that file shall be: +.sp +.RS 4 +.nf + +<\fIfields preceding join field\fR>, <\fIfields following join field\fR> +.fi +.P +.RE +.P +When the +.BR \-o +option is specified, the output format shall be: +.sp +.RS 4 +.nf + +"%s\en", <\fIconcatenation of fields\fR> +.fi +.P +.RE +.P +where the concatenation of fields is described by the +.BR \-o +option, above. +.P +For either format, each field (except the last) shall be written with +its trailing separator character. If the separator is the default (\c +<blank> +characters), a single +<space> +shall be written after each field (except the last). +.SH STDERR +The standard error shall be used only for diagnostic messages. +.SH "OUTPUT FILES" +None. +.SH "EXTENDED DESCRIPTION" +None. +.SH "EXIT STATUS" +The following exit values shall be returned: +.IP "\00" 6 +All input files were output successfully. +.IP >0 6 +An error occurred. +.SH "CONSEQUENCES OF ERRORS" +Default. +.LP +.IR "The following sections are informative." +.SH "APPLICATION USAGE" +Pathnames consisting of numeric digits or of the form +.IR string.string +should not be specified directly following the +.BR \-o +list. +.P +If the collating sequence of the current locale does not have a total +ordering of all characters (see the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 7.3.2" ", " "LC_COLLATE"), +.IR join +treats fields that collate equally but are not identical as being the +same. If this behavior is not desired, it can be avoided by forcing +the use of the POSIX locale (although this means re-sorting the input +files into the POSIX locale collating sequence.) +.P +When using +.IR join +to process pathnames, it is recommended that LC_ALL, or at least +LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment, +since pathnames can contain byte sequences that do not form valid +characters in some locales, in which case the utility's behavior would +be undefined. In the POSIX locale each byte is a valid single-byte +character, and therefore this problem is avoided. +.SH EXAMPLES +The +.BR \-o +0 field essentially selects the union of the join fields. For example, +given file +.BR phone : +.sp +.RS 4 +.nf + +!Name Phone Number +Don +1 123-456-7890 +Hal +1 234-567-8901 +Yasushi +2 345-678-9012 +.fi +.P +.RE +.P +and file +.BR fax : +.sp +.RS 4 +.nf + +!Name Fax Number +Don +1 123-456-7899 +Keith +1 456-789-0122 +Yasushi +2 345-678-9011 +.fi +.P +.RE +.P +(where the large expanses of white space are meant to each represent a +single +<tab>), +the command: +.sp +.RS 4 +.nf + +join -t "<tab>" -a 1 -a 2 -e \(aq(unknown)\(aq -o 0,1.2,2.2 phone fax +.fi +.P +.RE +.P +(where +.IR <tab> +is a literal +<tab> +character) would produce: +.sp +.RS 4 +.nf + +!Name Phone Number Fax Number +Don +1 123-456-7890 +1 123-456-7899 +Hal +1 234-567-8901 (unknown) +Keith (unknown) +1 456-789-0122 +Yasushi +2 345-678-9012 +2 345-678-9011 +.fi +.P +.RE +.P +Multiple instances of the same key will produce combinatorial results. +The following: +.sp +.RS 4 +.nf + +fa: + a x + a y + a z +fb: + a p +.fi +.P +.RE +.P +will produce: +.sp +.RS 4 +.nf + +a x p +a y p +a z p +.fi +.P +.RE +.P +And the following: +.sp +.RS 4 +.nf + +fa: + a b c + a d e +fb: + a w x + a y z + a o p +.fi +.P +.RE +.P +will produce: +.sp +.RS 4 +.nf + +a b c w x +a b c y z +a b c o p +a d e w x +a d e y z +a d e o p +.fi +.P +.RE +.SH RATIONALE +The +.BR \-e +option is only effective when used with +.BR \-o +because, unless specific fields are identified using +.BR \-o , +.IR join +is not aware of what fields might be empty. The exception to this is +the join field, but identifying an empty join field with the +.BR \-e +string is not historical practice and some scripts might break if this +were changed. +.P +The 0 field in the +.BR \-o +list was adopted from the Tenth Edition version of +.IR join +to satisfy international objections that the +.IR join +in the base documents for IEEE\ Std 1003.2\(hy1992 did not support the ``full join'' +or ``outer join'' described in relational database literature. +Although it has been possible to include a join field in the +output (by default, or by field number using +.BR \-o ), +the join field could not be included for an unpaired line selected by +.BR \-a . +The +.BR \-o +0 field essentially selects the union of the join fields. +.P +This sort of outer join was not possible with the +.IR join +commands in the base documents for IEEE\ Std 1003.2\(hy1992. The +.BR \-o +0 field was chosen because it is an upwards-compatible change for +applications. An alternative was considered: have the join field +represent the union of the fields in the files (where they are +identical for matched lines, and one or both are null for unmatched +lines). This was not adopted because it would break some historical +applications. +.P +The ability to specify +.IR file2 +as +.BR \- +is not historical practice; it was added for completeness. +.P +The +.BR \-v +option is not historical practice, but was considered necessary because +it permitted the writing of +.IR only +those lines that do not match on the join field, as opposed to the +.BR \-a +option, which prints both lines that do and do not match. This +additional facility is parallel with the +.BR \-v +option of +.IR grep . +.P +Some historical implementations have been encountered where a blank +line in one of the input files was considered to be the end of the +file; the description in this volume of POSIX.1\(hy2017 does not cite this as an allowable case. +.P +Earlier versions of this standard allowed +.BR \-j , +.BR \-j1 , +.BR \-j2 +options, and a form of the +.BR \-o +option that allowed the +.IR list +option-argument to be multiple arguments. These forms are no longer +specified by POSIX.1\(hy2008 but may be present in some implementations. +.SH "FUTURE DIRECTIONS" +None. +.SH "SEE ALSO" +.IR "\fIawk\fR\^", +.IR "\fIcomm\fR\^", +.IR "\fIsort\fR\^", +.IR "\fIuniq\fR\^" +.P +The Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 7.3.2" ", " "LC_COLLATE", +.IR "Chapter 8" ", " "Environment Variables", +.IR "Section 12.2" ", " "Utility Syntax Guidelines" +.\" +.SH COPYRIGHT +Portions of this text are reprinted and reproduced in electronic form +from IEEE Std 1003.1-2017, Standard for Information Technology +-- Portable Operating System Interface (POSIX), The Open Group Base +Specifications Issue 7, 2018 Edition, +Copyright (C) 2018 by the Institute of +Electrical and Electronics Engineers, Inc and The Open Group. +In the event of any discrepancy between this version and the original IEEE and +The Open Group Standard, the original IEEE and The Open Group Standard +is the referee document. The original Standard can be obtained online at +http://www.opengroup.org/unix/online.html . +.PP +Any typographical or formatting errors that appear +in this page are most likely +to have been introduced during the conversion of the source files to +man page format. To report such errors, see +https://www.kernel.org/doc/man-pages/reporting_bugs.html . |