summaryrefslogtreecommitdiffstats
path: root/man-pages-posix-2017/man1p/join.1p
diff options
context:
space:
mode:
Diffstat (limited to 'man-pages-posix-2017/man1p/join.1p')
-rw-r--r--man-pages-posix-2017/man1p/join.1p532
1 files changed, 532 insertions, 0 deletions
diff --git a/man-pages-posix-2017/man1p/join.1p b/man-pages-posix-2017/man1p/join.1p
new file mode 100644
index 0000000..fd2fb0f
--- /dev/null
+++ b/man-pages-posix-2017/man1p/join.1p
@@ -0,0 +1,532 @@
+'\" et
+.TH JOIN "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
+.\"
+.SH PROLOG
+This manual page is part of the POSIX Programmer's Manual.
+The Linux implementation of this interface may differ (consult
+the corresponding Linux manual page for details of Linux behavior),
+or the interface may not be implemented on Linux.
+.\"
+.SH NAME
+join
+\(em relational database operator
+.SH SYNOPSIS
+.LP
+.nf
+join \fB[\fR-a \fIfile_number\fR|-v \fIfile_number\fB] [\fR-e \fIstring\fB] [\fR-o \fIlist\fB] [\fR-t \fIchar\fB]
+ [\fR-1 \fIfield\fB] [\fR-2 \fIfield\fB]\fI file1 file2\fR
+.fi
+.SH DESCRIPTION
+The
+.IR join
+utility shall perform an equality join on the files
+.IR file1
+and
+.IR file2 .
+The joined files shall be written to the standard output.
+.P
+The join field is a field in each file on which the files are
+compared. The
+.IR join
+utility shall write one line in the output for each pair of lines in
+.IR file1
+and
+.IR file2
+that have join fields that collate equally. The output line by default
+shall consist of the join field, then the remaining fields from
+.IR file1 ,
+then the remaining fields from
+.IR file2 .
+This format can be changed by using the
+.BR \-o
+option (see below). The
+.BR \-a
+option can be used to add unmatched lines to the output. The
+.BR \-v
+option can be used to output only unmatched lines.
+.P
+The files
+.IR file1
+and
+.IR file2
+shall be ordered in the collating sequence of
+.IR sort
+.BR \-b
+on the fields on which they shall be joined, by default the first in
+each line. All selected output shall be written in the same collating
+sequence.
+.P
+The default input field separators shall be
+<blank>
+characters. In this case, multiple separators shall count as one field
+separator, and leading separators shall be ignored. The default output
+field separator shall be a
+<space>.
+.P
+The field separator and collating sequence can be changed by using the
+.BR \-t
+option (see below).
+.P
+If the same key appears more than once in either file, all combinations
+of the set of remaining fields in
+.IR file1
+and the set of remaining fields in
+.IR file2
+are output in the order of the lines encountered.
+.P
+If the input files are not in the appropriate collating sequence, the
+results are unspecified.
+.SH OPTIONS
+The
+.IR join
+utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 12.2" ", " "Utility Syntax Guidelines".
+.P
+The following options shall be supported:
+.IP "\fB\-a\ \fIfile_number\fR" 10
+.br
+Produce a line for each unpairable line in file
+.IR file_number ,
+where
+.IR file_number
+is 1 or 2, in addition to the default output. If both
+.BR \-a 1
+and
+.BR \-a 2
+are specified, all unpairable lines shall be output.
+.IP "\fB\-e\ \fIstring\fR" 10
+Replace empty output fields in the list selected by
+.BR \-o
+with the string
+.IR string .
+.IP "\fB\-o\ \fIlist\fR" 10
+Construct the output line to comprise the fields specified in
+.IR list ,
+each element of which shall have one of the following two forms:
+.RS 10
+.IP " 1." 4
+\fIfile_number.field\fR, where
+.IR file_number
+is a file number and
+.IR field
+is a decimal integer field number
+.IP " 2." 4
+0 (zero), representing the join field
+.P
+The elements of
+.IR list
+shall be either
+<comma>-separated
+or
+<blank>-separated,
+as specified in Guideline 8 of the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 12.2" ", " "Utility Syntax Guidelines".
+The fields specified by
+.IR list
+shall be written for all selected output lines. Fields selected by
+.IR list
+that do not appear in the input shall be treated as empty output
+fields. (See the
+.BR \-e
+option.) Only specifically requested fields shall be written. The
+application shall ensure that
+.IR list
+is a single command line argument.
+.RE
+.IP "\fB\-t\ \fIchar\fR" 10
+Use character
+.IR char
+as a separator, for both input and output. Every appearance of
+.IR char
+in a line shall be significant. When this option is specified, the
+collating sequence shall be the same as
+.IR sort
+without the
+.BR \-b
+option.
+.IP "\fB\-v\ \fIfile_number\fR" 10
+.br
+Instead of the default output, produce a line only for each unpairable
+line in
+.IR file_number ,
+where
+.IR file_number
+is 1 or 2. If both
+.BR \-v 1
+and
+.BR \-v 2
+are specified, all unpairable lines shall be output.
+.IP "\fB\-1\ \fIfield\fR" 10
+Join on the
+.IR field th
+field of file 1. Fields are decimal integers starting with 1.
+.IP "\fB\-2\ \fIfield\fR" 10
+Join on the
+.IR field th
+field of file 2. Fields are decimal integers starting with 1.
+.SH OPERANDS
+The following operands shall be supported:
+.IP "\fIfile1\fR,\ \fIfile2\fR" 10
+A pathname of a file to be joined. If either of the
+.IR file1
+or
+.IR file2
+operands is
+.BR '\-' ,
+the standard input shall be used in its place.
+.SH STDIN
+The standard input shall be used only if the
+.IR file1
+or
+.IR file2
+operand is
+.BR '\-' .
+See the INPUT FILES section.
+.SH "INPUT FILES"
+The input files shall be text files.
+.SH "ENVIRONMENT VARIABLES"
+The following environment variables shall affect the execution of
+.IR join :
+.IP "\fILANG\fP" 10
+Provide a default value for the internationalization variables that are
+unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 8.2" ", " "Internationalization Variables"
+for the precedence of internationalization variables used to determine
+the values of locale categories.)
+.IP "\fILC_ALL\fP" 10
+If set to a non-empty string value, override the values of all the
+other internationalization variables.
+.IP "\fILC_COLLATE\fP" 10
+.br
+Determine the locale of the collating sequence
+.IR join
+expects to have been used when the input files were sorted.
+.IP "\fILC_CTYPE\fP" 10
+Determine the locale for the interpretation of sequences of bytes of
+text data as characters (for example, single-byte as opposed to
+multi-byte characters in arguments and input files).
+.IP "\fILC_MESSAGES\fP" 10
+.br
+Determine the locale that should be used to affect the format and
+contents of diagnostic messages written to standard error.
+.IP "\fINLSPATH\fP" 10
+Determine the location of message catalogs for the processing of
+.IR LC_MESSAGES .
+.SH "ASYNCHRONOUS EVENTS"
+Default.
+.SH STDOUT
+The
+.IR join
+utility output shall be a concatenation of selected character fields.
+When the
+.BR \-o
+option is not specified, the output shall be:
+.sp
+.RS 4
+.nf
+
+"%s%s%s\en", <\fIjoin field\fR>, <\fIother file1 fields\fR>,
+ <\fIother file2 fields\fR>
+.fi
+.P
+.RE
+.P
+If the join field is not the first field in a file, the
+<\fIother\ file\ fields\fP> for that file shall be:
+.sp
+.RS 4
+.nf
+
+<\fIfields preceding join field\fR>, <\fIfields following join field\fR>
+.fi
+.P
+.RE
+.P
+When the
+.BR \-o
+option is specified, the output format shall be:
+.sp
+.RS 4
+.nf
+
+"%s\en", <\fIconcatenation of fields\fR>
+.fi
+.P
+.RE
+.P
+where the concatenation of fields is described by the
+.BR \-o
+option, above.
+.P
+For either format, each field (except the last) shall be written with
+its trailing separator character. If the separator is the default (\c
+<blank>
+characters), a single
+<space>
+shall be written after each field (except the last).
+.SH STDERR
+The standard error shall be used only for diagnostic messages.
+.SH "OUTPUT FILES"
+None.
+.SH "EXTENDED DESCRIPTION"
+None.
+.SH "EXIT STATUS"
+The following exit values shall be returned:
+.IP "\00" 6
+All input files were output successfully.
+.IP >0 6
+An error occurred.
+.SH "CONSEQUENCES OF ERRORS"
+Default.
+.LP
+.IR "The following sections are informative."
+.SH "APPLICATION USAGE"
+Pathnames consisting of numeric digits or of the form
+.IR string.string
+should not be specified directly following the
+.BR \-o
+list.
+.P
+If the collating sequence of the current locale does not have a total
+ordering of all characters (see the Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 7.3.2" ", " "LC_COLLATE"),
+.IR join
+treats fields that collate equally but are not identical as being the
+same. If this behavior is not desired, it can be avoided by forcing
+the use of the POSIX locale (although this means re-sorting the input
+files into the POSIX locale collating sequence.)
+.P
+When using
+.IR join
+to process pathnames, it is recommended that LC_ALL, or at least
+LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
+since pathnames can contain byte sequences that do not form valid
+characters in some locales, in which case the utility's behavior would
+be undefined. In the POSIX locale each byte is a valid single-byte
+character, and therefore this problem is avoided.
+.SH EXAMPLES
+The
+.BR \-o
+0 field essentially selects the union of the join fields. For example,
+given file
+.BR phone :
+.sp
+.RS 4
+.nf
+
+!Name Phone Number
+Don +1 123-456-7890
+Hal +1 234-567-8901
+Yasushi +2 345-678-9012
+.fi
+.P
+.RE
+.P
+and file
+.BR fax :
+.sp
+.RS 4
+.nf
+
+!Name Fax Number
+Don +1 123-456-7899
+Keith +1 456-789-0122
+Yasushi +2 345-678-9011
+.fi
+.P
+.RE
+.P
+(where the large expanses of white space are meant to each represent a
+single
+<tab>),
+the command:
+.sp
+.RS 4
+.nf
+
+join -t "<tab>" -a 1 -a 2 -e \(aq(unknown)\(aq -o 0,1.2,2.2 phone fax
+.fi
+.P
+.RE
+.P
+(where
+.IR <tab>
+is a literal
+<tab>
+character) would produce:
+.sp
+.RS 4
+.nf
+
+!Name Phone Number Fax Number
+Don +1 123-456-7890 +1 123-456-7899
+Hal +1 234-567-8901 (unknown)
+Keith (unknown) +1 456-789-0122
+Yasushi +2 345-678-9012 +2 345-678-9011
+.fi
+.P
+.RE
+.P
+Multiple instances of the same key will produce combinatorial results.
+The following:
+.sp
+.RS 4
+.nf
+
+fa:
+ a x
+ a y
+ a z
+fb:
+ a p
+.fi
+.P
+.RE
+.P
+will produce:
+.sp
+.RS 4
+.nf
+
+a x p
+a y p
+a z p
+.fi
+.P
+.RE
+.P
+And the following:
+.sp
+.RS 4
+.nf
+
+fa:
+ a b c
+ a d e
+fb:
+ a w x
+ a y z
+ a o p
+.fi
+.P
+.RE
+.P
+will produce:
+.sp
+.RS 4
+.nf
+
+a b c w x
+a b c y z
+a b c o p
+a d e w x
+a d e y z
+a d e o p
+.fi
+.P
+.RE
+.SH RATIONALE
+The
+.BR \-e
+option is only effective when used with
+.BR \-o
+because, unless specific fields are identified using
+.BR \-o ,
+.IR join
+is not aware of what fields might be empty. The exception to this is
+the join field, but identifying an empty join field with the
+.BR \-e
+string is not historical practice and some scripts might break if this
+were changed.
+.P
+The 0 field in the
+.BR \-o
+list was adopted from the Tenth Edition version of
+.IR join
+to satisfy international objections that the
+.IR join
+in the base documents for IEEE\ Std 1003.2\(hy1992 did not support the ``full join''
+or ``outer join'' described in relational database literature.
+Although it has been possible to include a join field in the
+output (by default, or by field number using
+.BR \-o ),
+the join field could not be included for an unpaired line selected by
+.BR \-a .
+The
+.BR \-o
+0 field essentially selects the union of the join fields.
+.P
+This sort of outer join was not possible with the
+.IR join
+commands in the base documents for IEEE\ Std 1003.2\(hy1992. The
+.BR \-o
+0 field was chosen because it is an upwards-compatible change for
+applications. An alternative was considered: have the join field
+represent the union of the fields in the files (where they are
+identical for matched lines, and one or both are null for unmatched
+lines). This was not adopted because it would break some historical
+applications.
+.P
+The ability to specify
+.IR file2
+as
+.BR \-
+is not historical practice; it was added for completeness.
+.P
+The
+.BR \-v
+option is not historical practice, but was considered necessary because
+it permitted the writing of
+.IR only
+those lines that do not match on the join field, as opposed to the
+.BR \-a
+option, which prints both lines that do and do not match. This
+additional facility is parallel with the
+.BR \-v
+option of
+.IR grep .
+.P
+Some historical implementations have been encountered where a blank
+line in one of the input files was considered to be the end of the
+file; the description in this volume of POSIX.1\(hy2017 does not cite this as an allowable case.
+.P
+Earlier versions of this standard allowed
+.BR \-j ,
+.BR \-j1 ,
+.BR \-j2
+options, and a form of the
+.BR \-o
+option that allowed the
+.IR list
+option-argument to be multiple arguments. These forms are no longer
+specified by POSIX.1\(hy2008 but may be present in some implementations.
+.SH "FUTURE DIRECTIONS"
+None.
+.SH "SEE ALSO"
+.IR "\fIawk\fR\^",
+.IR "\fIcomm\fR\^",
+.IR "\fIsort\fR\^",
+.IR "\fIuniq\fR\^"
+.P
+The Base Definitions volume of POSIX.1\(hy2017,
+.IR "Section 7.3.2" ", " "LC_COLLATE",
+.IR "Chapter 8" ", " "Environment Variables",
+.IR "Section 12.2" ", " "Utility Syntax Guidelines"
+.\"
+.SH COPYRIGHT
+Portions of this text are reprinted and reproduced in electronic form
+from IEEE Std 1003.1-2017, Standard for Information Technology
+-- Portable Operating System Interface (POSIX), The Open Group Base
+Specifications Issue 7, 2018 Edition,
+Copyright (C) 2018 by the Institute of
+Electrical and Electronics Engineers, Inc and The Open Group.
+In the event of any discrepancy between this version and the original IEEE and
+The Open Group Standard, the original IEEE and The Open Group Standard
+is the referee document. The original Standard can be obtained online at
+http://www.opengroup.org/unix/online.html .
+.PP
+Any typographical or formatting errors that appear
+in this page are most likely
+to have been introduced during the conversion of the source files to
+man page format. To report such errors, see
+https://www.kernel.org/doc/man-pages/reporting_bugs.html .