summaryrefslogtreecommitdiffstats
path: root/man-pages-posix-2017/man1p/comm.1p
blob: 1e610de6bb8c4df3ea215e9b8e7de5eee25c38fd (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
'\" et
.TH COMM "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
.\"
.SH PROLOG
This manual page is part of the POSIX Programmer's Manual.
The Linux implementation of this interface may differ (consult
the corresponding Linux manual page for details of Linux behavior),
or the interface may not be implemented on Linux.
.\"
.SH NAME
comm
\(em select or reject lines common to two files
.SH SYNOPSIS
.LP
.nf
comm \fB[\fR-123\fB] \fIfile1 file2\fR
.fi
.SH DESCRIPTION
The
.IR comm
utility shall read
.IR file1
and
.IR file2 ,
which should be ordered in the current collating sequence, and produce
three text columns as output: lines only in
.IR file1 ,
lines only in
.IR file2 ,
and lines in both files.
.P
If the lines in both files are not ordered according to the collating
sequence of the current locale, the results are unspecified.
.P
If the collating sequence of the current locale does not have a total
ordering of all characters (see the Base Definitions volume of POSIX.1\(hy2017,
.IR "Section 7.3.2" ", " "LC_COLLATE")
and any lines from the input files collate equally but are not identical,
.IR comm
should treat them as different lines but may treat them as being the
same. If it treats them as different,
.IR comm
should expect them to be ordered according to a further byte-by-byte
comparison using the collating sequence for the POSIX locale and if
they are not ordered in this way, the output of
.IR comm
can identify such lines as being both unique to
.IR file1
and unique to
.IR file2
instead of being in both files.
.SH OPTIONS
The
.IR comm
utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
.IR "Section 12.2" ", " "Utility Syntax Guidelines".
.P
The following options shall be supported:
.IP "\fB\-1\fP" 10
Suppress the output column of lines unique to
.IR file1 .
.IP "\fB\-2\fP" 10
Suppress the output column of lines unique to
.IR file2 .
.IP "\fB\-3\fP" 10
Suppress the output column of lines duplicated in
.IR file1
and
.IR file2 .
.SH OPERANDS
The following operands shall be supported:
.IP "\fIfile1\fR" 10
A pathname of the first file to be compared. If
.IR file1
is
.BR '\-' ,
the standard input shall be used.
.IP "\fIfile2\fR" 10
A pathname of the second file to be compared. If
.IR file2
is
.BR '\-' ,
the standard input shall be used.
.P
If both
.IR file1
and
.IR file2
refer to standard input or to the same FIFO special, block special, or
character special file, the results are undefined.
.SH STDIN
The standard input shall be used only if one of the
.IR file1
or
.IR file2
operands refers to standard input. See the INPUT FILES section.
.SH "INPUT FILES"
The input files shall be text files.
.SH "ENVIRONMENT VARIABLES"
The following environment variables shall affect the execution of
.IR comm :
.IP "\fILANG\fP" 10
Provide a default value for the internationalization variables that are
unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
.IR "Section 8.2" ", " "Internationalization Variables"
for the precedence of internationalization variables used to determine
the values of locale categories.)
.IP "\fILC_ALL\fP" 10
If set to a non-empty string value, override the values of all the
other internationalization variables.
.IP "\fILC_COLLATE\fP" 10
.br
Determine the locale for the collating sequence
.IR comm
expects to have been used when the input files were sorted.
.IP "\fILC_CTYPE\fP" 10
Determine the locale for the interpretation of sequences of bytes of
text data as characters (for example, single-byte as opposed to
multi-byte characters in arguments and input files).
.IP "\fILC_MESSAGES\fP" 10
.br
Determine the locale that should be used to affect the format and
contents of diagnostic messages written to standard error.
.IP "\fINLSPATH\fP" 10
Determine the location of message catalogs for the processing of
.IR LC_MESSAGES .
.SH "ASYNCHRONOUS EVENTS"
Default.
.SH STDOUT
The
.IR comm
utility shall produce output depending on the options selected. If the
.BR \-1 ,
.BR \-2 ,
and
.BR \-3
options are all selected,
.IR comm
shall write nothing to standard output.
.P
If the
.BR \-1
option is not selected, lines contained only in
.IR file1
shall be written using the format:
.sp
.RS 4
.nf

"%s\en", <\fIline in file1\fR>
.fi
.P
.RE
.P
If the
.BR \-2
option is not selected, lines contained only in
.IR file2
are written using the format:
.sp
.RS 4
.nf

"%s%s\en", <\fIlead\fR>, <\fIline in file2\fR>
.fi
.P
.RE
.P
where the string <\fIlead\fP> is as follows:
.IP <tab> 10
The
.BR \-1
option is not selected.
.IP "null\ string" 10
The
.BR \-1
option is selected.
.P
If the
.BR \-3
option is not selected, lines contained in both files shall be written
using the format:
.sp
.RS 4
.nf

"%s%s\en", <\fIlead\fR>, <\fIline in both\fR>
.fi
.P
.RE
.P
where the string <\fIlead\fP> is as follows:
.IP <tab><tab> 10
Neither the
.BR \-1
nor the
.BR \-2
option is selected.
.IP <tab> 10
Exactly one of the
.BR \-1
and
.BR \-2
options is selected.
.IP "null\ string" 10
Both the
.BR \-1
and
.BR \-2
options are selected.
.P
If the input files were ordered according to the collating sequence of
the current locale, the lines written shall be in the collating
sequence of the current locale. If the input files contained any
lines that collated equally but were not identical and within each
file those lines were ordered according to a further byte-by-byte
comparison using the collating sequence for the POSIX locale, and
.IR comm
treated them as different lines, then lines written that collate
equally but are not identical should be ordered according to a further
byte-by-byte comparison using the collating sequence for the POSIX
locale.
.SH STDERR
The standard error shall be used only for diagnostic messages.
.SH "OUTPUT FILES"
None.
.SH "EXTENDED DESCRIPTION"
None.
.SH "EXIT STATUS"
The following exit values shall be returned:
.IP "\00" 6
All input files were successfully output as specified.
.IP >0 6
An error occurred.
.SH "CONSEQUENCES OF ERRORS"
Default.
.LP
.IR "The following sections are informative."
.SH "APPLICATION USAGE"
If the input files are not properly presorted, the output of
.IR comm
might not be useful.
.P
When using
.IR comm
to process pathnames, it is recommended that LC_ALL, or at least
LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
since pathnames can contain byte sequences that do not form valid
characters in some locales, in which case the utility's behavior would
be undefined. In the POSIX locale each byte is a valid single-byte
character, and therefore this problem is avoided.
.P
If the collating sequence of the current locale does not have a total
ordering of all characters, this can affect the behavior of
.IR comm
in the following ways:
.IP " *" 4
If
.IR comm
treats lines as being the same only if they are identical, some lines
can be misleadingly identified as being both unique to
.IR file1
and unique to
.IR file2 .
.IP " *" 4
If
.IR comm
treats lines as being the same if they collate equally and a line from
.IR file1
collates equally with a line from
.IR file2
but is not identical to it, one of the lines is misleadingly
identified as being in both files and the other is not written to the
output at all.
.P
Such problems can be avoided by forcing the use of the POSIX locale;
for example, the following identifies lines in both
.IR file1
and
.IR file2 :
.sp
.RS 4
.nf

LC_ALL=POSIX sort file1 > file1.posix
LC_ALL=POSIX sort file2 > file2.posix
LC_ALL=POSIX comm -12 file1.posix file2.posix | sort
.fi
.P
.RE
.P
The final
.IR sort
re-sorts the output of
.IR comm
according to the collating sequence of the original locale. Doing
this might be difficult if more than one column is output and leading
<blank>s
cannot be ignored.
.SH EXAMPLES
If a file named
.BR xcu
contains a sorted list of the utilities in this volume of POSIX.1\(hy2017, a file named
.BR xpg3
contains a sorted list of the utilities specified in the X/Open
Portability Guide, Issue 3, and a file named
.BR svid89
contains a sorted list of the utilities in the System V Interface
Definition Third Edition:
.sp
.RS 4
.nf

comm -23 xcu xpg3 | comm -23 - svid89
.fi
.P
.RE
.P
would print a list of utilities in this volume of POSIX.1\(hy2017 not specified by either of the
other documents:
.sp
.RS 4
.nf

comm -12 xcu xpg3 | comm -12 - svid89
.fi
.P
.RE
.P
would print a list of utilities specified by all three documents, and:
.sp
.RS 4
.nf

comm -12 xpg3 svid89 | comm -23 - xcu
.fi
.P
.RE
.P
would print a list of utilities specified by both XPG3 and the SVID,
but not specified in this volume of POSIX.1\(hy2017.
.SH RATIONALE
None.
.SH "FUTURE DIRECTIONS"
A future version of this standard may require that if any lines from
the input files collate equally but are not identical, then
.IR comm
treats them as different lines and expects them to be ordered
according to a further byte-by-byte comparison using the collating
sequence for the POSIX locale.
.P
A future version of this standard may require that if the input files
contained any lines that collated equally but were not identical and
within each file those lines were ordered according to a further
byte-by-byte comparison using the collating sequence for the POSIX
locale, then lines written that collate equally but are not identical
are ordered according to a further byte-by-byte comparison using the
collating sequence for the POSIX locale.
.SH "SEE ALSO"
.IR "\fIcmp\fR\^",
.IR "\fIdiff\fR\^",
.IR "\fIsort\fR\^",
.IR "\fIuniq\fR\^"
.P
The Base Definitions volume of POSIX.1\(hy2017,
.IR "Section 7.3.2" ", " "LC_COLLATE",
.IR "Chapter 8" ", " "Environment Variables",
.IR "Section 12.2" ", " "Utility Syntax Guidelines"
.\"
.SH COPYRIGHT
Portions of this text are reprinted and reproduced in electronic form
from IEEE Std 1003.1-2017, Standard for Information Technology
-- Portable Operating System Interface (POSIX), The Open Group Base
Specifications Issue 7, 2018 Edition,
Copyright (C) 2018 by the Institute of
Electrical and Electronics Engineers, Inc and The Open Group.
In the event of any discrepancy between this version and the original IEEE and
The Open Group Standard, the original IEEE and The Open Group Standard
is the referee document. The original Standard can be obtained online at
http://www.opengroup.org/unix/online.html .
.PP
Any typographical or formatting errors that appear
in this page are most likely
to have been introduced during the conversion of the source files to
man page format. To report such errors, see
https://www.kernel.org/doc/man-pages/reporting_bugs.html .