summaryrefslogtreecommitdiffstats
path: root/man/man3/iconv.3
diff options
context:
space:
mode:
Diffstat (limited to 'man/man3/iconv.3')
-rw-r--r--man/man3/iconv.3233
1 files changed, 233 insertions, 0 deletions
diff --git a/man/man3/iconv.3 b/man/man3/iconv.3
new file mode 100644
index 000000000..12bb9b50b
--- /dev/null
+++ b/man/man3/iconv.3
@@ -0,0 +1,233 @@
+'\" t
+.\" Copyright (c) Bruno Haible <haible@clisp.cons.org>
+.\"
+.\" SPDX-License-Identifier: GPL-2.0-or-later
+.\"
+.\" References consulted:
+.\" GNU glibc-2 source code and manual
+.\" OpenGroup's Single UNIX specification
+.\" http://www.UNIX-systems.org/online.html
+.\"
+.\" 2000-06-30 correction by Yuichi SATO <sato@complex.eng.hokudai.ac.jp>
+.\" 2000-11-15 aeb, fixed prototype
+.\"
+.TH iconv 3 (date) "Linux man-pages (unreleased)"
+.SH NAME
+iconv \- perform character set conversion
+.SH LIBRARY
+Standard C library
+.RI ( libc ", " \-lc )
+.SH SYNOPSIS
+.nf
+.B #include <iconv.h>
+.P
+.BI "size_t iconv(iconv_t " cd ,
+.BI " char **restrict " inbuf ", size_t *restrict " inbytesleft ,
+.BI " char **restrict " outbuf ", size_t *restrict " outbytesleft );
+.fi
+.SH DESCRIPTION
+The
+.BR iconv ()
+function converts a sequence of characters in one character encoding
+to a sequence of characters in another character encoding.
+The
+.I cd
+argument is a conversion descriptor,
+previously created by a call to
+.BR iconv_open (3);
+the conversion descriptor defines the character encodings that
+.BR iconv ()
+uses for the conversion.
+The
+.I inbuf
+argument is the address of a variable that points to
+the first character of the input sequence;
+.I inbytesleft
+indicates the number of bytes in that buffer.
+The
+.I outbuf
+argument is the address of a variable that points to
+the first byte available in the output buffer;
+.I outbytesleft
+indicates the number of bytes available in the output buffer.
+.P
+The main case is when \fIinbuf\fP is not NULL and \fI*inbuf\fP is not NULL.
+In this case, the
+.BR iconv ()
+function converts the multibyte sequence
+starting at \fI*inbuf\fP to a multibyte sequence starting at \fI*outbuf\fP.
+At most \fI*inbytesleft\fP bytes, starting at \fI*inbuf\fP, will be read.
+At most \fI*outbytesleft\fP bytes, starting at \fI*outbuf\fP, will be written.
+.P
+The
+.BR iconv ()
+function converts one multibyte character at a time, and for
+each character conversion it increments \fI*inbuf\fP and decrements
+\fI*inbytesleft\fP by the number of converted input bytes, it increments
+\fI*outbuf\fP and decrements \fI*outbytesleft\fP by the number of converted
+output bytes, and it updates the conversion state contained in \fIcd\fP.
+If the character encoding of the input is stateful, the
+.BR iconv ()
+function can also convert a sequence of input bytes
+to an update to the conversion state without producing any output bytes;
+such input is called a \fIshift sequence\fP.
+The conversion can stop for five reasons:
+.IP \[bu] 3
+An invalid multibyte sequence is encountered in the input.
+In this case,
+it sets \fIerrno\fP to \fBEILSEQ\fP and returns
+.IR (size_t)\ \-1 .
+\fI*inbuf\fP
+is left pointing to the beginning of the invalid multibyte sequence.
+.IP \[bu]
+A multibyte sequence is encountered that is valid but that
+cannot be translated to the character encoding of the output.
+This condition depends on the implementation and on the conversion descriptor.
+In the GNU C library and GNU libiconv, if
+.I cd
+was created without the suffix
+.B //TRANSLIT
+or
+.BR //IGNORE ,
+the conversion is strict:
+lossy conversions produce this condition.
+If the suffix
+.B //TRANSLIT
+was specified,
+transliteration can avoid this condition in some cases.
+In the musl C library,
+this condition cannot occur because a conversion to
+.B \[aq]*\[aq]
+is used as a fallback.
+In the FreeBSD, NetBSD, and Solaris implementations of
+.BR iconv (),
+this condition cannot occur either,
+because a conversion to
+.B \[aq]?\[aq]
+is used as a fallback.
+When this condition is met,
+.BR iconv ()
+sets
+.I errno
+to
+.B EILSEQ
+and returns
+.IR (size_t)\ \-1 .
+.I *inbuf
+is left pointing to the beginning of the unconvertible multibyte sequence.
+.IP \[bu]
+The input byte sequence has been entirely converted,
+that is, \fI*inbytesleft\fP has gone down to 0.
+In this case,
+.BR iconv ()
+returns the number of
+nonreversible conversions performed during this call.
+.IP \[bu]
+An incomplete multibyte sequence is encountered in the input, and the
+input byte sequence terminates after it.
+In this case, it sets \fIerrno\fP to
+\fBEINVAL\fP and returns
+.IR (size_t)\ \-1 .
+\fI*inbuf\fP is left pointing to the
+beginning of the incomplete multibyte sequence.
+.IP \[bu]
+The output buffer has no more room for the next converted character.
+In this case, it sets \fIerrno\fP to \fBE2BIG\fP and returns
+.IR (size_t)\ \-1 .
+.P
+A different case is when \fIinbuf\fP is NULL or \fI*inbuf\fP is NULL, but
+\fIoutbuf\fP is not NULL and \fI*outbuf\fP is not NULL.
+In this case, the
+.BR iconv ()
+function attempts to set \fIcd\fP's conversion state to the
+initial state and store a corresponding shift sequence at \fI*outbuf\fP.
+At most \fI*outbytesleft\fP bytes, starting at \fI*outbuf\fP, will be written.
+If the output buffer has no more room for this reset sequence, it sets
+\fIerrno\fP to \fBE2BIG\fP and returns
+.IR (size_t)\ \-1 .
+Otherwise, it increments
+\fI*outbuf\fP and decrements \fI*outbytesleft\fP by the number of bytes
+written.
+.P
+A third case is when \fIinbuf\fP is NULL or \fI*inbuf\fP is NULL, and
+\fIoutbuf\fP is NULL or \fI*outbuf\fP is NULL.
+In this case, the
+.BR iconv ()
+function sets \fIcd\fP's conversion state to the initial state.
+.SH RETURN VALUE
+The
+.BR iconv ()
+function returns the number of characters converted in a
+nonreversible way during this call; reversible conversions are not counted.
+In case of error,
+.BR iconv ()
+returns
+.I (size_t)\ \-1
+and sets
+.I errno
+to indicate the error.
+.SH ERRORS
+The following errors can occur, among others:
+.TP
+.B E2BIG
+There is not sufficient room at \fI*outbuf\fP.
+.TP
+.B EILSEQ
+An invalid multibyte sequence has been encountered in the input.
+.TP
+.B EINVAL
+An incomplete multibyte sequence has been encountered in the input.
+.SH ATTRIBUTES
+For an explanation of the terms used in this section, see
+.BR attributes (7).
+.TS
+allbox;
+lbx lb lb
+l l l.
+Interface Attribute Value
+T{
+.na
+.nh
+.BR iconv ()
+T} Thread safety MT-Safe race:cd
+.TE
+.P
+The
+.BR iconv ()
+function is MT-Safe, as long as callers arrange for
+mutual exclusion on the
+.I cd
+argument.
+.SH STANDARDS
+POSIX.1-2008.
+.SH HISTORY
+glibc 2.1.
+POSIX.1-2001.
+.SH NOTES
+In each series of calls to
+.BR iconv (),
+the last should be one with \fIinbuf\fP or \fI*inbuf\fP equal to NULL,
+in order to flush out any partially converted input.
+.P
+Although
+.I inbuf
+and
+.I outbuf
+are typed as
+.IR "char\ **" ,
+this does not mean that the objects they point can be interpreted
+as C strings or as arrays of characters:
+the interpretation of character byte sequences is
+handled internally by the conversion functions.
+In some encodings, a zero byte may be a valid part of a multibyte character.
+.P
+The caller of
+.BR iconv ()
+must ensure that the pointers passed to the function are suitable
+for accessing characters in the appropriate character set.
+This includes ensuring correct alignment on platforms that have
+tight restrictions on alignment.
+.SH SEE ALSO
+.BR iconv_close (3),
+.BR iconv_open (3),
+.BR iconvconfig (8)