summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBruno Haible <bruno@clisp.org>2023-05-21 13:05:29 +0200
committerAlejandro Colomar <alx@kernel.org>2023-05-25 01:25:08 +0200
commit470b97d59797b040e28103a0ba0f616d95f0ed93 (patch)
treea6731e214fe44b2c17352f684633fd7c49be7c92
parent4ca216bacc7d185c1af3c384ab53cd1ec74830d1 (diff)
List a fifth condition when iconv(3) may stop.
The wording regarding transliteration is vague, because this man page is not the right place for going into the details of the transliteration. Here are the details: GNU libc and GNU libiconv support transliteration, for example, of "½" to "1/2", or of "å" to "aa" in a Danish locale. The transliteration maps a multibyte character of the input encoding to zero or more characters in the output. There are two kinds of transliteration rules: - Those that are valid regardless of locale. Typically this means that the original and the transliterated character have similar glyphs, such as in the case "½" to "1/2". In GNU libc, these are collected in the files glibc/localedata/locales/translit_*. - Those that are valid in a single locale only. Often such a rule reflects similar pronounciation of the original and the transliterated characters. Some locales have script-based transliteration, for example from the Cyrillic script to the Latin script. In GNU libc, these are collected in the file glibc/localedata/locales/<locale>. In GNU libiconv, transliterations of this kind are not supported. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217059 Reported-by: Steffen Nurpmeso <steffen@sdaoden.eu> Reported-by: Reuben Thomas <rrt@sc3d.org> Signed-off-by: Bruno Haible <bruno@clisp.org> [ fix semantic newlines ] Signed-off-by: Alejandro Colomar <alx@kernel.org>
-rw-r--r--man3/iconv.338
1 files changed, 37 insertions, 1 deletions
diff --git a/man3/iconv.3 b/man3/iconv.3
index 66f59b8c3..c65f9c393 100644
--- a/man3/iconv.3
+++ b/man3/iconv.3
@@ -71,7 +71,7 @@ If the character encoding of the input is stateful, the
function can also convert a sequence of input bytes
to an update to the conversion state without producing any output bytes;
such input is called a \fIshift sequence\fP.
-The conversion can stop for four reasons:
+The conversion can stop for five reasons:
.IP \[bu] 3
An invalid multibyte sequence is encountered in the input.
In this case,
@@ -80,6 +80,42 @@ it sets \fIerrno\fP to \fBEILSEQ\fP and returns
\fI*inbuf\fP
is left pointing to the beginning of the invalid multibyte sequence.
.IP \[bu]
+A multibyte sequence is encountered that is valid but that
+cannot be translated to the character encoding of the output.
+This condition depends on the implementation and on the conversion descriptor.
+In the GNU C library and GNU libiconv, if
+.I cd
+was created without the suffix
+.B //TRANSLIT
+or
+.BR //IGNORE ,
+the conversion is strict:
+lossy conversions produce this condition.
+If the suffix
+.B //TRANSLIT
+was specified,
+transliteration can avoid this condition in some cases.
+In the musl C library,
+this condition cannot occur because a conversion to
+.B \[aq]*\[aq]
+is used as a fallback.
+In the FreeBSD, NetBSD, and Solaris implementations of
+.BR iconv (),
+this condition cannot occur either,
+because a conversion to
+.B \[aq]?\[aq]
+is used as a fallback.
+When this condition is met,
+.BR iconv ()
+sets
+.I errno
+to
+.B EILSEQ
+and returns
+.IR (size_t)\ \-1 .
+.I *inbuf
+is left pointing to the beginning of the unconvertible multibyte sequence.
+.IP \[bu]
The input byte sequence has been entirely converted,
that is, \fI*inbytesleft\fP has gone down to 0.
In this case,