From 470b97d59797b040e28103a0ba0f616d95f0ed93 Mon Sep 17 00:00:00 2001 From: Bruno Haible Date: Sun, 21 May 2023 13:05:29 +0200 Subject: List a fifth condition when iconv(3) may stop. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The wording regarding transliteration is vague, because this man page is not the right place for going into the details of the transliteration. Here are the details: GNU libc and GNU libiconv support transliteration, for example, of "½" to "1/2", or of "å" to "aa" in a Danish locale. The transliteration maps a multibyte character of the input encoding to zero or more characters in the output. There are two kinds of transliteration rules: - Those that are valid regardless of locale. Typically this means that the original and the transliterated character have similar glyphs, such as in the case "½" to "1/2". In GNU libc, these are collected in the files glibc/localedata/locales/translit_*. - Those that are valid in a single locale only. Often such a rule reflects similar pronounciation of the original and the transliterated characters. Some locales have script-based transliteration, for example from the Cyrillic script to the Latin script. In GNU libc, these are collected in the file glibc/localedata/locales/. In GNU libiconv, transliterations of this kind are not supported. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217059 Reported-by: Steffen Nurpmeso Reported-by: Reuben Thomas Signed-off-by: Bruno Haible [ fix semantic newlines ] Signed-off-by: Alejandro Colomar --- man3/iconv.3 | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/man3/iconv.3 b/man3/iconv.3 index 66f59b8c3..c65f9c393 100644 --- a/man3/iconv.3 +++ b/man3/iconv.3 @@ -71,7 +71,7 @@ If the character encoding of the input is stateful, the function can also convert a sequence of input bytes to an update to the conversion state without producing any output bytes; such input is called a \fIshift sequence\fP. -The conversion can stop for four reasons: +The conversion can stop for five reasons: .IP \[bu] 3 An invalid multibyte sequence is encountered in the input. In this case, @@ -80,6 +80,42 @@ it sets \fIerrno\fP to \fBEILSEQ\fP and returns \fI*inbuf\fP is left pointing to the beginning of the invalid multibyte sequence. .IP \[bu] +A multibyte sequence is encountered that is valid but that +cannot be translated to the character encoding of the output. +This condition depends on the implementation and on the conversion descriptor. +In the GNU C library and GNU libiconv, if +.I cd +was created without the suffix +.B //TRANSLIT +or +.BR //IGNORE , +the conversion is strict: +lossy conversions produce this condition. +If the suffix +.B //TRANSLIT +was specified, +transliteration can avoid this condition in some cases. +In the musl C library, +this condition cannot occur because a conversion to +.B \[aq]*\[aq] +is used as a fallback. +In the FreeBSD, NetBSD, and Solaris implementations of +.BR iconv (), +this condition cannot occur either, +because a conversion to +.B \[aq]?\[aq] +is used as a fallback. +When this condition is met, +.BR iconv () +sets +.I errno +to +.B EILSEQ +and returns +.IR (size_t)\ \-1 . +.I *inbuf +is left pointing to the beginning of the unconvertible multibyte sequence. +.IP \[bu] The input byte sequence has been entirely converted, that is, \fI*inbytesleft\fP has gone down to 0. In this case, -- cgit v1.2.3