diff options
author | Bruno Haible <bruno@clisp.org> | 2023-05-21 13:05:29 +0200 |
---|---|---|
committer | Alejandro Colomar <alx@kernel.org> | 2023-05-25 01:25:08 +0200 |
commit | 470b97d59797b040e28103a0ba0f616d95f0ed93 (patch) | |
tree | a6731e214fe44b2c17352f684633fd7c49be7c92 | |
parent | 4ca216bacc7d185c1af3c384ab53cd1ec74830d1 (diff) |
List a fifth condition when iconv(3) may stop.
The wording regarding transliteration is vague, because this man page is not
the right place for going into the details of the transliteration.
Here are the details:
GNU libc and GNU libiconv support transliteration, for example, of "½" to "1/2",
or of "å" to "aa" in a Danish locale. The transliteration maps a multibyte
character of the input encoding to zero or more characters in the output.
There are two kinds of transliteration rules:
- Those that are valid regardless of locale. Typically this means that the
original and the transliterated character have similar glyphs, such as
in the case "½" to "1/2".
In GNU libc, these are collected in the files
glibc/localedata/locales/translit_*.
- Those that are valid in a single locale only. Often such a rule
reflects similar pronounciation of the original and the transliterated
characters. Some locales have script-based transliteration, for example
from the Cyrillic script to the Latin script.
In GNU libc, these are collected in the file
glibc/localedata/locales/<locale>.
In GNU libiconv, transliterations of this kind are not supported.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29913#c4
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217059
Reported-by: Steffen Nurpmeso <steffen@sdaoden.eu>
Reported-by: Reuben Thomas <rrt@sc3d.org>
Signed-off-by: Bruno Haible <bruno@clisp.org>
[ fix semantic newlines ]
Signed-off-by: Alejandro Colomar <alx@kernel.org>
-rw-r--r-- | man3/iconv.3 | 38 |
1 files changed, 37 insertions, 1 deletions
diff --git a/man3/iconv.3 b/man3/iconv.3 index 66f59b8c3..c65f9c393 100644 --- a/man3/iconv.3 +++ b/man3/iconv.3 @@ -71,7 +71,7 @@ If the character encoding of the input is stateful, the function can also convert a sequence of input bytes to an update to the conversion state without producing any output bytes; such input is called a \fIshift sequence\fP. -The conversion can stop for four reasons: +The conversion can stop for five reasons: .IP \[bu] 3 An invalid multibyte sequence is encountered in the input. In this case, @@ -80,6 +80,42 @@ it sets \fIerrno\fP to \fBEILSEQ\fP and returns \fI*inbuf\fP is left pointing to the beginning of the invalid multibyte sequence. .IP \[bu] +A multibyte sequence is encountered that is valid but that +cannot be translated to the character encoding of the output. +This condition depends on the implementation and on the conversion descriptor. +In the GNU C library and GNU libiconv, if +.I cd +was created without the suffix +.B //TRANSLIT +or +.BR //IGNORE , +the conversion is strict: +lossy conversions produce this condition. +If the suffix +.B //TRANSLIT +was specified, +transliteration can avoid this condition in some cases. +In the musl C library, +this condition cannot occur because a conversion to +.B \[aq]*\[aq] +is used as a fallback. +In the FreeBSD, NetBSD, and Solaris implementations of +.BR iconv (), +this condition cannot occur either, +because a conversion to +.B \[aq]?\[aq] +is used as a fallback. +When this condition is met, +.BR iconv () +sets +.I errno +to +.B EILSEQ +and returns +.IR (size_t)\ \-1 . +.I *inbuf +is left pointing to the beginning of the unconvertible multibyte sequence. +.IP \[bu] The input byte sequence has been entirely converted, that is, \fI*inbytesleft\fP has gone down to 0. In this case, |