This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: std::codecvt::out() returns a no-op in marginal situations


Paolo, FYI, library issue deals with (some of) these details:
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#382

Howard, Bill, and I discussed how to fix it a couple of years
ago but we couldn't agree on an optimal solution. IIRC, Bill
wanted to make sure whatever text was added would accommodate
using UTF-16 as an internal encoding which seemed to complicate
things (AFAIK, no implementation other than Bill's makes it
possible). It would be nice if we could find the time to revisit
the issue before the end of the year to see if can nail it down
this time.

Martin

Martin Sebor wrote:
Sam Varshavchik wrote:
[...]
 > Anyone know if there exists
some encoding where a multibyte sequence produces more than one wchar_t?

Assuming by multibyte sequence you mean a sequence that forms a single character (plus any shift sequences) I believe the only such encoding is UTF-16. Other than that, I haven't encountered such an encoding on any of the systems we deal with (most modern Unices).

Martin

I want to see what happens when you feed it to inp(), but give it a one element wchar_t outbuf buffer. Reading iconv(3) closely:

4. The output buffer has no more room for the next converted character.
In this case it sets errno to E2BIG and returns (size_t)(-1).


A brief experiment:

    iconv_t i=iconv_open("UTF-8", "WCHAR_T");
    wchar_t iarr[]={0x00e8,0x00e8};
    char carr[3];

    char *inbuf=(char *)&iarr[0];
    char *outbuf=carr;

    size_t inbytesleft=sizeof(iarr);
    size_t outbytesleft=sizeof(carr);

size_t n=iconv(i, &inbuf, &inbytesleft, &outbuf, &outbytesleft);

    printf("Return code %d, errno=%s\n", (int)n, strerror(errno));
    printf("inbytesleft=%d, outbytesleft=%d\n",
           (int)inbytesleft, (int)outbytesleft);

Result:

Return code -1, errno=Argument list too long
inbytesleft=4, outbytesleft=1

iconv() converted the first wchar_t, but then stopped and terminated with room in both input and outbuf buffers. Since iconv(), via mbr..(), is presumably used for inp() as well, I should expect the same behavior for encodings where multibyte sequences can produce more than one wchar_t, if such encoding exists. I'm wondering if the current logic in basic_filebuf handles this situation -- haven't yet deciphered it.






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]