This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Paolo, FYI, library issue deals with (some of) these details: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#382
Howard, Bill, and I discussed how to fix it a couple of years ago but we couldn't agree on an optimal solution. IIRC, Bill wanted to make sure whatever text was added would accommodate using UTF-16 as an internal encoding which seemed to complicate things (AFAIK, no implementation other than Bill's makes it possible). It would be nice if we could find the time to revisit the issue before the end of the year to see if can nail it down this time.
Sam Varshavchik wrote: [...] > Anyone know if there existssome encoding where a multibyte sequence produces more than one wchar_t?
Assuming by multibyte sequence you mean a sequence that forms a single character (plus any shift sequences) I believe the only such encoding is UTF-16. Other than that, I haven't encountered such an encoding on any of the systems we deal with (most modern Unices).
Martin
I want to see what happens when you feed it to inp(), but give it a one element wchar_t outbuf buffer. Reading iconv(3) closely:
4. The output buffer has no more room for the next converted character.
In this case it sets errno to E2BIG and returns (size_t)(-1).
A brief experiment:
iconv_t i=iconv_open("UTF-8", "WCHAR_T"); wchar_t iarr[]={0x00e8,0x00e8}; char carr[3];
char *inbuf=(char *)&iarr[0]; char *outbuf=carr;
size_t inbytesleft=sizeof(iarr); size_t outbytesleft=sizeof(carr);
size_t n=iconv(i, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
printf("Return code %d, errno=%s\n", (int)n, strerror(errno)); printf("inbytesleft=%d, outbytesleft=%d\n", (int)inbytesleft, (int)outbytesleft);
Result:
Return code -1, errno=Argument list too long inbytesleft=4, outbytesleft=1
iconv() converted the first wchar_t, but then stopped and terminated with room in both input and outbuf buffers. Since iconv(), via mbr..(), is presumably used for inp() as well, I should expect the same behavior for encodings where multibyte sequences can produce more than one wchar_t, if such encoding exists. I'm wondering if the current logic in basic_filebuf handles this situation -- haven't yet deciphered it.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |