The attached source file (UTF-8 encoded) demonstrates that codecvt is broken for the simplest of transformations (UTF-8 to UCS-4). This is pretty basic, and the underlying gconf stuff works correctly, so the bug is either in libstdc++6 or somewhere inline in the headers. $ ./wide wide: ../iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr - bytebuf > (state->__count & 7)' failed. Aborted While running: (gdb) bt #0 0x0fcc672c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #1 0x0fe0425c in ?? () from /lib/tls/libc.so.6 #2 0x0ffa6ef8 in std::codecvt<wchar_t, char, __mbstate_t>::do_in () from /usr/lib/libstdc++.so.6 #3 0x100016b4 in std::__codecvt_abstract_base<wchar_t, char, __mbstate_t>::in (this=0x100290b8, __state=@0x7fa405a8, __from=0x10013014 "fffäESC%G�ESC%@2 37»", __from_end=0x1001301d "", __from_next=@0x7fa405b0, __to=0x7fa405bc, __to_end=0x7fa406fc, __to_next=@0x7fa405b4) at /usr/lib/gcc/powerpc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/c odecvt.h:204 #4 0x10001244 in to_wide_string (str=@0x7fa40758, locale=@0x7fa40738) at wide.cc:22 #5 0x10001544 in main () at wide.cc:59 Program received signal SIGABRT, Aborted. 0x0fcd67bc in raise () from /lib/tls/libc.so.6 (gdb) bt #0 0x0fcd67bc in raise () from /lib/tls/libc.so.6 #1 0x0fcd82c0 in abort () from /lib/tls/libc.so.6 #2 0x0fcce768 in __assert_fail () from /lib/tls/libc.so.6 #3 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #4 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #5 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #6 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #7 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #8 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #9 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #10 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 #11 0x0fcc6c7c in __gconv_transform_utf8_internal () from /lib/tls/libc.so.6 Previous frame inner to this frame (corrupt stack?) It affects GCC 4.2 (20060613), 4.1, 4.0, 3.3 on Debian GNU/Linux (unstable). The program works correctly with 3.4: $ g++-3.4 -o wide wide.cc $ ./wide 1 fffäß»fffäß»$ Regards, Roger
Created attachment 11679 [details] Testcase to show codecvt crash Compile with g++ -o wide wide.cc
Humm, this is really puzzling because nothing non-trivial changed in that area going from 3.4 to 4.0 and of course we all run daily the testsuite which includes quite a few codecvt tests, which always pass smoothly. Could you please compare/contrast your issue to existing testcases in testsuite/22_locale/codecvt? Anyway, if I save the attached wide.cc from the browser and compile/run it, then I get "1 4 1 4..." without end. Is that the expected result? Or can you help us reproduce the problem? Thanks,
The source is UTF-8 encoded, and it assumes you are going to run it in a UTF-8 locale. That might possibly be why you get odd output. The expected output should be as per the GCC 3.4 output in the original report: $ g++-3.4 -o wide wide.cc $ ./wide 1 fffäß»fffäß»$ where '$' is the shell prompt. This output was also verified by someone with access to a MS VC++ compiler (source recoded to Windows character set). If the source file got corrupted by bugzilla, it's also available from http://people.debian.org/~rleigh/wide.cc I'll check out the testsuite next.
(In reply to comment #3) > The source is UTF-8 encoded, and it assumes you are going to run it in a UTF-8 > locale. That might possibly be why you get odd output. > > The expected output should be as per the GCC 3.4 output in the original report: > > $ g++-3.4 -o wide wide.cc > $ ./wide > 1 > fffäß»fffäß»$ Ok, thanks. Then I used the "en_US.UTF-8" locale and it worked fine, both mainline and stock 4.1.1: no crashes, apparently same output.
$ g++ --version g++ (GCC) 4.1.2 20060613 (prerelease) (Debian 4.1.1-5) $ g++ -o wide wide.cc $ time ./wide wide: ../iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr - bytebuf > (state->__count & 7)' failed. Aborted real 0m12.545s user 0m12.416s sys 0m0.016s All that run time is spent in __gconv_transform_utf8_internal, before it blows up. $ locale LANG=en_GB.UTF8 LANGUAGE=en_GB:en_US:en LC_CTYPE="en_GB.UTF8" LC_NUMERIC="en_GB.UTF8" LC_TIME="en_GB.UTF8" LC_COLLATE="en_GB.UTF8" LC_MONETARY="en_GB.UTF8" LC_MESSAGES="en_GB.UTF8" LC_PAPER="en_GB.UTF8" LC_NAME="en_GB.UTF8" LC_ADDRESS="en_GB.UTF8" LC_TELEPHONE="en_GB.UTF8" LC_MEASUREMENT="en_GB.UTF8" LC_IDENTIFICATION="en_GB.UTF8" LC_ALL=
en_US.UTF-8 doesn't work for me either.
(In reply to comment #6) > en_US.UTF-8 doesn't work for me either. Nope, I just tried with "en_GB.utf8" too and everything works fine in that case too. Everything considered I don't think it's likely that libstdc++ can be at fault.
(In reply to comment #5) > All that run time is spent in __gconv_transform_utf8_internal, before it blows > up. Isn't that a strong hint that something is wrong with the glibc? When you say 3.4 is fine you mean on the very same machine?
Humm, wait, I'm working on x86-linux! Is that target specific? You can see the issue only on powerpc?
Yes, this is all on the same Debian installation. 3.3, 3.4, 4.0, 4.1 and 4.2 (snapshot) are available. All but 3.4 exhibit this problem. I will test on an i686 system in a moment to check if it's powerpc-only.
(In reply to comment #9) > Humm, wait, I'm working on x86-linux! Is that target specific? You can see the > issue only on powerpc? Well, in any case all the codecvt regression tests are always fine on powerpc and powerpc64-linux too...
Testing on i486-linux-gnu, the results are: 3.3: fail 3.4: OK 4.0: OK 4.1: OK 4.2 snapshot: OK So 4.0, 4.1 and 4.2 snapshot are OK on i486-linux-gnu but not on powerpc-linux-gnu.
(In reply to comment #12) > So 4.0, 4.1 and 4.2 snapshot are OK on i486-linux-gnu but not on > powerpc-linux-gnu. Ok. In the meanwhile I double checked and in fact **nothing** changed in the codecvt code going from 3.4 to 4.0. Really, I don't know what to do on the libstdc++ side. All the powerpc-linux and powerpc64-linux tests are fine, on Debian too, as you can see on testresults. Frankly, I don't think there is anything in libstdc++ which could trigger only with en_GB and not en_US and only on powerpc. And the failure is happening inside the glibc code... Could you maybe try to feed mbsnrtowcs directly on powerpc: as you can see in codecvt_members.cc, our codecvt::in is just a thin wrapper around it.
Can you please tell us the glibc version? I'm asking because I can reproduce on an ia64 machine using glibc2.4, not on all the glibc2.3.6 systems I tried.
$ uname -a Linux hardknott 2.6.16.17 #7 Sun May 21 15:39:23 BST 2006 ppc GNU/Linux $ /lib/libc.so.6 GNU C Library stable release version 2.3.6, by Roland McGrath et al. Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.0.4 20060507 (prerelease) (Debian 4.0.3-3). Compiled on a Linux 2.6.13 system on 2006-06-08. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others GNU Libidn by Simon Josefsson linuxthreads-0.10 by Xavier Leroy BIND-8.2.3-T5B libthread_db work sponsored by Alpha Processor Inc NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk software FPU emulation by Richard Henderson, Jakub Jelinek and others
I can reproduce on an ia64-linux machine, so confirmed, but very puzzling on the libstdc++-v3 side, no idea how/when we are going to deal with it...
Created attachment 11682 [details] Use mbsnrtowcs directly. This testcase is similar to the original, with the exception that it uses mbsnrtowcs in place of the codecvt locale facet. It also initialises the locale with setlocale() for LC_CTYPE. It shows some interesting results, in fact the exact opposite of the original testcase: GCC ver powerpc i386 3.3 fail fail 3.4 OK OK 4.0 OK fail 4.1 OK fail 4.2 OK fail With this test, the expected output is this: $ ./wide2 1 fffäß» fffäß» The output for the failed tests: GCC 3.3: powerpc (GCC 3.3 was bad at wide streams; the output is "lost"): $ ./wide2 1 fffäß» i386: $ ./wide2 wide2: ../iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr - bytebuf > (state->__count & 7)' failed. Aborted GCC 4.0/i386: $ ./wide2 wide2: ../iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr - bytebuf > (state->__count & 7)' failed. Aborted GCC 4.1/i386: ./wide2 wide2: ../iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr - bytebuf > (state->__count & 7)' failed. Aborted GCC 4.2/i386: $ ./wide2 wide2: ../iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr - bytebuf > (state->__count & 7)' failed. Aborted Please do allow for the fact that one (or both) of these testcases might be buggy; I've never used these interfaces before. However... the behaviour is still highly variable between the two platforms. Regards, Roger
Ok, thanks. Before I go completely crazy, let's agree at least about a detail: let's not involve 3.3: in 3.3 codecvt is known to be broken and was completely rewritten for 3.4.
Created attachment 11683 [details] C example using mbsnrtowcs This testcase is the same as the last, but uses C only. It looks like this: GCC ver powerpc i386 3.3 OK OK 3.4 OK OK 4.0 OK fail 4.1 OK fail 4.2 OK fail The expected output is: $ ./wide3 fffäß» 1 fffäß» On i386 (all failing versions): $ ./wide3 fffäß» Segmentation fault (gdb) run Starting program: /home/rleigh/wide3 fffäß» Program received signal SIGSEGV, Segmentation fault. 0xa7e0e19d in __gconv_transform_utf8_internal (step=0x805ede0, data=0xafc2a8d0, inptrp=0xafc2aa80, inend=0x8048754 "", outbufstart=0x0, irreversible=0xafc2a8f8, do_flush=0, consume_incomplete=1) at ../iconv/loop.c:371 371 ../iconv/loop.c: No such file or directory. in ../iconv/loop.c (gdb) bt #0 0xa7e0e19d in __gconv_transform_utf8_internal (step=0x805ede0, data=0xafc2a8d0, inptrp=0xafc2aa80, inend=0x8048754 "", outbufstart=0x0, irreversible=0xafc2a8f8, do_flush=0, consume_incomplete=1) at ../iconv/loop.c:371 #1 0xa7e65bd9 in __mbsnrtowcs (dst=0xafc2a93c, src=0xafc2aa80, nmc=9, len=162, ps=0xafc2aa84) at mbsnrtowcs.c:106 #2 0x08048503 in print_wide (str=0x804874b "fffä�237»") at wide3.c:16 #3 0x080485f0 in main () at wide3.c:40 Both the powerpc and i386 system are running the same version of glibc.
> Before I go completely crazy, let's agree at least about a detail: > let's not involve 3.3: in 3.3 codecvt is known to be broken and was > completely rewritten for 3.4. Agreed :)
Ok, I think I have something meaningful to say: seems definitely a miscompilation. I would ask you to check on powerpc-linux what I'm seeing on ia64-linux: the problem goes away if I both build libstdc++ and eventually the testcase at "-O0 -g3". Therefore I would ask you to go inside the libstdc++-v3 dir of your build tree, do a make clean ; make CXXFLAGS="-O0 -g3", reinstall the library alone (no need to rebuild the compiler proper) and build the testcase itself "-O0 -g3". On ia64-linux the problem goes away. If yoy can confirm, the difficult part begins ;) because we are supposed to prepare a reduced testcase for the compiler people...
Just to summarise the current tests: wide wide2 wide3 GCC ver ppc i386 ppc i386 ppc i386 3.4 OK OK OK OK OK fail 4.0 fail OK OK fail OK fail 4.1 fail OK OK fail OK fail 4.2 fail OK OK fail OK fail GCC 3.4 is the most reliable, but I don't understand the pattern of failures. I'll do a build in a moment as you suggest.
This will take a few more hours. I didn't have a built GCC tree to hand, so I'm still waiting on "make bootstrap".
../gcc-20060613/configure --enable-languages=c,c++ --prefix=/home/rleigh/gcc-test --enable-shared --with-system-zlib --without-included-gettext --enable-threads=posix --enable-nls --enable-__cxa_atexit --enable-libstdcxx-debug $ ./wide terminate called after throwing an instance of 'std::runtime_error' what(): locale::facet::_S_create_c_locale name not valid Aborted #0 0x0fcf77c8 in kill () at ../string/bits/string2.h:998 #1 0x0fcf754c in *__GI_raise (sig=6) at ../linuxthreads/sysdeps/unix/sysv/linux/raise.c:32 #2 0x0fcf8e68 in *__GI_abort () at ../sysdeps/generic/abort.c:88 #3 0x0ffb273c in __gnu_cxx::__verbose_terminate_handler () at ../../../../gcc-20060613/libstdc++-v3/libsupc++/vterminate.cc:98 #4 0x0ffaf87c in __cxxabiv1::__terminate (handler=0) at ../../../../gcc-20060613/libstdc++-v3/libsupc++/eh_terminate.cc:43 #5 0x0ffaf8b8 in std::terminate () at ../../../../gcc-20060613/libstdc++-v3/libsupc++/eh_terminate.cc:53 #6 0x0ffafa20 in __cxa_throw (obj=<value optimized out>, tinfo=<value optimized out>, dest=<value optimized out>) at ../../../../gcc-20060613/libstdc++-v3/libsupc++/eh_throw.cc:76 #7 0x0ff3a050 in std::__throw_runtime_error (__s=<value optimized out>) at ../../../../gcc-20060613/libstdc++-v3/src/functexcept.cc:84 #8 0x0ffadd64 in std::locale::facet::_S_create_c_locale (__cloc=<value optimized out>, __s=<value optimized out>) at c++locale.cc:141 #9 0x0ff40154 in _Impl (this=0x10013080, __s=0x6 <Address 0x6 out of bounds>, __refs=<value optimized out>) at ../../../../gcc-20060613/libstdc++-v3/src/localename.cc:185 #10 0x0ff41ac4 in locale (this=0x7fc83950, __s=<value optimized out>) at ../../../../gcc-20060613/libstdc++-v3/src/localename.cc:138 #11 0x100015e8 in main () at wide.cc:54 $ ./wide2 1 fffäß» fffäß» ./wide3 fffäß» 1 fffäß» Rebuilding libstdc++v3 with 'make CXXFLAGS="-O0 -g3"': $ ./wide terminate called after throwing an instance of 'std::runtime_error' what(): locale::facet::_S_create_c_locale name not valid Aborted (gdb) run Starting program: /home/rleigh/wbug/wide terminate called after throwing an instance of 'std::runtime_error' what(): locale::facet::_S_create_c_locale name not valid Program received signal SIGABRT, Aborted. 0x0fcc57c8 in kill () at ../string/bits/string2.h:998 998 ../string/bits/string2.h: No such file or directory. in ../string/bits/string2.h Current language: auto; currently c (gdb) bt #0 0x0fcc57c8 in kill () at ../string/bits/string2.h:998 #1 0x0fcc554c in *__GI_raise (sig=6) at ../linuxthreads/sysdeps/unix/sysv/linux/raise.c:32 #2 0x0fcc6e68 in *__GI_abort () at ../sysdeps/generic/abort.c:88 #3 0x0ffaf7d4 in __gnu_cxx::__verbose_terminate_handler () at ../../../../gcc-20060613/libstdc++-v3/libsupc++/vterminate.cc:98 #4 0x0ffaa238 in __cxxabiv1::__terminate (handler=0xffaf5ac <__gnu_cxx::__verbose_terminate_handler()>) at ../../../../gcc-20060613/libstdc++-v3/libsupc++/eh_terminate.cc:43 #5 0x0ffaa288 in std::terminate () at ../../../../gcc-20060613/libstdc++-v3/libsupc++/eh_terminate.cc:53 #6 0x0ffaa534 in __cxa_throw (obj=0x10013130, tinfo=0xffe2d58, dest=0xff1ea3c <~runtime_error>) at ../../../../gcc-20060613/libstdc++-v3/libsupc++/eh_throw.cc:76 #7 0x0ff120e4 in std::__throw_runtime_error (__s=0xffb7e04 "locale::facet::_S_create_c_locale name not valid") at ../../../../gcc-20060613/libstdc++-v3/src/functexcept.cc:84 #8 0x0ffa7624 in std::locale::facet::_S_create_c_locale (__cloc=@0x7fd11824, __s=0x1001306c "en_GB.UTF8") at c++locale.cc:141 #9 0x0ff1bda4 in _Impl (this=0x10013080, __s=0x1001306c "en_GB.UTF8", __refs=1) at ../../../../gcc-20060613/libstdc++-v3/src/localename.cc:185 #10 0x0ff1de70 in locale (this=0x7fd11950, __s=0x10002364 "") at ../../../../gcc-20060613/libstdc++-v3/src/localename.cc:138 #11 0x10001748 in main () at wide.cc:54 $ ./wide2 1 fffäß» fffäß» $ ./wide3 fffäß» 1 fffäß» Regards, Roger
(In reply to comment #24) > terminate called after throwing an instance of 'std::runtime_error' > what(): locale::facet::_S_create_c_locale name not valid This is the standard throw which happens when a named locale cannot be used, has nothing to do with the issue which we are discussing and it's expexted behavior. The only possible explanation is that the GNU locale model has been disabled by the configure-time tests. Do you have installed a full set of locales, in particular de_DE? See also these notes for additional details: http://gcc.gnu.org/onlinedocs/libstdc++/install.html Anyway, at this point it's almost sure we are dealing with a miscompilation, the fact that nothing changed in the libary code and the problem happen with the 4.x compilers (of new technology, ssa, etc..) it's also a strong indication of that (besides my 100% reproducible tests on ia64-linux and all the other checks).
Thiemo Seufer diagnosed this as a problem with the testcases: mbstate_t needs explictly initialising to all-bits-zero with memset. After doing this std::memset(&state, 0, sizeof(mbstate_t)); all the testcases work for me on powerpc and i386. Since this is not a bug, it can be closed. Sorry about that. Perhaps the libstdc++ doxygen documentation for codecvt could document that state_type/mbstate_t needs explicit initialisation before use. Regards, Roger
(In reply to comment #26) > Thiemo Seufer diagnosed this as a problem with the testcases: mbstate_t needs > explictly initialising to all-bits-zero with memset. After doing this > > std::memset(&state, 0, sizeof(mbstate_t)); > > all the testcases work for me on powerpc and i386. Funny. Actually, we still have bugs, in the testsuite only , where we are never doing the initialization. I will fix that. Sorry about my part of the waste of time, I'm learning some of those details with you, the current codecvt has been contributed by other people. > > Since this is not a bug, it can be closed. Sorry about that. Perhaps the > libstdc++ doxygen documentation for codecvt could document that > state_type/mbstate_t needs explicit initialisation before use. > > > Regards, > Roger >
Correction, our testcases are already fine, zero_state does the job... Anyway...