Consider the following test case: #include <cstddef> #include <locale> const std::size_t max_size = 10u; const char text[] = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"; int main() { std::locale loc; std::codecvt< wchar_t, char, std::mbstate_t > const& fac = std::use_facet< std::codecvt< wchar_t, char, std::mbstate_t > >(loc); std::mbstate_t mbs = std::mbstate_t(); const char* from = text; const char* from_to = from + max_size; std::size_t max = ~static_cast< std::size_t >(0u); return static_cast< std::size_t >(fac.length(mbs, from, from_to, max)); } $ g++ -g2 -O0 -o codecvt_length_bug codecvt_length_bug.cpp Running this causes a crash with a buffer overflow: Program received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737348011840, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff7b56476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff7b3c7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff7b9d6f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7cef943 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155 #6 0x00007ffff7c4a76a in __GI___fortify_fail (msg=msg@entry=0x7ffff7cef8e9 "buffer overflow detected") at ./debug/fortify_fail.c:26 #7 0x00007ffff7c490c6 in __GI___chk_fail () at ./debug/chk_fail.c:28 #8 0x00007ffff7c4a199 in __mbsnrtowcs_chk (dst=<optimized out>, src=<optimized out>, nmc=<optimized out>, len=<optimized out>, ps=<optimized out>, dstlen=<optimized out>) at ./debug/mbsnrtowcs_chk.c:27 #9 0x00007ffff7e290d2 in std::codecvt<wchar_t, char, __mbstate_t>::do_length(__mbstate_t&, char const*, char const*, unsigned long) const () from /lib/x86_64-linux-gnu/libstdc++.so.6 #10 0x00005555555552d3 in std::__codecvt_abstract_base<wchar_t, char, __mbstate_t>::length (this=0x7ffff7f86090, __state=..., __from=0x555555556040 <text> " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", __end=0x55555555604a <text+10> "*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", __max=18446744073709551615) at /usr/include/c++/11/bits/codecvt.h:219 #11 0x000055555555523d in main () at codecvt_length_bug.cpp:14 The problem appears to be that std::codecvt< wchar_t, char, std::mbstate_t >::do_length() accesses characters outside the [s, s + max_size) range, apparently using the ~static_cast< std::size_t >(0u) as the size limit. This is against the do_length() definition in the C++ standard, see [locale.codecvt.virtuals]/12-14 (http://eel.is/c++draft/locale.codecvt.virtuals#lib:codecvt,do_length): Effects: The effect on the state argument is as if it called do_in(state, from, from_end, from, to, to+max, to) for to pointing to a buffer of at least max elements. That is, max is only referred to as the size of the potential output buffer, and the source buffer is specified as [from, from_end). There is no requirement for max to be within [from, from_end) bounds. If I change max to (sizeof(text) - 1u) then the buffer overflow does not happen. (As to the purpose of this code, it is supposed to calculate the size, in bytes, of the initial sequence of complete characters not larger than max_size.) $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.2.0-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
Created attachment 53089 [details] Test case to reproduce the problem.
> outside the [s, s + max_size) range This should be [from, from_to) range. Sorry, posted a little too soon.
I'm not sure about the meaning of the wrong-code keyword, but just to clear, I do not consider the code sample in the bug report to be wrong (i.e. incorrect usage of the standard library). If it is wrong, please explain how.
It means "GCC (in some form) generates incorrect code for a valid testcase".
You can click on the "Keywords" link to see the meanings.
I can't reproduce a crash, or any runtime error from valgrind, ubsan, asan, ...
Reproduces for me: $ g++ -g2 -O0 -o codecvt_length_bug codecvt_length_bug.cpp $ ./codecvt_length_bug *** buffer overflow detected ***: terminated Aborted (core dumped) $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.2.0-23ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4) gdb shows this backtrace: #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff784526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff78288ff in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff78297b6 in __libc_message_impl (fmt=fmt@entry=0x7ffff79ce765 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:132 #6 0x00007ffff7936c19 in __GI___fortify_fail (msg=msg@entry=0x7ffff79ce74c "buffer overflow detected") at ./debug/fortify_fail.c:24 #7 0x00007ffff79365d4 in __GI___chk_fail () at ./debug/chk_fail.c:28 #8 0x00007ffff7937339 in __mbsnrtowcs_chk (dst=<optimized out>, src=<optimized out>, nmc=<optimized out>, len=<optimized out>, ps=<optimized out>, dstlen=<optimized out>) at ./debug/mbsnrtowcs_chk.c:27 #9 0x00007ffff7cdad44 in std::codecvt<wchar_t, char, __mbstate_t>::do_length(__mbstate_t&, char const*, char const*, unsigned long) const () from /lib/x86_64-linux-gnu/libstdc++.so.6 #10 0x00005555555552f7 in std::__codecvt_abstract_base<wchar_t, char, __mbstate_t>::length (this=0x7ffff7e7b2f0, __state=..., __from=0x555555556040 <text> " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", __end=0x55555555604a <text+10> "*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", __max=18446744073709551615) at /usr/include/c++/13/bits/codecvt.h:219 #11 0x000055555555524d in main () at codecvt_length_bug.cpp:16
I think, sanitizers won't detect it unless you build libstdc++ with them enabled. I would expect valgrind to detect it though. I think, Ubuntu builds libstdc++ with _FORTIFY_SOURCE enabled, which is why it checks for pointer validity and fails.
I can reproduce it as well.
(In reply to andysem from comment #0) > The problem appears to be that std::codecvt< wchar_t, char, std::mbstate_t > >::do_length() accesses characters outside the [s, s + max_size) range, I'm pretty sure it doesn't. > apparently using the ~static_cast< std::size_t >(0u) as the size limit. That's the maximum number of output characters to write to the output buffer, it should have no effect on the number of characters read from the input buffer. If it does, that's a glibc bug because libstdc++ just calls mbsnrtowcs: size_t __conv = mbsnrtowcs(__to, &__from, __from_chunk_end - __from, __max, &__state); > That is, max is only referred to as the size of the potential output buffer, > and the source buffer is specified as [from, from_end). Again, if there's an access outside that range, it's a bug in glibc's mbsnrtowcs. The problem is that the output range is a buffer obtained from alloca: // A dummy internal buffer is needed in order for mbsnrtocws to consider // its fourth parameter (it wouldn't with NULL as first parameter). wchar_t* __to = static_cast<wchar_t*>(__builtin_alloca(sizeof(wchar_t) * __max)); For your program, that multiplication is going to wrap and give a value smaller than __max. The fortified mbsnrtowcs has: size_t __mbsnrtowcs_chk (wchar_t *dst, const char **src, size_t nmc, size_t len, mbstate_t *ps, size_t dstlen) { if (__glibc_unlikely (dstlen < len)) __chk_fail (); So this check is noticing that the size of the alloca'd buffer (destlen) is smaller than the size passed in (len). > There is no > requirement for max to be within [from, from_end) bounds. If I change max to > (sizeof(text) - 1u) then the buffer overflow does not happen. There's no overflow. It's a fortify check saying "you told me the buffer is size N but it's actually smaller than that". Nothing ever accesses outside any buffer. But when __max is small, the alloca(sizeof(wchar_t) * __max) calculation doesn't wrap and so the buffer size matches.
I can reproduce it if I compile codecvt_members.cc with _FORTIFY_SOURCE=3 I think Fedora is built with _FORTIFY_SOURCE=2 so doesn't check mbsnrtowcs. I'll test this fix: --- a/libstdc++-v3/config/locale/gnu/codecvt_members.cc +++ b/libstdc++-v3/config/locale/gnu/codecvt_members.cc @@ -232,10 +232,12 @@ namespace // in case we advance past it and then continue, in a loop. // NB: mbsnrtowcs is a GNU extension + const size_t __to_len = 1024; // Size of alloca'd output buffer + // A dummy internal buffer is needed in order for mbsnrtocws to consider // its fourth parameter (it wouldn't with NULL as first parameter). wchar_t* __to = static_cast<wchar_t*>(__builtin_alloca(sizeof(wchar_t) - * __max)); + * __to_len)); while (__from < __end && __max) { const extern_type* __from_chunk_end; @@ -248,7 +250,8 @@ namespace const extern_type* __tmp_from = __from; size_t __conv = mbsnrtowcs(__to, &__from, __from_chunk_end - __from, - __max, &__state); + __max > __to_len ? __to_len : __max, + &__state); if (__conv == static_cast<size_t>(-1)) { // In case of error, in order to stop at the exact place we
Overflow is possible though. If you call it with max = (-1ull / 4 + 2) then the alloca length will be 4. If (from_end - from) requires more than 4 wide characters, we'll overflow the alloca buffer.
(In reply to Jonathan Wakely from comment #12) > Overflow is possible though. If you call it with max = (-1ull / 4 + 2) then > the alloca length will be 4. If (from_end - from) requires more than 4 wide Actually 4 bytes, so 1 wide character. > characters, we'll overflow the alloca buffer. Except that inside mbsnrtowcs glibc does the same len * sizeof(wchar_t) calculation with the value that libstdc++ passes it, and comes up with the same value of 4. So it won't write more than 1 wide character.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:498f9aefbf36b0e3f119d634c41d86699ce6fed2 commit r15-5718-g498f9aefbf36b0e3f119d634c41d86699ce6fed2 Author: Jonathan Wakely <jwakely@redhat.com> Date: Tue Nov 26 20:07:33 2024 +0000 libstdc++: Fix unsigned wraparound in codecvt::do_length [PR105857] When the max argument to std::codecvt<wchar_t, char, mbstate_t>::length is SIZE_MAX/4+1 or greater the multiplication with sizeof(wchar_t) will wrap to a small value, and the alloca call will have a buffer that's smaller than requested. The call to mbsnrtowcs then has a buffer that is smaller than the value passed as the buffer length. When libstdc++.so is built with -D_FORTIFY_SOURCE=3 the mismatched buffer and length will get detected and will abort inside Glibc. When it doesn't abort, there's no buffer overflow because Glibc's mbsnrtowcs has the same len * sizeof(wchar_t) calculation to determine the size of the buffer in bytes, and that will wrap to the same small number as the alloca argument. So luckily Glibc agrees with the caller about the real size of the buffer, and won't overflow it. Even when the max argument isn't large enough to wrap, it can still be much too large to safely pass to alloca, so we should limit that. We already have a loop that processes chunks so that we can handle null characters in the middle of the input. If we limit the alloca buffer to 4kB then we'll just loop each time that buffer is filled. libstdc++-v3/ChangeLog: PR libstdc++/105857 * config/locale/dragonfly/codecvt_members.cc (do_length): Limit size of alloca buffer to 4k. * config/locale/gnu/codecvt_members.cc (do_length): Likewise. * testsuite/22_locale/codecvt/length/wchar_t/105857.cc: New test.
Fixed on trunk, but this code has been present since 2003 so I want to backport it to all active branches.
Thanks for the fix. And for the explanation. Yes, backport would be nice.
Is this related? Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x9221e4e0 in std::codecvt<wchar_t, char, __mbstate_t>::do_in () (gdb) bt #0 0x9221e4e0 in std::codecvt<wchar_t, char, __mbstate_t>::do_in () #1 0x921ecb10 in std::istream::_M_extract<unsigned short> () #2 0x029866e8 in Exiv2::ValueType<unsigned short>::read () #3 0x0298380c in Exiv2::Exifdatum::operator= () #4 0x0164bd90 in gexiv2_metadata_set_exif_tag_multiple () #5 0x016460ac in gexiv2_metadata_set_tag_multiple () #6 0x00f68a58 in gimp_metadata_deserialize_text () #7 0x019bb6c4 in g_markup_parse_context_parse () #8 0x00f694a0 in gimp_metadata_deserialize () #9 0x0024325c in image_set_metadata_invoker () #10 0x0028e188 in gimp_procedure_execute () #11 0x002880a0 in gimp_pdb_execute_procedure_by_name_args () #12 0x00293f3c in gimp_plug_in_handle_message () #13 0x00292124 in gimp_plug_in_recv_message () #14 0x019b3ec0 in g_main_dispatch () #15 0x019b7030 in g_main_context_iterate_unlocked.isra.0 () #16 0x019b7aac in g_main_loop_run () #17 0x00297cdc in gimp_plug_in_manager_call_run () #18 0x0029cfd0 in gimp_plug_in_procedure_execute () #19 0x0028e188 in gimp_procedure_execute () #20 0x002880a0 in gimp_pdb_execute_procedure_by_name_args () #21 0x002882e8 in gimp_pdb_execute_procedure_by_name () #22 0x00395d04 in file_open_image () #23 0x00348570 in gimp_imagefile_create_thumbnail () #24 0x003487dc in gimp_imagefile_create_thumbnail_weak () #25 0x001ee8bc in gimp_thumb_box_auto_thumbnail () #26 0x019b3ec0 in g_main_dispatch () #27 0x019b7030 in g_main_context_iterate_unlocked.isra.0 () #28 0x019b7aac in g_main_loop_run () #29 0x0000f914 in app_run () #30 0x00408928 in main () This is on powerpc-darwin with gcc 14.3.0.
Judging just by the backtrace, I don't think so. The backtrace doesn't mention codecvt::do_length(), it crashes in do_in() with a null pointer.
The releases/gcc-14 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:2c39bf386d92a0d0c4278f14ab08b67e5d728211 commit r14-12265-g2c39bf386d92a0d0c4278f14ab08b67e5d728211 Author: Jonathan Wakely <jwakely@redhat.com> Date: Tue Nov 26 20:07:33 2024 +0000 libstdc++: Fix unsigned wraparound in codecvt::do_length [PR105857] When the max argument to std::codecvt<wchar_t, char, mbstate_t>::length is SIZE_MAX/4+1 or greater the multiplication with sizeof(wchar_t) will wrap to a small value, and the alloca call will have a buffer that's smaller than requested. The call to mbsnrtowcs then has a buffer that is smaller than the value passed as the buffer length. When libstdc++.so is built with -D_FORTIFY_SOURCE=3 the mismatched buffer and length will get detected and will abort inside Glibc. When it doesn't abort, there's no buffer overflow because Glibc's mbsnrtowcs has the same len * sizeof(wchar_t) calculation to determine the size of the buffer in bytes, and that will wrap to the same small number as the alloca argument. So luckily Glibc agrees with the caller about the real size of the buffer, and won't overflow it. Even when the max argument isn't large enough to wrap, it can still be much too large to safely pass to alloca, so we should limit that. We already have a loop that processes chunks so that we can handle null characters in the middle of the input. If we limit the alloca buffer to 4kB then we'll just loop each time that buffer is filled. libstdc++-v3/ChangeLog: PR libstdc++/105857 * config/locale/dragonfly/codecvt_members.cc (do_length): Limit size of alloca buffer to 4k. * config/locale/gnu/codecvt_members.cc (do_length): Likewise. * testsuite/22_locale/codecvt/length/wchar_t/105857.cc: New test. (cherry picked from commit 498f9aefbf36b0e3f119d634c41d86699ce6fed2)
The releases/gcc-13 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:430a5720a97a2c725e3902fdc86d7419727a8522 commit r13-10054-g430a5720a97a2c725e3902fdc86d7419727a8522 Author: Jonathan Wakely <jwakely@redhat.com> Date: Tue Nov 26 20:07:33 2024 +0000 libstdc++: Fix unsigned wraparound in codecvt::do_length [PR105857] When the max argument to std::codecvt<wchar_t, char, mbstate_t>::length is SIZE_MAX/4+1 or greater the multiplication with sizeof(wchar_t) will wrap to a small value, and the alloca call will have a buffer that's smaller than requested. The call to mbsnrtowcs then has a buffer that is smaller than the value passed as the buffer length. When libstdc++.so is built with -D_FORTIFY_SOURCE=3 the mismatched buffer and length will get detected and will abort inside Glibc. When it doesn't abort, there's no buffer overflow because Glibc's mbsnrtowcs has the same len * sizeof(wchar_t) calculation to determine the size of the buffer in bytes, and that will wrap to the same small number as the alloca argument. So luckily Glibc agrees with the caller about the real size of the buffer, and won't overflow it. Even when the max argument isn't large enough to wrap, it can still be much too large to safely pass to alloca, so we should limit that. We already have a loop that processes chunks so that we can handle null characters in the middle of the input. If we limit the alloca buffer to 4kB then we'll just loop each time that buffer is filled. libstdc++-v3/ChangeLog: PR libstdc++/105857 * config/locale/dragonfly/codecvt_members.cc (do_length): Limit size of alloca buffer to 4k. * config/locale/gnu/codecvt_members.cc (do_length): Likewise. * testsuite/22_locale/codecvt/length/wchar_t/105857.cc: New test. (cherry picked from commit 498f9aefbf36b0e3f119d634c41d86699ce6fed2)
Fixed for 13.5, 14.4, and 15.1