Bug 105857 - codecvt::do_length causes unexpected buffer overflow
Summary: codecvt::do_length causes unexpected buffer overflow
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 11.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2022-06-05 15:12 UTC by andysem
Modified: 2022-06-07 10:05 UTC (History)
1 user (show)

See Also:
Known to work:
Known to fail:
Last reconfirmed: 2022-06-07 00:00:00

Test case to reproduce the problem. (349 bytes, text/x-csrc)
2022-06-05 15:13 UTC, andysem

Note You need to log in before you can comment on or make changes to this bug.
Description andysem 2022-06-05 15:12:18 UTC
Consider the following test case:

#include <cstddef>
#include <locale>

const std::size_t max_size = 10u;
const char text[] = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";

int main()
    std::locale loc;
    std::codecvt< wchar_t, char, std::mbstate_t > const& fac =
        std::use_facet< std::codecvt< wchar_t, char, std::mbstate_t > >(loc);
    std::mbstate_t mbs = std::mbstate_t();
    const char* from = text;
    const char* from_to = from + max_size;
    std::size_t max = ~static_cast< std::size_t >(0u);
    return static_cast< std::size_t >(fac.length(mbs, from, from_to, max));

$ g++ -g2 -O0 -o codecvt_length_bug codecvt_length_bug.cpp

Running this causes a crash with a buffer overflow:

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737348011840) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737348011840, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7b56476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7b3c7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7b9d6f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7cef943 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7c4a76a in __GI___fortify_fail (msg=msg@entry=0x7ffff7cef8e9 "buffer overflow detected") at ./debug/fortify_fail.c:26
#7  0x00007ffff7c490c6 in __GI___chk_fail () at ./debug/chk_fail.c:28
#8  0x00007ffff7c4a199 in __mbsnrtowcs_chk (dst=<optimized out>, src=<optimized out>, nmc=<optimized out>, len=<optimized out>, ps=<optimized out>, dstlen=<optimized out>) at ./debug/mbsnrtowcs_chk.c:27
#9  0x00007ffff7e290d2 in std::codecvt<wchar_t, char, __mbstate_t>::do_length(__mbstate_t&, char const*, char const*, unsigned long) const () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00005555555552d3 in std::__codecvt_abstract_base<wchar_t, char, __mbstate_t>::length (this=0x7ffff7f86090, __state=..., __from=0x555555556040 <text> " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", 
    __end=0x55555555604a <text+10> "*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", __max=18446744073709551615) at /usr/include/c++/11/bits/codecvt.h:219
#11 0x000055555555523d in main () at codecvt_length_bug.cpp:14

The problem appears to be that std::codecvt< wchar_t, char, std::mbstate_t >::do_length() accesses characters outside the [s, s + max_size) range, apparently using the ~static_cast< std::size_t >(0u) as the size limit. This is against the do_length() definition in the C++ standard, see [locale.codecvt.virtuals]/12-14 (http://eel.is/c++draft/locale.codecvt.virtuals#lib:codecvt,do_length):

Effects: The effect on the state argument is as if it called do_­in(state, from, from_­end, from, to, to+max, to) for to pointing to a buffer of at least max elements.

That is, max is only referred to as the size of the potential output buffer, and the source buffer is specified as [from, from_end). There is no requirement for max to be within [from, from_end) bounds. If I change max to (sizeof(text) - 1u) then the buffer overflow does not happen.

(As to the purpose of this code, it is supposed to calculate the size, in bytes, of the initial sequence of complete characters not larger than max_size.)

$ g++ -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.2.0-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
Comment 1 andysem 2022-06-05 15:13:14 UTC
Created attachment 53089 [details]
Test case to reproduce the problem.
Comment 2 andysem 2022-06-05 15:15:08 UTC
> outside the [s, s + max_size) range

This should be [from, from_to) range. Sorry, posted a little too soon.