Bug 109703 - [12/13/14 Regression] __builtin_unreachable() reached since r13-6915-gbf78b43873b0b7
Summary: [12/13/14 Regression] __builtin_unreachable() reached since r13-6915-gbf78b43...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 13.1.1
: P1 normal
Target Milestone: 12.3
Assignee: Jonathan Wakely
URL:
Keywords:
: 109706 109737 109786 110549 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-05-02 19:05 UTC by Cristian Morales Vega
Modified: 2024-07-09 09:24 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work: 12.2.0
Known to fail: 12.2.1, 13.1.0
Last reconfirmed: 2023-05-02 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cristian Morales Vega 2023-05-02 19:05:15 UTC
This (https://github.com/gcc-mirror/gcc/commit/bf78b43873b0b7e8f9a430df38749b8b61f9c9b8#diff-c8a656ef205ec6452ef0bed111a387dc9e7eb2404fb3222a48f9f93b2460bd55R278) __builtin_unreachable() is reached using this code

--------------------------------------
#include <boost/algorithm/string/case_conv.hpp>
#include <string>

int main() {
  auto lower = std::string{"0123456789ABCDEF"};
  auto upper = boost::algorithm::to_upper_copy(lower);
}
--------------------------------------

with Boost 1.82.

Doing "g++ -O1 -fsanitize=undefined -o main main.cpp && ./main" results in

/usr/include/c++/13/bits/basic_string.h:278:29: runtime error: execution reached an unreachable program point

This is with gcc-13.1.1-1.fc38.x86_64 from Fedora 38.


AFAICT the bug is in libstdc++.

During the first iteration, is not until https://github.com/gcc-mirror/gcc/blob/releases/gcc-13.1.0/libstdc%2B%2B-v3/include/bits/basic_string.tcc#L201 that the data pointer is replaced from the SSO to the heap one; but one line before, in

https://github.com/gcc-mirror/gcc/blob/releases/gcc-13.1.0/libstdc%2B%2B-v3/include/bits/basic_string.tcc#L200
https://github.com/gcc-mirror/gcc/blob/releases/gcc-13.1.0/libstdc%2B%2B-v3/include/bits/basic_string.h#L293
https://github.com/gcc-mirror/gcc/blob/releases/gcc-13.1.0/libstdc%2B%2B-v3/include/bits/basic_string.h#L275

, it checks whether it should destroy the data based on whether _M_data() == _M_local_data().
Comment 1 Jonathan Wakely 2023-05-02 20:35:38 UTC
See https://gcc.gnu.org/pipermail/libstdc++/2023-May/055903.html
Comment 2 Andrew Pinski 2023-05-03 02:00:19 UTC
*** Bug 109706 has been marked as a duplicate of this bug. ***
Comment 3 GCC Commits 2023-05-03 12:19:45 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:cbf6c7a1d16490a1e63e9a5ce00e9a5c44c4c2f2

commit r14-430-gcbf6c7a1d16490a1e63e9a5ce00e9a5c44c4c2f2
Author: Kefu Chai <kefu.chai@scylladb.com>
Date:   Mon May 1 21:24:26 2023 +0100

    libstdc++: Set _M_string_length before calling _M_dispose() [PR109703]
    
    This always sets _M_string_length in the constructor for ranges of input
    iterators, such as stream iterators.
    
    We copy from the source range to the local buffer, and then repeatedly
    reallocate a larger one if necessary. When disposing the old buffer,
    _M_is_local() is used to tell if the buffer is the local one or not (and
    so must be deallocated). In addition to comparing the buffer address
    with the local buffer, _M_is_local() has an optimization hint so that
    the compiler knows that for a string using the local buffer, there is an
    invariant that _M_string_length <= _S_local_capacity (added for PR109299
    via r13-6915-gbf78b43873b0b7).  But we failed to set _M_string_length in
    the constructor taking a pair of iterators, so the invariant might not
    hold, and __builtin_unreachable() is reached. This causes UBsan errors,
    and potentially misoptimization.
    
    To ensure the invariant holds, _M_string_length is initialized to zero
    before doing anything else, so that _M_is_local() doesn't see an
    uninitialized value.
    
    This issue only surfaces when constructing a string with a range of
    input iterator, and the uninitialized _M_string_length happens to be
    greater than _S_local_capacity, i.e., 15 for the std::string
    specialization.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/109703
            * include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)):
            Initialize _M_string_length.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Comment 4 GCC Commits 2023-05-03 12:24:44 UTC
The releases/gcc-13 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:d50f2599d7b23bdba05a9102645d082ed9bcb05f

commit r13-7282-gd50f2599d7b23bdba05a9102645d082ed9bcb05f
Author: Kefu Chai <kefu.chai@scylladb.com>
Date:   Mon May 1 21:24:26 2023 +0100

    libstdc++: Set _M_string_length before calling _M_dispose() [PR109703]
    
    This always sets _M_string_length in the constructor for ranges of input
    iterators, such as stream iterators.
    
    We copy from the source range to the local buffer, and then repeatedly
    reallocate a larger one if necessary. When disposing the old buffer,
    _M_is_local() is used to tell if the buffer is the local one or not (and
    so must be deallocated). In addition to comparing the buffer address
    with the local buffer, _M_is_local() has an optimization hint so that
    the compiler knows that for a string using the local buffer, there is an
    invariant that _M_string_length <= _S_local_capacity (added for PR109299
    via r13-6915-gbf78b43873b0b7).  But we failed to set _M_string_length in
    the constructor taking a pair of iterators, so the invariant might not
    hold, and __builtin_unreachable() is reached. This causes UBsan errors,
    and potentially misoptimization.
    
    To ensure the invariant holds, _M_string_length is initialized to zero
    before doing anything else, so that _M_is_local() doesn't see an
    uninitialized value.
    
    This issue only surfaces when constructing a string with a range of
    input iterator, and the uninitialized _M_string_length happens to be
    greater than _S_local_capacity, i.e., 15 for the std::string
    specialization.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/109703
            * include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)):
            Initialize _M_string_length.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
    (cherry picked from commit cbf6c7a1d16490a1e63e9a5ce00e9a5c44c4c2f2)
Comment 5 GCC Commits 2023-05-03 13:24:40 UTC
The releases/gcc-12 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:04fbfafbd9657a36e8c3e80708a18fba08136854

commit r12-9508-g04fbfafbd9657a36e8c3e80708a18fba08136854
Author: Kefu Chai <kefu.chai@scylladb.com>
Date:   Mon May 1 21:24:26 2023 +0100

    libstdc++: Set _M_string_length before calling _M_dispose() [PR109703]
    
    This always sets _M_string_length in the constructor for ranges of input
    iterators, such as stream iterators.
    
    We copy from the source range to the local buffer, and then repeatedly
    reallocate a larger one if necessary. When disposing the old buffer,
    _M_is_local() is used to tell if the buffer is the local one or not (and
    so must be deallocated). In addition to comparing the buffer address
    with the local buffer, _M_is_local() has an optimization hint so that
    the compiler knows that for a string using the local buffer, there is an
    invariant that _M_string_length <= _S_local_capacity (added for PR109299
    via r13-6915-gbf78b43873b0b7).  But we failed to set _M_string_length in
    the constructor taking a pair of iterators, so the invariant might not
    hold, and __builtin_unreachable() is reached. This causes UBsan errors,
    and potentially misoptimization.
    
    To ensure the invariant holds, _M_string_length is initialized to zero
    before doing anything else, so that _M_is_local() doesn't see an
    uninitialized value.
    
    This issue only surfaces when constructing a string with a range of
    input iterator, and the uninitialized _M_string_length happens to be
    greater than _S_local_capacity, i.e., 15 for the std::string
    specialization.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/109703
            * include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)):
            Initialize _M_string_length.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
    (cherry picked from commit cbf6c7a1d16490a1e63e9a5ce00e9a5c44c4c2f2)
Comment 6 Jonathan Wakely 2023-05-03 13:25:03 UTC
Fixed for 12.3 and 13.2, thanks for the report.
Comment 7 Andrew Pinski 2023-05-04 15:43:53 UTC
*** Bug 109737 has been marked as a duplicate of this bug. ***
Comment 8 Jonathan Wakely 2023-05-09 11:22:24 UTC
*** Bug 109786 has been marked as a duplicate of this bug. ***
Comment 9 Jonathan Wakely 2023-07-04 16:41:00 UTC
*** Bug 110549 has been marked as a duplicate of this bug. ***