Bug 109849 - suboptimal code for vector walking loop
Summary: suboptimal code for vector walking loop
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 13.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 110378 110379 110414 114821 110287 110377 112653
Blocks: std::vector 109811
  Show dependency treegraph
 
Reported: 2023-05-13 22:26 UTC by Jan Hubicka
Modified: 2025-01-16 14:01 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-05-15 00:00:00


Attachments
patch for the missed hoisting (1.88 KB, patch)
2023-05-18 11:54 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2023-05-13 22:26:04 UTC
jan@localhost:/tmp> cat t.C
#include <vector>
typedef unsigned int uint32_t;
std::vector<std::pair<uint32_t, uint32_t>> stack;
void
test()
{
        while (!stack.empty()) {
                std::pair<uint32_t, uint32_t> cur = stack.back();
                stack.pop_back();
                if (cur.second)
                        break;
        }
}
jan@localhost:/tmp> gcc t.C -O3 -S 

yields to:

_Z4testv:
.LFB1264:
        .cfi_startproc
        movq    stack(%rip), %rcx
        movq    stack+8(%rip), %rax
        jmp     .L5
        .p2align 4,,10
        .p2align 3
.L6:
        movl    -4(%rax), %edx
        subq    $8, %rax
        movq    %rax, stack+8(%rip)
        testl   %edx, %edx
        jne     .L4
.L5:
        cmpq    %rax, %rcx
        jne     .L6
.L4:
        ret

We really should order the basic blocks putting cmpq before L6 saving a jump.
Moreover clang does

        .p2align        4, 0x90
.LBB1_1:                                # =>This Inner Loop Header: Depth=1
        cmpq    %rax, %rcx
        je      .LBB1_3
# %bb.2:                                #   in Loop: Header=BB1_1 Depth=1
        cmpl    $0, -4(%rcx)
        leaq    -8(%rcx), %rcx
        movq    %rcx, stack+8(%rip)
        je      .LBB1_1
.LBB1_3:
        retq

saving an instruction. Why we do not move stack+8 updating out of the loop?
Comment 1 Andrew Pinski 2023-05-13 22:32:54 UTC
Actually why didn't we copy the loop header in the first place?
Comment 2 Andrew Pinski 2023-05-13 22:40:05 UTC
(In reply to Jan Hubicka from comment #0)
> saving an instruction. Why we do not move stack+8 updating out of the loop?

Maybe because of a clobber:
  cur$second_5 = MEM[(const struct pairD.26349 &)_7 + 18446744073709551608].secondD.27577;
  # PT = nonlocal escaped 
  _4 = _7 + 18446744073709551608;
  # .MEM_9 = VDEF <.MEM_1>
  stackD.26352.D.27437._M_implD.26667.D.26744._M_finishD.26670 = _4;
  # .MEM_10 = VDEF <.MEM_9>
  MEM[(struct pairD.26349 *)_7 + -8B] ={v} {CLOBBER};
Comment 3 Alexander Monakov 2023-05-14 05:57:09 UTC
Rather, because store-motion out of a loop that might iterate zero times would create a data race.
Comment 4 Jan Hubicka 2023-05-14 09:58:39 UTC
> Rather, because store-motion out of a loop that might iterate zero times would
> create a data race.
Good point.  If we did copy loop headers all the way to the store the
problem will go away.  Also I assume we can still add a flag which is
set to true if loops iterates and then make store conditional...
Comment 5 Jan Hubicka 2023-05-14 10:01:09 UTC
> Actually why didn't we copy the loop header in the first place?
Because it is considered to be do-while loop already (thanks to
the in-loop conitional, do_while_loop_p is happy).
Comment 6 Jan Hubicka 2023-05-17 14:53:38 UTC
Here is slightly improved testcase which actually pushes into stack and measures something. It test loops 1000 times and returns.  It also makes stack to be local variable so race conditions are not a problem.

#include <vector>
typedef unsigned int uint32_t;
std::pair<uint32_t, uint32_t> pair;
void
test()
{
        std::vector<std::pair<uint32_t, uint32_t>> stack;
        stack.push_back (pair);
        while (!stack.empty()) {
                std::pair<uint32_t, uint32_t> cur = stack.back();
                stack.pop_back();
                if (!cur.first)
                {
                        cur.second++;
                        stack.push_back (cur);
                }
                if (cur.second > 10000)
                        break;
        }
}
int
main()
{
        for (int i = 0; i < 10000; i++)
          test();
}

Clang code is about twice as fast

jan@localhost:/tmp> clang++ -O2 tt.C  -fno-exceptions
jan@localhost:/tmp> g++ -O2 tt.C  -fno-exceptions -o a.out-gcc
jan@localhost:/tmp> perf stat ./a.out

 Performance counter stats for './a.out':

            434.24 msec task-clock:u                     #    0.997 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
               129      page-faults:u                    #  297.073 /sec                      
     1,003,191,657      cycles:u                         #    2.310 GHz                       
            68,927      stalled-cycles-frontend:u        #    0.01% frontend cycles idle      
       800,792,619      stalled-cycles-backend:u         #   79.82% backend cycles idle       
     1,904,682,933      instructions:u                   #    1.90  insn per cycle            
                                                  #    0.42  stalled cycles per insn   
       500,912,196      branches:u                       #    1.154 G/sec                     
            23,144      branch-misses:u                  #    0.00% of all branches           

       0.435340389 seconds time elapsed

       0.431409000 seconds user
       0.003994000 seconds sys


jan@localhost:/tmp> perf stat ./a.out-gcc

 Performance counter stats for './a.out-gcc':

          1,197.28 msec task-clock:u                     #    0.999 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
               131      page-faults:u                    #  109.415 /sec                      
     2,903,995,656      cycles:u                         #    2.425 GHz                       
            86,204      stalled-cycles-frontend:u        #    0.00% frontend cycles idle      
     2,690,907,052      stalled-cycles-backend:u         #   92.66% backend cycles idle       
     2,005,212,311      instructions:u                   #    0.69  insn per cycle            
                                                  #    1.34  stalled cycles per insn   
       401,007,320      branches:u                       #  334.932 M/sec                     
            23,290      branch-misses:u                  #    0.01% of all branches           

       1.198388186 seconds time elapsed

       1.198450000 seconds user
       0.000000000 seconds sys


The problem seems to be, like in first example, that we keep updating in-memory stack in the main loop.

.L39:
        movl    12(%rsp), %ebx
.L30:
        movq    16(%rsp), %rax
        cmpl    $10000, %ebx
        ja      .L33
.L40:
        movq    24(%rsp), %rdi
        cmpq    %rdi, %rax
        je      .L28
.L34:
        movq    -8(%rdi), %rax
        leaq    -8(%rdi), %rsi
        movq    %rsi, 24(%rsp)
        movq    %rax, 8(%rsp)
        testl   %eax, %eax
        jne     .L39

While clang does:

.LBB0_1:                                #   in Loop: Header=BB0_4 Depth=1
        movq    %rax, %r14
.LBB0_2:                                #   in Loop: Header=BB0_4 Depth=1
        movq    %rbx, %r12
        movq    %r12, %rbx
        cmpl    $10001, %r13d                   # imm = 0x2711
        jae     .LBB0_27
.LBB0_4:                                # =>This Loop Header: Depth=1
                                        #     Child Loop BB0_16 Depth 2
                                        #     Child Loop BB0_21 Depth 2
        cmpq    %r14, %rbx
        je      .LBB0_26
# %bb.5:                                #   in Loop: Header=BB0_4 Depth=1
        leaq    -8(%r14), %rax
        movq    -8(%r14), %rcx
        movq    %rcx, %r13
        shrq    $32, %r13
        testl   %ecx, %ecx
        jne     .LBB0_1
Comment 7 Richard Biener 2023-05-17 20:36:58 UTC
There is nothing to sink really, loop header copying introduces a PHI and there's not partial redundancies but only partial-partial and those are not
obvious to CSE because of the introduced PHI.

I believe we have to teach SRA to decompose 'cur' and maybe also 'stack', there's no scalar optimization going to do it also because we have aggregate
copies involved.
Comment 8 Jan Hubicka 2023-05-18 09:35:59 UTC
We can only SRA if the address is non-escaping.  Clang does not seem to need it to optimize better:

jan@localhost:~> cat t.c
extern void q(int *);
__attribute__ ((noinline))
void
test()
{
        for (int a = 0; a < 1000;a++)
                if (!(a%100))
                        q(&a);
}
int
main()
{
        for (int a = 0; a < 1000000;a++)
                test ();
}
jan@localhost:~> cat t2.c
void q(int *a)
{
}
jan@localhost:~> gcc -O2 t.c t2.c ; perf stat ./a.out

 Performance counter stats for './a.out':

          2,916.73 msec task-clock:u                     #    0.999 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
                52      page-faults:u                    #   17.828 /sec                      
     8,344,719,833      cycles:u                         #    2.861 GHz                       
        13,561,375      stalled-cycles-frontend:u        #    0.16% frontend cycles idle      
     5,128,112,757      stalled-cycles-backend:u         #   61.45% backend cycles idle       
    10,050,172,242      instructions:u                   #    1.20  insn per cycle            
                                                  #    0.51  stalled cycles per insn   
     2,034,043,082      branches:u                       #  697.370 M/sec                     
        11,186,312      branch-misses:u                  #    0.55% of all branches           

       2.918344737 seconds time elapsed

       2.917844000 seconds user
       0.000000000 seconds sys


jan@localhost:~> clang -O2 t.c t2.c ; perf stat ./a.out

 Performance counter stats for './a.out':

            664.40 msec task-clock:u                     #    0.999 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
                54      page-faults:u                    #   81.276 /sec                      
     2,318,095,848      cycles:u                         #    3.489 GHz                       
        10,417,694      stalled-cycles-frontend:u        #    0.45% frontend cycles idle      
     1,057,731,301      stalled-cycles-backend:u         #   45.63% backend cycles idle       
    10,062,172,840      instructions:u                   #    4.34  insn per cycle            
                                                  #    0.11  stalled cycles per insn   
     2,034,042,724      branches:u                       #    3.061 G/sec                     
        10,003,620      branch-misses:u                  #    0.49% of all branches           

       0.665267996 seconds time elapsed

       0.665247000 seconds user
       0.000000000 seconds sys


We do:

        jmp     .L3
        .p2align 4,,10
        .p2align 3
.L2:
        movl    12(%rsp), %eax
        addl    $1, %eax
        movl    %eax, 12(%rsp)
        cmpl    $999, %eax
        jg      .L7
.L3:
        imull   $-1030792151, %eax, %eax
        addl    $85899344, %eax
        rorl    $2, %eax
        cmpl    $42949672, %eax
        ja      .L2
        leaq    12(%rsp), %rdi
        call    q
        jmp     .L2

Which has stupid store-to-load dpendency in the internal loop. Clang keeps the store but optimizes away the load:

        jmp     .LBB0_1
        .p2align        4, 0x90
.LBB0_3:                                #   in Loop: Header=BB0_1 Depth=1
        leal    1(%rax), %ecx
        movl    %ecx, 12(%rsp)
        cmpl    $999, %eax                      # imm = 0x3E7
        movl    %ecx, %eax
        jge     .LBB0_4
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        imull   $-1030792151, %eax, %ecx        # imm = 0xC28F5C29
        addl    $85899344, %ecx                 # imm = 0x51EB850
        rorl    $2, %ecx
        cmpl    $42949672, %ecx                 # imm = 0x28F5C28
        ja      .LBB0_3
# %bb.2:                                #   in Loop: Header=BB0_1 Depth=1
        movq    %rbx, %rdi
        callq   q@PLT
        movl    12(%rsp), %eax
        jmp     .LBB0_3

Wonder what makes clang to think it needs @PLT though.
Why we do not consider the load as partially redundant with itself?
Comment 9 Richard Biener 2023-05-18 11:54:44 UTC
Created attachment 55110 [details]
patch for the missed hoisting

For the testcase in comment#6 there is a missing code hoisting from PRE
which is caused by do_hoist_insertion doing

  /* A hoistable value must be in ANTIC_IN(block)
     but not in AVAIL_OUT(BLOCK).  */
  bitmap_initialize (&hoistable_set.values, &grand_bitmap_obstack);
  bitmap_and_compl (&hoistable_set.values,
                    &ANTIC_IN (block)->values, &AVAIL_OUT (block)->values);

but in reality we want to check ANTIC_OUT(block), not ANTIC_IN(block).
cur.second is killed by the aggregate assignment to cur at the beginning
of the block we should hoist to and that's reflected in ANTIC_IN.

The attached patch properly re-computes ANTIC_OUT and uses that.
Comment 10 Jan Hubicka 2023-05-18 13:00:48 UTC
Thanks. I tested the patch on jpegxl and it does not help there (I guess becuase the redundancy there is partial). But it is cool we compile at least the simplified testcase well.
Comment 11 GCC Commits 2023-05-23 09:57:57 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:9e2017ae6ac788d3e36999bb0f0d20ea0f62c20e

commit r14-1127-g9e2017ae6ac788d3e36999bb0f0d20ea0f62c20e
Author: Richard Biener <rguenther@suse.de>
Date:   Thu May 18 13:52:29 2023 +0200

    tree-optimization/109849 - missed code hoisting
    
    The following fixes code hoisting to properly consider ANTIC_OUT instead
    of ANTIC_IN.  That's a bit expensive to re-compute but since we no
    longer iterate we're doing this only once per BB which should be
    acceptable.  This avoids missing hoistings to the end of blocks where
    something in the block clobbers the hoisted value.
    
            PR tree-optimization/109849
            * tree-ssa-pre.cc (do_hoist_insertion): Compute ANTIC_OUT
            and use that to determine what to hoist.
    
            * gcc.dg/tree-ssa/ssa-hoist-8.c: New testcase.
Comment 12 Richard Biener 2023-05-23 10:10:14 UTC
So this fixed the missing code hoisting - partial PRE is done with -O3 only.
Comment 13 GCC Commits 2023-05-24 11:19:41 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:5476de2618ffb77f3a52e59e2c9f10b018329689

commit r14-1161-g5476de2618ffb77f3a52e59e2c9f10b018329689
Author: Richard Biener <rguenther@suse.de>
Date:   Wed May 24 12:36:28 2023 +0200

    tree-optimization/109849 - fix fallout of PRE hoisting change
    
    The PR109849 fix made us no longer hoist some memory loads because
    of the expression set intersection.  We can still avoid to compute
    the union by simply taking the first sets expressions and leave
    the pruning of expressions with values not suitable for hoisting
    to sorted_array_from_bitmap_set.
    
            PR tree-optimization/109849
            * tree-ssa-pre.cc (do_hoist_insertion): Do not intersect
            expressions but take the first sets.
    
            * gcc.dg/tree-ssa/ssa-hoist-9.c: New testcase.
Comment 14 Jan Hubicka 2023-06-16 14:20:03 UTC
One interesting situation is:
void std::vector<std::pair<unsigned int, unsigned int> >::push_back (struct vector * const this, const struct value_type & __x)
{
  struct __normal_iterator D.27894;
  struct pair * _1;
  struct pair * _2;
  struct pair * _3;

  <bb 2> [local count: 1073741824]:
  _1 = this_6(D)->D.26707._M_impl.D.26014._M_finish;
  _2 = this_6(D)->D.26707._M_impl.D.26014._M_end_of_storage;
  if (_1 != _2)
    goto <bb 3>; [82.57%]
  else
    goto <bb 4>; [17.43%]

  <bb 3> [local count: 886588625]:
  *_1 = MEM[(const struct pair &)__x_7(D)];
  _3 = _1 + 8;
  this_6(D)->D.26707._M_impl.D.26014._M_finish = _3;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 187153200]:
  D.27894._M_current = _1;
  std::vector<std::pair<unsigned int, unsigned int> >::_M_realloc_insert<const std::pair<unsigned int, unsigned int>&> (this_6(D), D.27894, __x_7(D));

  <bb 5> [local count: 1073741824]:
  return;

}
here we could do partial inlining and offline the call to _M_realloc_insert but we fail to cut since _1 is already load:

Split point at BB 4
  header time: 9.302800 header size: 9
  split time: 2.440200 split size: 5
  bbs: 4
  SSA names to pass: 1, 9, 11
  Refused: need to pass non-param values 

It should be easy to insert code loading the parameter again in the split part.
We still hit the SRA limitation since this would be still escaping, but it is another missed optimization on this simple testcase.
Comment 15 GCC Commits 2023-06-18 16:59:18 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:5a1ef1cfac005370d0a5a0f85798724cb2c9cf5e

commit r14-1909-g5a1ef1cfac005370d0a5a0f85798724cb2c9cf5e
Author: Honza <jh@ryzen3.suse.cz>
Date:   Sun Jun 18 18:58:26 2023 +0200

    Analyze SRA candidates in ipa-fnsummary
    
    this patch extends ipa-fnsummary to anticipate statements that will be removed
    by SRA.  This is done by looking for calls passing addresses of automatic
    variables.  In function body we look for dereferences from pointers of such
    variables and mark them with new not_sra_candidate condition.
    
    This is just first step which is overly optimistic.  We do not try to prove that
    given automatic variable will not be SRAed even after inlining.  We now also
    optimistically assume that the transformation will always happen.  I will restrict
    this in a followup patch, but I think it is useful to gether some data on how
    much code is affected by this.
    
    This is motivated by PR109849 where we fail to fully inline push_back.
    The patch alone does not solve the problem even for -O3, but improves
    analysis in this case.
    
    gcc/ChangeLog:
    
            PR tree-optimization/109849
            * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Add new parameter
            ES; handle ipa_predicate::not_sra_candidate.
            (evaluate_properties_for_edge): Pass es to
            evaluate_conditions_for_known_args.
            (ipa_fn_summary_t::duplicate): Handle sra candidates.
            (dump_ipa_call_summary): Dump points_to_possible_sra_candidate.
            (load_or_store_of_ptr_parameter): New function.
            (points_to_possible_sra_candidate_p): New function.
            (analyze_function_body): Initialize points_to_possible_sra_candidate;
            determine sra predicates.
            (estimate_ipcp_clone_size_and_time): Update call of
            evaluate_conditions_for_known_args.
            (remap_edge_params): Update points_to_possible_sra_candidate.
            (read_ipa_call_summary): Stream points_to_possible_sra_candidate
            (write_ipa_call_summary): Likewise.
            * ipa-predicate.cc (ipa_predicate::add_clause): Handle not_sra_candidate.
            (dump_condition): Dump it.
            * ipa-predicate.h (struct inline_param_summary): Add
            points_to_possible_sra_candidate.
    
    gcc/testsuite/ChangeLog:
    
            PR tree-optimization/109849
            * g++.dg/ipa/devirt-45.C: Update template.
Comment 16 GCC Commits 2023-06-19 16:28:57 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:7b34cacc5735385e7e2855d7c0a6fad60ef4a99b

commit r14-1951-g7b34cacc5735385e7e2855d7c0a6fad60ef4a99b
Author: Jan Hubicka <jh@suse.cz>
Date:   Mon Jun 19 18:28:17 2023 +0200

    optimize std::max early
    
    we currently produce very bad code on loops using std::vector as a stack, since
    we fail to inline push_back which in turn prevents SRA and we fail to optimize
    out some store-to-load pairs.
    
    I looked into why this function is not inlined and it is inlined by clang.  We
    currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30
    at -O3.  Clang has similar estimate, but still decides to inline at -O2.
    
    I looked into reason why the body is so large and one problem I spotted is the
    way std::max is implemented by taking and returning reference to the values.
    
      const T& max( const T& a, const T& b );
    
    This makes it necessary to store the values to memory and load them later
    and max is used by code computing new size of vector on resize.
    
    We optimize this to MAX_EXPR, but only during late optimizations.  I think this
    is a common enough coding pattern and we ought to make this transparent to
    early opts and IPA.  The following is easist fix that simply adds phiprop pass
    that turns the PHI of address values into PHI of values so later FRE can
    propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE
    remove the memory stores.
    
    gcc/ChangeLog:
    
            PR tree-optimization/109811
            PR tree-optimization/109849
            * passes.def: Add phiprop to early optimization passes.
            * tree-ssa-phiprop.cc: Allow clonning.
    
    gcc/testsuite/ChangeLog:
    
            PR tree-optimization/109811
            PR tree-optimization/109849
            * gcc.dg/tree-ssa/phiprop-1.c: New test.
            * gcc.dg/tree-ssa/pr21463.c: Adjust template.
Comment 17 GCC Commits 2023-06-26 16:30:06 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:c2ebccc97190a978a44e341516b488f02a78c598

commit r14-2101-gc2ebccc97190a978a44e341516b488f02a78c598
Author: Jan Hubicka <jh@suse.cz>
Date:   Mon Jun 26 18:29:39 2023 +0200

    Fix profile of forwarders produced by cd-dce
    
    compiling the testcase from PR109849 (which uses std:vector based stack to
    drive a loop) with profile feedbakc leads to profile mismatches introduced by
    tree-ssa-dce.  This is the new code to produce unified forwarder blocks for
    PHIs.
    
    I am not including the testcase itself since
    checking it for Invalid sum is probably going to be too fragile and this should
    show in our LNT testers. The patch however fixes the mismatch.
    
    Bootstrapped/regtested x86_64-linux and plan to commit it shortly.
    
    gcc/ChangeLog:
    
            PR tree-optimization/109849
            * tree-ssa-dce.cc (make_forwarders_with_degenerate_phis): Fix profile
            count of newly constructed forwarder block.
Comment 18 GCC Commits 2023-06-28 09:47:29 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:45c53768b6fa3d737ae818e31d3c50da62e0ad2b

commit r14-2157-g45c53768b6fa3d737ae818e31d3c50da62e0ad2b
Author: Jan Hubicka <jh@suse.cz>
Date:   Wed Jun 28 11:45:15 2023 +0200

    Add cold attribute to throw wrappers and terminate
    
            PR middle-end/109849
            * include/bits/c++config (std::__terminate): Mark cold.
            * include/bits/functexcept.h: Mark everything as cold.
            * libsupc++/exception: Mark terminate and unexpected as cold.
Comment 19 GCC Commits 2023-06-29 20:46:34 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:9dc18fca431626404b0692c689a2e103666e7adb

commit r14-2202-g9dc18fca431626404b0692c689a2e103666e7adb
Author: Jan Hubicka <jh@suse.cz>
Date:   Thu Jun 29 22:45:37 2023 +0200

    Compute ipa-predicates for conditionals involving __builtin_expect_p
    
    std::vector allocator looks as follows:
    
    __attribute__((nodiscard))
    struct pair * std::__new_allocator<std::pair<unsigned int, unsigned int> >::allocate (struct __new_allocator * const this, size_type __n, const void * D.27753)
    {
      bool _1;
      long int _2;
      long int _3;
      long unsigned int _5;
      struct pair * _9;
    
      <bb 2> [local count: 1073741824]:
      _1 = __n_7(D) > 1152921504606846975;
      _2 = (long int) _1;
      _3 = __builtin_expect (_2, 0);
      if (_3 != 0)
        goto <bb 3>; [10.00%]
      else
        goto <bb 6>; [90.00%]
    
      <bb 3> [local count: 107374184]:
      if (__n_7(D) > 2305843009213693951)
        goto <bb 4>; [50.00%]
      else
        goto <bb 5>; [50.00%]
    
      <bb 4> [local count: 53687092]:
      std::__throw_bad_array_new_length ();
    
      <bb 5> [local count: 53687092]:
      std::__throw_bad_alloc ();
    
      <bb 6> [local count: 966367641]:
      _5 = __n_7(D) * 8;
      _9 = operator new (_5);
      return _9;
    }
    
    So there is check for allocated block size being greater than max_size which is
    wrapper in __builtin_expect.  This makes ipa-fnsummary to give up analyzing
    predicates and it will miss the fact that the two different calls to __throw
    will be optimized out if __n is larady smaller than 1152921504606846975 which
    it is after _M_check_len.
    
    This patch extends ipa-fnsummary to understand functions that return their
    parameter.
    
    gcc/ChangeLog:
    
            PR tree-optimization/109849
            * ipa-fnsummary.cc (decompose_param_expr): Skip
            functions returning its parameter.
            (set_cond_stmt_execution_predicate): Return early
            if predicate was constructed.
    
    gcc/testsuite/ChangeLog:
    
            PR tree-optimization/109849
            * gcc.dg/ipa/pr109849.c: New test.
Comment 20 GCC Commits 2023-06-30 14:28:14 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:eab57b825bcc350e9ff44eb2fa739a80199d9bb1

commit r14-2219-geab57b825bcc350e9ff44eb2fa739a80199d9bb1
Author: Jan Hubicka <jh@suse.cz>
Date:   Fri Jun 30 16:27:27 2023 +0200

    Fix handling of __builtin_expect_with_probability and improve first-match heuristics
    
    While looking into the std::vector _M_realloc_insert codegen I noticed that
    call of __throw_bad_alloc is predicted with 10% probability. This is because
    the conditional guarding it has __builtin_expect (cond, 0) on it.  This
    incorrectly takes precedence over more reliable heuristics predicting that call
    to cold noreturn is likely not going to happen.
    
    So I reordered the predictors so __builtin_expect_with_probability comes first
    after predictors that never makes a mistake (so user can use it to always
    specify the outcome by hand).  I also downgraded malloc predictor since I do
    think user-defined malloc functions & new operators may behave funny ways and
    moved usual __builtin_expect after the noreturn cold predictor.
    
    This triggered latent bug in expr_expected_value_1 where
    
              if (*predictor < predictor2)
                *predictor = predictor2;
    
    should be:
    
              if (predictor2 < *predictor)
                *predictor = predictor2;
    
    which eventually triggered an ICE on combining heuristics.  This made me notice
    that we can do slightly better while combining expected values in case only
    one of the parameters (such as in a*b when we expect a==0) can determine
    overall result.
    
    Note that the new code may pick weaker heuristics in case that both values are
    predicted.  Not sure if this scenario is worth the extra CPU time: there is
    not correct way to combine the probabilities anyway since we do not know if
    the predictions are independent, so I think users should not rely on it.
    
    Fixing this issue uncovered another problem.  In 2018 Martin Liska added
    code predicting that MALLOC returns non-NULL but instead of that he predicts
    that it returns true (boolean 1).  This sort of works for testcase testing
     malloc (10) != NULL
    but, for example, we will predict
     malloc (10) == malloc (10)
    as true, which is not right and such comparsion may happen in real code
    
    I think proper way is to update expr_expected_value_1 to work with value
    ranges, but that needs greater surgery so I decided to postpone this and
    only add FIXME and fill PR110499.
    
    gcc/ChangeLog:
    
            PR middle-end/109849
            * predict.cc (estimate_bb_frequencies): Turn to static function.
            (expr_expected_value_1): Fix handling of binary expressions with
            predicted values.
            * predict.def (PRED_MALLOC_NONNULL): Move later in the priority queue.
            (PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the priority
            queue.
            * predict.h (estimate_bb_frequencies): No longer declare it.
    
    gcc/testsuite/ChangeLog:
    
            PR middle-end/109849
            * gcc.dg/predict-18.c: Improve testcase.
Comment 21 Jan Hubicka 2023-11-19 15:25:08 UTC
Patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html
gets us closer to inlining _M_realloc_insert at -O3 (3 insns away)

Patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636935.html
reduces the expense when _M_realloc_insert is not inlined at -O2 (where I think we should not inline it, unlike for clang)
Comment 22 GCC Commits 2023-11-21 14:17:51 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:1d82fc2e6824bf83159389729c31a942f7b91b04

commit r14-5679-g1d82fc2e6824bf83159389729c31a942f7b91b04
Author: Jan Hubicka <jh@suse.cz>
Date:   Tue Nov 21 15:17:16 2023 +0100

    optimize std::vector::push_back
    
    this patch speeds up the push_back at -O3 significantly by making the
    reallocation to be inlined by default.  _M_realloc_insert is general
    insertion that takes iterator pointing to location where the value
    should be inserted.  As such it contains code to move other entries around
    that is quite large.
    
    Since appending to the end of array is common operation, I think we should
    have specialized code for that.  Sadly it is really hard to work out this
    from IPA passes, since we basically care whether the iterator points to
    the same place as the end pointer, which are both passed by reference.
    This is inter-procedural value numbering that is quite out of reach.
    
    I also added extra check making it clear that the new length of the vector
    is non-zero.  This saves extra conditionals.  Again it is quite hard case
    since _M_check_len seem to be able to return 0 if its parameter is 0.
    This never happens here, but we are not able to propagate this early nor
    at IPA stage.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/110287
            PR middle-end/109811
            PR middle-end/109849
            * include/bits/stl_vector.h (_M_realloc_append): New member function.
            (push_back): Use it.
            * include/bits/vector.tcc: (emplace_back): Use it.
            (_M_realloc_insert): Let compiler know that new vector size is non-zero.
            (_M_realloc_append): New member function.
Comment 23 GCC Commits 2023-11-24 16:35:15 UTC
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363

commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
Author: Martin Jambor <mjambor@suse.cz>
Date:   Fri Nov 24 17:32:35 2023 +0100

    sra: SRA of non-escaped aggregates passed by reference to calls
    
    PR109849 shows that a loop that heavily pushes and pops from a stack
    implemented by a C++ std::vec results in slow code, mainly because the
    vector structure is not split by SRA and so we end up in many loads
    and stores into it.  This is because it is passed by reference
    to (re)allocation methods and so needs to live in memory, even though
    it does not escape from them and so we could SRA it if we
    re-constructed it before the call and then separated it to distinct
    replacements afterwards.
    
    This patch does exactly that, first relaxing the selection of
    candidates to also include those which are addressable but do not
    escape and then adding code to deal with the calls.  The
    micro-benchmark that is also the (scan-dump) testcase in this patch
    runs twice as fast with it than with current trunk.  Honza measured
    its effect on the libjxl benchmark and it almost closes the
    performance gap between Clang and GCC while not requiring excessive
    inlining and thus code growth.
    
    The patch disallows creation of replacements for such aggregates which
    are also accessed with a precision smaller than their size because I
    have observed that this led to excessive zero-extending of data
    leading to slow-downs of perlbench (on some CPUs).  Apart from this
    case I have not noticed any regressions, at least not so far.
    
    Gimple call argument flags can tell if an argument is unused (and then
    we do not need to generate any statements for it) or if it is not
    written to and then we do not need to generate statements loading
    replacements from the original aggregate after the call statement.
    Unfortunately, we cannot symmetrically use flags that an aggregate is
    not read because to avoid re-constructing the aggregate before the
    call because flags don't tell which what parts of aggregates were not
    written to, so we load all replacements, and so all need to have the
    correct value before the call.
    
    This version of the patch also takes care to avoid attempts to modify
    abnormal edges, something which was missing in the previosu version.
    
    gcc/ChangeLog:
    
    2023-11-23  Martin Jambor  <mjambor@suse.cz>
    
            PR middle-end/109849
            * tree-sra.cc (passed_by_ref_in_call): New.
            (sra_initialize): Allocate passed_by_ref_in_call.
            (sra_deinitialize): Free passed_by_ref_in_call.
            (create_access): Add decl pool candidates only if they are not
            already candidates.
            (build_access_from_expr_1): Bail out on ADDR_EXPRs.
            (build_access_from_call_arg): New function.
            (asm_visit_addr): Rename to scan_visit_addr, change the
            disqualification dump message.
            (scan_function): Check taken addresses for all non-call statements,
            including phi nodes.  Process all call arguments, including the static
            chain, build_access_from_call_arg.
            (maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
            non-escaped local variables.
            (sort_and_splice_var_accesses): Disallow smaller-than-precision
            replacements for aggregates passed by reference to functions.
            (sra_modify_expr): Use a separate stmt iterator for adding satements
            before the processed statement and after it.
            (enum out_edge_check): New type.
            (abnormal_edge_after_stmt_p): New function.
            (sra_modify_call_arg): New function.
            (sra_modify_assign): Adjust calls to sra_modify_expr.
            (sra_modify_function_body): Likewise, use sra_modify_call_arg to
            process call arguments, including the static chain.
    
    gcc/testsuite/ChangeLog:
    
    2023-11-23  Martin Jambor  <mjambor@suse.cz>
    
            PR middle-end/109849
            * g++.dg/tree-ssa/pr109849.C: New test.
            * g++.dg/tree-ssa/sra-eh-1.C: Likewise.
            * gcc.dg/tree-ssa/pr109849.c: Likewise.
            * gcc.dg/tree-ssa/sra-longjmp-1.c: Likewise.
            * gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
Comment 24 GCC Commits 2023-11-24 17:00:05 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:c2dcfb6ba6e9a84a16e63ae73a822ae2a843170c

commit r14-5832-gc2dcfb6ba6e9a84a16e63ae73a822ae2a843170c
Author: Jan Hubicka <jh@suse.cz>
Date:   Fri Nov 24 17:59:44 2023 +0100

    Use memcpy instead of memmove in __relocate_a_1
    
    __relocate_a_1 is used to copy data after vector reizing.  This can be done by memcpy
    rather than memmove.
    
    libstdc++-v3/ChangeLog:
    
            PR middle-end/109849
            * include/bits/stl_uninitialized.h (__relocate_a_1): Use memcpy instead
            of memmove.
Comment 25 Martin Jambor 2023-11-24 17:00:57 UTC
(In reply to Richard Biener from comment #7)
> There is nothing to sink really, loop header copying introduces a PHI and
> there's not partial redundancies but only partial-partial and those are not
> obvious to CSE because of the introduced PHI.
> 

SRA now decomposes stack.
Comment 26 Jonathan Wakely 2023-11-28 09:33:50 UTC
(In reply to GCC Commits from comment #23)
> https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363
> 
> commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
> Author: Martin Jambor <mjambor@suse.cz>
> Date:   Fri Nov 24 17:32:35 2023 +0100
> 
>     sra: SRA of non-escaped aggregates passed by reference to calls
>     

I'm seeing a large number of libstdc++ testsuite failures, bisected to this patch.

For example:

make check -C x86_64-pc-linux-gnu/libstdc++-v3 RUNTESTFLAGS="conformance.exp=21_strings/basic_string/operators/char/1.cc --target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0"

The full list of FAILs is:

FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)
FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++26 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++11 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++14 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++17 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++20 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++23 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++26 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++11 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++14 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++17 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++20 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++23 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++26 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++11 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++14 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++17 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++20 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++23 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++26 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++11 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++14 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++17 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++20 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++23 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++11 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++14 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++17 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++20 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++23 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++11 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++14 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++17 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++20 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++23 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++11 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++14 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++17 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++20 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++23 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++11 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++14 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++17 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++20 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++23 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++26 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++11 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++14 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++17 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++20 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++23 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++26 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc  -std=gnu++17 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc  -std=gnu++20 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc  -std=gnu++23 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc  -std=gnu++26 execution test
FAIL: 27_io/filesystem/operations/canonical.cc  -std=gnu++17 execution test
FAIL: 27_io/filesystem/operations/canonical.cc  -std=gnu++20 execution test
FAIL: 27_io/filesystem/operations/canonical.cc  -std=gnu++23 execution test
FAIL: 27_io/filesystem/operations/canonical.cc  -std=gnu++26 execution test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc  -std=gnu++17 execution test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc  -std=gnu++20 execution test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc  -std=gnu++23 execution test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc  -std=gnu++26 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc  -std=gnu++17 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc  -std=gnu++20 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc  -std=gnu++23 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc  -std=gnu++26 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++11 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++14 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++17 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++20 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++23 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++26 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++11 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++14 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++17 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++20 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++23 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++26 execution test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++14 execution test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++17 execution test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++20 execution test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++23 execution test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++26 execution test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++14 execution test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++17 execution test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++20 execution test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++23 execution test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/hash_map_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/hash_set_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/hash_set_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/list_update_set_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/tree_map_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_map_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/tree_set_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_set_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/trie_map_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_map_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/trie_set_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_set_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++26 execution test
FAIL: std/format/functions/format.cc  -std=gnu++20 execution test
FAIL: std/format/functions/format.cc  -std=gnu++23 execution test
FAIL: std/format/functions/format.cc  -std=gnu++26 execution test
Comment 27 Martin Jambor 2023-11-28 10:32:51 UTC
(In reply to Jonathan Wakely from comment #26)
> (In reply to GCC Commits from comment #23)
> > https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363
> > 
> > commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
> > Author: Martin Jambor <mjambor@suse.cz>
> > Date:   Fri Nov 24 17:32:35 2023 +0100
> > 
> >     sra: SRA of non-escaped aggregates passed by reference to calls
> >     
> 
> I'm seeing a large number of libstdc++ testsuite failures, bisected to this
> patch.
> 
> For example:
> 
> make check -C x86_64-pc-linux-gnu/libstdc++-v3
> RUNTESTFLAGS="conformance.exp=21_strings/basic_string/operators/char/1.cc
> --target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0"
> 

Unfortunately I cannot reproduce this, the above (on pristine master
commit 006e90e1344 on an x86_64-linux) results in:

Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
Running /home/mjambor/gcc/small/src/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ...
PASS: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 (test for excess errors)
PASS: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 execution test

Can you please try if
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
fixes this?
Comment 28 Jonathan Wakely 2023-11-28 12:41:03 UTC
(In reply to Martin Jambor from comment #27)
> Unfortunately I cannot reproduce this, the above (on pristine master
> commit 006e90e1344 on an x86_64-linux) results in:
> 
> Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
> Running
> /home/mjambor/gcc/small/src/libstdc++-v3/testsuite/libstdc++-dg/conformance.
> exp ...
> PASS: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 (test for
> excess errors)
> PASS: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 execution
> test

Oops, sorry, that particular FAIL needs either --target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG which then makes it fail for all -std modes:


Schedule of variations:
    unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG

Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/test/src/gcc/libstdc++-v3/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ...
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++26 execution test

Or just set GLIBCXX_TESTSUITE_STDS="17,20" in the env before running the test:

Schedule of variations:
    unix/-D_GLIBCXX_USE_CXX11_ABI=0

Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/test/src/gcc/libstdc++-v3/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ...
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++20 execution test

It seems I picked a bad example to give, which requires additional options to FAIL.

Many of the other FAILs do not require _GLIBCXX_DEBUG or -std=gnu++20 to FAIL, but the -D_GLIBCXX_USE_CXX11_ABI=0 option is necessary, at least for all the ones I inspected. That option isn't used by default, but I run the full testsuite with that several times a day, and with GLIBCXX_TESTSUITE_STDS=98,11,14,17,20,23,26.

 
> Can you please try if
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
> fixes this?

Testing now ...
Comment 29 Jonathan Wakely 2023-11-28 13:29:03 UTC
(In reply to Jonathan Wakely from comment #28)
> (In reply to Martin Jambor from comment #27)
> > Can you please try if
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
> > fixes this?
> 
> Testing now ...

Yes, this fixes 21_strings/basic_string/operators/char/* and 28_regex/basic_regex/106607.cc

Running the full testsuite now...
Comment 30 Jonathan Wakely 2023-11-28 15:29:14 UTC
So far the only FAIL is still see is:

FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)

I'm not sure if this is caused by your patch or one of Honza's. The test only fails with GLIBCXX_TESTSUITE_STDS=98 defined.
Comment 31 Jonathan Wakely 2023-11-28 22:21:54 UTC
Bisection points to r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 for that remaining FAIL as well (and it isn't fixed by the new patch).

It introduced a new warning which wasn't present before:

/tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437: warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' writing between 2 and 9223372036854775806 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
Comment 32 Jan Hubicka 2023-11-29 12:27:02 UTC
> /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
> warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
> writing between 2 and 9223372036854775806 bytes into a region of size 0
> overflows the destination [-Wstringop-overflow=]

It warns on:

  template<bool _IsMove>
    struct __copy_move<_IsMove, true, random_access_iterator_tag>
    {
      template<typename _Tp, typename _Up>
        _GLIBCXX20_CONSTEXPR
        static _Up*
        __copy_m(_Tp* __first, _Tp* __last, _Up* __result)
        {
          const ptrdiff_t _Num = __last - __first;
          if (__builtin_expect(_Num > 1, true))
            __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
          else if (_Num == 1)
            std::__copy_move<_IsMove, false, random_access_iterator_tag>::
              __assign_one(__result, __first);
          return __result + _Num;
        }
    };

It is likely false positive on a code path that never happens in real
code, but we now optimize it better.

Does it show an inline path?
Comment 33 GCC Commits 2023-11-29 15:25:55 UTC
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:302461ad9a04d82fee904bddac69811d13d5bb6a

commit r14-5971-g302461ad9a04d82fee904bddac69811d13d5bb6a
Author: Martin Jambor <mjambor@suse.cz>
Date:   Wed Nov 29 16:24:33 2023 +0100

    tree-sra: Avoid returns of references to SRA candidates
    
    The enhancement to address PR 109849 contained an importsnt thinko,
    and that any reference that is passed to a function and does not
    escape, must also not happen to be aliased by the return value of the
    function.  This has quickly transpired as bugs PR 112711 and PR
    112721.
    
    Just as IPA-modref does a good enough job to allow us to rely on the
    escaped set of variables, it sems to be doing well also on updating
    EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
    exactly the situation we need to avoid.  Of course, if a call
    statement ignores any returned value, we also do not need to check the
    flag.
    
    Hopefully this does not pessimize things too much, I have verified
    that the PR 109849 testcae remains quick and so should also the
    benchmark it is derived from.
    
    gcc/ChangeLog:
    
    2023-11-27  Martin Jambor  <mjambor@suse.cz>
    
            PR tree-optimization/112711
            PR tree-optimization/112721
            * tree-sra.cc (build_access_from_call_arg): New parameter
            CAN_BE_RETURNED, disqualify any candidate passed by reference if it is
            true.  Adjust leading comment.
            (scan_function): Pass appropriate value to CAN_BE_RETURNED of
            build_access_from_call_arg.
    
    gcc/testsuite/ChangeLog:
    
    2023-11-29  Martin Jambor  <mjambor@suse.cz>
    
            PR tree-optimization/112711
            PR tree-optimization/112721
            * g++.dg/tree-ssa/pr112711.C: New test.
            * gcc.dg/tree-ssa/pr112721.c: Likewise.
Comment 34 Martin Jambor 2024-01-03 17:41:28 UTC
(In reply to Jan Hubicka from comment #32)
> > /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
> > warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
> > writing between 2 and 9223372036854775806 bytes into a region of size 0
> > overflows the destination [-Wstringop-overflow=]
> 
> It warns on:
> 
>   template<bool _IsMove>
>     struct __copy_move<_IsMove, true, random_access_iterator_tag>
>     {
>       template<typename _Tp, typename _Up>
>         _GLIBCXX20_CONSTEXPR
>         static _Up*
>         __copy_m(_Tp* __first, _Tp* __last, _Up* __result)
>         {
>           const ptrdiff_t _Num = __last - __first;
>           if (__builtin_expect(_Num > 1, true))
>             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
>           else if (_Num == 1)
>             std::__copy_move<_IsMove, false, random_access_iterator_tag>::
>               __assign_one(__result, __first);
>           return __result + _Num;
>         }
>     };
> 
> It is likely false positive on a code path that never happens in real
> code, but we now optimize it better.
> 

We end up with:
  <bb 16> [local count: 64736968]:
  __builtin_memcpy (1B, v$_M_impl$D10203$_M_start_448, _354);

IIRC the statement variant is created by jump threading (specifically
thread2).

Moreover, if I understand the comment in compute_objsize_r about the
INTEGER_CST case correctly, small integers are considered potential
"result of erroneous null pointer addition/subtraction."  So not
warning on a constant 1 destination does not seem to be desirable.
Comment 35 Jonathan Wakely 2024-05-30 11:11:16 UTC
(In reply to Jonathan Wakely from comment #31)
> Bisection points to r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 for
> that remaining FAIL as well (and it isn't fixed by the new patch).
> 
> It introduced a new warning which wasn't present before:
> 
> /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
> warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
> writing between 2 and 9223372036854775806 bytes into a region of size 0
> overflows the destination [-Wstringop-overflow=]

I don't know why we only get a warning for C++98 and not other modes, but this seems to fix it:

--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -933,6 +933,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
                const size_type __len =
                  _M_check_len(__n, "vector::_M_range_insert");
+               if (__len < (__n + (__old_finish - __old_start)))
+                 __builtin_unreachable();
+
                pointer __new_start(this->_M_allocate(__len));
                pointer __new_finish(__new_start);
                __try

So it looks like the compiler can't tell that _M_check_len(n) doesn't undergo unsigned wraparound.
Comment 36 GCC Commits 2024-06-03 19:38:12 UTC
The master branch has been updated by Francois Dumont <fdumont@gcc.gnu.org>:

https://gcc.gnu.org/g:0426be454448f8cfb9db21f4f669426afb7b57c8

commit r15-996-g0426be454448f8cfb9db21f4f669426afb7b57c8
Author: François Dumont <frs.dumont@gmail.com>
Date:   Sat Jun 1 22:17:19 2024 +0200

    libstdc++: Fix -Wstringop-overflow warning coming from std::vector [PR109849]
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/109849
            * include/bits/vector.tcc
            (std::vector<>::_M_range_insert(iterator, _FwdIt, _FwdIt,
            forward_iterator_tag))[__cplusplus < 201103L]: Add __builtin_unreachable
            expression to tell the compiler that the allocated buffer is large enough to
            receive current elements plus the elements of the range to insert.
Comment 37 GCC Commits 2024-06-03 19:54:10 UTC
The releases/gcc-14 branch has been updated by Francois Dumont <fdumont@gcc.gnu.org>:

https://gcc.gnu.org/g:955202eb2cdbe2bc74c626bce90ee1eac410ad4f

commit r14-10272-g955202eb2cdbe2bc74c626bce90ee1eac410ad4f
Author: François Dumont <frs.dumont@gmail.com>
Date:   Sat Jun 1 22:17:19 2024 +0200

    libstdc++: Fix -Wstringop-overflow warning coming from std::vector [PR109849]
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/109849
            * include/bits/vector.tcc
            (std::vector<>::_M_range_insert(iterator, _FwdIt, _FwdIt,
            forward_iterator_tag))[__cplusplus < 201103L]: Add __builtin_unreachable
            expression to tell the compiler that the allocated buffer is large enough to
            receive current elements plus the elements of the range to insert.
    
    (cherry picked from commit 0426be454448f8cfb9db21f4f669426afb7b57c8)
Comment 38 Arseny Kapoulkine 2025-01-15 01:10:31 UTC
The change that attempts to silence the warning breaks std::vector insert behavior in certain (most?) cases.

Specifically, the patch checks:

+               if (__len < (__n + (__old_start - __old_finish)))

... which is incorrect, as "start - finish" subtraction order is wrong - it should be "finish - start".

The consequences are that insert will trap in some cases depending on how many elements the vector already has and how many elements are being inserted. For example, if the vector size and capacity is 768 elements, and 384 elements (__n) are being inserted, __len computed via _M_check_len(__n) produces 1536 (doubling the currently allocated capacity). Since __n is 384, __n + (-768) underflows and produces a very large value that is definitely greater then __len. Depending on the codegen the trap may or may not happen; it does happen when building with -fsanitize=undefined without optimizations at least.

This is part of gcc 14 release; I can only imagine this doesn't break the world because the world generally isn't compiling for C++ versions earlier than 2011 these days.

Let me know if this should be filed as a separate bug report, or if this comment suffices.
Comment 39 rguenther@suse.de 2025-01-15 07:26:04 UTC
On Wed, 15 Jan 2025, arseny.kapoulkine at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
> 
> Arseny Kapoulkine <arseny.kapoulkine at gmail dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |arseny.kapoulkine at gmail dot com
> 
> --- Comment #38 from Arseny Kapoulkine <arseny.kapoulkine at gmail dot com> ---
> The change that attempts to silence the warning breaks std::vector insert
> behavior in certain (most?) cases.
> 
> Specifically, the patch checks:
> 
> +               if (__len < (__n + (__old_start - __old_finish)))
> 
> ... which is incorrect, as "start - finish" subtraction order is wrong - it
> should be "finish - start".
> 
> The consequences are that insert will trap in some cases depending on how many
> elements the vector already has and how many elements are being inserted. For
> example, if the vector size and capacity is 768 elements, and 384 elements
> (__n) are being inserted, __len computed via _M_check_len(__n) produces 1536
> (doubling the currently allocated capacity). Since __n is 384, __n + (-768)
> underflows and produces a very large value that is definitely greater then
> __len. Depending on the codegen the trap may or may not happen; it does happen
> when building with -fsanitize=undefined without optimizations at least.
> 
> This is part of gcc 14 release; I can only imagine this doesn't break the world
> because the world generally isn't compiling for C++ versions earlier than 2011
> these days.
> 
> Let me know if this should be filed as a separate bug report, or if this
> comment suffices.

yes, please raise a separate bugreport so we can appropriately prioritize
that as important.
Comment 40 Jonathan Wakely 2025-01-15 09:26:28 UTC
Yes please - thanks for catching it.
Comment 41 Arseny Kapoulkine 2025-01-15 16:11:04 UTC
Filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118493
Comment 42 GCC Commits 2025-01-15 21:37:45 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:6f85a97248fdff15aadc9514c1118eee0293d256

commit r15-6926-g6f85a97248fdff15aadc9514c1118eee0293d256
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jan 15 09:33:55 2025 +0000

    libstdc++: Fix reversed args in unreachable assumption [PR109849]
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/109849
            * include/bits/vector.tcc (vector::_M_range_insert): Fix
            reversed args in length calculation.
Comment 43 GCC Commits 2025-01-16 14:00:43 UTC
The releases/gcc-14 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:7df6af205f5c9853c4d70b5b8172b0483179c891

commit r14-11215-g7df6af205f5c9853c4d70b5b8172b0483179c891
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jan 15 09:33:55 2025 +0000

    libstdc++: Fix reversed args in unreachable assumption [PR109849]
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/109849
            * include/bits/vector.tcc (vector::_M_range_insert): Fix
            reversed args in length calculation.
    
    (cherry picked from commit 6f85a97248fdff15aadc9514c1118eee0293d256)
Comment 44 Jonathan Wakely 2025-01-16 14:01:17 UTC
Oops, I put this PR number in the commits rather than PR 118493