jan@localhost:/tmp> cat t.C #include <vector> typedef unsigned int uint32_t; std::vector<std::pair<uint32_t, uint32_t>> stack; void test() { while (!stack.empty()) { std::pair<uint32_t, uint32_t> cur = stack.back(); stack.pop_back(); if (cur.second) break; } } jan@localhost:/tmp> gcc t.C -O3 -S yields to: _Z4testv: .LFB1264: .cfi_startproc movq stack(%rip), %rcx movq stack+8(%rip), %rax jmp .L5 .p2align 4,,10 .p2align 3 .L6: movl -4(%rax), %edx subq $8, %rax movq %rax, stack+8(%rip) testl %edx, %edx jne .L4 .L5: cmpq %rax, %rcx jne .L6 .L4: ret We really should order the basic blocks putting cmpq before L6 saving a jump. Moreover clang does .p2align 4, 0x90 .LBB1_1: # =>This Inner Loop Header: Depth=1 cmpq %rax, %rcx je .LBB1_3 # %bb.2: # in Loop: Header=BB1_1 Depth=1 cmpl $0, -4(%rcx) leaq -8(%rcx), %rcx movq %rcx, stack+8(%rip) je .LBB1_1 .LBB1_3: retq saving an instruction. Why we do not move stack+8 updating out of the loop?
Actually why didn't we copy the loop header in the first place?
(In reply to Jan Hubicka from comment #0) > saving an instruction. Why we do not move stack+8 updating out of the loop? Maybe because of a clobber: cur$second_5 = MEM[(const struct pairD.26349 &)_7 + 18446744073709551608].secondD.27577; # PT = nonlocal escaped _4 = _7 + 18446744073709551608; # .MEM_9 = VDEF <.MEM_1> stackD.26352.D.27437._M_implD.26667.D.26744._M_finishD.26670 = _4; # .MEM_10 = VDEF <.MEM_9> MEM[(struct pairD.26349 *)_7 + -8B] ={v} {CLOBBER};
Rather, because store-motion out of a loop that might iterate zero times would create a data race.
> Rather, because store-motion out of a loop that might iterate zero times would > create a data race. Good point. If we did copy loop headers all the way to the store the problem will go away. Also I assume we can still add a flag which is set to true if loops iterates and then make store conditional...
> Actually why didn't we copy the loop header in the first place? Because it is considered to be do-while loop already (thanks to the in-loop conitional, do_while_loop_p is happy).
Here is slightly improved testcase which actually pushes into stack and measures something. It test loops 1000 times and returns. It also makes stack to be local variable so race conditions are not a problem. #include <vector> typedef unsigned int uint32_t; std::pair<uint32_t, uint32_t> pair; void test() { std::vector<std::pair<uint32_t, uint32_t>> stack; stack.push_back (pair); while (!stack.empty()) { std::pair<uint32_t, uint32_t> cur = stack.back(); stack.pop_back(); if (!cur.first) { cur.second++; stack.push_back (cur); } if (cur.second > 10000) break; } } int main() { for (int i = 0; i < 10000; i++) test(); } Clang code is about twice as fast jan@localhost:/tmp> clang++ -O2 tt.C -fno-exceptions jan@localhost:/tmp> g++ -O2 tt.C -fno-exceptions -o a.out-gcc jan@localhost:/tmp> perf stat ./a.out Performance counter stats for './a.out': 434.24 msec task-clock:u # 0.997 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 129 page-faults:u # 297.073 /sec 1,003,191,657 cycles:u # 2.310 GHz 68,927 stalled-cycles-frontend:u # 0.01% frontend cycles idle 800,792,619 stalled-cycles-backend:u # 79.82% backend cycles idle 1,904,682,933 instructions:u # 1.90 insn per cycle # 0.42 stalled cycles per insn 500,912,196 branches:u # 1.154 G/sec 23,144 branch-misses:u # 0.00% of all branches 0.435340389 seconds time elapsed 0.431409000 seconds user 0.003994000 seconds sys jan@localhost:/tmp> perf stat ./a.out-gcc Performance counter stats for './a.out-gcc': 1,197.28 msec task-clock:u # 0.999 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 131 page-faults:u # 109.415 /sec 2,903,995,656 cycles:u # 2.425 GHz 86,204 stalled-cycles-frontend:u # 0.00% frontend cycles idle 2,690,907,052 stalled-cycles-backend:u # 92.66% backend cycles idle 2,005,212,311 instructions:u # 0.69 insn per cycle # 1.34 stalled cycles per insn 401,007,320 branches:u # 334.932 M/sec 23,290 branch-misses:u # 0.01% of all branches 1.198388186 seconds time elapsed 1.198450000 seconds user 0.000000000 seconds sys The problem seems to be, like in first example, that we keep updating in-memory stack in the main loop. .L39: movl 12(%rsp), %ebx .L30: movq 16(%rsp), %rax cmpl $10000, %ebx ja .L33 .L40: movq 24(%rsp), %rdi cmpq %rdi, %rax je .L28 .L34: movq -8(%rdi), %rax leaq -8(%rdi), %rsi movq %rsi, 24(%rsp) movq %rax, 8(%rsp) testl %eax, %eax jne .L39 While clang does: .LBB0_1: # in Loop: Header=BB0_4 Depth=1 movq %rax, %r14 .LBB0_2: # in Loop: Header=BB0_4 Depth=1 movq %rbx, %r12 movq %r12, %rbx cmpl $10001, %r13d # imm = 0x2711 jae .LBB0_27 .LBB0_4: # =>This Loop Header: Depth=1 # Child Loop BB0_16 Depth 2 # Child Loop BB0_21 Depth 2 cmpq %r14, %rbx je .LBB0_26 # %bb.5: # in Loop: Header=BB0_4 Depth=1 leaq -8(%r14), %rax movq -8(%r14), %rcx movq %rcx, %r13 shrq $32, %r13 testl %ecx, %ecx jne .LBB0_1
There is nothing to sink really, loop header copying introduces a PHI and there's not partial redundancies but only partial-partial and those are not obvious to CSE because of the introduced PHI. I believe we have to teach SRA to decompose 'cur' and maybe also 'stack', there's no scalar optimization going to do it also because we have aggregate copies involved.
We can only SRA if the address is non-escaping. Clang does not seem to need it to optimize better: jan@localhost:~> cat t.c extern void q(int *); __attribute__ ((noinline)) void test() { for (int a = 0; a < 1000;a++) if (!(a%100)) q(&a); } int main() { for (int a = 0; a < 1000000;a++) test (); } jan@localhost:~> cat t2.c void q(int *a) { } jan@localhost:~> gcc -O2 t.c t2.c ; perf stat ./a.out Performance counter stats for './a.out': 2,916.73 msec task-clock:u # 0.999 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 52 page-faults:u # 17.828 /sec 8,344,719,833 cycles:u # 2.861 GHz 13,561,375 stalled-cycles-frontend:u # 0.16% frontend cycles idle 5,128,112,757 stalled-cycles-backend:u # 61.45% backend cycles idle 10,050,172,242 instructions:u # 1.20 insn per cycle # 0.51 stalled cycles per insn 2,034,043,082 branches:u # 697.370 M/sec 11,186,312 branch-misses:u # 0.55% of all branches 2.918344737 seconds time elapsed 2.917844000 seconds user 0.000000000 seconds sys jan@localhost:~> clang -O2 t.c t2.c ; perf stat ./a.out Performance counter stats for './a.out': 664.40 msec task-clock:u # 0.999 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 54 page-faults:u # 81.276 /sec 2,318,095,848 cycles:u # 3.489 GHz 10,417,694 stalled-cycles-frontend:u # 0.45% frontend cycles idle 1,057,731,301 stalled-cycles-backend:u # 45.63% backend cycles idle 10,062,172,840 instructions:u # 4.34 insn per cycle # 0.11 stalled cycles per insn 2,034,042,724 branches:u # 3.061 G/sec 10,003,620 branch-misses:u # 0.49% of all branches 0.665267996 seconds time elapsed 0.665247000 seconds user 0.000000000 seconds sys We do: jmp .L3 .p2align 4,,10 .p2align 3 .L2: movl 12(%rsp), %eax addl $1, %eax movl %eax, 12(%rsp) cmpl $999, %eax jg .L7 .L3: imull $-1030792151, %eax, %eax addl $85899344, %eax rorl $2, %eax cmpl $42949672, %eax ja .L2 leaq 12(%rsp), %rdi call q jmp .L2 Which has stupid store-to-load dpendency in the internal loop. Clang keeps the store but optimizes away the load: jmp .LBB0_1 .p2align 4, 0x90 .LBB0_3: # in Loop: Header=BB0_1 Depth=1 leal 1(%rax), %ecx movl %ecx, 12(%rsp) cmpl $999, %eax # imm = 0x3E7 movl %ecx, %eax jge .LBB0_4 .LBB0_1: # =>This Inner Loop Header: Depth=1 imull $-1030792151, %eax, %ecx # imm = 0xC28F5C29 addl $85899344, %ecx # imm = 0x51EB850 rorl $2, %ecx cmpl $42949672, %ecx # imm = 0x28F5C28 ja .LBB0_3 # %bb.2: # in Loop: Header=BB0_1 Depth=1 movq %rbx, %rdi callq q@PLT movl 12(%rsp), %eax jmp .LBB0_3 Wonder what makes clang to think it needs @PLT though. Why we do not consider the load as partially redundant with itself?
Created attachment 55110 [details] patch for the missed hoisting For the testcase in comment#6 there is a missing code hoisting from PRE which is caused by do_hoist_insertion doing /* A hoistable value must be in ANTIC_IN(block) but not in AVAIL_OUT(BLOCK). */ bitmap_initialize (&hoistable_set.values, &grand_bitmap_obstack); bitmap_and_compl (&hoistable_set.values, &ANTIC_IN (block)->values, &AVAIL_OUT (block)->values); but in reality we want to check ANTIC_OUT(block), not ANTIC_IN(block). cur.second is killed by the aggregate assignment to cur at the beginning of the block we should hoist to and that's reflected in ANTIC_IN. The attached patch properly re-computes ANTIC_OUT and uses that.
Thanks. I tested the patch on jpegxl and it does not help there (I guess becuase the redundancy there is partial). But it is cool we compile at least the simplified testcase well.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:9e2017ae6ac788d3e36999bb0f0d20ea0f62c20e commit r14-1127-g9e2017ae6ac788d3e36999bb0f0d20ea0f62c20e Author: Richard Biener <rguenther@suse.de> Date: Thu May 18 13:52:29 2023 +0200 tree-optimization/109849 - missed code hoisting The following fixes code hoisting to properly consider ANTIC_OUT instead of ANTIC_IN. That's a bit expensive to re-compute but since we no longer iterate we're doing this only once per BB which should be acceptable. This avoids missing hoistings to the end of blocks where something in the block clobbers the hoisted value. PR tree-optimization/109849 * tree-ssa-pre.cc (do_hoist_insertion): Compute ANTIC_OUT and use that to determine what to hoist. * gcc.dg/tree-ssa/ssa-hoist-8.c: New testcase.
So this fixed the missing code hoisting - partial PRE is done with -O3 only.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:5476de2618ffb77f3a52e59e2c9f10b018329689 commit r14-1161-g5476de2618ffb77f3a52e59e2c9f10b018329689 Author: Richard Biener <rguenther@suse.de> Date: Wed May 24 12:36:28 2023 +0200 tree-optimization/109849 - fix fallout of PRE hoisting change The PR109849 fix made us no longer hoist some memory loads because of the expression set intersection. We can still avoid to compute the union by simply taking the first sets expressions and leave the pruning of expressions with values not suitable for hoisting to sorted_array_from_bitmap_set. PR tree-optimization/109849 * tree-ssa-pre.cc (do_hoist_insertion): Do not intersect expressions but take the first sets. * gcc.dg/tree-ssa/ssa-hoist-9.c: New testcase.
One interesting situation is: void std::vector<std::pair<unsigned int, unsigned int> >::push_back (struct vector * const this, const struct value_type & __x) { struct __normal_iterator D.27894; struct pair * _1; struct pair * _2; struct pair * _3; <bb 2> [local count: 1073741824]: _1 = this_6(D)->D.26707._M_impl.D.26014._M_finish; _2 = this_6(D)->D.26707._M_impl.D.26014._M_end_of_storage; if (_1 != _2) goto <bb 3>; [82.57%] else goto <bb 4>; [17.43%] <bb 3> [local count: 886588625]: *_1 = MEM[(const struct pair &)__x_7(D)]; _3 = _1 + 8; this_6(D)->D.26707._M_impl.D.26014._M_finish = _3; goto <bb 5>; [100.00%] <bb 4> [local count: 187153200]: D.27894._M_current = _1; std::vector<std::pair<unsigned int, unsigned int> >::_M_realloc_insert<const std::pair<unsigned int, unsigned int>&> (this_6(D), D.27894, __x_7(D)); <bb 5> [local count: 1073741824]: return; } here we could do partial inlining and offline the call to _M_realloc_insert but we fail to cut since _1 is already load: Split point at BB 4 header time: 9.302800 header size: 9 split time: 2.440200 split size: 5 bbs: 4 SSA names to pass: 1, 9, 11 Refused: need to pass non-param values It should be easy to insert code loading the parameter again in the split part. We still hit the SRA limitation since this would be still escaping, but it is another missed optimization on this simple testcase.
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:5a1ef1cfac005370d0a5a0f85798724cb2c9cf5e commit r14-1909-g5a1ef1cfac005370d0a5a0f85798724cb2c9cf5e Author: Honza <jh@ryzen3.suse.cz> Date: Sun Jun 18 18:58:26 2023 +0200 Analyze SRA candidates in ipa-fnsummary this patch extends ipa-fnsummary to anticipate statements that will be removed by SRA. This is done by looking for calls passing addresses of automatic variables. In function body we look for dereferences from pointers of such variables and mark them with new not_sra_candidate condition. This is just first step which is overly optimistic. We do not try to prove that given automatic variable will not be SRAed even after inlining. We now also optimistically assume that the transformation will always happen. I will restrict this in a followup patch, but I think it is useful to gether some data on how much code is affected by this. This is motivated by PR109849 where we fail to fully inline push_back. The patch alone does not solve the problem even for -O3, but improves analysis in this case. gcc/ChangeLog: PR tree-optimization/109849 * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Add new parameter ES; handle ipa_predicate::not_sra_candidate. (evaluate_properties_for_edge): Pass es to evaluate_conditions_for_known_args. (ipa_fn_summary_t::duplicate): Handle sra candidates. (dump_ipa_call_summary): Dump points_to_possible_sra_candidate. (load_or_store_of_ptr_parameter): New function. (points_to_possible_sra_candidate_p): New function. (analyze_function_body): Initialize points_to_possible_sra_candidate; determine sra predicates. (estimate_ipcp_clone_size_and_time): Update call of evaluate_conditions_for_known_args. (remap_edge_params): Update points_to_possible_sra_candidate. (read_ipa_call_summary): Stream points_to_possible_sra_candidate (write_ipa_call_summary): Likewise. * ipa-predicate.cc (ipa_predicate::add_clause): Handle not_sra_candidate. (dump_condition): Dump it. * ipa-predicate.h (struct inline_param_summary): Add points_to_possible_sra_candidate. gcc/testsuite/ChangeLog: PR tree-optimization/109849 * g++.dg/ipa/devirt-45.C: Update template.
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:7b34cacc5735385e7e2855d7c0a6fad60ef4a99b commit r14-1951-g7b34cacc5735385e7e2855d7c0a6fad60ef4a99b Author: Jan Hubicka <jh@suse.cz> Date: Mon Jun 19 18:28:17 2023 +0200 optimize std::max early we currently produce very bad code on loops using std::vector as a stack, since we fail to inline push_back which in turn prevents SRA and we fail to optimize out some store-to-load pairs. I looked into why this function is not inlined and it is inlined by clang. We currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30 at -O3. Clang has similar estimate, but still decides to inline at -O2. I looked into reason why the body is so large and one problem I spotted is the way std::max is implemented by taking and returning reference to the values. const T& max( const T& a, const T& b ); This makes it necessary to store the values to memory and load them later and max is used by code computing new size of vector on resize. We optimize this to MAX_EXPR, but only during late optimizations. I think this is a common enough coding pattern and we ought to make this transparent to early opts and IPA. The following is easist fix that simply adds phiprop pass that turns the PHI of address values into PHI of values so later FRE can propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE remove the memory stores. gcc/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * passes.def: Add phiprop to early optimization passes. * tree-ssa-phiprop.cc: Allow clonning. gcc/testsuite/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * gcc.dg/tree-ssa/phiprop-1.c: New test. * gcc.dg/tree-ssa/pr21463.c: Adjust template.
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:c2ebccc97190a978a44e341516b488f02a78c598 commit r14-2101-gc2ebccc97190a978a44e341516b488f02a78c598 Author: Jan Hubicka <jh@suse.cz> Date: Mon Jun 26 18:29:39 2023 +0200 Fix profile of forwarders produced by cd-dce compiling the testcase from PR109849 (which uses std:vector based stack to drive a loop) with profile feedbakc leads to profile mismatches introduced by tree-ssa-dce. This is the new code to produce unified forwarder blocks for PHIs. I am not including the testcase itself since checking it for Invalid sum is probably going to be too fragile and this should show in our LNT testers. The patch however fixes the mismatch. Bootstrapped/regtested x86_64-linux and plan to commit it shortly. gcc/ChangeLog: PR tree-optimization/109849 * tree-ssa-dce.cc (make_forwarders_with_degenerate_phis): Fix profile count of newly constructed forwarder block.
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:45c53768b6fa3d737ae818e31d3c50da62e0ad2b commit r14-2157-g45c53768b6fa3d737ae818e31d3c50da62e0ad2b Author: Jan Hubicka <jh@suse.cz> Date: Wed Jun 28 11:45:15 2023 +0200 Add cold attribute to throw wrappers and terminate PR middle-end/109849 * include/bits/c++config (std::__terminate): Mark cold. * include/bits/functexcept.h: Mark everything as cold. * libsupc++/exception: Mark terminate and unexpected as cold.
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:9dc18fca431626404b0692c689a2e103666e7adb commit r14-2202-g9dc18fca431626404b0692c689a2e103666e7adb Author: Jan Hubicka <jh@suse.cz> Date: Thu Jun 29 22:45:37 2023 +0200 Compute ipa-predicates for conditionals involving __builtin_expect_p std::vector allocator looks as follows: __attribute__((nodiscard)) struct pair * std::__new_allocator<std::pair<unsigned int, unsigned int> >::allocate (struct __new_allocator * const this, size_type __n, const void * D.27753) { bool _1; long int _2; long int _3; long unsigned int _5; struct pair * _9; <bb 2> [local count: 1073741824]: _1 = __n_7(D) > 1152921504606846975; _2 = (long int) _1; _3 = __builtin_expect (_2, 0); if (_3 != 0) goto <bb 3>; [10.00%] else goto <bb 6>; [90.00%] <bb 3> [local count: 107374184]: if (__n_7(D) > 2305843009213693951) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] <bb 4> [local count: 53687092]: std::__throw_bad_array_new_length (); <bb 5> [local count: 53687092]: std::__throw_bad_alloc (); <bb 6> [local count: 966367641]: _5 = __n_7(D) * 8; _9 = operator new (_5); return _9; } So there is check for allocated block size being greater than max_size which is wrapper in __builtin_expect. This makes ipa-fnsummary to give up analyzing predicates and it will miss the fact that the two different calls to __throw will be optimized out if __n is larady smaller than 1152921504606846975 which it is after _M_check_len. This patch extends ipa-fnsummary to understand functions that return their parameter. gcc/ChangeLog: PR tree-optimization/109849 * ipa-fnsummary.cc (decompose_param_expr): Skip functions returning its parameter. (set_cond_stmt_execution_predicate): Return early if predicate was constructed. gcc/testsuite/ChangeLog: PR tree-optimization/109849 * gcc.dg/ipa/pr109849.c: New test.
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:eab57b825bcc350e9ff44eb2fa739a80199d9bb1 commit r14-2219-geab57b825bcc350e9ff44eb2fa739a80199d9bb1 Author: Jan Hubicka <jh@suse.cz> Date: Fri Jun 30 16:27:27 2023 +0200 Fix handling of __builtin_expect_with_probability and improve first-match heuristics While looking into the std::vector _M_realloc_insert codegen I noticed that call of __throw_bad_alloc is predicted with 10% probability. This is because the conditional guarding it has __builtin_expect (cond, 0) on it. This incorrectly takes precedence over more reliable heuristics predicting that call to cold noreturn is likely not going to happen. So I reordered the predictors so __builtin_expect_with_probability comes first after predictors that never makes a mistake (so user can use it to always specify the outcome by hand). I also downgraded malloc predictor since I do think user-defined malloc functions & new operators may behave funny ways and moved usual __builtin_expect after the noreturn cold predictor. This triggered latent bug in expr_expected_value_1 where if (*predictor < predictor2) *predictor = predictor2; should be: if (predictor2 < *predictor) *predictor = predictor2; which eventually triggered an ICE on combining heuristics. This made me notice that we can do slightly better while combining expected values in case only one of the parameters (such as in a*b when we expect a==0) can determine overall result. Note that the new code may pick weaker heuristics in case that both values are predicted. Not sure if this scenario is worth the extra CPU time: there is not correct way to combine the probabilities anyway since we do not know if the predictions are independent, so I think users should not rely on it. Fixing this issue uncovered another problem. In 2018 Martin Liska added code predicting that MALLOC returns non-NULL but instead of that he predicts that it returns true (boolean 1). This sort of works for testcase testing malloc (10) != NULL but, for example, we will predict malloc (10) == malloc (10) as true, which is not right and such comparsion may happen in real code I think proper way is to update expr_expected_value_1 to work with value ranges, but that needs greater surgery so I decided to postpone this and only add FIXME and fill PR110499. gcc/ChangeLog: PR middle-end/109849 * predict.cc (estimate_bb_frequencies): Turn to static function. (expr_expected_value_1): Fix handling of binary expressions with predicted values. * predict.def (PRED_MALLOC_NONNULL): Move later in the priority queue. (PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the priority queue. * predict.h (estimate_bb_frequencies): No longer declare it. gcc/testsuite/ChangeLog: PR middle-end/109849 * gcc.dg/predict-18.c: Improve testcase.
Patch https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html gets us closer to inlining _M_realloc_insert at -O3 (3 insns away) Patch https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636935.html reduces the expense when _M_realloc_insert is not inlined at -O2 (where I think we should not inline it, unlike for clang)
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:1d82fc2e6824bf83159389729c31a942f7b91b04 commit r14-5679-g1d82fc2e6824bf83159389729c31a942f7b91b04 Author: Jan Hubicka <jh@suse.cz> Date: Tue Nov 21 15:17:16 2023 +0100 optimize std::vector::push_back this patch speeds up the push_back at -O3 significantly by making the reallocation to be inlined by default. _M_realloc_insert is general insertion that takes iterator pointing to location where the value should be inserted. As such it contains code to move other entries around that is quite large. Since appending to the end of array is common operation, I think we should have specialized code for that. Sadly it is really hard to work out this from IPA passes, since we basically care whether the iterator points to the same place as the end pointer, which are both passed by reference. This is inter-procedural value numbering that is quite out of reach. I also added extra check making it clear that the new length of the vector is non-zero. This saves extra conditionals. Again it is quite hard case since _M_check_len seem to be able to return 0 if its parameter is 0. This never happens here, but we are not able to propagate this early nor at IPA stage. libstdc++-v3/ChangeLog: PR libstdc++/110287 PR middle-end/109811 PR middle-end/109849 * include/bits/stl_vector.h (_M_realloc_append): New member function. (push_back): Use it. * include/bits/vector.tcc: (emplace_back): Use it. (_M_realloc_insert): Let compiler know that new vector size is non-zero. (_M_realloc_append): New member function.
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>: https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363 commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 Author: Martin Jambor <mjambor@suse.cz> Date: Fri Nov 24 17:32:35 2023 +0100 sra: SRA of non-escaped aggregates passed by reference to calls PR109849 shows that a loop that heavily pushes and pops from a stack implemented by a C++ std::vec results in slow code, mainly because the vector structure is not split by SRA and so we end up in many loads and stores into it. This is because it is passed by reference to (re)allocation methods and so needs to live in memory, even though it does not escape from them and so we could SRA it if we re-constructed it before the call and then separated it to distinct replacements afterwards. This patch does exactly that, first relaxing the selection of candidates to also include those which are addressable but do not escape and then adding code to deal with the calls. The micro-benchmark that is also the (scan-dump) testcase in this patch runs twice as fast with it than with current trunk. Honza measured its effect on the libjxl benchmark and it almost closes the performance gap between Clang and GCC while not requiring excessive inlining and thus code growth. The patch disallows creation of replacements for such aggregates which are also accessed with a precision smaller than their size because I have observed that this led to excessive zero-extending of data leading to slow-downs of perlbench (on some CPUs). Apart from this case I have not noticed any regressions, at least not so far. Gimple call argument flags can tell if an argument is unused (and then we do not need to generate any statements for it) or if it is not written to and then we do not need to generate statements loading replacements from the original aggregate after the call statement. Unfortunately, we cannot symmetrically use flags that an aggregate is not read because to avoid re-constructing the aggregate before the call because flags don't tell which what parts of aggregates were not written to, so we load all replacements, and so all need to have the correct value before the call. This version of the patch also takes care to avoid attempts to modify abnormal edges, something which was missing in the previosu version. gcc/ChangeLog: 2023-11-23 Martin Jambor <mjambor@suse.cz> PR middle-end/109849 * tree-sra.cc (passed_by_ref_in_call): New. (sra_initialize): Allocate passed_by_ref_in_call. (sra_deinitialize): Free passed_by_ref_in_call. (create_access): Add decl pool candidates only if they are not already candidates. (build_access_from_expr_1): Bail out on ADDR_EXPRs. (build_access_from_call_arg): New function. (asm_visit_addr): Rename to scan_visit_addr, change the disqualification dump message. (scan_function): Check taken addresses for all non-call statements, including phi nodes. Process all call arguments, including the static chain, build_access_from_call_arg. (maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow non-escaped local variables. (sort_and_splice_var_accesses): Disallow smaller-than-precision replacements for aggregates passed by reference to functions. (sra_modify_expr): Use a separate stmt iterator for adding satements before the processed statement and after it. (enum out_edge_check): New type. (abnormal_edge_after_stmt_p): New function. (sra_modify_call_arg): New function. (sra_modify_assign): Adjust calls to sra_modify_expr. (sra_modify_function_body): Likewise, use sra_modify_call_arg to process call arguments, including the static chain. gcc/testsuite/ChangeLog: 2023-11-23 Martin Jambor <mjambor@suse.cz> PR middle-end/109849 * g++.dg/tree-ssa/pr109849.C: New test. * g++.dg/tree-ssa/sra-eh-1.C: Likewise. * gcc.dg/tree-ssa/pr109849.c: Likewise. * gcc.dg/tree-ssa/sra-longjmp-1.c: Likewise. * gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>: https://gcc.gnu.org/g:c2dcfb6ba6e9a84a16e63ae73a822ae2a843170c commit r14-5832-gc2dcfb6ba6e9a84a16e63ae73a822ae2a843170c Author: Jan Hubicka <jh@suse.cz> Date: Fri Nov 24 17:59:44 2023 +0100 Use memcpy instead of memmove in __relocate_a_1 __relocate_a_1 is used to copy data after vector reizing. This can be done by memcpy rather than memmove. libstdc++-v3/ChangeLog: PR middle-end/109849 * include/bits/stl_uninitialized.h (__relocate_a_1): Use memcpy instead of memmove.
(In reply to Richard Biener from comment #7) > There is nothing to sink really, loop header copying introduces a PHI and > there's not partial redundancies but only partial-partial and those are not > obvious to CSE because of the introduced PHI. > SRA now decomposes stack.
(In reply to GCC Commits from comment #23) > https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363 > > commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 > Author: Martin Jambor <mjambor@suse.cz> > Date: Fri Nov 24 17:32:35 2023 +0100 > > sra: SRA of non-escaped aggregates passed by reference to calls > I'm seeing a large number of libstdc++ testsuite failures, bisected to this patch. For example: make check -C x86_64-pc-linux-gnu/libstdc++-v3 RUNTESTFLAGS="conformance.exp=21_strings/basic_string/operators/char/1.cc --target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0" The full list of FAILs is: FAIL: 23_containers/vector/types/1.cc -std=gnu++98 (test for excess errors) FAIL: 23_containers/vector/types/1.cc -std=gnu++98 (test for excess errors) FAIL: 19_diagnostics/stacktrace/output.cc -std=gnu++23 execution test FAIL: 19_diagnostics/stacktrace/output.cc -std=gnu++26 execution test FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++11 execution test FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++14 execution test FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++17 execution test FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++20 execution test FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++23 execution test FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++26 execution test FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++11 execution test FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++14 execution test FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++17 execution test FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++20 execution test FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++23 execution test FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++26 execution test FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++11 execution test FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++14 execution test FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++17 execution test FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++20 execution test FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++23 execution test FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++26 execution test FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++11 execution test FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++14 execution test FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++17 execution test FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++20 execution test FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++23 execution test FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++26 execution test FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++11 execution test FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++14 execution test FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++17 execution test FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++20 execution test FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++23 execution test FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++26 execution test FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++14 execution test FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++17 execution test FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++20 execution test FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++23 execution test FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++26 execution test FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++26 execution test FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++26 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++26 execution test FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++26 execution test FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++26 execution test FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++26 execution test FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++26 execution test FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++26 execution test FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++11 execution test FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++14 execution test FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++17 execution test FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++20 execution test FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++23 execution test FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++26 execution test FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++11 execution test FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++14 execution test FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++17 execution test FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++20 execution test FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++23 execution test FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++26 execution test FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++11 execution test FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++14 execution test FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++17 execution test FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++20 execution test FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++23 execution test FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++26 execution test FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++11 execution test FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++14 execution test FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++17 execution test FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++20 execution test FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++23 execution test FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++26 execution test FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++11 execution test FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++14 execution test FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++17 execution test FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++20 execution test FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++23 execution test FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++26 execution test FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++11 execution test FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++14 execution test FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++17 execution test FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++20 execution test FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++23 execution test FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++26 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++11 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++14 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++17 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++20 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++23 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++26 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++11 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++14 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++17 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++20 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++23 execution test FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++26 execution test FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++11 execution test FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++14 execution test FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++17 execution test FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++20 execution test FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++23 execution test FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++26 execution test FAIL: 27_io/filesystem/filesystem_error/cons.cc -std=gnu++17 execution test FAIL: 27_io/filesystem/filesystem_error/cons.cc -std=gnu++20 execution test FAIL: 27_io/filesystem/filesystem_error/cons.cc -std=gnu++23 execution test FAIL: 27_io/filesystem/filesystem_error/cons.cc -std=gnu++26 execution test FAIL: 27_io/filesystem/operations/canonical.cc -std=gnu++17 execution test FAIL: 27_io/filesystem/operations/canonical.cc -std=gnu++20 execution test FAIL: 27_io/filesystem/operations/canonical.cc -std=gnu++23 execution test FAIL: 27_io/filesystem/operations/canonical.cc -std=gnu++26 execution test FAIL: 27_io/filesystem/operations/copy_file_108178.cc -std=gnu++17 execution test FAIL: 27_io/filesystem/operations/copy_file_108178.cc -std=gnu++20 execution test FAIL: 27_io/filesystem/operations/copy_file_108178.cc -std=gnu++23 execution test FAIL: 27_io/filesystem/operations/copy_file_108178.cc -std=gnu++26 execution test FAIL: 27_io/filesystem/path/concat/strings.cc -std=gnu++17 execution test FAIL: 27_io/filesystem/path/concat/strings.cc -std=gnu++20 execution test FAIL: 27_io/filesystem/path/concat/strings.cc -std=gnu++23 execution test FAIL: 27_io/filesystem/path/concat/strings.cc -std=gnu++26 execution test FAIL: 28_regex/basic_regex/106607.cc -std=gnu++11 execution test FAIL: 28_regex/basic_regex/106607.cc -std=gnu++14 execution test FAIL: 28_regex/basic_regex/106607.cc -std=gnu++17 execution test FAIL: 28_regex/basic_regex/106607.cc -std=gnu++20 execution test FAIL: 28_regex/basic_regex/106607.cc -std=gnu++23 execution test FAIL: 28_regex/basic_regex/106607.cc -std=gnu++26 execution test FAIL: 30_threads/thread/id/output.cc -std=gnu++11 execution test FAIL: 30_threads/thread/id/output.cc -std=gnu++14 execution test FAIL: 30_threads/thread/id/output.cc -std=gnu++17 execution test FAIL: 30_threads/thread/id/output.cc -std=gnu++20 execution test FAIL: 30_threads/thread/id/output.cc -std=gnu++23 execution test FAIL: 30_threads/thread/id/output.cc -std=gnu++26 execution test FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++11 execution test FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++14 execution test FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++17 execution test FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++20 execution test FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++23 execution test FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++26 execution test FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++11 execution test FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++14 execution test FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++17 execution test FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++20 execution test FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++23 execution test FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++26 execution test FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++11 execution test FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++14 execution test FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++17 execution test FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++20 execution test FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++23 execution test FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/copy.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/copy.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/copy.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/copy.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/copy.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/copy.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/exists.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/exists.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/exists.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/exists.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/exists.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/exists.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++26 execution test FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++11 execution test FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++14 execution test FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++20 execution test FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++23 execution test FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++26 execution test FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++11 execution test FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++14 execution test FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++17 execution test FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++20 execution test FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++23 execution test FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++26 execution test FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++14 execution test FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++17 execution test FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++20 execution test FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++23 execution test FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++26 execution test FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++14 execution test FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++17 execution test FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++20 execution test FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++23 execution test FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/hash_map_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/hash_set_rand.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/hash_set_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/list_update_map_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/list_update_set_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/priority_queue_rand.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/priority_queue_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/tree_map_rand.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/tree_map_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/tree_set_rand.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/tree_set_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/trie_map_rand.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/trie_map_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++26 execution test FAIL: ext/pb_ds/regression/trie_set_rand.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/trie_set_rand.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++11 execution test FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++14 execution test FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++17 execution test FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++20 execution test FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++23 execution test FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++26 execution test FAIL: std/format/functions/format.cc -std=gnu++20 execution test FAIL: std/format/functions/format.cc -std=gnu++23 execution test FAIL: std/format/functions/format.cc -std=gnu++26 execution test
(In reply to Jonathan Wakely from comment #26) > (In reply to GCC Commits from comment #23) > > https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363 > > > > commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 > > Author: Martin Jambor <mjambor@suse.cz> > > Date: Fri Nov 24 17:32:35 2023 +0100 > > > > sra: SRA of non-escaped aggregates passed by reference to calls > > > > I'm seeing a large number of libstdc++ testsuite failures, bisected to this > patch. > > For example: > > make check -C x86_64-pc-linux-gnu/libstdc++-v3 > RUNTESTFLAGS="conformance.exp=21_strings/basic_string/operators/char/1.cc > --target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0" > Unfortunately I cannot reproduce this, the above (on pristine master commit 006e90e1344 on an x86_64-linux) results in: Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0 Running /home/mjambor/gcc/small/src/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ... PASS: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 (test for excess errors) PASS: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 execution test Can you please try if https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html fixes this?
(In reply to Martin Jambor from comment #27) > Unfortunately I cannot reproduce this, the above (on pristine master > commit 006e90e1344 on an x86_64-linux) results in: > > Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0 > Running > /home/mjambor/gcc/small/src/libstdc++-v3/testsuite/libstdc++-dg/conformance. > exp ... > PASS: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 (test for > excess errors) > PASS: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 execution > test Oops, sorry, that particular FAIL needs either --target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG which then makes it fail for all -std modes: Schedule of variations: unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/test/src/gcc/libstdc++-v3/testsuite/config/default.exp as tool-and-target-specific interface file. Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ... FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++11 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++14 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++20 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++23 execution test FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++26 execution test Or just set GLIBCXX_TESTSUITE_STDS="17,20" in the env before running the test: Schedule of variations: unix/-D_GLIBCXX_USE_CXX11_ABI=0 Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0 Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/test/src/gcc/libstdc++-v3/testsuite/config/default.exp as tool-and-target-specific interface file. Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ... FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++20 execution test It seems I picked a bad example to give, which requires additional options to FAIL. Many of the other FAILs do not require _GLIBCXX_DEBUG or -std=gnu++20 to FAIL, but the -D_GLIBCXX_USE_CXX11_ABI=0 option is necessary, at least for all the ones I inspected. That option isn't used by default, but I run the full testsuite with that several times a day, and with GLIBCXX_TESTSUITE_STDS=98,11,14,17,20,23,26. > Can you please try if > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html > fixes this? Testing now ...
(In reply to Jonathan Wakely from comment #28) > (In reply to Martin Jambor from comment #27) > > Can you please try if > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html > > fixes this? > > Testing now ... Yes, this fixes 21_strings/basic_string/operators/char/* and 28_regex/basic_regex/106607.cc Running the full testsuite now...
So far the only FAIL is still see is: FAIL: 23_containers/vector/types/1.cc -std=gnu++98 (test for excess errors) I'm not sure if this is caused by your patch or one of Honza's. The test only fails with GLIBCXX_TESTSUITE_STDS=98 defined.
Bisection points to r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 for that remaining FAIL as well (and it isn't fixed by the new patch). It introduced a new warning which wasn't present before: /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437: warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' writing between 2 and 9223372036854775806 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
> /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437: > warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' > writing between 2 and 9223372036854775806 bytes into a region of size 0 > overflows the destination [-Wstringop-overflow=] It warns on: template<bool _IsMove> struct __copy_move<_IsMove, true, random_access_iterator_tag> { template<typename _Tp, typename _Up> _GLIBCXX20_CONSTEXPR static _Up* __copy_m(_Tp* __first, _Tp* __last, _Up* __result) { const ptrdiff_t _Num = __last - __first; if (__builtin_expect(_Num > 1, true)) __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); else if (_Num == 1) std::__copy_move<_IsMove, false, random_access_iterator_tag>:: __assign_one(__result, __first); return __result + _Num; } }; It is likely false positive on a code path that never happens in real code, but we now optimize it better. Does it show an inline path?
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>: https://gcc.gnu.org/g:302461ad9a04d82fee904bddac69811d13d5bb6a commit r14-5971-g302461ad9a04d82fee904bddac69811d13d5bb6a Author: Martin Jambor <mjambor@suse.cz> Date: Wed Nov 29 16:24:33 2023 +0100 tree-sra: Avoid returns of references to SRA candidates The enhancement to address PR 109849 contained an importsnt thinko, and that any reference that is passed to a function and does not escape, must also not happen to be aliased by the return value of the function. This has quickly transpired as bugs PR 112711 and PR 112721. Just as IPA-modref does a good enough job to allow us to rely on the escaped set of variables, it sems to be doing well also on updating EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address exactly the situation we need to avoid. Of course, if a call statement ignores any returned value, we also do not need to check the flag. Hopefully this does not pessimize things too much, I have verified that the PR 109849 testcae remains quick and so should also the benchmark it is derived from. gcc/ChangeLog: 2023-11-27 Martin Jambor <mjambor@suse.cz> PR tree-optimization/112711 PR tree-optimization/112721 * tree-sra.cc (build_access_from_call_arg): New parameter CAN_BE_RETURNED, disqualify any candidate passed by reference if it is true. Adjust leading comment. (scan_function): Pass appropriate value to CAN_BE_RETURNED of build_access_from_call_arg. gcc/testsuite/ChangeLog: 2023-11-29 Martin Jambor <mjambor@suse.cz> PR tree-optimization/112711 PR tree-optimization/112721 * g++.dg/tree-ssa/pr112711.C: New test. * gcc.dg/tree-ssa/pr112721.c: Likewise.
(In reply to Jan Hubicka from comment #32) > > /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437: > > warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' > > writing between 2 and 9223372036854775806 bytes into a region of size 0 > > overflows the destination [-Wstringop-overflow=] > > It warns on: > > template<bool _IsMove> > struct __copy_move<_IsMove, true, random_access_iterator_tag> > { > template<typename _Tp, typename _Up> > _GLIBCXX20_CONSTEXPR > static _Up* > __copy_m(_Tp* __first, _Tp* __last, _Up* __result) > { > const ptrdiff_t _Num = __last - __first; > if (__builtin_expect(_Num > 1, true)) > __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); > else if (_Num == 1) > std::__copy_move<_IsMove, false, random_access_iterator_tag>:: > __assign_one(__result, __first); > return __result + _Num; > } > }; > > It is likely false positive on a code path that never happens in real > code, but we now optimize it better. > We end up with: <bb 16> [local count: 64736968]: __builtin_memcpy (1B, v$_M_impl$D10203$_M_start_448, _354); IIRC the statement variant is created by jump threading (specifically thread2). Moreover, if I understand the comment in compute_objsize_r about the INTEGER_CST case correctly, small integers are considered potential "result of erroneous null pointer addition/subtraction." So not warning on a constant 1 destination does not seem to be desirable.
(In reply to Jonathan Wakely from comment #31) > Bisection points to r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 for > that remaining FAIL as well (and it isn't fixed by the new patch). > > It introduced a new warning which wasn't present before: > > /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437: > warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' > writing between 2 and 9223372036854775806 bytes into a region of size 0 > overflows the destination [-Wstringop-overflow=] I don't know why we only get a warning for C++98 and not other modes, but this seems to fix it: --- a/libstdc++-v3/include/bits/vector.tcc +++ b/libstdc++-v3/include/bits/vector.tcc @@ -933,6 +933,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER const size_type __len = _M_check_len(__n, "vector::_M_range_insert"); + if (__len < (__n + (__old_finish - __old_start))) + __builtin_unreachable(); + pointer __new_start(this->_M_allocate(__len)); pointer __new_finish(__new_start); __try So it looks like the compiler can't tell that _M_check_len(n) doesn't undergo unsigned wraparound.
The master branch has been updated by Francois Dumont <fdumont@gcc.gnu.org>: https://gcc.gnu.org/g:0426be454448f8cfb9db21f4f669426afb7b57c8 commit r15-996-g0426be454448f8cfb9db21f4f669426afb7b57c8 Author: François Dumont <frs.dumont@gmail.com> Date: Sat Jun 1 22:17:19 2024 +0200 libstdc++: Fix -Wstringop-overflow warning coming from std::vector [PR109849] libstdc++-v3/ChangeLog: PR libstdc++/109849 * include/bits/vector.tcc (std::vector<>::_M_range_insert(iterator, _FwdIt, _FwdIt, forward_iterator_tag))[__cplusplus < 201103L]: Add __builtin_unreachable expression to tell the compiler that the allocated buffer is large enough to receive current elements plus the elements of the range to insert.
The releases/gcc-14 branch has been updated by Francois Dumont <fdumont@gcc.gnu.org>: https://gcc.gnu.org/g:955202eb2cdbe2bc74c626bce90ee1eac410ad4f commit r14-10272-g955202eb2cdbe2bc74c626bce90ee1eac410ad4f Author: François Dumont <frs.dumont@gmail.com> Date: Sat Jun 1 22:17:19 2024 +0200 libstdc++: Fix -Wstringop-overflow warning coming from std::vector [PR109849] libstdc++-v3/ChangeLog: PR libstdc++/109849 * include/bits/vector.tcc (std::vector<>::_M_range_insert(iterator, _FwdIt, _FwdIt, forward_iterator_tag))[__cplusplus < 201103L]: Add __builtin_unreachable expression to tell the compiler that the allocated buffer is large enough to receive current elements plus the elements of the range to insert. (cherry picked from commit 0426be454448f8cfb9db21f4f669426afb7b57c8)
The change that attempts to silence the warning breaks std::vector insert behavior in certain (most?) cases. Specifically, the patch checks: + if (__len < (__n + (__old_start - __old_finish))) ... which is incorrect, as "start - finish" subtraction order is wrong - it should be "finish - start". The consequences are that insert will trap in some cases depending on how many elements the vector already has and how many elements are being inserted. For example, if the vector size and capacity is 768 elements, and 384 elements (__n) are being inserted, __len computed via _M_check_len(__n) produces 1536 (doubling the currently allocated capacity). Since __n is 384, __n + (-768) underflows and produces a very large value that is definitely greater then __len. Depending on the codegen the trap may or may not happen; it does happen when building with -fsanitize=undefined without optimizations at least. This is part of gcc 14 release; I can only imagine this doesn't break the world because the world generally isn't compiling for C++ versions earlier than 2011 these days. Let me know if this should be filed as a separate bug report, or if this comment suffices.
On Wed, 15 Jan 2025, arseny.kapoulkine at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 > > Arseny Kapoulkine <arseny.kapoulkine at gmail dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |arseny.kapoulkine at gmail dot com > > --- Comment #38 from Arseny Kapoulkine <arseny.kapoulkine at gmail dot com> --- > The change that attempts to silence the warning breaks std::vector insert > behavior in certain (most?) cases. > > Specifically, the patch checks: > > + if (__len < (__n + (__old_start - __old_finish))) > > ... which is incorrect, as "start - finish" subtraction order is wrong - it > should be "finish - start". > > The consequences are that insert will trap in some cases depending on how many > elements the vector already has and how many elements are being inserted. For > example, if the vector size and capacity is 768 elements, and 384 elements > (__n) are being inserted, __len computed via _M_check_len(__n) produces 1536 > (doubling the currently allocated capacity). Since __n is 384, __n + (-768) > underflows and produces a very large value that is definitely greater then > __len. Depending on the codegen the trap may or may not happen; it does happen > when building with -fsanitize=undefined without optimizations at least. > > This is part of gcc 14 release; I can only imagine this doesn't break the world > because the world generally isn't compiling for C++ versions earlier than 2011 > these days. > > Let me know if this should be filed as a separate bug report, or if this > comment suffices. yes, please raise a separate bugreport so we can appropriately prioritize that as important.
Yes please - thanks for catching it.
Filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118493
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:6f85a97248fdff15aadc9514c1118eee0293d256 commit r15-6926-g6f85a97248fdff15aadc9514c1118eee0293d256 Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Jan 15 09:33:55 2025 +0000 libstdc++: Fix reversed args in unreachable assumption [PR109849] libstdc++-v3/ChangeLog: PR libstdc++/109849 * include/bits/vector.tcc (vector::_M_range_insert): Fix reversed args in length calculation.
The releases/gcc-14 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:7df6af205f5c9853c4d70b5b8172b0483179c891 commit r14-11215-g7df6af205f5c9853c4d70b5b8172b0483179c891 Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Jan 15 09:33:55 2025 +0000 libstdc++: Fix reversed args in unreachable assumption [PR109849] libstdc++-v3/ChangeLog: PR libstdc++/109849 * include/bits/vector.tcc (vector::_M_range_insert): Fix reversed args in length calculation. (cherry picked from commit 6f85a97248fdff15aadc9514c1118eee0293d256)
Oops, I put this PR number in the commits rather than PR 118493