The fix for PR c++/70167 (in GCC 11.3) inadvertently introduced a code quality regression for simple range-for using initializer lists. The motivating example is an idiom from the stockfish benchmark [update_continuation_histories in src/search.cpp]: #include <initializer_list> extern void ext(int); void foo() { for (int i: {1,2,4,6}) ext(i); } which currently generates inefficient code by copying the array (to the stack) before use: foo(): pushq %rbp pushq %rbx subq $24, %rsp movdqa .LC0(%rip), %xmm0 movq %rsp, %rbx leaq 16(%rsp), %rbp movaps %xmm0, (%rsp) .L2: movl (%rbx), %edi addq $4, %rbx call ext(int) cmpq %rbp, %rbx jne .L2 addq $24, %rsp popq %rbx popq %rbp ret .LC0: .long 1 .long 2 .long 4 .long 6 In GCC 11.2 and earlier, the initializing array is efficiently used without copying: foo(): pushq %rbx movl $C.0.0, %ebx .L2: movl (%rbx), %edi addq $4, %rbx call ext(int) cmpq $C.0.0+16, %rbx jne .L2 popq %rbx ret C.0.0: .long 1 .long 2 .long 4 .long 6 The underlying cause of the code difference stems from whether the initializer is marked "static" in the middle-end, as shown by the differences between: const int init[4] = {1,2,4,6}; for (int i: init) ... // generates a copy and static const int init[4] = {1,2,4,6}; for (int i: init) ... // doesn't generate a copy Fortunately, there's already code in the depth of the C++ front=end for marking such initializer lists/constructors as static, so I initially tried fixing this myself, at first trying: diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 05df628..a91693d 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -3314,7 +3314,6 @@ finish_compound_literal (tree type, tree compound_literal, /* FIXME all C99 compound literals should be variables rather than C++ temporaries, unless they are used as an aggregate initializer. */ if ((!at_function_scope_p () || CP_TYPE_CONST_P (type)) - && fcl_context == fcl_c99 && TREE_CODE (type) == ARRAY_TYPE && !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type) && initializer_constant_valid_p (compound_literal, type)) and then trying: diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index 2736f55..53220da 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -8557,7 +8557,10 @@ convert_like_internal (conversion *convs, tree expr, tree fn, int argnum, elttype = cp_build_qualified_type (elttype, cp_type_quals (elttype) | TYPE_QUAL_CONST); array = build_array_of_n_type (elttype, len); - array = finish_compound_literal (array, new_ctor, complain); + /* Indicate that a non-lvalue static const array is acceptable + by specifying fcl_c99. */ + array = finish_compound_literal (array, new_ctor, complain, + fcl_c99); /* Take the address explicitly rather than via decay_conversion to avoid the error about taking the address of a temporary. */ array = cp_build_addr_expr (array, complain); Both of which fix/improve code generation for this case, but break the initializer tests in the g++.dg testsuite in interesting ways. At this point I thought I'd give up and leave the fix to the experts. The range_expr passed to cp_convert_range_for is: <constructor 0x7ffff6dc9348 type <lang_type 0x7ffff6dadc78 init list VOID align:1 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6dadc78> constant length:4 val <non_lvalue_expr 0x7ffff6dcd540 type <integer_type 0x7ffff6c415e8 int public type_6 SI size <integer_cst 0x7ffff6c43228 constant 32> unit-size <integer_cst 0x7ffff6c43240 constant 4> align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6c415e8 precision:32 min <integer_cst 0x7ffff6c431e0 -2147483648> max <integer_cst 0x7ffff6c431f8 2147483647> pointer_to_this <pointer_type 0x7ffff6c49b28>> constant public arg:0 <integer_cst 0x7ffff6c43390 constant 1> iter.cc:7:16 start: iter.cc:7:16 finish: iter.cc:7:16> val <non_lvalue_expr 0x7ffff6dcd560 type <integer_type 0x7ffff6c415e8 int> constant public arg:0 <integer_cst 0x7ffff6c43768 constant 2> iter.cc:7:18 start: iter.cc:7:18 finish: iter.cc:7:18> val <non_lvalue_expr 0x7ffff6dcd580 type <integer_type 0x7ffff6c415e8 int> constant public arg:0 <integer_cst 0x7ffff6c43780 constant 4> iter.cc:7:20 start: iter.cc:7:20 finish: iter.cc:7:20> val <non_lvalue_expr 0x7ffff6dcd5a0 type <integer_type 0x7ffff6c415e8 int> constant public arg:0 <integer_cst 0x7ffff6c437b0 constant 6> iter.cc:7:22 start: iter.cc:7:22 finish: iter.cc:7:22>> which contains a lot of non_lvalue_expr, so it's surprising (to me) that we try to turn this into an lvalue, when it should/could be read-only. Thanks in advance. My apologies if this is a duplicate/known issue. Ideally, GCC should be able to unroll this loop, but that's a different issue.
(In reply to Roger Sayle from comment #0) > Fortunately, there's already code in the depth of the C++ front=end for > marking such initializer lists/constructors as static, Jason looked into this recently. There are problems with making it static (although in this case where the addresses of the elements don't escape maybe it would be OK).
The trunk branch has been updated by Jason Merrill <jason@gcc.gnu.org>: https://gcc.gnu.org/g:4d935f52b0d5c00fcc154461b87415ebd8791a94 commit r14-1500-g4d935f52b0d5c00fcc154461b87415ebd8791a94 Author: Jason Merrill <jason@redhat.com> Date: Wed Dec 7 11:40:53 2022 -0500 c++: make initializer_list array static again [PR110070] After the maybe_init_list_as_* patches, I noticed that we were putting the array of strings into .rodata, but then memcpying it into an automatic array, which is pointless; we should be able to use it directly. This doesn't happen automatically because TREE_ADDRESSABLE is set (since r12-657 for PR100464), and so gimplify_init_constructor won't promote the variable to static. Theoretically we could do escape analysis to recognize that the address, though taken, never leaves the function; that would allow promotion when we're only using the address for indexing within the function, as in initlist-opt2.C. But this would be a new pass. And in initlist-opt1.C, we're passing the array address to another function, so it definitely escapes; it's only safe in this case because it's calling a standard library function that we know only uses it for indexing. So, a flag seems needed. I first thought to put the flag on the TARGET_EXPR, but the VAR_DECL seems more appropriate. In a previous revision of the patch I called this flag DECL_NOT_OBSERVABLE, but I think DECL_MERGEABLE is a better name, especially if we're going to apply it to the backing array of initializer_list, which is observable. I then also check it in places that check for -fmerge-all-constants, so that multiple equivalent initializer-lists can also be combined. And then it seemed to make sense for [[no_unique_address]] to have this meaning for user-written variables. I think the note in [dcl.init.list]/6 intended to allow this kind of merging for initializer_lists, but it didn't actually work; for an explicit array with the same initializer, if the address escapes the program could tell whether the same variable in two frames have the same address. P2752 is trying to correct this defect, so I'm going to assume that this is the intent. PR c++/110070 PR c++/105838 gcc/ChangeLog: * tree.h (DECL_MERGEABLE): New. * tree-core.h (struct tree_decl_common): Mention it. * gimplify.cc (gimplify_init_constructor): Check it. * cgraph.cc (symtab_node::address_can_be_compared_p): Likewise. * varasm.cc (categorize_decl_for_section): Likewise. gcc/cp/ChangeLog: * call.cc (maybe_init_list_as_array): Set DECL_MERGEABLE. (convert_like_internal) [ck_list]: Set it. (set_up_extended_ref_temp): Copy it. * tree.cc (handle_no_unique_addr_attribute): Set it. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/initlist-opt1.C: Check for static array. * g++.dg/tree-ssa/initlist-opt2.C: Likewise. * g++.dg/tree-ssa/initlist-opt4.C: New test. * g++.dg/opt/icf1.C: New test. * g++.dg/opt/icf2.C: New test. * g++.dg/opt/icf3.C: New test. * g++.dg/tree-ssa/array-temp1.C: Revert r12-657 change.
Fixed for GCC 14 so far.
The tests introduced by the commit above all fail on Darwin, both on Intel and ARM: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111067
Fixed in GCC 14.