Bug 110070 - Code quality regression with for (int i: {1,2,4,6})
Summary: Code quality regression with for (int i: {1,2,4,6})
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 11.3.1
: P3 normal
Target Milestone: 14.0
Assignee: Jason Merrill
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2023-06-01 07:26 UTC by Roger Sayle
Modified: 2025-03-07 01:51 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-06-02 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roger Sayle 2023-06-01 07:26:09 UTC
The fix for PR c++/70167 (in GCC 11.3) inadvertently introduced a code quality regression for simple range-for using initializer lists.  The motivating example is an idiom from the stockfish benchmark [update_continuation_histories in src/search.cpp]:

#include <initializer_list>
extern void ext(int);
void foo()
{
  for (int i: {1,2,4,6})
    ext(i);
}

which currently generates inefficient code by copying the array (to the stack) before use:
foo():
        pushq   %rbp
        pushq   %rbx
        subq    $24, %rsp
        movdqa  .LC0(%rip), %xmm0
        movq    %rsp, %rbx
        leaq    16(%rsp), %rbp
        movaps  %xmm0, (%rsp)
.L2:
        movl    (%rbx), %edi
        addq    $4, %rbx
        call    ext(int)
        cmpq    %rbp, %rbx
        jne     .L2
        addq    $24, %rsp
        popq    %rbx
        popq    %rbp
        ret
.LC0:
        .long   1
        .long   2
        .long   4
        .long   6

In GCC 11.2 and earlier, the initializing array is efficiently used without copying:

foo():
        pushq   %rbx
        movl    $C.0.0, %ebx
.L2:
        movl    (%rbx), %edi
        addq    $4, %rbx
        call    ext(int)
        cmpq    $C.0.0+16, %rbx
        jne     .L2
        popq    %rbx
        ret
C.0.0:
        .long   1
        .long   2
        .long   4
        .long   6

The underlying cause of the code difference stems from whether the initializer is marked "static" in the middle-end, as shown by the differences between:

  const int init[4] = {1,2,4,6};
  for (int i: init) ... // generates a copy

and 

  static const int init[4] = {1,2,4,6};
  for (int i: init) ... // doesn't generate a copy


Fortunately, there's already code in the depth of the C++ front=end for marking such initializer lists/constructors as static, so I initially tried fixing this myself, at first trying:

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 05df628..a91693d 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -3314,7 +3314,6 @@ finish_compound_literal (tree type, tree compound_literal,
   /* FIXME all C99 compound literals should be variables rather than C++
      temporaries, unless they are used as an aggregate initializer.  */
   if ((!at_function_scope_p () || CP_TYPE_CONST_P (type))
-      && fcl_context == fcl_c99
       && TREE_CODE (type) == ARRAY_TYPE
       && !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type)
       && initializer_constant_valid_p (compound_literal, type))

and then trying:

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 2736f55..53220da 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -8557,7 +8557,10 @@ convert_like_internal (conversion *convs, tree expr, tree
 fn, int argnum,
 	    elttype = cp_build_qualified_type
 	      (elttype, cp_type_quals (elttype) | TYPE_QUAL_CONST);
 	    array = build_array_of_n_type (elttype, len);
-	    array = finish_compound_literal (array, new_ctor, complain);
+	    /* Indicate that a non-lvalue static const array is acceptable
+ 	       by specifying fcl_c99.  */
+	    array = finish_compound_literal (array, new_ctor, complain,
+					     fcl_c99);
 	    /* Take the address explicitly rather than via decay_conversion
 	       to avoid the error about taking the address of a temporary.  */
 	    array = cp_build_addr_expr (array, complain);


Both of which fix/improve code generation for this case, but break the initializer tests in the g++.dg testsuite in interesting ways.  At this point I thought I'd give up and leave the fix to the experts.  The range_expr passed to cp_convert_range_for is:

 <constructor 0x7ffff6dc9348
    type <lang_type 0x7ffff6dadc78 init list VOID
        align:1 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6dadc78>
    constant length:4
    val <non_lvalue_expr 0x7ffff6dcd540
        type <integer_type 0x7ffff6c415e8 int public type_6 SI
            size <integer_cst 0x7ffff6c43228 constant 32>
            unit-size <integer_cst 0x7ffff6c43240 constant 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6c415e8 precision:32 min <integer_cst 0x7ffff6c431e0 -2147483648> max <integer_cst 0x7ffff6c431f8 2147483647>
            pointer_to_this <pointer_type 0x7ffff6c49b28>>
        constant public
        arg:0 <integer_cst 0x7ffff6c43390 constant 1>
        iter.cc:7:16 start: iter.cc:7:16 finish: iter.cc:7:16>
    val <non_lvalue_expr 0x7ffff6dcd560 type <integer_type 0x7ffff6c415e8 int>
        constant public
        arg:0 <integer_cst 0x7ffff6c43768 constant 2>
        iter.cc:7:18 start: iter.cc:7:18 finish: iter.cc:7:18>
    val <non_lvalue_expr 0x7ffff6dcd580 type <integer_type 0x7ffff6c415e8 int>
        constant public
        arg:0 <integer_cst 0x7ffff6c43780 constant 4>
        iter.cc:7:20 start: iter.cc:7:20 finish: iter.cc:7:20>
    val <non_lvalue_expr 0x7ffff6dcd5a0 type <integer_type 0x7ffff6c415e8 int>
        constant public
        arg:0 <integer_cst 0x7ffff6c437b0 constant 6>
        iter.cc:7:22 start: iter.cc:7:22 finish: iter.cc:7:22>>

which contains a lot of non_lvalue_expr, so it's surprising (to me) that we try to turn this into an lvalue, when it should/could be read-only.

Thanks in advance.  My apologies if this is a duplicate/known issue.
Ideally, GCC should be able to unroll this loop, but that's a different issue.
Comment 1 Jonathan Wakely 2023-06-01 08:28:43 UTC
(In reply to Roger Sayle from comment #0)
> Fortunately, there's already code in the depth of the C++ front=end for
> marking such initializer lists/constructors as static,

Jason looked into this recently. There are problems with making it static (although in this case where the addresses of the elements don't escape maybe it would be OK).
Comment 2 GCC Commits 2023-06-02 15:01:30 UTC
The trunk branch has been updated by Jason Merrill <jason@gcc.gnu.org>:

https://gcc.gnu.org/g:4d935f52b0d5c00fcc154461b87415ebd8791a94

commit r14-1500-g4d935f52b0d5c00fcc154461b87415ebd8791a94
Author: Jason Merrill <jason@redhat.com>
Date:   Wed Dec 7 11:40:53 2022 -0500

    c++: make initializer_list array static again [PR110070]
    
    After the maybe_init_list_as_* patches, I noticed that we were putting the
    array of strings into .rodata, but then memcpying it into an automatic
    array, which is pointless; we should be able to use it directly.
    
    This doesn't happen automatically because TREE_ADDRESSABLE is set (since
    r12-657 for PR100464), and so gimplify_init_constructor won't promote the
    variable to static.  Theoretically we could do escape analysis to recognize
    that the address, though taken, never leaves the function; that would allow
    promotion when we're only using the address for indexing within the
    function, as in initlist-opt2.C.  But this would be a new pass.
    
    And in initlist-opt1.C, we're passing the array address to another function,
    so it definitely escapes; it's only safe in this case because it's calling a
    standard library function that we know only uses it for indexing.  So, a
    flag seems needed.  I first thought to put the flag on the TARGET_EXPR, but
    the VAR_DECL seems more appropriate.
    
    In a previous revision of the patch I called this flag DECL_NOT_OBSERVABLE,
    but I think DECL_MERGEABLE is a better name, especially if we're going to
    apply it to the backing array of initializer_list, which is observable.  I
    then also check it in places that check for -fmerge-all-constants, so that
    multiple equivalent initializer-lists can also be combined.  And then it
    seemed to make sense for [[no_unique_address]] to have this meaning for
    user-written variables.
    
    I think the note in [dcl.init.list]/6 intended to allow this kind of merging
    for initializer_lists, but it didn't actually work; for an explicit array
    with the same initializer, if the address escapes the program could tell
    whether the same variable in two frames have the same address.  P2752 is
    trying to correct this defect, so I'm going to assume that this is the
    intent.
    
            PR c++/110070
            PR c++/105838
    
    gcc/ChangeLog:
    
            * tree.h (DECL_MERGEABLE): New.
            * tree-core.h (struct tree_decl_common): Mention it.
            * gimplify.cc (gimplify_init_constructor): Check it.
            * cgraph.cc (symtab_node::address_can_be_compared_p): Likewise.
            * varasm.cc (categorize_decl_for_section): Likewise.
    
    gcc/cp/ChangeLog:
    
            * call.cc (maybe_init_list_as_array): Set DECL_MERGEABLE.
            (convert_like_internal) [ck_list]: Set it.
            (set_up_extended_ref_temp): Copy it.
            * tree.cc (handle_no_unique_addr_attribute): Set it.
    
    gcc/testsuite/ChangeLog:
    
            * g++.dg/tree-ssa/initlist-opt1.C: Check for static array.
            * g++.dg/tree-ssa/initlist-opt2.C: Likewise.
            * g++.dg/tree-ssa/initlist-opt4.C: New test.
            * g++.dg/opt/icf1.C: New test.
            * g++.dg/opt/icf2.C: New test.
            * g++.dg/opt/icf3.C: New test.
            * g++.dg/tree-ssa/array-temp1.C: Revert r12-657 change.
Comment 3 Jason Merrill 2023-06-02 15:34:10 UTC
Fixed for GCC 14 so far.
Comment 4 Francois-Xavier Coudert 2023-08-18 17:52:19 UTC
The tests introduced by the commit above all fail on Darwin, both on Intel and ARM: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111067
Comment 5 Jason Merrill 2024-07-24 13:45:39 UTC
Fixed in GCC 14.