Bug 81719

Summary: Range-based for loop on short fixed size array generates long unrolled loop
Product: gcc Reporter: John Zwinck <jzwinck>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: dushistov, webrown.cpp
Priority: P3 Keywords: missed-optimization
Version: 7.1.0   
Target Milestone: ---   
URL: https://stackoverflow.com/questions/45496987
Host: Target: x86_64
Build: Known to work: 8.0
Known to fail: 4.7.4, 7.1.0 Last reconfirmed: 2017-08-08 00:00:00

Description John Zwinck 2017-08-04 15:48:44 UTC
C++11 range-based for loops over arrays of size known at compile time result in  bloated, branchy, and unreachable code with -O3 optimization.  For example:

    typedef int Items[2];

    struct ItemArray
    {
        Items items;
        int sum_x2() const;
    };

    int ItemArray::sum_x2() const
    {
        int total = 0;
        for (int item : items)
        {
            total += item;
        }
        return total;
    }

Clang compiles the above to [mov, add, ret].  GCC with -O2 compiles it to a few more than that, and with -O3, a whopping 81 instructions.  Add -march=haswell and behold about 130 instructions to add two ints.

GCC (all versions, 4 to 7) generates code to handle a variable-sized array up to about 6 to 14 elements, depending on -march.  The number of elements is known at compile time to be 2 (other small values also elicit the bug).  GCC should generate three instructions in both -O2 and -O3.  It actually does, if sum_x2() is a free function instead of a member function.  The problem also goes away if you use a C-style loop.

There are lots of permutations of this, including using a range-based for loop to assign a common value to every element of an array whose size is known at compile time (120 instructions to assign a single int: https://godbolt.org/g/BGYggD).

Discussion on Stack Overflow: https://stackoverflow.com/questions/45496987/gcc-optimizes-fixed-range-based-for-loop-as-if-it-had-longer-variable-length
Comment 1 Richard Biener 2017-08-08 08:10:49 UTC
We are not able to analyze the number of iterations and end up vectorizing the function.

t.C:12:19: note: Symbolic number of iterations is (((unsigned long) __for_end_6 - (unsigned long) ((const int *) __for_range_5 + 4)) /[ex] 4 & 4611686018427387903) + 1


  <bb 2> [15.00%]:
  __for_range_5 = &this_4(D)->items;
  __for_end_6 = &MEM[(void *)this_4(D) + 8B];

  <bb 3> [85.00%]:
  # total_11 = PHI <total_9(4), 0(2)>
  # __for_begin_14 = PHI <__for_begin_10(4), __for_range_5(2)>
  item_8 = *__for_begin_14;
  total_9 = item_8 + total_11;
  __for_begin_10 = __for_begin_14 + 4;
  if (__for_end_6 == __for_begin_10)
    goto <bb 5>; [15.00%]
  else
    goto <bb 4>; [85.00%]

  <bb 4> [72.25%]:
  goto <bb 3>; [100.00%]

  <bb 5> [15.00%]:
  # total_12 = PHI <total_9(3)>

it looks like expand_simple_operations fails to handle ADDR_EXPR in

  e = gimple_assign_rhs1 (stmt);
  code = gimple_assign_rhs_code (stmt);
  if (get_gimple_rhs_class (code) == GIMPLE_SINGLE_RHS)
    {
      if (is_gimple_min_invariant (e))
        return e;

      if (code == SSA_NAME)
        return expand_simple_operations (e, stop);

      return expr;
    }

but when it is sth that just mimics a POINTER_PLUS_EXPR it should.

Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c   (revision 250813)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.
 #include "tree-chrec.h"
 #include "tree-scalar-evolution.h"
 #include "params.h"
+#include "tree-dfa.h"
 
 
 /* The maximum number of dominator BBs we search for conditions
@@ -1980,6 +1981,21 @@ expand_simple_operations (tree expr, tre
 
       if (code == SSA_NAME)
        return expand_simple_operations (e, stop);
+      else if (code == ADDR_EXPR)
+       {
+         HOST_WIDE_INT offset;
+         tree base = get_addr_base_and_unit_offset (TREE_OPERAND (e, 0),
+                                                    &offset);
+         if (base
+             && TREE_CODE (base) == MEM_REF)
+           {
+             ee = expand_simple_operations (TREE_OPERAND (base, 0), stop);
+             return fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (expr), ee,
+                                 wide_int_to_tree (sizetype,
+                                                   mem_ref_offset (base)
+                                                   + offset));
+           }
+       }
 
       return expr;
     }
Comment 2 Richard Biener 2017-08-08 12:51:34 UTC
Fixed on trunk.
Comment 3 Richard Biener 2017-08-08 12:51:53 UTC
Author: rguenth
Date: Tue Aug  8 12:51:20 2017
New Revision: 250954

URL: https://gcc.gnu.org/viewcvs?rev=250954&root=gcc&view=rev
Log:
2017-08-08  Richard Biener  <rguenther@suse.de>

	PR middle-end/81719
	* tree-ssa-loop-niter.c: Include tree-dfa.h.
	(expand_simple_operations): Also look through ADDR_EXPRs with
	MEM_REF bases treating them as POINTER_PLUS_EXPR.

	* g++.dg/tree-ssa/pr81719.C: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr81719.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-niter.c
Comment 4 Aldy Hernandez 2017-09-13 16:40:46 UTC
Author: aldyh
Date: Wed Sep 13 16:40:14 2017
New Revision: 252344

URL: https://gcc.gnu.org/viewcvs?rev=252344&root=gcc&view=rev
Log:
2017-08-08  Richard Biener  <rguenther@suse.de>

	PR middle-end/81719
	* tree-ssa-loop-niter.c: Include tree-dfa.h.
	(expand_simple_operations): Also look through ADDR_EXPRs with
	MEM_REF bases treating them as POINTER_PLUS_EXPR.

	* g++.dg/tree-ssa/pr81719.C: New testcase.

Added:
    branches/range-gen2/gcc/testsuite/g++.dg/tree-ssa/pr81719.C
Modified:
    branches/range-gen2/gcc/ChangeLog
    branches/range-gen2/gcc/testsuite/ChangeLog
    branches/range-gen2/gcc/tree-ssa-loop-niter.c