Bug 57380 - [4.7/4.8 Regression] GCC 4.9.0 will not vectorize std::max and similar functions
Summary: [4.7/4.8 Regression] GCC 4.9.0 will not vectorize std::max and similar functions
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P2 normal
Target Milestone: 4.7.4
Assignee: Richard Biener
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-22 21:42 UTC by Jeremiah Willcock
Modified: 2014-06-30 02:00 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.7, 4.5.4, 4.9.0
Known to fail: 4.6.3, 4.7.2, 4.8.1
Last reconfirmed: 2013-05-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremiah Willcock 2013-05-22 21:42:17 UTC
It appears that having a function that returns a const reference to one of its arguments causes vectorization of calls to that function to fail.  Here is a simple test program:

struct my_array {
  int data[4] __attribute__((aligned(16)));
};

const int& my_max(const int& a, const int& b) {
  return a < b ? b : a;
}

int f(my_array a, my_array b) {
  int res = 0;
  for (int i = 0; i < 4; ++i) {
    res += my_max(a.data[i], b.data[i]);
  }
  return res;
}

The signature of my_max is a specialization of the one of std::max; std::max itself has similar problems.  The loop will vectorize without trouble if my_max returns "int".  The main errors from the vectorization report seem to be:

vec_min_max.cpp:11: note: not vectorized: not suitable for gather load _6 = *iftmp.0_12;

vec_min_max.cpp:11: note: bad data references.

Other variants of the code get "control flow in loop" instead.  The flags I am using are:

-ftree-vectorizer-verbose=4 -Ofast -march=nocona

but the code should be able to vectorize under SSE2.  The GCC version I am using is "g++ (GCC) 4.9.0 20130519 (experimental)" on x86-64.  GCC 4.7.2 has a similar error, while "4.4.7 20120313 (Red Hat 4.4.7-3)" can vectorize it without problems.
Comment 1 Jeremiah Willcock 2013-05-22 21:47:40 UTC
I tested version "(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" and that fails as well (with "note: not vectorized: data ref analysis failed D.2097_9 = *D.2115_16;" in the vectorization report).
Comment 2 Richard Biener 2013-05-23 08:35:55 UTC
For some reason phiprop doesn't work here.  I'll investigate, but I guess
it is because the addresses are not constant but depend on the loop IV:

  <bb 3>:
  _3 = &b.data[i_2];
  _4 = &a.data[i_2];
  _10 = MEM[(const int &)&a].data[i_2];
  _11 = MEM[(const int &)&b].data[i_2];
  if (_10 < _11)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 4>:

  <bb 5>:
  # iftmp.0_12 = PHI <_3(3), _4(4)>

  <bb 6>:
  _6 = *iftmp.0_12;
Comment 3 Richard Biener 2013-05-23 09:20:25 UTC
There was a deliberate change to require at least one invariant address or
a re-use of a previous load.

 2010-07-08  Richard Guenther  <rguenther@suse.de>
 
+       PR tree-optimization/44831
+       * tree-ssa-phiprop.c (phiprop_insert_phi): Properly build
+       a MEM_REF preserving TBAA info of the original dereference.
+       Dereference the original pointer if the address is not
+       invariant.
+       (propagate_with_phi): Fixup type checks wrt MEM_REFs.  Require
+       at least one invariant address that we are going to dereference.

I have a fix.
Comment 4 Richard Biener 2013-05-23 12:24:36 UTC
Author: rguenth
Date: Thu May 23 12:23:59 2013
New Revision: 199246

URL: http://gcc.gnu.org/viewcvs?rev=199246&root=gcc&view=rev
Log:
2013-05-23  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/57380
	* tree-ssa-phiprop.c (propagate_with_phi): Do not require at
	least one invariant or re-used load.
	* passes.c (init_optimization_passes): Move pass_phiprop before
	pass_forwprop.

	* g++.dg/tree-ssa/pr57380.C: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr57380.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/passes.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-phiprop.c
Comment 5 Jeffrey A. Law 2014-01-16 21:04:06 UTC
Richi fixed on the trunk many months ago.
Comment 6 Jeffrey A. Law 2014-01-16 21:04:41 UTC
fixed on trunk
Comment 7 Andrew Pinski 2014-06-30 02:00:47 UTC
This testcase fails with -fPIC.