Bug 46785 - Doesn't vectorize reduction x += y*y
Summary: Doesn't vectorize reduction x += y*y
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.6.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2010-12-03 14:54 UTC by Richard Biener
Modified: 2010-12-06 10:09 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-12-05 11:30:05


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2010-12-03 14:54:36 UTC
When looking at why GCC is so slow with the himeno benchmark in the usual
Phoronix testing I noticed that we do not vectorize the reduction in

float x[1024];
float
test (void)
{
  int i;
  float gosa = 0.0;
  for (i = 0; i < 1024; ++i)
    {
      float tem = x[i];
      gosa += tem * tem;
    }
  return gosa;
}

because at analysis time we have

D.3171_6 = __builtin_powf (tem_5, 2.0e+0);

as the def for the addition which doesn't satisfy is_gimple_assign
nor any of the vinfo tests:

$3 = {type = undef_vec_info_type, live = 0 '\000', in_pattern_p = 0 '\000', 
  read_write_dep = 0 '\000', stmt = 0x7ffff7edc908, loop_vinfo = 0x18f77e0, 
  vectype = 0x0, vectorized_stmt = 0x0, data_ref_info = 0x0, 
  dr_base_address = 0x0, dr_init = 0x0, dr_offset = 0x0, dr_step = 0x0, 
  dr_aligned_to = 0x0, related_stmt = 0x0, same_align_refs = 0x18cf7f0, 
  def_type = vect_internal_def, slp_type = loop_vect, first_dr = 0x0, 
  next_dr = 0x0, same_dr_stmt = 0x0, size = 0, store_count = 0, gap = 0, 
  relevant = vect_unused_in_scope, cost = {outside_of_loop = 0, 
    inside_of_loop = 0}, bb_vinfo = 0x0, vectorizable = 1 '\001'}

As we want to allow internal defs we can also just let calls slip through
here (so we vectorize reductions with veclib vectorized calls as well).

Ira?
Comment 1 Richard Biener 2010-12-03 15:39:49 UTC
Btw, I wonder why we bother to check the defs at all and not just do

     def1 && flow_bb_inside_loop_p (loop, gimple_bb (def1))

we should be able to handle all vectorizable reduction operands, and
their vectorizability will be determined anyway.
Comment 2 Ira Rosen 2010-12-05 08:31:38 UTC
 
> As we want to allow internal defs we can also just let calls slip through
> here (so we vectorize reductions with veclib vectorized calls as well).

Right.

(In reply to comment #1)
> Btw, I wonder why we bother to check the defs at all and not just do
> 
>      def1 && flow_bb_inside_loop_p (loop, gimple_bb (def1))
> 
> we should be able to handle all vectorizable reduction operands, and
> their vectorizability will be determined anyway.

This checks that the other def is not a reduction too.

Ira
Comment 3 Richard Biener 2010-12-05 11:30:05 UTC
I have a tested patch.
Comment 4 Richard Biener 2010-12-06 10:05:11 UTC
Author: rguenth
Date: Mon Dec  6 10:05:07 2010
New Revision: 167486

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167486
Log:
2010-12-06  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/46785
	* tree-vect-loop.c (vect_is_simple_reduction_1): Also allow
	call statements as operand definition.

	* gcc.dg/vect/fast-math-vect-reduc-9.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-reduc-9.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-loop.c
Comment 5 Richard Biener 2010-12-06 10:09:08 UTC
Fixed on trunk.