Bug 46787 - Does not vectorize loop with load from scalar variable
Summary: Does not vectorize loop with load from scalar variable
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.7.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2010-12-03 15:27 UTC by Richard Biener
Modified: 2011-06-30 13:28 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-04-21 14:39:11


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2010-12-03 15:27:41 UTC
When looking at why GCC is so slow with the himeno benchmark in the usual
Phoronix testing I noticed that we do not vectorize

float *x;
float parm;
float
test (int start, int end)
{
  int i;
  for (i = start; i < end; ++i)
    {
      float tem = x[i];
      x[i] = parm * tem;
    }
}

because there is a scalar non-varying load of parm that, when the loop
rolls just a single time is aliased by the store to x[i].

We are though vectorizing with at least a vectorization factor of two,
which means that x cannot validly point to parm (and a vector store
would exceed the scalar variables size, something that after vectorization
alias-analysis would use to disambiguate the vector store and the load).

Thus we can treat loads of scalar decls as if they were done outside of
the loop and vectorize it as

     D.1234_3 = tem;
     vec_tmp_4 = { D.1234_3, D.1234_3 };

thus, as if D.1234_3 were a vect_external_def and mark the statement
itself as not relevant.
Comment 1 Richard Biener 2010-12-03 15:37:16 UTC
To fix:

      /* FIXME -- data dependence analysis does not work correctly for objects
         with invariant addresses in loop nests.  Let us fail here until the
         problem is fixed.  */
      if (dr_address_invariant_p (dr) && nest)
        {
          free_data_ref (dr);
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "\tFAILED as dr address is invariant\n");
          ret = false;
          break;
        }

for this to work and be a real improvement we need to pass down the
minimum iterations of the loop (thus, the minimum vectorization factor
in case of the vectorizer).

We can also fix it up locally in the vectorizer when compute_data_dependences_for_loop would not re-scan the loop
for data references (but we'd do it in the vectorizer and remove
the scalar loads).

We can also avoid computing data dependences until
vect_analyze_data_ref_dependences then and save some compile-time for
non-vectorized loops.
Comment 2 Richard Biener 2011-04-21 14:39:11 UTC
Mine.
Comment 3 Richard Biener 2011-06-30 13:27:47 UTC
Author: rguenth
Date: Thu Jun 30 13:27:43 2011
New Revision: 175704

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175704
Log:
2011-06-30  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/46787
	* tree-data-ref.c (dr_address_invariant_p): Remove.
	(find_data_references_in_stmt): Invariant accesses are ok now.
	* tree-vect-stmts.c (vectorizable_load): Handle invariant
	loads.
	* tree-vect-data-refs.c (vect_analyze_data_ref_access): Allow
	invariant loads.

	* gcc.dg/vect/vect-121.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/vect-121.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-data-ref.c
    trunk/gcc/tree-vect-data-refs.c
    trunk/gcc/tree-vect-stmts.c
Comment 4 Richard Biener 2011-06-30 13:28:06 UTC
Fixed.