Request permission to delete gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c

Fri Dec 11 17:44:00 GMT 2015

On Fri, 2015-12-11 at 10:47 +0100, Richard Biener wrote:
> On Thu, Dec 10, 2015 at 8:33 PM, David Edelsohn <dje.gcc@gmail.com> wrote:
> > On Thu, Dec 10, 2015 at 2:23 PM, Bill Schmidt
> > <wschmidt@linux.vnet.ibm.com> wrote:
> >> Hi,
> >>
> >> The subject test case has been failing as follows:
> >>
> >> FAIL: gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c scan-tree-dump-times vect "vectorization not profitable" 1
> >>
> >> The test has been failing since r223528, which is:
> >>
> >> 2015-05-22  Richard Biener  <rguenther@suse.de>
> >>
> >>         PR tree-optimization/65701
> >>         * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
> >>         Move peeling cost models into one place.  Peel for alignment
> >>         for single loads only if an aligned load is cheaper than
> >>         an unaligned load.
> >>
> >> Thus with that modification, gcc now vectorizes the loop that was
> >> previously deemed unprofitable to vectorize.  As a result, the test case
> >> no longer has any reason to exist, and I would like to delete it.
> 
> Just curious - why was it not profitable before but is now?  The only
> thing that has changed is we no longer require peeling for gaps(?)
>
> Thus, did you check with -fno-vect-cost-model before/after the rev.?
> 

Right -- so, with the cost model disabled, before and after we vectorize
differently.  Previously we would vectorize by applying peeling to force
alignment.  After the change, we vectorize because the unaligned
accesses are recognized as supported by hardware.  So it's the "Peel for
alignment for single loads only if an aligned load is cheaper than an
unaligned load" that's kicking in here, I imagine.

Note that I almost always am testing on POWER8 hardware these days, for
which unaligned vector accesses are cost-effective.  With earlier
hardware, not so much, and the cost modeling reflects this.  So testing
with earlier machines I expect that the test would continue to "succeed"
by not vectorizing.  Alternatively, we could change the test to require
inefficient unaligned access and keep it around, but eventually that
would mean that it just becomes obsolete and nobody notices it.  Thus
I'd prefer to just kill the test.

On POWER8, the resulting code with r223528 is much tighter than with
r223527, because we no longer have the unnecessary loop peeling.  With
-fno-vect-cost-model for both before and after, static instruction
counts drop from 85 to 30, and the loop body is also much cleaner.

Thanks,
Bill

> We might also do outer loop vectorization if the inner loop is not unrolled?
> 
> Richard.
> 
> >> Ok for trunk?
> >>
> >> Thanks,
> >> Bill
> >>
> >>
> >> [gcc/testsuite]
> >>
> >> 2015-12-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >>
> >>         * gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c:
> >>         Delete.
> >
> > Okay with me.
> >
> > Thanks, David
>