[Bug target/87561] [9 Regression] 416.gamess is slower by ~10% starting from r264866 with -Ofast

rsandifo at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Oct 9 18:18:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87561

--- Comment #5 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> Another thing is the too complicated alias check where for
> 
> (gdb) p debug_data_reference (dr_a.dr)
> #(Data Ref: 
> #  bb: 14 
> #  stmt: _28 = *xpqkl_172(D)[_27];
> #  ref: *xpqkl_172(D)[_27];
> #  base_object: *xpqkl_172(D);
> #  Access function 0: {(((integer(kind=8)) mkl_203 + 1) * stride.33_148 +
> offset.34_149) + _480, +, stride.33_148}_6
> #)
> $9 = void
> (gdb) p debug_data_reference (dr_b.dr)
> #(Data Ref: 
> #  bb: 14 
> #  stmt: *xpqkl_172(D)[_50] = _65;
> #  ref: *xpqkl_172(D)[_50];
> #  base_object: *xpqkl_172(D);
> #  Access function 0: {(((integer(kind=8)) mkl_203 + 1) * stride.33_148 +
> offset.34_149) + _486, +, stride.33_148}_6
> #)
> 
> we generate
> 
> (ssizetype) (((sizetype) ((((integer(kind=8)) mkl_203 + 1) * stride.33_148 +
> offset.34_149) + (integer(kind=8)) (_19 + jpack_161)) + (sizetype)
> stride.33_148) * 8) < (ssizetype) ((sizetype) ((((integer(kind=8)) mkl_203 +
> 1) * stride.33_148 + offset.34_149) + (integer(kind=8)) (_22 + lpack_164)) *
> 8) || (ssizetype) (((sizetype) ((((integer(kind=8)) mkl_203 + 1) *
> stride.33_148 + offset.34_149) + (integer(kind=8)) (_22 + lpack_164)) +
> (sizetype) stride.33_148) * 8) < (ssizetype) ((sizetype)
> ((((integer(kind=8)) mkl_203 + 1) * stride.33_148 + offset.34_149) +
> (integer(kind=8)) (_19 + jpack_161)) * 8)
> 
> instead of simply _480 != _486 (well, OK, not _that_ simple).
> 
> I guess we miss many of the "optimizations" we do when dealing with
> alias checks for constant steps.  In this case sth obvious would be
> to special-case DR_STEP (dra) == DR_STEP (drb).  Richard?
Not sure that would help much with the existing optimisations.
I think the closest we get is create_intersect_range_checks_index,
but "all" that avoids is scaling the index by the element size
and adding the common base.  I guess the expensive bit here is
multiplying by the stride, but the index-based check would still
do that.

That said, create_intersect_range_checks_index does feel like it
might be a bit *too* conservative (but I'm not brave enough to relax it)


More information about the Gcc-bugs mailing list