[Bug target/87561] [9 Regression] 416.gamess is slower by ~10% starting from r264866 with -Ofast
rsandifo at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Oct 9 18:18:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87561
--- Comment #5 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> Another thing is the too complicated alias check where for
>
> (gdb) p debug_data_reference (dr_a.dr)
> #(Data Ref:
> # bb: 14
> # stmt: _28 = *xpqkl_172(D)[_27];
> # ref: *xpqkl_172(D)[_27];
> # base_object: *xpqkl_172(D);
> # Access function 0: {(((integer(kind=8)) mkl_203 + 1) * stride.33_148 +
> offset.34_149) + _480, +, stride.33_148}_6
> #)
> $9 = void
> (gdb) p debug_data_reference (dr_b.dr)
> #(Data Ref:
> # bb: 14
> # stmt: *xpqkl_172(D)[_50] = _65;
> # ref: *xpqkl_172(D)[_50];
> # base_object: *xpqkl_172(D);
> # Access function 0: {(((integer(kind=8)) mkl_203 + 1) * stride.33_148 +
> offset.34_149) + _486, +, stride.33_148}_6
> #)
>
> we generate
>
> (ssizetype) (((sizetype) ((((integer(kind=8)) mkl_203 + 1) * stride.33_148 +
> offset.34_149) + (integer(kind=8)) (_19 + jpack_161)) + (sizetype)
> stride.33_148) * 8) < (ssizetype) ((sizetype) ((((integer(kind=8)) mkl_203 +
> 1) * stride.33_148 + offset.34_149) + (integer(kind=8)) (_22 + lpack_164)) *
> 8) || (ssizetype) (((sizetype) ((((integer(kind=8)) mkl_203 + 1) *
> stride.33_148 + offset.34_149) + (integer(kind=8)) (_22 + lpack_164)) +
> (sizetype) stride.33_148) * 8) < (ssizetype) ((sizetype)
> ((((integer(kind=8)) mkl_203 + 1) * stride.33_148 + offset.34_149) +
> (integer(kind=8)) (_19 + jpack_161)) * 8)
>
> instead of simply _480 != _486 (well, OK, not _that_ simple).
>
> I guess we miss many of the "optimizations" we do when dealing with
> alias checks for constant steps. In this case sth obvious would be
> to special-case DR_STEP (dra) == DR_STEP (drb). Richard?
Not sure that would help much with the existing optimisations.
I think the closest we get is create_intersect_range_checks_index,
but "all" that avoids is scaling the index by the element size
and adding the common base. I guess the expensive bit here is
multiplying by the stride, but the index-based check would still
do that.
That said, create_intersect_range_checks_index does feel like it
might be a bit *too* conservative (but I'm not brave enough to relax it)
More information about the Gcc-bugs
mailing list