[Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Feb 16 10:51:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|WAITING |NEW
CC| |rguenth at gcc dot gnu.org
Component|middle-end |tree-optimization
Summary|GCC 4.6 performance |GCC 4.6 performance
|regression (vs. 4.4/4.5) |regression (vs. 4.4/4.5),
| |PRE increases register
| |pressure
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
As for movaps vs. movups when movaps actually works shouldn't make any
difference on modern architectures. So I wonder if you could share the exact
CPU type
you are using?
We are putting quite heavy register-pressure on the thing by means of
partial redundancy elimination, thus disabling PRE using -fno-tree-pre
might help (we still spill a lot).
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts: 103296 c/s real, 103296 c/s virtual
Only one salt: 100736 c/s real, 100736 c/s virtual
improves to
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts: 126848 c/s real, 126848 c/s virtual
Only one salt: 123008 c/s real, 123008 c/s virtual
with that for me (gcc 4.8, SSE2). Which is close to what 4.5.3 gets for me:
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts: 128384 c/s real, 128384 c/s virtual
Only one salt: 124800 c/s real, 124800 c/s virtual
albeit that doesn't need -fno-tree-pre to fix things.
Note that we have to use movups because DES_bs_all is not aligned as seen
from DES_bs_b.c (it's defined in DES_bs.c and only there annotated with
CC_CACHE_ALIGN, not at the point of declaration in DES_bs.h). So the
unaligned moves are the sources fault. Annotating that with CC_CACHE_ALIGN
produces the desired movaps instructions (with no effect on performance for
me).
I think for the effect of PRE increasing register pressure we do have some
duplicate bugs (but no good heuristic to fix anything). LIM store-motion can
have the very same issue.
More information about the Gcc-bugs
mailing list