Created attachment 34196 [details]
When comparing version 4.4.2 with version 4.9.2 I noticed a performance decrease for the Whetstone benchmark. It seems to have been introduced with r173250. Compiling Whetstone with the latest master (r217599) I get the following numbers:
Loops: 5000000, Iterations: 1, Duration: 54 sec.
C Converted Double Precision Whetstones: 9259.3 MIPS
Loops: 5000000, Iterations: 1, Duration: 58 sec.
C Converted Double Precision Whetstones: 8620.7 MIPS
The assembly output has also increased in size.
I have attached a preprocessed copy of the Whetstone benchmark and assembly output for i686-build_pc-linux-gnu compiled with -O3, with and without r173250.
Let me know if you need any more information.
Created attachment 34197 [details]
assembly output with r173250
Created attachment 34198 [details]
assembly output without r173250
I will have a look. Note the rev. was backported to GCC 4.5 and up.
Created attachment 34229 [details]
Ok, I see the regression introduced by that rev., but on trunk (r218479)
I get the same code generated with/without the attached fix. I'm using
-O3 -m32 -march=i686 on x86_64-linux.
Probably the regression was mitigated by the partial fix for PR63677:
2014-11-20 Richard Biener <email@example.com>
* tree-ssa-dom.c: Include gimplify.h for unshare_expr.
(avail_exprs_stack): Make a vector of pairs.
(struct hash_expr_elt): Replace stmt member with vop member.
where DOM now catches the CSE opportunities FRE no longer did.
> Probably the regression was mitigated by the partial fix for PR63677:
Yes, that seems to be the case for my attached example. Do you think that the regression is mitigated in general, or is your attached fix also needed?
On Tue, 9 Dec 2014, cederman at gaisler dot com wrote:
> --- Comment #5 from Daniel Cederman <cederman at gaisler dot com> ---
> > Probably the regression was mitigated by the partial fix for PR63677:
> Yes, that seems to be the case for my attached example. Do you think that the
> regression is mitigated in general, or is your attached fix also needed?
I am testing my fix and will install it nevertheless - it is required
to get full optimistic value-numbering again. It would be also the
fix that is appropriate for backporting (if any of them is).
AFAICT if there is any speedup with the patch in comment 5, it is negligible on a 2.8Ghz Corei7 and x86_64-apple-darwin14: ~7.1s with/without the patch. Note that the timing in the test is not very precise.
On Tue, 9 Dec 2014, dominiq at lps dot ens.fr wrote:
> --- Comment #7 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> AFAICT if there is any speedup with the patch in comment 5, it is negligible on
> a 2.8Ghz Corei7 and x86_64-apple-darwin14: ~7.1s with/without the patch. Note
> that the timing in the test is not very precise.
As said the patch is more interesting on the 4.9 branch where the DOM
improvements do not mitigate its effect (or with -fno-tree-dominator-opts)
Fixed on trunk sofar.
Date: Tue Dec 9 14:25:09 2014
New Revision: 218515
2014-12-09 Richard Biener <firstname.lastname@example.org>
* tree-ssa-alias.c (walk_non_aliased_vuses): Add valueize parameter
and valueize the VUSE before looking up the def stmt.
* tree-ssa-alias.h (walk_non_aliased_vuses): Adjust prototype.
* tree-ssa-sccvn.c (vn_reference_lookup_pieces): Pass vn_valueize
* tree-ssa-dom.c (lookup_avail_expr): Pass NULL as valueize
callback to walk_non_aliased_vuses.
* gcc.dg/tree-ssa/ssa-fre-43.c: New testcase.
GCC 4.8.4 has been released.
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
GCC 4.9.3 has been released.
Fixed in 5.0, not backporting as this is too invasive.