Bug 64193

Summary: [4.9 Regression] Decreased performance after r173250
Product: gcc Reporter: Daniel Cederman <cederman>
Component: tree-optimizationAssignee: Richard Biener <rguenth>
Status: RESOLVED FIXED    
Severity: normal CC: ebotcazou, joel.sherrill, sebastian.huber
Priority: P2 Keywords: missed-optimization
Version: 4.9.2   
Target Milestone: 5.0   
Host: Target: i686-build_pc-linux-gnu
Build: Known to work: 5.0
Known to fail: 4.8.3, 4.9.2 Last reconfirmed: 2014-12-08 00:00:00
Bug Depends on: 64312    
Bug Blocks:    
Attachments: preprocessed source
assembly output with r173250
assembly output without r173250
patch

Description Daniel Cederman 2014-12-05 12:29:18 UTC
Created attachment 34196 [details]
preprocessed source

When comparing version 4.4.2 with version 4.9.2 I noticed a performance decrease for the Whetstone benchmark. It seems to have been introduced with r173250. Compiling Whetstone with the latest master (r217599) I get the following numbers:

Without r173250:
Loops: 5000000, Iterations: 1, Duration: 54 sec.
C Converted Double Precision Whetstones: 9259.3 MIPS

With r173250:
Loops: 5000000, Iterations: 1, Duration: 58 sec.
C Converted Double Precision Whetstones: 8620.7 MIPS

The assembly output has also increased in size.

I have attached a preprocessed copy of the Whetstone benchmark and assembly output for i686-build_pc-linux-gnu compiled with -O3, with and without r173250.

Let me know if you need any more information.
Comment 1 Daniel Cederman 2014-12-05 12:30:20 UTC
Created attachment 34197 [details]
assembly output with r173250
Comment 2 Daniel Cederman 2014-12-05 12:31:05 UTC
Created attachment 34198 [details]
assembly output without r173250
Comment 3 Richard Biener 2014-12-08 13:40:01 UTC
I will have a look.  Note the rev. was backported to GCC 4.5 and up.
Comment 4 Richard Biener 2014-12-09 10:48:04 UTC
Created attachment 34229 [details]
patch

Ok, I see the regression introduced by that rev., but on trunk (r218479)
I get the same code generated with/without the attached fix.  I'm using
-O3 -m32 -march=i686 on x86_64-linux.

Probably the regression was mitigated by the partial fix for PR63677:

2014-11-20   Richard Biener  <rguenther@suse.de>

        PR tree-optimization/63677
        * tree-ssa-dom.c: Include gimplify.h for unshare_expr.
        (avail_exprs_stack): Make a vector of pairs.
        (struct hash_expr_elt): Replace stmt member with vop member.
        (expr_elt_hasher::equal): Simplify.
...

where DOM now catches the CSE opportunities FRE no longer did.
Comment 5 Daniel Cederman 2014-12-09 13:12:18 UTC
> Probably the regression was mitigated by the partial fix for PR63677:

Yes, that seems to be the case for my attached example. Do you think that the regression is mitigated in general, or is your attached fix also needed?
Comment 6 rguenther@suse.de 2014-12-09 13:40:28 UTC
On Tue, 9 Dec 2014, cederman at gaisler dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64193
> 
> --- Comment #5 from Daniel Cederman <cederman at gaisler dot com> ---
> > Probably the regression was mitigated by the partial fix for PR63677:
> 
> Yes, that seems to be the case for my attached example. Do you think that the
> regression is mitigated in general, or is your attached fix also needed?

I am testing my fix and will install it nevertheless - it is required
to get full optimistic value-numbering again.  It would be also the
fix that is appropriate for backporting (if any of them is).
Comment 7 Dominique d'Humieres 2014-12-09 13:57:14 UTC
AFAICT if there is any speedup with the patch in comment 5, it is negligible on a 2.8Ghz Corei7 and x86_64-apple-darwin14: ~7.1s with/without the patch. Note that the timing in the test is not very precise.
Comment 8 rguenther@suse.de 2014-12-09 13:58:49 UTC
On Tue, 9 Dec 2014, dominiq at lps dot ens.fr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64193
> 
> --- Comment #7 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> AFAICT if there is any speedup with the patch in comment 5, it is negligible on
> a 2.8Ghz Corei7 and x86_64-apple-darwin14: ~7.1s with/without the patch. Note
> that the timing in the test is not very precise.

As said the patch is more interesting on the 4.9 branch where the DOM
improvements do not mitigate its effect (or with -fno-tree-dominator-opts)
Comment 9 Richard Biener 2014-12-09 14:25:34 UTC
Fixed on trunk sofar.
Comment 10 Richard Biener 2014-12-09 14:25:42 UTC
Author: rguenth
Date: Tue Dec  9 14:25:09 2014
New Revision: 218515

URL: https://gcc.gnu.org/viewcvs?rev=218515&root=gcc&view=rev
Log:
2014-12-09  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/64193
	* tree-ssa-alias.c (walk_non_aliased_vuses): Add valueize parameter
	and valueize the VUSE before looking up the def stmt.
	* tree-ssa-alias.h (walk_non_aliased_vuses): Adjust prototype.
	* tree-ssa-sccvn.c (vn_reference_lookup_pieces): Pass vn_valueize
	to walk_non_aliased_vuses.
	(vn_reference_lookup): Likewise.
	* tree-ssa-dom.c (lookup_avail_expr): Pass NULL as valueize
	callback to walk_non_aliased_vuses.

	* gcc.dg/tree-ssa/ssa-fre-43.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-43.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-alias.c
    trunk/gcc/tree-ssa-alias.h
    trunk/gcc/tree-ssa-dom.c
    trunk/gcc/tree-ssa-sccvn.c
Comment 11 Jakub Jelinek 2014-12-19 13:34:47 UTC
GCC 4.8.4 has been released.
Comment 12 Richard Biener 2015-06-23 08:17:32 UTC
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Comment 13 Jakub Jelinek 2015-06-26 19:58:42 UTC
GCC 4.9.3 has been released.
Comment 14 Richard Biener 2016-02-11 09:48:27 UTC
Fixed in 5.0, not backporting as this is too invasive.