Bug 103990 - [12 Regression] 541.leela_r slower by 4.5-6% with PGO+LTO -Ofast -march=native in the first week of January 2022
Summary: [12 Regression] 541.leela_r slower by 4.5-6% with PGO+LTO -Ofast -march=nativ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: 12.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2022-01-12 13:15 UTC by Martin Jambor
Modified: 2024-09-21 04:35 UTC (History)
3 users (show)

See Also:
Host: x86_64-linux
Target: x86_64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-01-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Jambor 2022-01-12 13:15:35 UTC
LNT reports that 541.leela_r from SPEC 2017 intrate suite regressed
when compiled with both PGO and LTO with -Ofast -march=native on all
machines in the first week of January:

zen3: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=477.397.0
zen2: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=286.397.0
zen1: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=17.397.0
kaby: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=16.397.0

On my zen2 desktop I have bisected the regression, or at least most of
it, to  r12-6208-gebc853deb7cc04:

  ebc853deb7cc0487de9ef6e891a007ba853d1933 is the first bad commit
  commit ebc853deb7cc0487de9ef6e891a007ba853d1933
  Author: Richard Biener <rguenther@suse.de>
  Date:   Tue Jan 4 11:59:35 2022 +0100

    tree-optimization/103690 - not up-to-date SSA and PRE DCE

    This avoids running simple_dce_from_worklist on partially not up-to-date
    SSA form (in unreachable code regions) by scheduling CFG cleanup
    manually as is done anyway when tail-merging runs.

    2022-01-04  Richard Biener  <rguenther@suse.de>
            
            PR tree-optimization/103690
            * tree-pass.h (tail_merge_optimize): Adjust.
            * tree-ssa-tail-merge.c (tail_merge_optimize): Pass in whether
            to re-split critical edges, move CFG cleanup ...
            * tree-ssa-pre.c (pass_pre::execute): ... here, before
            simple_dce_from_worklist and delay freeing inserted_exprs from
            ...
            (fini_pre): .. here.
Comment 1 Richard Biener 2022-01-12 13:48:51 UTC
OK, so the only effect I can think of is that simple_dce_from_worklist can end up removing the last stmt in a BB and thus _eventually_ expose BB merging CFG cleanup opportunities.  I also notice that while tail_merge_optimize altered
todo by clearing TODO_cleanup_cfg, PRE just did (and still does)

-  todo |= tail_merge_optimize (todo);
+  todo |= tail_merge_optimize (todo, need_crit_edge_split);

so it would have retained TODO_cleanup_cfg, something we now do not.  The
code is all somewhat of a mess due to the embedded tail-merge and I tried
to do as little changes as possible this late in the cycle.

I'll try to reproduce and see if keeping TODO_cleanup_cfg around helps.
Comment 2 Richard Biener 2022-01-12 13:59:38 UTC
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index ab24fa98a1f..2bdfae5482f 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -4442,7 +4442,6 @@ pass_pre::execute (function *fun)
   if (todo & TODO_cleanup_cfg)
     {
       cleanup_tree_cfg ();
-      todo &= ~TODO_cleanup_cfg;
       need_crit_edge_split = true;
     }

should fix that
Comment 3 Martin Jambor 2022-01-12 14:30:06 UTC
(In reply to Richard Biener from comment #2)
> 
> should fix that

I can confirm that it does.
Comment 4 GCC Commits 2022-01-12 15:18:35 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:2f62294dec1f3af59dd7505c058b0af38c2d1524

commit r12-6527-g2f62294dec1f3af59dd7505c058b0af38c2d1524
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jan 12 15:25:07 2022 +0100

    tree-optimization/103990 - fix CFG cleanup regression from PRE change
    
    This adjusts the CFG cleanup flow back to what it was before the
    last change which fixes the observed regression of 541.leela_r with
    LTO and FDO.
    
    2022-01-12  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/103990
            * tree-pass.h (tail_merge_optimize): Drop unused argument.
            * tree-ssa-tail-merge.c (tail_merge_optimize): Likewise.
            * tree-ssa-pre.c (pass_pre::execute): Retain TODO_cleanup_cfg
            and adjust call to tail_merge_optimize.
Comment 5 Richard Biener 2022-01-12 15:18:43 UTC
Fixed.