Bug 107946 - [13/14/15 Regression] 507.cactuBSSN_r regresses by ~9% on znver3 with PGO since r13-3875-g9e11ceef165bc0
Summary: [13/14/15 Regression] 507.cactuBSSN_r regresses by ~9% on znver3 with PGO sin...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P2 normal
Target Milestone: 13.4
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2022-12-01 13:40 UTC by Martin Liška
Modified: 2024-12-28 22:58 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-01-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Liška 2022-12-01 13:40:41 UTC
The revision r13-3875-g9e11ceef165bc0 was supposed to speed up the benchmark, but it makes it slower w/ -O2 -flto and PGO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=694.437.0

Similar regression can be seen w/o LTO as well:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=463.437.0
Comment 1 Richard Biener 2022-12-01 15:19:29 UTC
Nope, it wasn't supposed to speedup the benchmark but it indeed (with -Ofast) causes the hot loop kernels to be unswitched.

Btw, do we know if train and ref data align up in these loops?

Btw, with -Ofast on znver2 I didn't observe any change when benchmarking this.

I'm trying to reproduce.

OK, so with -O2 -flto -march=znver2 and FDO I get a runtime of 173s while
adding -fno-unswitch-loops gets me 188s.  There's currently no knob to
specifically disable outer loop unswitching so I have to instead patch
that up.  With -O2 -flto -funswitch-loops (w/o FDO) I get 178s.  I'm going
to add a --param to allow easier reproduction.
Comment 2 Richard Biener 2022-12-01 15:45:57 UTC
So with --param max-unswitch-depth=1 and -O2 -flto -march=znver2 + FDO I get 176s which is slower than with unswitching outer loops.

Means I cannot reproduce (at least with this specific feature, aka this revision).
Comment 3 GCC Commits 2022-12-02 07:04:28 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:5b50850c3c6f2eceb8012dcc8d3cd5ddd94fac6c

commit r13-4458-g5b50850c3c6f2eceb8012dcc8d3cd5ddd94fac6c
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Dec 1 16:14:14 2022 +0100

    Add --param max-unswitch-depth
    
    The following adds a --param to limit the depth of unswitched loop
    nests.  One can use --param max-unswitch-depth=1 to disable unswitching
    of outer loops (the innermost loop will then be unswitched).
    
            PR tree-optimization/107946
            * params.opt (-param=max-unswitch-depth=): New.
            * doc/invoke.texi (--param=max-unswitch-depth): Document.
            * tree-ssa-loop-unswitch.cc (init_loop_unswitch_info): Honor
            --param=max-unswitch-depth
Comment 4 Richard Biener 2022-12-02 14:40:41 UTC
The data found for other machines/flags is also rather inconclusive with ups and downs.
Comment 5 Richard Biener 2023-04-26 06:57:19 UTC
GCC 13.1 is being released, retargeting bugs to GCC 13.2.
Comment 6 Richard Biener 2023-07-27 09:24:38 UTC
GCC 13.2 is being released, retargeting bugs to GCC 13.3.
Comment 7 Martin Jambor 2024-01-26 18:06:55 UTC
This regression is still there (as the graphs linked in the summary show).
Comment 8 Jakub Jelinek 2024-05-21 09:13:18 UTC
GCC 13.3 is being released, retargeting bugs to GCC 13.4.