Bug 110757 - [14/15 Regression] 7% parest regression on zen3 -Ofast -march=native -flto between g:4dbb3af1efe55174 (2023-07-14 00:54) and g:a5088dc3f5ef73c8 (2023-07-17 03:24)
Summary: [14/15 Regression] 7% parest regression on zen3 -Ofast -march=native -flto be...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 14.0
: P2 normal
Target Milestone: 14.3
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, needs-bisection
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2023-07-20 21:55 UTC by Jan Hubicka
Modified: 2024-12-28 22:29 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-07-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2023-07-20 21:55:47 UTC
seems there are two commits producing this regression. Run in between is d76d19c9bc5ef113 (2023-07-16 00:16)

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.457.0

There are earlier two jumps between g:52577a301ef1b86d (2023-05-30 02:20) and g:d0c064c3eabc75cf (2023-05-31 16:46)
and between g:7ebd4a1d61993c0a (2023-04-28 07:23) and g:977a3be3ccbc7f17 (2023-05-01 13:40)

8% regression is also seen on zen1 machine:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.457.0
Comment 1 Martin Jambor 2023-07-21 17:17:44 UTC
The first (2%) slowdown seems to be due to r14-2524-gaa6741ef2e0c31
(Turn TODO_rebuild_frequencies to a pass), I'm now bisecting the
bigger one.
Comment 2 Martin Jambor 2023-07-21 19:14:14 UTC
The second slow-down of 4.5% was caused by r14-2546-g061f74c06735e1:

061f74c06735e1fa35b910ae0bcf01b61a74ec23 is the first bad commit
commit 061f74c06735e1fa35b910ae0bcf01b61a74ec23
Author: Jan Hubicka <jh@suse.cz>
Date:   Sun Jul 16 23:56:59 2023 +0200

    Fix profile update in scale_profile_for_vect_loop

    When vectorizing 4 times, we sometimes do
      for
        <4x vectorized body>
      for
        <2x vectorized body>
      for
        <1x vectorized body>

    Here the second two fors handling epilogue never iterates.
    Currently vecotrizer thinks that the middle for itrates twice.
    This turns out to be scale_profile_for_vect_loop that uses
    niter_for_unrolled_loop.
Comment 3 Martin Jambor 2023-07-22 11:41:39 UTC
And while I am at it, the 2.5% slowdown in April was caused by Richi's
r14-332-g24905a4bd1375c (Adjust costing of emulated vectorized
gather/scatter) and the 2.8% regression in May by 2.8% is caused by
r14-1371-ge5405f065bace0 (Handle FMA friendly in reassoc pass).

Both are small and so may not warrant their own bug-report but together
they make up almost 6% and we are now 13% slower than GCC 13 on zen 3
and 2 (on the Intel machine in LNT it is just 2.7% and I see no
regression on the Aarch64 one).
Comment 4 Jan Hubicka 2023-07-26 07:15:27 UTC
Most of the profile based regression is gone between
g:1c6231c05bdccab3 (2023-07-21 03:06)
and 
g:f33fdf9e7c038639 (2023-07-23 00:17)

This should be:
commit a31ef26b056d0c4f0a9f08b6eb81456ea257298e
Author: Jan Hubicka <jh@suse.cz>
Date:   Fri Jul 21 19:38:26 2023 +0200

    Avoid scaling flat loop profiles of vectorized loops

Which "fixes" the overactive scaling of scale_profile_for_vect_loop
for static profiles.
Still not sure why propagating profile later causes regression - will take a look.
Comment 5 Richard Biener 2024-05-07 07:41:29 UTC
GCC 14.1 is being released, retargeting bugs to GCC 14.2.
Comment 6 Jakub Jelinek 2024-08-01 09:34:11 UTC
GCC 14.2 is being released, retargeting bugs to GCC 14.3.