Bug 104762 - [12 Regression] x86_64 538.imagick_r 8%-28% regressions and 10% 525.x264_r regressions after r12-7319-g90d693bdc9d718
Summary: [12 Regression] x86_64 538.imagick_r 8%-28% regressions and 10% 525.x264_r re...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: 12.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2022-03-02 16:20 UTC by Martin Jambor
Modified: 2022-03-11 14:04 UTC (History)
3 users (show)

See Also:
Host: x86_64-linux
Target: x86_64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-03-07 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Jambor 2022-03-02 16:20:27 UTC
Around 22nd of February 2022, SPEC 2017 538.imagick_r regressed on all
x86_64 systems used by our periodic benchmarker.  I have bisected the
zen3 -Ofast -march=native -flto case to revision
r12-7319-g90d693bdc9d718 but I think most if not all of the
regressions are caused by this:

commit 90d693bdc9d71841f51d68826ffa5bd685d7f0bc
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 18 14:32:14 2022 +0100

    target/99881 - x86 vector cost of CTOR from integer regs

    This uses the now passed SLP node to the vectorizer costing hook
    to adjust vector construction costs for the cost of moving an
    integer component from a GPR to a vector register when that's
    required for building a vector from components.  A cruical difference
    here is whether the component is loaded from memory or extracted
    from a vector register as in those cases no intermediate GPR is involved.

    The pr99881.c testcase can be Un-XFAILed with this patch, the
    pr91446.c testcase now produces scalar code which looks superior
    to me so I've adjusted it as well.


List of (selected) regressions with links to LNT graphs:

zen2 -O2 regressed by 26%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=297.507.0

zen2 -O2 -flto regressed by 26% too:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=296.507.0

zen2 -Ofast -march=native by 28%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.507.0

zen2 -Ofast -march=native -flto by 23:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.507.0


zen3 -O2 by 8%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.507.0

zen3 -O2 -march=native by 18%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=472.507.0

zen3 -Ofast -march=native by 17%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0

zen3 -Ofast -march=native -flto by 10%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.507.0


kabylake -O2 by 9%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=226.507.0
(though this one looks suspiciously noisy)

kabylake -O2 -march=native by 16%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=28.507.0

kabylake -Ofast -march=native by 22%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=32.507.0

kabylake -Ofast -march=native -flto by 15%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=11.507.0
Comment 1 Martin Jambor 2022-03-02 17:02:08 UTC
The same revision also regressed SPEC 2017 INTrate benchmark x264_r by
about 10% at -Ofast -march=native with and without LTO on zen3:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.377.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.377.0

and I suspect (unlike the above, I have not actually verified it) also
on zen2:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.377.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.377.0
Comment 2 Hongtao.liu 2022-03-03 01:20:45 UTC
For 525, it's PR101929, for 538, it's STF issue.
Comment 3 Richard Biener 2022-03-07 07:20:10 UTC
See PR101929 comment#7 for two possible things we can do at this stage that might "fix" the regression.
Comment 4 GCC Commits 2022-03-11 14:03:21 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:69619acd8d9b5856f5af6e5323d9c7c4ec9ad08f

commit r12-7612-g69619acd8d9b5856f5af6e5323d9c7c4ec9ad08f
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Mar 11 11:51:13 2022 +0100

    target/104762 - vectorization costs of CONSTRUCTORs
    
    After accounting for GPR -> XMM move cost for vec_construct the
    base cost needs adjustments to not double-cost those.  This also
    lowers the cost when such move is not necessary.
    
    2022-03-11  Richard Biener  <rguenther@suse.de>
    
            PR target/104762
            * config/i386/i386.cc (ix86_builtin_vectorization_cost): Do not
            cost the first lane of SSE pieces as inserts for vec_construct.
Comment 5 Richard Biener 2022-03-11 14:04:08 UTC
Fixed.