[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
wilco at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Nov 4 17:26:26 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2022-11-04
Status|UNCONFIRMED |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |wilco at gcc dot gnu.org
--- Comment #10 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #9)
> (In reply to Rama Malladi from comment #8)
> > (In reply to Wilco from comment #7)
> > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> > > reassociation pass is still splitting FMAs into separate MUL and ADD (which
> > > is bad for narrow cores).
> >
> > Thank you for checking on N1. Did you happen to check on V1 too to reproduce
> > the perf results I had? Any other experiments/ tests I can do to help on
> > this filing? Thanks again for the debug/ fix.
>
> I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and
> using option 'neoverse-n1' on the Graviton 3 processor (which has support
> for SVE). The performance was up by 0.4%, primary contributor being
> 519.lbm_r which was up 13%.
I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll
post a patch that allows per-CPU settings for FMA reassociation, so you'll get
good performance with -mcpu=native. However reassociation really needs to be
taught about the existence of FMAs.
More information about the Gcc-bugs
mailing list