[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

wilco at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Nov 4 17:26:26 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-11-04
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |wilco at gcc dot gnu.org

--- Comment #10 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #9)
> (In reply to Rama Malladi from comment #8)
> > (In reply to Wilco from comment #7)
> > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> > > reassociation pass is still splitting FMAs into separate MUL and ADD (which
> > > is bad for narrow cores).
> > 
> > Thank you for checking on N1. Did you happen to check on V1 too to reproduce
> > the perf results I had? Any other experiments/ tests I can do to help on
> > this filing? Thanks again for the debug/ fix.
> 
> I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and
> using option 'neoverse-n1' on the Graviton 3 processor (which has support
> for SVE). The performance was up by 0.4%, primary contributor being
> 519.lbm_r which was up 13%.

I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll
post a patch that allows per-CPU settings for FMA reassociation, so you'll get
good performance with -mcpu=native. However reassociation really needs to be
taught about the existence of FMAs.


More information about the Gcc-bugs mailing list