[Bug tree-optimization/84114] global reassociation pass prevents fma usage, generates slower code

wdijkstr at arm dot com gcc-bugzilla@gcc.gnu.org
Sat Feb 10 18:38:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114

--- Comment #3 from Wilco <wdijkstr at arm dot com> ---
(In reply to Richard Biener from comment #1)
> This is probably related to targetm.sched.reassociation_width where reassoc
> will widen a PLUS chain so several instructions will be executable in
> parallel
> without dependences.  Thus, (x + (y + (z + w))) -> (x + y) + (z + w).  When
> all of them are fed by multiplications this goes from four fmas to two.
> 
> It's basically a target request we honor so it works as designed.
> 
> At some point I thought about integrating FMA detection with reassociation.

It should understand FMA indeed, A*B + p[0] + C*D + p[1] + E*F + p[2] can
become(((p[0] + p[1] + p[2]) + A*B) + C*D) + E*F. 

Also we're missing a reassociation depth parameter. You need to be able to
specify how long a chain needs to be before it is worth splitting - the example
shows a chain of 5 FMAs is not worth splitting since FMA latency on modern
cores is low, but if these were integer operations (not MADD) then the chain
should be split.


More information about the Gcc-bugs mailing list