This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] [Patch X86_64]: Pass to split FMA to MUL and ADD


> On Tue, 7 Nov 2017, Kumar, Venkataramanan wrote:
> 
> >>>The attached patch implements an RTL pass which splits generated FMA
> >>>instruction into MUL/ADD sequence.
> >>
> >>That seems wrong if the user explicitly asked for FMA in his program, unless
> >>you have a way to recognize which FMA instructions come from user calls to
> >>fma and which were invented by gcc. Why not disable the gimple
> >>transformation that creates FMA instead ?
> >We split only for reduction pattern and not all FMAs.
> >By user calls do you mean FMA in inline ASM calls? We don't split in that case.
> 
> I mean calls to the C function 'fma', or any of the intrinsics (say from
> fmaintrin.h).
> 
> >>That seems wrong if the user explicitly asked for FMA in his program
> >Do you mean using function attribute or command line option?
> 
> I mean by calling the standard function 'fma'. It has precision
> requirements that may be needed for program correctness.
> 
> >Doing in Gimple would be more generic.
> >This implementation is profitable only for few sub-targets of x86 where latency of floating point ADD is less than that of FMA (ex Zen).
> 
> The gimple pass already checks if there exists a native fma instruction on
> the subtarget, it could more specifically ask if that instruction is faster
> than add+mul (if optimizing for speed, or shorter for size) (related to
> FP_FAST_FMA as well).

We have mutiple existing transformations that optimize SSE builtins into different
instructions when doing so is win (we run full RTL optimization queue on them and
do usual instruction combining, simplification and splitting). So i would say that
we are OK changing the builtins into different instructoins. After all there are
asm statements if one really wants the precise instruction choice.

With FMA however the situation is different becuase there are rounding differences.
Why we can convert multiplicatoin+add into FMA without -ffast-math at first place?

An altnerative would be to prevent the conversion in tree-ssa-mathops? (I.e. matching
the accumulation pattern and having some target hook specifying whether this is a good
idea?)

This looks like useful optimization in general - I was just looking into similar
loop from swim of spec2k.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]