This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFC] [Patch X86_64]: Pass to split FMA to MUL and ADD


On Tue, 7 Nov 2017, Kumar, Venkataramanan wrote:

The attached patch implements an RTL pass which splits generated FMA
instruction into MUL/ADD sequence.

That seems wrong if the user explicitly asked for FMA in his program, unless
you have a way to recognize which FMA instructions come from user calls to
fma and which were invented by gcc. Why not disable the gimple
transformation that creates FMA instead ?
We split only for reduction pattern and not all FMAs.
By user calls do you mean FMA in inline ASM calls? We don't split in that case.

I mean calls to the C function 'fma', or any of the intrinsics (say from
fmaintrin.h).

That seems wrong if the user explicitly asked for FMA in his program
Do you mean using function attribute or command line option?

I mean by calling the standard function 'fma'. It has precision
requirements that may be needed for program correctness.

Doing in Gimple would be more generic.
This implementation is profitable only for few sub-targets of x86 where latency of floating point ADD is less than that of FMA (ex Zen).

The gimple pass already checks if there exists a native fma instruction on the subtarget, it could more specifically ask if that instruction is faster than add+mul (if optimizing for speed, or shorter for size) (related to FP_FAST_FMA as well).

--
Marc Glisse


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]