[PATCH] RFC for patch to add C99 fma/fmaf/fmal builtins

Mon Oct 4 23:32:00 GMT 2010

On Mon, 4 Oct 2010, Michael Meissner wrote:

> I happen to reread parts of the C99 standard recently, and noticed the FMA
> builtins.  The current compiler defines fma{,f,l} builtins but does not provide
> any code to expand the builtin.

Indeed, such code would be useful.  Properly done - providing RTL (and 
ideally GENERIC/GIMPLE) operations for the fused operations - it would 
also pave the way for fixing the bug that operations of the form a*b+c get 
converted to fused operations on some targets regardless of whether they 
were written like that in a source language expression, by writing the RTL 
descriptions to describe the proper semantics of the affected instructions 
and not pretending that they are proper implementations of a*b+c - other 
parts of the compiler would then deal with converting a*b+c to a fused 
operation where appropriate rather than the back ends doing so.

There have been various past discussions of this.  I'm afraid I didn't 
fully understand how the PAREN_EXPR approach some people mentioned in the 
thread <http://gcc.gnu.org/ml/gcc-patches/2010-06/subjects.html#02256> for 
representing what forms of contracting were permitted would work, but it 
was supposed to allow contracting (in this context, converting a*b+c to a 
fused operation if the target has one and the language permits) to be done 
on GIMPLE rather than needing to be done in the front end to have the 
required information about source language expressions.

I would imagine a tristate option for contracting - options "off", "on" 
(conforming), "fast" (the present state, contracting even outside the 
bounds of source language expressions).

As indicated in <http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01280.html>, 
I believe that once you've added the fma RTL operation it should be usable 
to describe all the versions with negated operands or results.  My only 
real comment about your present patch is that given the correct RTL it 
ought to be easy to extend it to cover the full range of related 
operations on Power Architecture processors (so that fma (a, b, -c) etc. 
get appropriately expanded), even without doing any of the other pieces.

> The enclosed patch adds the basic FMA support to the compiler, and adds the
> support to the powerpc backend.  In doing a quick grep of the MD files, it
> looks like the following ports may support a combined multiply/add operation
> for floating point types:
>       arm

ARM has both fused and non-fused (i.e. a single instruction that does 
a*b+c with two roundings) operations.  The fused operations are new to 
VFPv4 (Cortex-A5 and Cortex-A15; not in Cortex-A8, or in Cortex-A9 which 
added half-precision operations) and GCC only supports the non-fused 
operations at present (unsurprising, given the lack of relevant expander 
support before your patch and the presence of the non-fused operations 
which are accurately described by the present RTL).

> My question is what should we do if the port has no FMA instruction?
> 
>    1)	Always call fma external (current behavior)

This is logically correct.

>    2)	Expand to (operand[0] * operand[1]) + operand[2] always

Obviously wrong.  Yes, glibc's default fma version is useless since the 
whole point of fma is to get a fused operation even if slow.

>    3)	Expand to (operand[0] * operand[1]) + operand[2] if -ffast-math

Also wrong in my view, given the purpose of fma, but I know there's been 
controversy in the past about what is sensible to do with isnan in the 
presence of -ffast-math (which is supposed to mean no NaNs, but why would 
someone use isnan then?).

> A second issue is should we provide macros that say the port has an appropriate
> FMA instruction.  The C99 standard says that the following macros should be
> defined in math.h if a fast fma implementation is provided:
> 
> 	FP_FAST_FMA
> 	FP_FAST_FMAF
> 	FP_FAST_FMAL
> 
> I would imagine that we should provide __FAST_FMA__, __FAST_FMAF__, and
> __FAST_FMAL__ macros that the library math.h file could test and define
> FP_FAST_FMA* if desired.  Do people have an opinion on this?

Talk to glibc maintainers about what they want.  There's the usual 
controversy about whether it's GCC's or glibc's responsibility to provide 
particular information.  Unfortunately it may be hard to get an answer out 
of them - glibc bug 10110 deals with a similar issue of interaction 
between GCC and glibc in providing certain information but I couldn't get 
a useful response from the glibc maintainers regarding whether a small 
patch to put __STDC_ISO_10646__ in its own header would be suitable or 
whether a mechanism based on using fixincludes would be needed.

-- 
Joseph S. Myers
joseph@codesourcery.com