How to get a vector FMA with GCC in a portable way?

Vincent Lefevre
Wed Jan 16 10:14:00 GMT 2019

On 2019-01-16 12:26:51 +0300, Alexander Monakov wrote:
> On Tue, 15 Jan 2019, Vincent Lefevre wrote:
> > I would like to know how to get a vector FMA with GCC in a portable
> > way.
> > 
> > By "portable way", I mean that the behavior must not depend on the
> > compilation options (e.g., if FP contraction is disabled, I still
> > want a true FMA) and that the code must not depend on the architecture
> > (thus intrinsics should not be used... even when restricting to x86,
> > one reason is FMA3 vs FMA4 issues).
> > 
> > For instance, for addition, one can write "a + b". But for FMA?
> In the context of autovectorized code or when using generic vector types?

It could be either (or both, see below). But it appears that I need
to use vector types due to ABI issues:

(inlining might improve things, but I prefer to avoid ABI issues in
every case).

But if I use fma() (from either <math.h> or <tgmath.h>), it must be
done on scalar types, thus this means that autovectorized code must
also work with decomposed vector types. Unfortunately, while this
works with structures (which are affected by ABI issues), this
doesn't with vectors: on x86_64, I get 2 vfmadd132sd (with unpack
instructions) instead of a single vfmadd132pd!

I've just reported the following bug:

> When the source is supposed to be autovectorized and operates on scalar
> variables, using fma function works (GCC recognizes it as a builtin;
> __FP_FAST_FMA is predefined when the fma instruction is available).
> For generic vector types I'm afraid GCC does not provide such a facility.
> I think it would make a reasonable feature request.

I've just done it here:

Vincent Lefèvre <> - Web: <>
100% accessible validated (X)HTML - Blog: <>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

More information about the Gcc-help mailing list