This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [RFC] [Patch X86_64]: Pass to split FMA to MUL and ADD

From: Marc Glisse <marc dot glisse at inria dot fr>
To: "Kumar, Venkataramanan" <Venkataramanan dot Kumar at amd dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "Dharmakan, Rohit arul raj" <Rohitarulraj dot Dharmakan at amd dot com>, "Jan Hubicka (hubicka at ucw dot cz)" <hubicka at ucw dot cz>, Uros Bizjak <ubizjak at gmail dot com>
Date: Tue, 7 Nov 2017 09:19:11 +0100 (CET)
Subject: RE: [RFC] [Patch X86_64]: Pass to split FMA to MUL and ADD
Authentication-results: sourceware.org; auth=none
References: <CY4PR12MB173625509B98E82B0133B39C8F510@CY4PR12MB1736.namprd12.prod.outlook.com> <alpine.DEB.2.20.1711070818330.6994@stedding.saclay.inria.fr> <CY4PR12MB1736558B3EE4D036642321758F510@CY4PR12MB1736.namprd12.prod.outlook.com>
Reply-to: gcc-patches at gcc dot gnu dot org

On Tue, 7 Nov 2017, Kumar, Venkataramanan wrote:

The attached patch implements an RTL pass which splits generated FMA
instruction into MUL/ADD sequence.


That seems wrong if the user explicitly asked for FMA in his program, unless
you have a way to recognize which FMA instructions come from user calls to
fma and which were invented by gcc. Why not disable the gimple
transformation that creates FMA instead ?

We split only for reduction pattern and not all FMAs.
By user calls do you mean FMA in inline ASM calls? We don't split in that case.


I mean calls to the C function 'fma', or any of the intrinsics (say from
fmaintrin.h).

That seems wrong if the user explicitly asked for FMA in his program

Do you mean using function attribute or command line option?


I mean by calling the standard function 'fma'. It has precision
requirements that may be needed for program correctness.

Doing in Gimple would be more generic.
This implementation is profitable only for few sub-targets of x86 where latency of floating point ADD is less than that of FMA (ex Zen).

The gimple pass already checks if there exists a native fma instruction onthe subtarget, it could more specifically ask if that instruction isfaster than add+mul (if optimizing for speed, or shorter for size)(related to FP_FAST_FMA as well).


--
Marc Glisse

Follow-Ups:
- Re: [RFC] [Patch X86_64]: Pass to split FMA to MUL and ADD
  - From: Jan Hubicka

References:
- [RFC] [Patch X86_64]: Pass to split FMA to MUL and ADD
  - From: Kumar, Venkataramanan
- Re: [RFC] [Patch X86_64]: Pass to split FMA to MUL and ADD
  - From: Marc Glisse
- RE: [RFC] [Patch X86_64]: Pass to split FMA to MUL and ADD
  - From: Kumar, Venkataramanan

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]