This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][AArch64] PR84114: Avoid reassociating FMA
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: James Greenhalgh <james dot greenhalgh at arm dot com>
- Cc: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>, "sje at gcc dot gnu dot org" <sje at gcc dot gnu dot org>
- Date: Tue, 27 Feb 2018 14:19:15 +0100
- Subject: Re: [PATCH][AArch64] PR84114: Avoid reassociating FMA
- Authentication-results: sourceware.org; auth=none
- References: <DB6PR0801MB20536FBEAD77DF5C8B8490FF83CD0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <20180226222545.GA3086@arm.com>
On Mon, Feb 26, 2018 at 11:25 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Thu, Feb 22, 2018 at 11:38:03AM +0000, Wilco Dijkstra wrote:
>> As discussed in the PR, the reassociation phase runs before FMAs are formed
>> and so can significantly reduce FMA opportunities. Although reassociation
>> could be switched off, it helps in many cases, so a better alternative is to
>> only avoid reassociation of floating point additions. This fixes the testcase
>> and gives 1% speedup on SPECFP2017, fixing the performance regression.
>>
>> OK for commit?
>
> This is OK as a fairly safe fix for stage 4. We should fix reassociation
> properly in GCC 9.
It happens that on some targets doing two FMAs in parallel and one
non-FMA operation merging them is faster than chaining three FMAs...
But yes, somewhere I suggested that FMA detection should/could be
integrated with reassociation.
Richard.
> Thanks,
> James
>
>>
>> ChangeLog:
>> 2018-02-23 Wilco Dijkstra <wdijkstr@arm.com>
>>
>> PR tree-optimization/84114
>> * config/aarch64/aarch64.c (aarch64_reassociation_width)
>> Avoid reassociation of FLOAT_MODE addition.
>> --
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index b3d5fde171920e5759046a4bd61cfcf9eb78d7dd..5f9541cf700aaf18c1f1ac73054614e2932781e4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -1109,15 +1109,16 @@ aarch64_min_divisions_for_recip_mul (machine_mode mode)
>> return aarch64_tune_params.min_div_recip_mul_df;
>> }
>>
>> +/* Return the reassociation width of treeop OPC with mode MODE. */
>> static int
>> -aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED,
>> - machine_mode mode)
>> +aarch64_reassociation_width (unsigned opc, machine_mode mode)
>> {
>> if (VECTOR_MODE_P (mode))
>> return aarch64_tune_params.vec_reassoc_width;
>> if (INTEGRAL_MODE_P (mode))
>> return aarch64_tune_params.int_reassoc_width;
>> - if (FLOAT_MODE_P (mode))
>> + /* Avoid reassociating floating point addition so we emit more FMAs. */
>> + if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR)
>> return aarch64_tune_params.fp_reassoc_width;
>> return 1;
>> }