This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Generate fused widening multiply-and-accumulate operations only when the widening multiply has single use
- From: Richard Henderson <rth at redhat dot com>
- To: Yufeng Zhang <Yufeng dot Zhang at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Cc: ams at codesourcery dot com
- Date: Wed, 23 Oct 2013 17:29:04 -0700
- Subject: Re: [PATCH] Generate fused widening multiply-and-accumulate operations only when the widening multiply has single use
- Authentication-results: sourceware.org; auth=none
- References: <5265A43E dot 7060507 at arm dot com>
On 10/21/2013 03:01 PM, Yufeng Zhang wrote:
> This patch changes the widening_mul pass to fuse the widening multiply with
> accumulate only when the multiply has single use. The widening_mul pass
> currently does the conversion regardless of the number of the uses, which can
> cause poor code-gen in cases like the following:
> typedef int ArrT ;
> foo (ArrT Arr, int Idx)
> Arr[Idx][Idx] = 1;
> Arr[Idx + 10][Idx] = 2;
> On AArch64, after widening_mul, the IR is like
> _2 = (long unsigned int) Idx_1(D);
> _3 = Idx_1(D) w* 40; <----
> _5 = Arr_4(D) + _3;
> *_5[Idx_1(D)] = 1;
> _8 = WIDEN_MULT_PLUS_EXPR <Idx_1(D), 40, 400>; <----
> _9 = Arr_4(D) + _8;
> *_9[Idx_1(D)] = 2;
> Where the arrows point, there are redundant widening multiplies.
So they're redundant. Why does this imply poor code-gen?
If a target has more than one FMA unit, then the target might
be able to issue the computation for _3 and _8 in parallel.
Even if the target only has one FMA unit, but the unit is
pipelined, the computations could overlap.