This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Don't use integer "FMA" for shifts
On Tue, Jul 30, 2019 at 12:50 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Jul 30, 2019 at 12:04 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> tree-ssa-math-opts supports FMA optabs for integers as well as
> >> floating-point types, even though there's no distinction between
> >> fused and unfused there. It turns out to be pretty handy for the
> >> IFN_COND_* handling though, so I don't want to remove it, however
> >> weird it might seem.
> >>
> >> Instead this patch makes sure that we don't apply it to integer
> >> multiplications that are actually shifts (but that are represented
> >> in gimple as multiplications because that's the canonical form).
> >>
> >> This is a preemptive strike. The test doesn't fail for SVE as-is,
> >> but would after a later patch.
> >>
> >> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> >> OK to install?
> >>
> >> Richard
> >>
> >>
> >> 2019-07-30 Richard Sandiford <richard.sandiford@arm.com>
> >>
> >> gcc/
> >> * tree-ssa-math-opts.c (convert_mult_to_fma): Reject integer
> >> multiplications by a power of 2 or a negative power of 2.
> >>
> >> gcc/testsuite/
> >> * gcc.dg/vect/vect-cond-arith-8.c: New test.
> >>
> >> Index: gcc/tree-ssa-math-opts.c
> >> ===================================================================
> >> --- gcc/tree-ssa-math-opts.c 2019-07-30 10:51:51.827405171 +0100
> >> +++ gcc/tree-ssa-math-opts.c 2019-07-30 10:52:27.327139149 +0100
> >> @@ -3074,10 +3074,20 @@ convert_mult_to_fma (gimple *mul_stmt, t
> >> && flag_fp_contract_mode == FP_CONTRACT_OFF)
> >> return false;
> >>
> >> - /* We don't want to do bitfield reduction ops. */
> >> - if (INTEGRAL_TYPE_P (type)
> >> - && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS (type)))
> >> - return false;
> >> + if (ANY_INTEGRAL_TYPE_P (type))
> >> + {
> >> + /* We don't want to do bitfield reduction ops. */
> >> + tree itype = INTEGRAL_TYPE_P (type) ? type : TREE_TYPE (type);
> >
> > you can use element_type () for this. But I think it's easier to
> > leave the existing
> > test in-place since vector or complex types cannot have non-mode precision
> > components.
>
> Ah, yeah. Guess I was over-generalising here :-)
>
> >> + if (!type_has_mode_precision_p (itype) || TYPE_OVERFLOW_TRAPS (itype))
> >> + return false;
> >> +
> >> + /* Don't use FMA for multiplications that are actually shifts. */
> >
> > So I question this - if the FMA can do the shift "for free" and it eventually is
> > same cost/latency as an add why do we want to not allow this (for all targets)?
> > Esp. for vector types I would guess a add plus a shift may be more costly
> > (not all ISAs implement shifts anyhow).
>
> The shift doesn't really come for free. By using FMA we're converting
> the shift by a constant into a general multiplication.
Yeah, it probably "depends". OTOH the shift may end up being a (vector)
multiply anyway due to target constraints.
I wonder how we select between (vector) shift and multiply when expanding
without FMA. Ideally the scheduler would have a say here based on
port utilization ;)
> > Can this be fixed up in the target instead? (by a splitter or appropriate
> > expander?)
>
> OK, I'll try to do it that way instead.
>
> Thanks,
> Richard
>
> >
> >> + tree val = VECTOR_TYPE_P (type) ? uniform_vector_p (op2) : op2;
> >> + if (val
> >> + && TREE_CODE (val) == INTEGER_CST
> >> + && wi::popcount (wi::abs (wi::to_wide (val))) == 1)
> >> + return false;
> >> + }
> >>
> >> /* If the target doesn't support it, don't generate it. We assume that
> >> if fma isn't available then fms, fnma or fnms are not either. */
> >> Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c
> >> ===================================================================
> >> --- /dev/null 2019-07-30 08:53:31.317691683 +0100
> >> +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-8.c 2019-07-30 10:52:27.327139149 +0100
> >> @@ -0,0 +1,57 @@
> >> +/* { dg-require-effective-target scalar_all_fma } */
> >> +/* { dg-additional-options "-fdump-tree-optimized -ffp-contract=fast" } */
> >> +
> >> +#include "tree-vect.h"
> >> +
> >> +#define N (VECTOR_BITS * 11 / 64 + 3)
> >> +
> >> +#define DEF(INV) \
> >> + void __attribute__ ((noipa)) \
> >> + f_##INV (int *restrict a, int *restrict b, \
> >> + int *restrict c) \
> >> + { \
> >> + for (int i = 0; i < N; ++i) \
> >> + { \
> >> + int mb = (INV & 1 ? -b[i] : b[i]); \
> >> + int mc = (INV & 2 ? -c[i] : c[i]); \
> >> + a[i] = b[i] < 10 ? mb * 8 + mc : 10; \
> >> + } \
> >> + }
> >> +
> >> +#define TEST(INV) \
> >> + { \
> >> + f_##INV (a, b, c); \
> >> + for (int i = 0; i < N; ++i) \
> >> + { \
> >> + int mb = (INV & 1 ? -b[i] : b[i]); \
> >> + int mc = (INV & 2 ? -c[i] : c[i]); \
> >> + int truev = mb * 8 + mc; \
> >> + if (a[i] != (i % 17 < 10 ? truev : 10)) \
> >> + __builtin_abort (); \
> >> + asm volatile ("" ::: "memory"); \
> >> + } \
> >> + }
> >> +
> >> +#define FOR_EACH_INV(T) \
> >> + T (0) T (1) T (2) T (3)
> >> +
> >> +FOR_EACH_INV (DEF)
> >> +
> >> +int
> >> +main (void)
> >> +{
> >> + int a[N], b[N], c[N];
> >> + for (int i = 0; i < N; ++i)
> >> + {
> >> + b[i] = i % 17;
> >> + c[i] = i % 13 + 14;
> >> + asm volatile ("" ::: "memory");
> >> + }
> >> + FOR_EACH_INV (TEST)
> >> + return 0;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not { = \.COND_FMA } "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-not { = \.COND_FMS } "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-not { = \.COND_FNMA } "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-not { = \.COND_FNMS } "optimized" } } */