PING Re: [PATCH, MIPS] add new peephole for 74k dspr2
Maciej W. Rozycki
macro@codesourcery.com
Mon Oct 8 23:23:00 GMT 2012
On Sun, 7 Oct 2012, Richard Sandiford wrote:
> > So I think this can't really be selected automatically for all cores,
> > some human-supplied knowledge about the MD unit used is required -- that
> > obviously affects other operations too, e.g. some multiplications
> > involving a constant that may be cheaper to do either directly or with a
> > sequence of additions depending on the MD unit present (unless optimising
> > for size, of course).
>
> Yeah, as far as this optimisation goes, I think your original suggestion
> of using the DFA to work out the best sequence is still right. If the DFA
> says that multiplications take a handful of cycles when they actually take
> 32 then we're not going to optimise multiplication very well whatever happens.
>
> In practice, code destined for non-4kc cores with iterative multipliers
> would probably tune well with -mtune=4kp.
Hmm, maybe, although I find requiring people to tune for an obsolete
MIPS32 rev. 1 processor to get the right MDU performance figures for
modern processors a bit odd (the 4Kp DFA undoubtedly lacks reservations
for newer instructions introduced with the MIPS32r2 architecture update).
I've thought of -miterative-mdu or suchlike a knob that would override
the default cost of multiplication/division as appropriate (i.e. to 32/64
plus any extra operation-specific constant as required), perhaps by
forcing the 4Kp insn reservation (extended to 64 bits as required) over
the usual one; of course there would be nothing to override the -mtune=4Kp
arrangement with. Nothing urgent though, just an idea to ponder.
> The final patch ended up being much more complicated than I'd have liked,
> but at least it should be easier to add other automatically-derived tuning
> costs in future.
Thanks for taking my concerns into account. One nit below.
> Index: gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c
> ===================================================================
> --- /dev/null 2012-10-06 09:26:21.400998949 +0100
> +++ gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c 2012-10-06 22:24:24.857713316 +0100
> @@ -0,0 +1,22 @@
> +/* { dg-options "-mdspr2 -mgp32 -mtune=4kp" } */
> +/* References to RESULT within the loop need to have a higher frequency than
> + references to RESULT outside the loop, otherwise there is no reason
> + to prefer multiply/accumulator registers over GPRs. */
> +/* { dg-skip-if "requires register frequencies" { *-*-* } { "-O0" "-Os" } { "" } } */
> +
> +/* Check that the zero-initialization of the accumulator feeding into
> + the madd is done by means of a mult instruction instead of mthi/mtlo. */
This comment looks reversed to me as in this case we do NOT want a MULT
to be used...
> +
> +NOMIPS16 long long f (int n, int *v, int m)
> +{
> + long long result = 0;
> + int i;
> +
> + for (i = 0; i < n; i++)
> + result = __builtin_mips_madd (result, v[i], m);
> + return result;
> +}
> +
> +/* { dg-final { scan-assembler-not "mult\t\[^\n\]*\\\$0" } } */
> +/* { dg-final { scan-assembler "\tmthi\t" } } */
> +/* { dg-final { scan-assembler "\tmtlo\t" } } */
... and in fact you do check that we didn't get one.
Maciej
More information about the Gcc-patches
mailing list