PING Re: [PATCH, MIPS] add new peephole for 74k dspr2

Mon Oct 8 23:23:00 GMT 2012

On Sun, 7 Oct 2012, Richard Sandiford wrote:

> >  So I think this can't really be selected automatically for all cores, 
> > some human-supplied knowledge about the MD unit used is required -- that 
> > obviously affects other operations too, e.g. some multiplications 
> > involving a constant that may be cheaper to do either directly or with a 
> > sequence of additions depending on the MD unit present (unless optimising 
> > for size, of course).
> 
> Yeah, as far as this optimisation goes, I think your original suggestion
> of using the DFA to work out the best sequence is still right.  If the DFA
> says that multiplications take a handful of cycles when they actually take
> 32 then we're not going to optimise multiplication very well whatever happens.
> 
> In practice, code destined for non-4kc cores with iterative multipliers
> would probably tune well with -mtune=4kp.

 Hmm, maybe, although I find requiring people to tune for an obsolete 
MIPS32 rev. 1 processor to get the right MDU performance figures for 
modern processors a bit odd (the 4Kp DFA undoubtedly lacks reservations 
for newer instructions introduced with the MIPS32r2 architecture update).

 I've thought of -miterative-mdu or suchlike a knob that would override 
the default cost of multiplication/division as appropriate (i.e. to 32/64 
plus any extra operation-specific constant as required), perhaps by 
forcing the 4Kp insn reservation (extended to 64 bits as required) over 
the usual one; of course there would be nothing to override the -mtune=4Kp 
arrangement with.  Nothing urgent though, just an idea to ponder.

> The final patch ended up being much more complicated than I'd have liked,
> but at least it should be easier to add other automatically-derived tuning
> costs in future.

 Thanks for taking my concerns into account.  One nit below.

> Index: gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c
> ===================================================================
> --- /dev/null	2012-10-06 09:26:21.400998949 +0100
> +++ gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c	2012-10-06 22:24:24.857713316 +0100
> @@ -0,0 +1,22 @@
> +/* { dg-options "-mdspr2 -mgp32 -mtune=4kp" } */
> +/* References to RESULT within the loop need to have a higher frequency than
> +   references to RESULT outside the loop, otherwise there is no reason
> +   to prefer multiply/accumulator registers over GPRs.  */
> +/* { dg-skip-if "requires register frequencies" { *-*-* } { "-O0" "-Os" } { "" } } */
> +
> +/* Check that the zero-initialization of the accumulator feeding into
> +   the madd is done by means of a mult instruction instead of mthi/mtlo.  */

 This comment looks reversed to me as in this case we do NOT want a MULT 
to be used...

> +
> +NOMIPS16 long long f (int n, int *v, int m)
> +{
> +  long long result = 0;
> +  int i;
> +
> +  for (i = 0; i < n; i++)
> +    result = __builtin_mips_madd (result, v[i], m);
> +  return result;
> +}
> +
> +/* { dg-final { scan-assembler-not "mult\t\[^\n\]*\\\$0" } } */
> +/* { dg-final { scan-assembler "\tmthi\t" } } */
> +/* { dg-final { scan-assembler "\tmtlo\t" } } */

 ... and in fact you do check that we didn't get one.

  Maciej