PING Re: [PATCH, MIPS] add new peephole for 74k dspr2

Wed Oct 10 20:00:00 GMT 2012

"Maciej W. Rozycki" <macro@codesourcery.com> writes:
> On Sun, 7 Oct 2012, Richard Sandiford wrote:
>
>> >  So I think this can't really be selected automatically for all cores, 
>> > some human-supplied knowledge about the MD unit used is required -- that 
>> > obviously affects other operations too, e.g. some multiplications 
>> > involving a constant that may be cheaper to do either directly or with a 
>> > sequence of additions depending on the MD unit present (unless optimising 
>> > for size, of course).
>> 
>> Yeah, as far as this optimisation goes, I think your original suggestion
>> of using the DFA to work out the best sequence is still right.  If the DFA
>> says that multiplications take a handful of cycles when they actually take
>> 32 then we're not going to optimise multiplication very well whatever happens.
>> 
>> In practice, code destined for non-4kc cores with iterative multipliers
>> would probably tune well with -mtune=4kp.
>
>  Hmm, maybe, although I find requiring people to tune for an obsolete 
> MIPS32 rev. 1 processor to get the right MDU performance figures for 
> modern processors a bit odd (the 4Kp DFA undoubtedly lacks reservations 
> for newer instructions introduced with the MIPS32r2 architecture update).
>
>  I've thought of -miterative-mdu or suchlike a knob that would override 
> the default cost of multiplication/division as appropriate (i.e. to 32/64 
> plus any extra operation-specific constant as required), perhaps by 
> forcing the 4Kp insn reservation (extended to 64 bits as required) over 
> the usual one; of course there would be nothing to override the -mtune=4Kp 
> arrangement with.  Nothing urgent though, just an idea to ponder.

FWIW, using the 4Kp reservation itself wouldn't necessarily work,
because all modern scheduling descriptions define their own CPU units.
You'd probably need to add new reservations to each affected MIPS32
and MIPS64 .md file.

>> Index: gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c
>> ===================================================================
>> --- /dev/null	2012-10-06 09:26:21.400998949 +0100
>> +++ gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c	2012-10-06 22:24:24.857713316 +0100
>> @@ -0,0 +1,22 @@
>> +/* { dg-options "-mdspr2 -mgp32 -mtune=4kp" } */
>> +/* References to RESULT within the loop need to have a higher frequency than
>> +   references to RESULT outside the loop, otherwise there is no reason
>> +   to prefer multiply/accumulator registers over GPRs.  */
>> +/* { dg-skip-if "requires register frequencies" { *-*-* } { "-O0" "-Os" } { "" } } */
>> +
>> +/* Check that the zero-initialization of the accumulator feeding into
>> +   the madd is done by means of a mult instruction instead of mthi/mtlo.  */
>
>  This comment looks reversed to me as in this case we do NOT want a MULT 
> to be used...
>
>> +
>> +NOMIPS16 long long f (int n, int *v, int m)
>> +{
>> +  long long result = 0;
>> +  int i;
>> +
>> +  for (i = 0; i < n; i++)
>> +    result = __builtin_mips_madd (result, v[i], m);
>> +  return result;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "mult\t\[^\n\]*\\\$0" } } */
>> +/* { dg-final { scan-assembler "\tmthi\t" } } */
>> +/* { dg-final { scan-assembler "\tmtlo\t" } } */
>
>  ... and in fact you do check that we didn't get one.

Thanks, fixed as follows.

Richard


gcc/testsuite/
	* gcc.target/mips/mips32-dsp-accinit-2.c: Fix test description.

Index: gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c
===================================================================

--- gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c	2012-10-07 09:45:04.000000000 +0100
+++ gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c	2012-10-10 20:51:44.402528138 +0100
@@ -5,7 +5,8 @@
 /* { dg-skip-if "requires register frequencies" { *-*-* } { "-O0" "-Os" } { "" } } */
 
 /* Check that the zero-initialization of the accumulator feeding into
-   the madd is done by means of a mult instruction instead of mthi/mtlo.  */
+   the madd is done by means of an mthi & mtlo pair instead of a
+   "mult $0,$0" instruction.  */
 
 NOMIPS16 long long f (int n, int *v, int m)
 {