This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PING Re: [PATCH, MIPS] add new peephole for 74k dspr2
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: "Maciej W. Rozycki" <macro at codesourcery dot com>
- Cc: Sandra Loosemore <sandra at codesourcery dot com>, <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 24 Sep 2012 21:48:48 +0100
- Subject: Re: PING Re: [PATCH, MIPS] add new peephole for 74k dspr2
- References: <502D0DF4.3070302@codesourcery.com> <87zk5qu783.fsf@talisman.home> <5034EBC6.5080602@codesourcery.com> <87bohw1ecb.fsf@talisman.home> <5058ACDA.5060706@codesourcery.com> <87lig719ig.fsf@talisman.home> <alpine.DEB.1.10.1209241548180.28358@tp.orcam.me.uk>
"Maciej W. Rozycki" <macro@codesourcery.com> writes:
> On Tue, 18 Sep 2012, Richard Sandiford wrote:
>
>> > Have you had time to think about this some more? I am not sure I can
>> > guess how you'd like me to fix this patch now without some more specific
>> > review and/or suggestions about where the optimization should happen and
>> > what cases it should be extended to detect in addition to the dsp
>> > accumulator multiplies.
>>
>> The patch below is the one I've been testing. But I got sidetracked
>> by looking into the possibility of removing the MD0_REG and MD1_REG
>> classes, in order to get more sensible costs. I think that was needed
>> for the madd-9.c test to pass.
>
> Sorry to come up with this so late -- I have only now noticed this being
> discussed.
>
>> @@ -4105,39 +4105,55 @@ mips_subword (rtx op, bool high_p)
>> return simplify_gen_subreg (word_mode, op, mode, byte);
>> }
>>
>> -/* Return true if a 64-bit move from SRC to DEST should be split into two. */
>> +/* Return true if SRC can be moved into DEST using MULT $0, $0. */
>> +
>> +static bool
>> +mips_mult_move_p (rtx dest, rtx src)
>> +{
>> + return (src == const0_rtx
>> + && REG_P (dest)
>> + && GET_MODE_SIZE (GET_MODE (dest)) == 2 * UNITS_PER_WORD
>> + && (ISA_HAS_DSP_MULT
>> + ? ACC_REG_P (REGNO (dest))
>> + : MD_REG_P (REGNO (dest))));
>> +}
>> +
>> +/* Return true if a move from SRC to DEST should be split into two. */
>
> Does the DSP ASE guarantee that a MULT $0, $0 is going not to be slower
> than MTHI $0/MTLO $0? The latency of multiplication varies among
> implementations, for example the original R3000 took 12 cycles (of course
> the R3000 itself is not relevant for this change, but you see the
> picture!). On the other hand in some (but not all!) processors
> multiplication runs in parallel to the main pipeline so it is the
> difference, if positive, between the number of cycles consumed by other
> instructions up to the next HI/LO access instruction and the latency of
> MULT run in the background that matters.
>
> From the context I am assuming none of this matters for the 74K (and
> presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall
> isn't it something that should be decided based on instruction costs from
> DFA schedulers? Is there anything that I've missed here? It doesn't
> appear to me your (and neither the original) proposal takes instruction
> cost calculation into consideration.
In practice, we only move 0 into HI and LO for MADD- and MSUB-style
operations. We deliberately don't use HI and LO as scratch space.
I think it's a reasonable default assumption that anything that supports
those instructions also has a fast path from MULT to MADD or MULT to MSUB.
I certainly don't know of any counter-examples. The decision is deliberately
centeralised in one place so that the condition can be tweaked in future
if necessary.
Richard