This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Mode change for bswap pattern expansion
- From: Richard Sandiford <rsandifo at linux dot vnet dot ibm dot com>
- To: Paulo Matos <pmatos at broadcom dot com>
- Cc: "gcc\ at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Mon, 27 Jan 2014 16:49:34 +0000
- Subject: Re: Mode change for bswap pattern expansion
- Authentication-results: sourceware.org; auth=none
- References: <19EB96622A777C4AB91610E763265F463F12AC at SJEXCHMB14 dot corp dot ad dot broadcom dot com> <877g9ltnoj dot fsf at sandifor-thinkpad dot stglab dot manchester dot uk dot ibm dot com> <19EB96622A777C4AB91610E763265F463F15EE at SJEXCHMB14 dot corp dot ad dot broadcom dot com>
Paulo Matos <pmatos@broadcom.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford [mailto:rdsandiford@googlemail.com]
>> Sent: 27 January 2014 16:06
>> To: Paulo Matos
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: Mode change for bswap pattern expansion
>>
>> Paulo Matos <pmatos@broadcom.com> writes:
>> > On a vector processor we can do a bswapsi with two instructions, by first
>> rotating half-words (16 bits) by 8 and then rotating full words by 16.
>> > However, this means expanding:
>> > (set (match_operand:SI 0 "register_operand" "")
>> > (bswap:SI (match_operand:SI 1 "register_operand" "")))
>> >
>> > to:
>> > (set (match_dup:V2HI 0)
>> > (rotate:V2HI (match_dup:V2HI 1)
>> > (const_int 8)))
>> > (set (match_dup:SI 0)
>> > (rotate:SI (match_dup:SI 0)
>> > (const_int 16)))
>> >
>> > This is obviously not correct, because match_dup cannot set the mode. The point
>> I am trying to make is that I can't find a good way to deal with the mode
>> changes. I don't think GCC is too happy if I change the modes of the
>> same operand
>> from one instruction to the other right? The only other way is to emit
>> paradoxical subregs. So something along these lines:
>> > (set (subreg:V2HI (match_dup 0) 0)
>> > (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
>> > (const_int 8)))
>> > (set (match_dup 0)
>> > (rotate:SI (match_dup 0)
>> > (const_int 16)))
>>
>> It's usually better not to hard-code the subregs in the pattern.
>> Instead you could use C code to create the subregs, e.g.:
>>
>> [(set (match_dup 3)
>> (rotate:V2HI (match_dup 2)
>> (const_int 8)))
>> (set (match_dup 0)
>> (rotate:SI (match_dup 4)
>> (const_int 16)))]
>> ""
>> {
>> operands[2] = gen_lowpart (V2HImode, operands[1]);
>> operands[3] = gen_reg_rtx (V2HImode);
>> operands[4] = gen_lowpart (SImode, operands[3]);
>> }
>>
>> so that any hard regs are correctly handled. Or it might be easier to code
>> it using emit_insn (gen_* (...))s instead.
>>
>> BTW, paradoxical subregs are where the outer mode is strictly larger
>> than the inner mode.
>>
>
> That's right. My mis-understanding.
>
>> MIPS uses essentially the same sequence, except that it has a special
>> instruction to do the first rotate (WSBH), rather than it being an instance
>> of a general vector rotate. For MIPS we just model it as an unspec SImode
>> operation. Maybe that would be easier here too.
>>
>
> I will look at how MIPS is doing it.
>
> However, the unspec SI has severe performance penalties on my port since
> it is able to issue more that one instruction per cycle, therefore
> having each instruction separately allows the scheduler to issue each of
> the bswapsi parts into different slots with other instructions.
Sorry, I meant we use an unspec for the first ("V2HI") rotate.
I.e. rather than:
(set (subreg:V2HI (match_dup 2) 0)
(rotate:V2HI (subreg:V2HI (match_dup 1) 0)
(const_int 8)))
(set (match_dup 0)
(rotate:SI (match_dup 2)
(const_int 16)))
we have:
(set (match_dup 2) (unspec:SI [(match_dup 1)] UNSPEC_FOO))
(set (match_dup 0)
(rotate:SI (match_dup 2)
(const_int 16)))
In your case the define_insn for the UNSPEC_FOO pattern would have the
same attributes as a V2HI rotate, so it should get scheduled in the same way.
Thanks,
Richard