This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [x86 PATCH] Provide bswapdi2 intrinisic on 32-bit IA-32
- From: "Uros Bizjak" <ubizjak at gmail dot com>
- To: "GCC Patches" <gcc-patches at gcc dot gnu dot org>
- Cc: "Roger Sayle" <roger at eyesopen dot com>
- Date: Mon, 12 Feb 2007 08:56:02 +0100
- Subject: Re: [x86 PATCH] Provide bswapdi2 intrinisic on 32-bit IA-32
Hello Roger!
> * config/i386/i386.md (bswapdi2_rex): Renamed from bswapdi2.
> (bswapdi2): New define_expand to provide 32-bit implementation.
>
> * gcc.target/i386/builtin-bswap-3.c: New test case.
Grr. Operator error, sleep deprivation, new mail client...
Here's the patch. Sorry for the inconvenience.
Index: config/i386/i386.md
! (define_insn "bswapdi2_rex"
As Jakub mentioned, there is no need for named pattern there.
[(set (match_operand:DI 0 "register_operand" "=r")
(bswap:DI (match_operand:DI 1 "register_operand" "0")))
(clobber (reg:CC FLAGS_REG))]
+ (define_expand "bswapdi2"
+ [(parallel [(set (match_operand:DI 0 "register_operand" "")
+ (bswap:DI (match_operand:DI 1 "register_operand" "0")))
Also, register constraint is not needed in expanders.
+ (clobber (reg:CC FLAGS_REG))])]
+ "TARGET_BSWAP"
+ {
+ if (!TARGET_64BIT)
+ {
+ rtx tmp1, tmp2;
+ tmp1 = gen_reg_rtx (SImode);
+ tmp2 = gen_reg_rtx (SImode);
+ emit_insn (gen_bswapsi2 (tmp1, gen_lowpart (SImode, operands[1])));
+ emit_insn (gen_bswapsi2 (tmp2, gen_highpart (SImode, operands[1])));
+ emit_move_insn (gen_lowpart (SImode, operands[0]), tmp2);
+ emit_move_insn (gen_highpart (SImode, operands[0]), tmp1);
+ DONE;
+ }
+ })
IMO we don't need to generate new registers and moves. Following code
produces eqivalent asm (and also handles -mregparm=3 case):
if (!TARGET_64BIT)
{
emit_insn (gen_bswapsi2 (gen_highpart (SImode, operands[0]),
gen_lowpart (SImode, operands[1])));
emit_insn (gen_bswapsi2 (gen_lowpart (SImode, operands[0]),
gen_highpart (SImode, operands[1])));
DONE;
(However, it is up to you which style you prefer.)
Otherwise, this patch is OK for mainline.
BTW: I don't know if you are aware about some missing middle-end
optimizations regarding byte swaps. If you are interested, they are
summarised in PR middle-end/29749.
Thanks,
Uros.