This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: generic and i386 bswap improvements


On 2/15/07, Richard Henderson <rth@redhat.com> wrote:

I'd prefer you not use define_insn_and_split and just write

(define_split
  [(set (match_operand:HI 0 "q_regs_operand" "")
        (rotate:HI (match_dup 0) (const_int 8)))
   (clobber (reg:CC FLAGS_REG))]
  "(TARGET_USE_XCHGB || optimize_size) && reload_completed"
  [(set (match_dup 0) (bswap:HI (match_dup 0)))]
  "")

Please note that q_regs_operand matches ANY_QI_REG_P, which is in 64bit case not appropriate for %h modifier. We should use plain Q constraint to filter out correct registers.

and you should arrange something for bswaphi_lowpart as well.

Attahed to this message, please find another solution:


bswaphi_lowpart is divided into two separate patterns, where the first
shadows the second in case of (TARGET_XCHGB || optimize size). Both
patterns can handle full register set, but the first can emit xchgb in
case of Q regs.

Since these patterns handle full register set, we can channel HImode
rotates of 8 through the same trap, looking for xchgb optimization
opportunities (when Q register can be allocated). Fortunatelly, HImode
rotates don't touch bits 16+, so rotates and xchgb can substitute each
other. Rotates however touch FLAGS_REG, and we have to model this in
bswaphi insn template.

BTW: Due to the fact, that alu1 type attribute expects operand[1], the
compilation breaks. The best solution for this is to define correct
length attribute, so operands are not futrther analyzed to determine
values of other attributes.

__builtin_bswap32 now generates:
       rolw    $8, %ax
       roll    $16, %eax
       rolw    $8, %ax

or in case of pentium4 or optimization for size:

       xchgb   %ah, %al
       roll    $16, %eax
       xchgb   %ah, %al

BTW2: There is also an interesting possibility for a subreg
optimization. For following testcase:

unsigned long long int bs(unsigned long long int x)
{
 return __builtin_bswap64(x);
}

we generate (-O2 -mregparm=3 -fomit-frame-pointer):
bs:
       movl    %eax, %ecx
       movl    %edx, %eax
       rolw    $8, %cx
       rolw    $8, %ax
       roll    $16, %ecx
       rolw    $8, %cx
       roll    $16, %eax
       rolw    $8, %ax
       movl    %ecx, %edx
       ret

In the above example, moves are not needed.

Attached patch was bootstrapped on i686-pc-linux-gnu and regression
tested for all default languages. A couple of builtin_bswap tests were
changed, as we generate byte swap sequences for  other targets too.

2007-02-16 Uros Bizjak <ubizjak@gmail.com>

       * config/i386/i386.h (x86_use_xchgb): New.
       (TARGET_USE_XCHGB): New macro.
       * config/i386/i386.c (x86_use_xchgb): Set for PENT4.
	* config/i386/i386.md (*rotlhi3_1 splitter, *rotrhi3_1 splitter):
	Split after reload into bswaphi for shifts of 8.
	(bswaphi_lowpart): Generate rolw insn for HImode byte swaps.
	(*bswaphi_lowpart_1): Generate xchgb for Q registers for TARGET_XCHGB
	or when optimizing for size.

testsuite/ChangeLog:

2007-02-16 Uros Bizjak <ubizjak@gmail.com>

	* gcc.target/i386/builtin-bswap-1.c: Remove -march=nocona.
	* gcc.target/i386/builtin-bswap-3.c: Ditto.
	* gcc.target/i386/xchg-1.c: New test.
	* gcc.target/i386/xchg-2.c: New test.

OK for mainline?

Uros.

Attachment: i386-xchgb-2.diff
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]