This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
I'd prefer you not use define_insn_and_split and just write
(define_split [(set (match_operand:HI 0 "q_regs_operand" "") (rotate:HI (match_dup 0) (const_int 8))) (clobber (reg:CC FLAGS_REG))] "(TARGET_USE_XCHGB || optimize_size) && reload_completed" [(set (match_dup 0) (bswap:HI (match_dup 0)))] "")
Please note that q_regs_operand matches ANY_QI_REG_P, which is in 64bit case not appropriate for %h modifier. We should use plain Q constraint to filter out correct registers.
and you should arrange something for bswaphi_lowpart as well.
bswaphi_lowpart is divided into two separate patterns, where the first shadows the second in case of (TARGET_XCHGB || optimize size). Both patterns can handle full register set, but the first can emit xchgb in case of Q regs.
Since these patterns handle full register set, we can channel HImode rotates of 8 through the same trap, looking for xchgb optimization opportunities (when Q register can be allocated). Fortunatelly, HImode rotates don't touch bits 16+, so rotates and xchgb can substitute each other. Rotates however touch FLAGS_REG, and we have to model this in bswaphi insn template.
BTW: Due to the fact, that alu1 type attribute expects operand[1], the compilation breaks. The best solution for this is to define correct length attribute, so operands are not futrther analyzed to determine values of other attributes.
__builtin_bswap32 now generates: rolw $8, %ax roll $16, %eax rolw $8, %ax
xchgb %ah, %al roll $16, %eax xchgb %ah, %al
BTW2: There is also an interesting possibility for a subreg optimization. For following testcase:
unsigned long long int bs(unsigned long long int x) { return __builtin_bswap64(x); }
we generate (-O2 -mregparm=3 -fomit-frame-pointer): bs: movl %eax, %ecx movl %edx, %eax rolw $8, %cx rolw $8, %ax roll $16, %ecx rolw $8, %cx roll $16, %eax rolw $8, %ax movl %ecx, %edx ret
Attached patch was bootstrapped on i686-pc-linux-gnu and regression tested for all default languages. A couple of builtin_bswap tests were changed, as we generate byte swap sequences for other targets too.
* config/i386/i386.h (x86_use_xchgb): New. (TARGET_USE_XCHGB): New macro. * config/i386/i386.c (x86_use_xchgb): Set for PENT4. * config/i386/i386.md (*rotlhi3_1 splitter, *rotrhi3_1 splitter): Split after reload into bswaphi for shifts of 8. (bswaphi_lowpart): Generate rolw insn for HImode byte swaps. (*bswaphi_lowpart_1): Generate xchgb for Q registers for TARGET_XCHGB or when optimizing for size.
* gcc.target/i386/builtin-bswap-1.c: Remove -march=nocona. * gcc.target/i386/builtin-bswap-3.c: Ditto. * gcc.target/i386/xchg-1.c: New test. * gcc.target/i386/xchg-2.c: New test.
Attachment:
i386-xchgb-2.diff
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |