Split from PR 53176, that changed lower-subreg to not split subregs early on x86. Following testcase --cut here-- extern long long x; void foo (long long i) { x = __builtin_bswap64 (i); } long long bar () { return __builtin_bswap64 (x); } --cut here-- compiled with -O2 -mmovbe -m32 on x86 target triggers RA to allocate non-optimal registers for "foo" (and forcing reload), while it is able to allocate optimal regs for "bar" case: bar: movbe x+4, %eax movbe x, %edx ret The situation with foo: foo: pushl %ebx movl 8(%esp), %eax movl 12(%esp), %edx movl %eax, %ebx movl %edx, %ecx bswap %ebx bswap %ecx movl %ebx, x+4 movl %ecx, x popl %ebx ret Which is a noticeable regression from 4.7: foo: movbe 4(%esp), %eax movbe 8(%esp), %edx movl %eax, x+4 movl %edx, x ret Adding -mregparm=2 does not improve things: foo: pushl %ebx movl %edx, %ecx movl %eax, %ebx bswap %ecx bswap %ebx movl %ecx, x movl %ebx, x+4 popl %ebx ret while 4.7 generates: foo: movbe %edx, x movbe %eax, x+4 ret
Confirmed.
Why do you consider this a reload/RA problem? Code before ira looks like: (insn 2 4 3 2 (set (reg/v:DI 62 [ i ]) (mem/c:DI (reg/f:SI 16 argp) [2 i+0 S8 A32])) test.i:6 63 {*movdi_internal} (expr_list:REG_EQUIV (mem/c:DI (reg/f:SI 16 argp) [2 i+0 S8 A32]) (nil))) (insn 8 7 9 2 (clobber (reg:DI 60 [ D.1367 ])) test.i:7 -1 (nil)) (insn 9 8 10 2 (set (subreg:SI (reg:DI 60 [ D.1367 ]) 0) (bswap:SI (subreg:SI (reg/v:DI 62 [ i ]) 4))) test.i:7 719 {*bswapsi2_movbe} (nil)) (insn 10 9 11 2 (set (subreg:SI (reg:DI 60 [ D.1367 ]) 4) (bswap:SI (subreg:SI (reg/v:DI 62 [ i ]) 0))) test.i:7 719 {*bswapsi2_movbe} (expr_list:REG_DEAD (reg/v:DI 62 [ i ]) (nil))) (insn 11 10 0 2 (set (mem/c:DI (symbol_ref:SI ("x") [flags 0x40] <var_decl 0xb75e6a80 x>) [2 x+0 S8 A64]) (reg:DI 60 [ D.1367 ])) test.i:7 63 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 60 [ D.1367 ]) (nil))) with the memory accesses both in DImode, but the bswap already split into SImode. This causes the two DImode registers to be live at the same time, so RA cannot allocate the same register for both. Given the limited register availability on i386, which allocation would you have suggested instead? [ Note that I'd consider this a case where the moves certainly *ought* to have been split into SImode, because: - on i386 the moves will be split later on anyway - accesses to subregs of the registers being moved already happens elsewhere. ]
(In reply to comment #2) > Why do you consider this a reload/RA problem? Code before ira looks like: Indeed. The bar case works OK since access to memory is already expanded in a split way: (insn 5 4 6 3 (set (reg:SI 65) (mem/c:SI (symbol_ref:SI ("x") [flags 0x40] <var_decl 0x7f56b41c2140 x>) [2 x+0 S4 A64])) movbe-2.c:15 -1 (nil)) (insn 6 5 7 3 (set (reg:SI 64) (bswap:SI (reg:SI 65))) movbe-2.c:15 -1 (nil)) (insn 7 6 8 3 (set (reg:SI 67) (mem/c:SI (const:SI (plus:SI (symbol_ref:SI ("x") [flags 0x40] <var_decl 0x7f56b41c2140 x>) (const_int 4 [0x4]))) [2 x+4 S4 A32])) movbe-2.c:15 -1 (nil)) (insn 8 7 9 3 (set (reg:SI 66) (bswap:SI (reg:SI 67))) movbe-2.c:15 -1 (nil)) However, reload should notice that memory could be propagated into bswap. I'm not sure this would fix the problem.
(In reply to comment #3) > However, reload should notice that memory could be propagated into bswap. Since register allocation already assigned a hard reg to the pseudo, reload is happy. Reload doesn't generally attempt to go back and search for whether the value is also available in memory (unless it has to ...). In general, it is preferable for earlier passes to leave an operand as MEM (if an insn accepts memory operands in some alternative), so that reload can then make the decision whether to use the MEM or to reload into a register.
Created attachment 27327 [details] Patch that improves DImode swap sequeces Patch in testing.
Target issue.
Author: uros Date: Sun May 6 20:47:59 2012 New Revision: 187215 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187215 Log: PR target/53227 * config/i386/i386.md (swap<mode>): Rename from *swap<mode>. (bswapdi2): Split from bswap<mode>2. Use nonnimediate_operand predicate for operand 1. Force operand 1 to register for TARGET_BSWAP. (bswapsi2): Ditto. (*bswapdi2_doubleword): New insn pattern. (*bswap<mode>2): Rename from *bswap<mode>2_1. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
Fixed.