Bug 53227 - [4.8 Regression] FAIL: gcc.target/i386/movbe-2.c scan-assembler-times movbe[ \t] 4
Summary: [4.8 Regression] FAIL: gcc.target/i386/movbe-2.c scan-assembler-times movbe[ ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.8.0
Assignee: Uroš Bizjak
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-04 07:11 UTC by Uroš Bizjak
Modified: 2012-05-06 21:00 UTC (History)
4 users (show)

See Also:
Host:
Target: i686
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-05-04 00:00:00


Attachments
Patch that improves DImode swap sequeces (822 bytes, patch)
2012-05-06 18:08 UTC, Uroš Bizjak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2012-05-04 07:11:05 UTC
Split from PR 53176, that changed lower-subreg to not split subregs early on x86.

Following testcase

--cut here--
extern long long x;

void
foo (long long i)
{
  x = __builtin_bswap64 (i);
}

long long
bar ()
{
  return __builtin_bswap64 (x);
}
--cut here--

compiled with -O2 -mmovbe -m32 on x86 target triggers RA to allocate non-optimal registers for "foo" (and forcing reload), while it is able to allocate optimal regs for "bar" case:

bar:
        movbe   x+4, %eax
        movbe   x, %edx
        ret

The situation with foo:

foo:
        pushl   %ebx
        movl    8(%esp), %eax
        movl    12(%esp), %edx
        movl    %eax, %ebx
        movl    %edx, %ecx
        bswap   %ebx
        bswap   %ecx
        movl    %ebx, x+4
        movl    %ecx, x
        popl    %ebx
        ret

Which is a noticeable regression from 4.7:

foo:
        movbe   4(%esp), %eax
        movbe   8(%esp), %edx
        movl    %eax, x+4
        movl    %edx, x
        ret

Adding -mregparm=2 does not improve things:

foo:
        pushl   %ebx
        movl    %edx, %ecx
        movl    %eax, %ebx
        bswap   %ecx
        bswap   %ebx
        movl    %ecx, x
        movl    %ebx, x+4
        popl    %ebx
        ret

while 4.7 generates:

foo:
        movbe   %edx, x
        movbe   %eax, x+4
        ret
Comment 1 Richard Biener 2012-05-04 08:58:36 UTC
Confirmed.
Comment 2 Ulrich Weigand 2012-05-04 16:03:35 UTC
Why do you consider this a reload/RA problem?  Code before ira looks like:

(insn 2 4 3 2 (set (reg/v:DI 62 [ i ])
        (mem/c:DI (reg/f:SI 16 argp) [2 i+0 S8 A32])) test.i:6 63 {*movdi_internal}
     (expr_list:REG_EQUIV (mem/c:DI (reg/f:SI 16 argp) [2 i+0 S8 A32])
        (nil)))

(insn 8 7 9 2 (clobber (reg:DI 60 [ D.1367 ])) test.i:7 -1
     (nil))

(insn 9 8 10 2 (set (subreg:SI (reg:DI 60 [ D.1367 ]) 0)
        (bswap:SI (subreg:SI (reg/v:DI 62 [ i ]) 4))) test.i:7 719 {*bswapsi2_movbe}
     (nil))

(insn 10 9 11 2 (set (subreg:SI (reg:DI 60 [ D.1367 ]) 4)
        (bswap:SI (subreg:SI (reg/v:DI 62 [ i ]) 0))) test.i:7 719 {*bswapsi2_movbe}
     (expr_list:REG_DEAD (reg/v:DI 62 [ i ])
        (nil)))

(insn 11 10 0 2 (set (mem/c:DI (symbol_ref:SI ("x") [flags 0x40]  <var_decl 0xb75e6a80 x>) [2 x+0 S8 A64])
        (reg:DI 60 [ D.1367 ])) test.i:7 63 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 60 [ D.1367 ])
        (nil)))

with the memory accesses both in DImode, but the bswap already split into SImode.  This causes the two DImode registers to be live at the same time, so RA cannot allocate the same register for both.

Given the limited register availability on i386, which allocation would you have suggested instead?

[ Note that I'd consider this a case where the moves certainly *ought* to have been split into SImode, because:
- on i386 the moves will be split later on anyway
- accesses to subregs of the registers being moved already happens elsewhere. ]
Comment 3 Uroš Bizjak 2012-05-04 16:22:57 UTC
(In reply to comment #2)
> Why do you consider this a reload/RA problem?  Code before ira looks like:

Indeed. The bar case works OK since access to memory is already expanded in a split way:

(insn 5 4 6 3 (set (reg:SI 65)
        (mem/c:SI (symbol_ref:SI ("x") [flags 0x40]  <var_decl 0x7f56b41c2140 x>) [2 x+0 S4 A64])) movbe-2.c:15 -1
     (nil))

(insn 6 5 7 3 (set (reg:SI 64)
        (bswap:SI (reg:SI 65))) movbe-2.c:15 -1
     (nil))

(insn 7 6 8 3 (set (reg:SI 67)
        (mem/c:SI (const:SI (plus:SI (symbol_ref:SI ("x") [flags 0x40]  <var_decl 0x7f56b41c2140 x>)
                    (const_int 4 [0x4]))) [2 x+4 S4 A32])) movbe-2.c:15 -1
     (nil))

(insn 8 7 9 3 (set (reg:SI 66)
        (bswap:SI (reg:SI 67))) movbe-2.c:15 -1
     (nil))

However, reload should notice that memory could be propagated into bswap. I'm not sure this would fix the problem.
Comment 4 Ulrich Weigand 2012-05-04 16:56:59 UTC
(In reply to comment #3)
> However, reload should notice that memory could be propagated into bswap.

Since register allocation already assigned a hard reg to the pseudo, reload is happy.  Reload doesn't generally attempt to go back and search for whether the value is also available in memory (unless it has to ...).

In general, it is preferable for earlier passes to leave an operand as MEM (if an insn accepts memory operands in some alternative), so that reload can then make the decision whether to use the MEM or to reload into a register.
Comment 5 Uroš Bizjak 2012-05-06 18:08:37 UTC
Created attachment 27327 [details]
Patch that improves DImode swap sequeces

Patch in testing.
Comment 6 Uroš Bizjak 2012-05-06 18:09:28 UTC
Target issue.
Comment 7 uros 2012-05-06 20:48:03 UTC
Author: uros
Date: Sun May  6 20:47:59 2012
New Revision: 187215

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187215
Log:
	PR target/53227
	* config/i386/i386.md (swap<mode>): Rename from *swap<mode>.
	(bswapdi2): Split from bswap<mode>2.  Use nonnimediate_operand
	predicate for operand 1.  Force operand 1 to register for TARGET_BSWAP.
	(bswapsi2): Ditto.
	(*bswapdi2_doubleword): New insn pattern.
	(*bswap<mode>2): Rename from *bswap<mode>2_1.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
Comment 8 Uroš Bizjak 2012-05-06 21:00:49 UTC
Fixed.