This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/22152] Poor loop optimization when using mmx builtins



------- Comment #4 from ubizjak at gmail dot com  2007-03-01 13:47 -------
Current mainline produces really horrible code:

.L4:
        movl    (%edx), %ebx
        addl    $1, %eax
        movl    4(%edx), %esi
        addl    $8, %edx
        movl    %ebx, 8(%esp)
        movl    %esi, 12(%esp)
        movq    8(%esp), %mm0
        paddq   (%ecx), %mm0
        addl    $8, %ecx
        cmpl    %edi, %eax
        movq    %mm0, 8(%esp)
        movl    8(%esp), %ebx
        movl    12(%esp), %esi
        jne     .L4

This is due to two problems:

1) For some reason, ivopts doesn't use fancy i386 addressing modes. -fno-ivopts
produces slightly better code:

.L4:
        movl    (%edi,%eax,8), %edx
        movl    4(%edi,%eax,8), %ecx
        movl    %edx, 8(%esp)
        movl    %ecx, 12(%esp)
        movq    8(%esp), %mm0
        paddq   (%esi,%eax,8), %mm0
        addl    $1, %eax
        cmpl    %eax, %ebx
        movq    %mm0, 8(%esp)
        movl    8(%esp), %edx
        movl    12(%esp), %ecx
        ja      .L4

2) A DImode register is used in the middle of RTL stream, following to this
reload sequence:

(insn:HI 21 20 53 4 (set (reg:DI 1 dx)
        (mem:DI (plus:SI (mult:SI (reg/v:SI 0 ax [orig:59 i ] [59])
                    (const_int 8 [0x8]))
                (reg/v/f:SI 5 di [orig:64 a ] [64])) [2 S8 A64])) 56 {*movdi_2}
(nil)
    (nil))

(insn 53 21 54 4 (set (mem/c:DI (plus:SI (reg/f:SI 7 sp)
                (const_int 8 [0x8])) [5 S8 A8])
        (reg:DI 1 dx)) 56 {*movdi_2} (nil)
    (nil))

(insn 54 53 22 4 (set (reg:DI 29 mm0)
        (mem/c:DI (plus:SI (reg/f:SI 7 sp)
                (const_int 8 [0x8])) [5 S8 A8])) 56 {*movdi_2} (nil)
    (nil))

(insn:HI 22 54 55 4 (set (reg:DI 29 mm0)
        (unspec:DI [
                (plus:DI (reg:DI 29 mm0)
                    (mem:DI (plus:SI (mult:SI (reg/v:SI 0 ax [orig:59 i ] [59])
                                (const_int 8 [0x8]))
                            (reg/v/f:SI 4 si [orig:65 b ] [65])) [2 S8 A64]))
            ] 38)) 612 {mmx_adddi3} (insn_list:REG_DEP_TRUE 21 (nil))
    (nil))


DImode register in insn 21 gets allocated to dx/cx DImode pair, but insn 22
wants mmx register. Reload then inserts insn 53 and 54 to satisfy input and
output constraints. The same story repeats at the end of the loop, but this
time dx/cx gets allocated to V2SImode pseudo (?!):

(insn:HI 24 55 26 4 (set (reg/v:V2SI 1 dx [orig:60 sum ] [60])
        (mem/c:V2SI (plus:SI (reg/f:SI 7 sp)
                (const_int 8 [0x8])) [5 S8 A8])) 581 {*movv2si_internal}
(insn_list:REG_DEP_TRUE 22 (nil))
    (nil))

In above case, mm0 register (in DImode) gets reloaded from mm0 (V2SImode) via
memory. It looks that mmx DImode _really_ upsets register allocator as it can
be allocated to either si/si register pair or to mmx register. Perhaps we need
V1DI mode to separate pure DImodes (either 2*32bit for i686 or 64bit for
x86_64) from mmx DImodes.

It is possible to change delicate allocation balance by changing register
preferences in movdi_2 and mov<mode>_internal MMX move patterns, but we really
need more robust solution for this problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22152


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]