This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [i386] Scalar DImode instructions on XMM registers


On 05/25/2015 09:27 AM, Ilya Enkovich wrote:
2015-05-22 15:01 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
2015-05-22 11:53 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
2015-05-21 22:08 GMT+03:00 Vladimir Makarov <vmakarov@redhat.com>:
So, Ilya, to solve the problem you need to avoid sharing subregs for the
correct LRA/reload work.



Thanks a lot for your help! I'll fix it.

Ilya

I've fixed SUBREG sharing and got a missing spill. I added
--enable-checking=rtl to check other possible bugs. Spill/fill code
still seems incorrect because different sizes are used.  Shouldn't
block me though.

.L5:
         movl    16(%esp), %eax
         addl    $8, %esi
         movl    20(%esp), %edx
         movl    %eax, (%esp)
         movl    %edx, 4(%esp)
         call    counter@PLT
         movq    -8(%esi), %xmm0
         **movdqa  16(%esp), %xmm2**
         pand    %xmm0, %xmm2
         movdqa  %xmm2, %xmm0
         movd    %xmm2, %edx
         **movq    %xmm2, 16(%esp)**
         psrlq   $32, %xmm0
         movd    %xmm0, %eax
         orl     %edx, %eax
         jne     .L5

Thanks,
Ilya

I was wrong assuming reloads with wrong size shouldn't block me. These
reloads require memory to be aligned which is not always true. Here is
what I have in RTL now:

(insn 2 7 3 2 (set (reg/v:DI 93 [ l ])
         (mem/c:DI (reg/f:SI 16 argp) [1 l+0 S8 A32])) test.c:5 89
{*movdi_internal}
      (nil))
...
(insn 27 26 52 6 (set (subreg:V2DI (reg:DI 87 [ D.1822 ]) 0)
         (ior:V2DI (subreg:V2DI (reg:DI 99 [ D.1822 ]) 0)
             (subreg:V2DI (reg/v:DI 93 [ l ]) 0))) test.c:11 3489 {*iorv2di3}
      (expr_list:REG_DEAD (reg:DI 99 [ D.1822 ])
         (expr_list:REG_DEAD (reg/v:DI 93 [ l ])
             (nil))))

After reload I get:

(insn 2 7 75 2 (set (reg/v:DI 0 ax [orig:93 l ] [93])
         (mem/c:DI (plus:SI (reg/f:SI 7 sp)
                 (const_int 24 [0x18])) [1 l+0 S8 A32])) test.c:5 89
{*movdi_internal}
      (nil))
(insn 75 2 3 2 (set (mem/c:DI (reg/f:SI 7 sp) [3 %sfp+-16 S8 A64])
         (reg/v:DI 0 ax [orig:93 l ] [93])) test.c:5 89 {*movdi_internal}
      (nil))
...
(insn 27 26 52 6 (set (reg:V2DI 21 xmm0 [orig:87 D.1822 ] [87])
         (ior:V2DI (reg:V2DI 21 xmm0 [orig:99 D.1822 ] [99])
             (mem/c:V2DI (reg/f:SI 7 sp) [3 %sfp+-16 S16 A64])))
test.c:11 3489 {*iorv2di3}


'por' instruction requires memory to be aligned and fails in a bigger
testcase. There is also movdqa generated for esp by reload. May it
mean I still have some inconsistencies in the produced RTL? Probably I
should somehow transform loads and stores?
I'd start by looking at the AP->SP elimination step. What's the defined stack alignment and whether or not a dynamic stack realignment is needed. If you don't have all that setup properly prior to the allocators, then they're not going to know how what objects to align nor how to align them.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]