This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/32735] i686 sse2 generates more movdqa than necessary



------- Comment #5 from ubizjak at gmail dot com  2007-07-12 08:22 -------
(In reply to comment #0)

> The loop for CallSumDeltas2 compiles to:
> 
> .L7:
>         movdqa  %xmm1, %xmm0
>         pslldq  $4, %xmm0
>         addl    $1, %eax
>         paddd   %xmm1, %xmm0
>         cmpl    $100000000, %eax
>         movdqa  %xmm0, %xmm1
>         pslldq  $8, %xmm1
>         paddd   %xmm1, %xmm0
>         movdqa  %xmm0, %xmm1
>         movdqa  %xmm0, foo1
>         jne     .L7
> 
> ===
> 
> This is two more movdqa then the hand-written code in CallSumDeltas3.

         paddd   %xmm1, %xmm0       (2)
         movdqa  %xmm0, %xmm1       (2)
         movdqa  %xmm0, foo1        (1)
         jne     .L7

(1) is assignment to a global variable. I'm not sure that it can be pushed out
of the loop, but this can be solved by adding a local temporary in
CallSumDeltas2().

(2) is probably regmove, failing to optimize:

(set (reg:V4SI 21 xmm0 [72])
     (plus:V4SI (reg:V4SI 21 xmm0 [69])
                (reg:V4SI 22 xmm1 [71]))) 843 {*addv4si3} (nil))

(set (reg:V2DI 22 xmm1 [orig:73 foo1 ] [73])
     (reg:V2DI 21 xmm0 [72])) 698 {*movv2di_internal} (nil))

into

(set (reg:V4SI 21 xmm1 [72])
     (plus:V4SI (reg:V4SI 21 xmm1 [69])
                (reg:V4SI 22 xmm0 [71]))) 843 {*addv4si3} (nil))


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]