This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/32735] i686 sse2 generates more movdqa than necessary
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 12 Jul 2007 08:22:11 -0000
- Subject: [Bug target/32735] i686 sse2 generates more movdqa than necessary
- References: <bug-32735-11211@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #5 from ubizjak at gmail dot com 2007-07-12 08:22 -------
(In reply to comment #0)
> The loop for CallSumDeltas2 compiles to:
>
> .L7:
> movdqa %xmm1, %xmm0
> pslldq $4, %xmm0
> addl $1, %eax
> paddd %xmm1, %xmm0
> cmpl $100000000, %eax
> movdqa %xmm0, %xmm1
> pslldq $8, %xmm1
> paddd %xmm1, %xmm0
> movdqa %xmm0, %xmm1
> movdqa %xmm0, foo1
> jne .L7
>
> ===
>
> This is two more movdqa then the hand-written code in CallSumDeltas3.
paddd %xmm1, %xmm0 (2)
movdqa %xmm0, %xmm1 (2)
movdqa %xmm0, foo1 (1)
jne .L7
(1) is assignment to a global variable. I'm not sure that it can be pushed out
of the loop, but this can be solved by adding a local temporary in
CallSumDeltas2().
(2) is probably regmove, failing to optimize:
(set (reg:V4SI 21 xmm0 [72])
(plus:V4SI (reg:V4SI 21 xmm0 [69])
(reg:V4SI 22 xmm1 [71]))) 843 {*addv4si3} (nil))
(set (reg:V2DI 22 xmm1 [orig:73 foo1 ] [73])
(reg:V2DI 21 xmm0 [72])) 698 {*movv2di_internal} (nil))
into
(set (reg:V4SI 21 xmm1 [72])
(plus:V4SI (reg:V4SI 21 xmm1 [69])
(reg:V4SI 22 xmm0 [71]))) 843 {*addv4si3} (nil))
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735