This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/44141] Redundant loads and stores generated for AMD bdver1 target
- From: "venkataramanan.kumar at amd dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 22 Mar 2012 13:17:34 +0000
- Subject: [Bug target/44141] Redundant loads and stores generated for AMD bdver1 target
- Auto-submitted: auto-generated
- References: <bug-44141-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44141
--- Comment #4 from Venkataramanan <venkataramanan.kumar at amd dot com> 2012-03-22 13:17:34 UTC ---
I dont have permission to confirm this bug.
Here is my analysis for the cause.
#(insn:TI 4886 4885 4888 132 (set (reg:V2DF 25 xmm4 [8797])
# (mult:V2DF (reg:V2DF 25 xmm4 [8795])
# (reg:V2DF 22 xmm1 [8758]))) ac.f90:499 1138 {*mulv2df3}
# (nil))
vmulpd %xmm1, %xmm4, %xmm4 # 4886 *mulv2df3/2 [length = 4]
We are forcing a conversion from V2DF to V4SF mode here for unaligned moves
when TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL is set.
(-----Snip ix86_expand_vector_move_misalign-----)
case V2DFmode:
if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
{
op0 = gen_lowpart (V4SFmode, op0);
op1 = gen_lowpart (V4SFmode, op1);
emit_insn (gen_sse_movups (op0, op1));
return;
}
(-----Snip-----)
This conversion generates RTL as shown below.
#(insn:TI 4888 4886 4890 132 (set (mem/c:V4SF (plus:DI (reg/f:DI 7 sp)
# (const_int 6136 [0x17f8])) [3 MEM[(real(kind=8)[26] *)&dclroo
+ 152B]+0 S16 A64])
# (unspec:V4SF [
# (reg:V4SF 25 xmm4 [8797])
# ] UNSPEC_MOVU)) ac.f90:499 1104 {*sse_movups}
# (expr_list:REG_DEAD (reg:V4SF 25 xmm4 [8797])
# (nil)))
vmovups %xmm4, 6136(%rsp) # 4888 *sse_movups/2 [length = 9]
Now GCC does not know how to come back to V2DF mode again. As Uros said, it
reloads through memory.
#(insn 4930 4929 8259 132 (set (reg:V4SF 23 xmm2)
# (unspec:V4SF [
# (mem/c:V4SF (plus:DI (reg/f:DI 7 sp)
# (const_int 6136 [0x17f8])) [3 MEM[(real(kind=8)[26]
*)&dclroo + 152B]+0 S16 A64])
# ] UNSPEC_MOVU)) ac.f90:503 1104 {*sse_movups}
# (nil))
vmovups 6136(%rsp), %xmm2 # 4930 *sse_movups/1 [length = 9]
#(insn:TI 8259 4930 8261 132 (set (mem/c:V4SF (plus:DI (reg/f:DI 7 sp)
# (const_int 240 [0xf0])) [12 %sfp+-11184 S16 A128])
# (reg:V4SF 23 xmm2)) ac.f90:503 1098 {*movv4sf_internal}
# (expr_list:REG_DEAD (reg:V4SF 23 xmm2)
# (nil)))
vmovaps %xmm2, 240(%rsp) # 8259 *movv4sf_internal/3 [length
= 9]
#(insn 8261 8259 4931 132 (set (reg:V2DF 23 xmm2)
# (mem/c:V2DF (plus:DI (reg/f:DI 7 sp)
# (const_int 240 [0xf0])) [12 %sfp+-11184 S16 A128])) ac.f90:503
1100 {*movv2df_internal}
# (nil))
vmovaps 240(%rsp), %xmm2 # 8261 *movv2df_internal/2 [length
= 9]
#(insn:TI 4931 8261 8260 132 (set (reg:V2DF 23 xmm2)
# (div:V2DF (reg:V2DF 23 xmm2)
# (mem/c:V2DF (plus:DI (reg/f:DI 7 sp)
# (const_int 6128 [0x17f0])) [3 MEM[(real(kind=8)[26]
*)&dclroo + 144B]+0 S16 A128]))) ac.f90:503 1144 {sse2_divv2df3}
# (nil))
vdivpd 6128(%rsp), %xmm2, %xmm2 # 4931 sse2_divv2df3/2 [length
= 9]