[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

xuepeng.guo at intel dot com gcc-bugzilla@gcc.gnu.org
Wed Nov 14 02:51:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

--- Comment #4 from Terry Guo <xuepeng.guo at intel dot com> ---
(In reply to Uroš Bizjak from comment #2)
> Following testcase:
> 
> --cut here--
> typedef int V __attribute__((vector_size (8)));
> 
> void foo (int x, int y)
> {
>   register int a __asm ("xmm1");
>   register int b __asm ("xmm2");
>   register V c __asm ("xmm3");
>   a = x;
>   b = y;
>   asm volatile ("" : "+v" (a), "+v" (b));
>   c = (V) { a, b };
>   asm volatile ("" : "+v" (c));
> }
> --cut here--
> 
> gets compiled with -O2 -mavx -mtune=intel:
> 
>         vmovd   %edi, %xmm1
>         vmovd   %esi, %xmm2
>         vmovd   %xmm2, %eax
>         vpinsrd $1, %eax, %xmm1, %xmm3
>         ret
> 
> The relevant pattern is defined as:
> 
> (define_insn "*vec_concatv2si_sse4_1"
>   [(set (match_operand:V2SI 0 "register_operand"
> 	  "=Yr,*x, x, v,Yr,*x, v, v, *y,*y")
> 	(vec_concat:V2SI
> 	  (match_operand:SI 1 "nonimmediate_operand"
> 	  "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
> 	  (match_operand:SI 2 "nonimm_or_0_operand"
> 	  " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
>   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>   "@
>    pinsrd\t{$1, %2, %0|%0, %2, 1}
>    pinsrd\t{$1, %2, %0|%0, %2, 1}
>    vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
>    vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
>    punpckldq\t{%2, %0|%0, %2}
>    punpckldq\t{%2, %0|%0, %2}
>    vpunpckldq\t{%2, %1, %0|%0, %1, %2}
>    %vmovd\t{%1, %0|%0, %1}
>    punpckldq\t{%2, %0|%0, %2}
>    movd\t{%1, %0|%0, %1}"
> 
> but for some reason RA chooses alternative 2 (x<-x,rm) instead of
> alternative 6 (v<-Yv,Yv), although alternative 2 needs an extra reload from
> %xmm2 to %eax.

I dig this a bit and looks like we missed something in combine pass, hence fail
to get a pattern that can match alternative 6. The combine pass dump of old gcc
shows:
-------------------
      REG_UNUSED flags:CC
insn_cost 4 for    10: r82:SI=xmm16:SI
      REG_DEAD xmm16:SI
insn_cost 4 for    11: r83:SI=xmm17:SI
      REG_DEAD xmm17:SI
insn_cost 4 for    12: r87:V2SI=vec_concat(r82:SI,r83:SI)
      REG_DEAD r83:SI
      REG_DEAD r82:SI
-------------------

then we got:
-------------------
Trying 10 -> 12:
   10: r82:SI=xmm16:SI
      REG_DEAD xmm16:SI
   12: r87:V2SI=vec_concat(r82:SI,r83:SI)
      REG_DEAD r83:SI
      REG_DEAD r82:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
    (vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ])
        (reg:SI 83 [ b.1_2 ])))
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 10.
modifying insn i3    12: r87:V2SI=vec_concat(xmm16:SI,r83:SI)
      REG_DEAD xmm16:SI
      REG_DEAD r83:SI
deferring rescan insn with uid = 12.

Trying 11 -> 12:
   11: r83:SI=xmm17:SI
      REG_DEAD xmm17:SI
   12: r87:V2SI=vec_concat(xmm16:SI,r83:SI)
      REG_DEAD xmm16:SI
      REG_DEAD r83:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
    (vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ])
        (reg/v:SI 53 xmm17 [ b ])))
allowing combination of insns 11 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i3    12: r87:V2SI=vec_concat(xmm16:SI,xmm17:SI)
      REG_DEAD xmm17:SI
      REG_DEAD xmm16:SI
deferring rescan insn with uid = 12.
-------------------

There are two successful combine attempts. We end up with pattern that can
match alternative 6.

However dump from current GCC trunk shows:
-------------------
insn_cost 4 for    19: r90:SI=xmm16:SI
      REG_DEAD xmm16:SI
insn_cost 4 for    10: r82:SI=r90:SI
      REG_DEAD r90:SI
insn_cost 4 for    20: r91:SI=xmm17:SI
      REG_DEAD xmm17:SI
insn_cost 4 for    11: r83:SI=r91:SI
      REG_DEAD r91:SI
insn_cost 4 for    12: r87:V2SI=vec_concat(r82:SI,r83:SI)
      REG_DEAD r83:SI
      REG_DEAD r82:SI
insn_cost 4 for    13: xmm3:V2SI=r87:V2SI
      REG_DEAD r87:V2SI
-------------------
Trying 11 -> 12:
   11: r83:SI=r91:SI
      REG_DEAD r91:SI
   12: r87:V2SI=vec_concat(r90:SI,r83:SI)
      REG_DEAD r90:SI
      REG_DEAD r83:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
    (vec_concat:V2SI (reg:SI 90)
        (reg:SI 91)))
allowing combination of insns 11 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i3    12: r87:V2SI=vec_concat(r90:SI,r91:SI)
      REG_DEAD r91:SI
      REG_DEAD r90:SI
deferring rescan insn with uid = 12.
-------------------

We end up with "12: r87:V2SI=vec_concat(r90:SI,r91:SI)", later in LRA pass, the
operand r90 is replaced with XMM register, the r91 is kept as general register.
Then no chance match against preferred alternative 6.


More information about the Gcc-bugs mailing list