[Bug rtl-optimization/91154] [10 Regression] 456.hmmer regression on Haswell caused by r272922

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Jul 18 12:53:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91154

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
With

Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md     (revision 273567)
+++ gcc/config/i386/i386.md     (working copy)
@@ -17681,6 +17681,23 @@ (define_insn "<code><mode>3"
    (set_attr "type" "sseadd")
    (set_attr "mode" "<MODE>")])

+(define_expand "smaxsi3"
+ [(set (match_operand:SI 0 "register_operand")
+       (smax:SI
+        (match_operand:SI 1 "register_operand")
+       (match_operand:SI 2 "register_operand")))]
+ ""
+{
+  rtx vop1 = gen_reg_rtx (V4SImode);
+  rtx vop2 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_vec_setv4si_0 (vop1, CONST0_RTX (V4SImode), operands[1]));
+  emit_insn (gen_vec_setv4si_0 (vop2, CONST0_RTX (V4SImode), operands[2]));
+  rtx tem = gen_reg_rtx (V4SImode);
+  emit_insn (gen_smaxv4si3 (tem, vop1, vop2));
+  emit_move_insn (operands[0], lowpart_subreg (SImode, tem, V4SImode));
+  DONE;
+})
+
 ;; These versions of the min/max patterns implement exactly the operations
 ;;   min = (op1 < op2 ? op1 : op2)
 ;;   max = (!(op1 < op2) ? op1 : op2)


we generate

.L3:
        addl    (%rdx,%r8,4), %r9d
        movl    (%rcx,%r8,4), %eax
        addl    (%rsi,%r8,4), %eax
        vmovd   %r9d, %xmm1
        vmovd   %eax, %xmm0
        movq    %r8, %rax
        vpmaxsd %xmm1, %xmm0, %xmm0
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0
        vpmaxsd %xmm2, %xmm0, %xmm0
        vmovd   %xmm0, 4(%rdi,%r8,4)
        vmovd   %xmm0, %r9d
        incq    %r8
        cmpq    %rax, %r10
        jne     .L3

so we manage to catch the store as well but somehow

(insn:TI 35 27 37 4 (set (reg:V4SI 20 xmm0 [114])
        (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 20 xmm0 [107]))
            (const_vector:V4SI [
                    (const_int 0 [0]) repeated x4
                ])
            (const_int 1 [0x1]))) 2740 {vec_setv4si_0}
     (nil))

fails to be elided.  Maybe vec_setv4si_0 isn't the optimal representation
choice.  Ah, of course the zeros might end up invalidated by the earlier
max...  we can't say in RTL that we actually do not care about the
upper bits - can we?

Anyhow, while the above would fix the regression on Haswell we'd degrade
on Zen and in more isolated cmov cases it's clearly not going to be a win
as well.


More information about the Gcc-bugs mailing list