[Bug rtl-optimization/91154] [10 Regression] 456.hmmer regression on Haswell caused by r272922
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Jul 18 12:53:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91154
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
With
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md (revision 273567)
+++ gcc/config/i386/i386.md (working copy)
@@ -17681,6 +17681,23 @@ (define_insn "<code><mode>3"
(set_attr "type" "sseadd")
(set_attr "mode" "<MODE>")])
+(define_expand "smaxsi3"
+ [(set (match_operand:SI 0 "register_operand")
+ (smax:SI
+ (match_operand:SI 1 "register_operand")
+ (match_operand:SI 2 "register_operand")))]
+ ""
+{
+ rtx vop1 = gen_reg_rtx (V4SImode);
+ rtx vop2 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_vec_setv4si_0 (vop1, CONST0_RTX (V4SImode), operands[1]));
+ emit_insn (gen_vec_setv4si_0 (vop2, CONST0_RTX (V4SImode), operands[2]));
+ rtx tem = gen_reg_rtx (V4SImode);
+ emit_insn (gen_smaxv4si3 (tem, vop1, vop2));
+ emit_move_insn (operands[0], lowpart_subreg (SImode, tem, V4SImode));
+ DONE;
+})
+
;; These versions of the min/max patterns implement exactly the operations
;; min = (op1 < op2 ? op1 : op2)
;; max = (!(op1 < op2) ? op1 : op2)
we generate
.L3:
addl (%rdx,%r8,4), %r9d
movl (%rcx,%r8,4), %eax
addl (%rsi,%r8,4), %eax
vmovd %r9d, %xmm1
vmovd %eax, %xmm0
movq %r8, %rax
vpmaxsd %xmm1, %xmm0, %xmm0
vinsertps $0xe, %xmm0, %xmm0, %xmm0
vpmaxsd %xmm2, %xmm0, %xmm0
vmovd %xmm0, 4(%rdi,%r8,4)
vmovd %xmm0, %r9d
incq %r8
cmpq %rax, %r10
jne .L3
so we manage to catch the store as well but somehow
(insn:TI 35 27 37 4 (set (reg:V4SI 20 xmm0 [114])
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 20 xmm0 [107]))
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int 1 [0x1]))) 2740 {vec_setv4si_0}
(nil))
fails to be elided. Maybe vec_setv4si_0 isn't the optimal representation
choice. Ah, of course the zeros might end up invalidated by the earlier
max... we can't say in RTL that we actually do not care about the
upper bits - can we?
Anyhow, while the above would fix the regression on Haswell we'd degrade
on Zen and in more isolated cmov cases it's clearly not going to be a win
as well.
More information about the Gcc-bugs
mailing list