[committed] i386: Fix handling of SUBREGs in divv2sf3 [PR103842]
Jakub Jelinek
jakub@redhat.com
Tue Dec 28 10:02:56 GMT 2021
Hi!
register_operand predicate allows not just REGs, but also SUBREGs of REGs,
and for the latter lowpart_subreg might FAIL when trying to create paradoxical
SUBREG in some cases. For the input operand fixed by force_reg on it first,
for the output operand handled by always dividing into a fresh V4SFmode temporary
and emit_move_insn into the destination afterwards, that is also beneficial for
combine.
Bootstrapped/regtested on x86_64-linux and i686-linux, preapproved by Uros
in the PR, committed to trunk.
2021-12-28 Jakub Jelinek <jakub@redhat.com>
PR target/103842
* config/i386/mmx.md (divv2sf3): Use force_reg on op1. Always perform
divv4sf3 into a pseudo and emit_move_insn into operands[0].
* g++.dg/opt/pr103842.C: New test.
--- gcc/config/i386/mmx.md.jj 2021-12-27 10:59:22.562829474 +0100
+++ gcc/config/i386/mmx.md 2021-12-27 11:54:39.802851366 +0100
@@ -529,17 +529,19 @@ (define_expand "divv2sf3"
(match_operand:V2SF 2 "register_operand")))]
"TARGET_MMX_WITH_SSE"
{
- rtx op0 = lowpart_subreg (V4SFmode, operands[0],
- GET_MODE (operands[0]));
- rtx op1 = lowpart_subreg (V4SFmode, operands[1],
- GET_MODE (operands[1]));
+ rtx op1 = lowpart_subreg (V4SFmode, force_reg (V2SFmode, operands[1]),
+ V2SFmode);
rtx op2 = gen_rtx_VEC_CONCAT (V4SFmode, operands[2],
force_reg (V2SFmode, CONST1_RTX (V2SFmode)));
rtx tmp = gen_reg_rtx (V4SFmode);
emit_insn (gen_rtx_SET (tmp, op2));
+ rtx op0 = gen_reg_rtx (V4SFmode);
+
emit_insn (gen_divv4sf3 (op0, op1, tmp));
+
+ emit_move_insn (operands[0], lowpart_subreg (V2SFmode, op0, V4SFmode));
DONE;
})
--- gcc/testsuite/g++.dg/opt/pr103842.C.jj 2021-12-27 11:37:42.692248570 +0100
+++ gcc/testsuite/g++.dg/opt/pr103842.C 2021-12-27 11:38:48.489317229 +0100
@@ -0,0 +1,31 @@
+// PR target/103842
+// { dg-do compile }
+// { dg-options "-O3 -std=c++14" }
+
+void foo (float *);
+struct M {
+ float x[3][3];
+ float *operator[](int i) { return x[i]; }
+ M();
+ M(float f, float g) {
+ x[1][0] = x[1][1] = x[1][2] = f;
+ x[2][0] = g;
+ }
+ void bar();
+ M baz() {
+ M s(x[1][2] - x[1][2], x[1][1] - x[1][1]);
+ float r = s[2][0];
+ if (r)
+ for (int i = 0; i < 3; ++i)
+ for (int j = 0; j < 3; ++j)
+ s[i][j] /= r;
+ for (int i = 0;;) {
+ float *t = s[i];
+ foo(t);
+ }
+ }
+};
+void qux() {
+ M m, i = m.baz(), j = i;
+ j.bar();
+}
Jakub
More information about the Gcc-patches
mailing list