This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully
- From: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- To: Rainer Orth <ro at cebitec dot uni-bielefeld dot de>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Jeff Law <law at redhat dot com>
- Date: Thu, 17 Sep 2015 17:32:40 +0100
- Subject: Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully
- Authentication-results: sourceware.org; auth=none
- References: <55F13D66 dot 9010207 at arm dot com> <yddpp1qcxe3 dot fsf at lokon dot CeBiTec dot Uni-Bielefeld dot DE> <55F197EB dot 3010404 at arm dot com> <yddr3m5bap3 dot fsf at lokon dot CeBiTec dot Uni-Bielefeld dot DE> <55F2F575 dot 6080609 at arm dot com> <yddmvwl8elj dot fsf at lokon dot CeBiTec dot Uni-Bielefeld dot DE>
Hi Rainer,
On 17/09/15 12:33, Rainer Orth wrote:
Hi Kyrill,
On 11/09/15 09:51, Rainer Orth wrote:
Kyrill Tkachov <kyrylo.tkachov@arm.com> writes:
On 10/09/15 12:43, Rainer Orth wrote:
Hi Kyrill,
Rainer, could you please check that this patch still fixes the SPARC
regressions?
unfortunately, it breaks sparc-sun-solaris2.10 bootstrap: compiling
stage2 libiberty/regex.c FAILs:
Thanks for providing the preprocessed file.
I've reproduced and fixed the ICE in this version of the patch.
The problem was that I was taking the mode of x before the check
of whether a and b are MEMs, after which we would change x to an
address_mode reg,
thus confusing emit_move_insn.
The fix is to take the mode of x and perform the
can_conditionally_move_p check
after that transformation.
Bootstrapped and tested on aarch64 and x86_64.
The preprocessed regex.i that Rainer provided now compiles successfully
for me
on a sparc-sun-solaris2.10 stage-1 cross-compiler.
Rainer, thanks for your help so far, could you please try out this patch?
While bootstrap succeeds again, the testsuite regression in
gcc.c-torture/execute/20071216-1.c reoccured.
Right, so I dug into the RTL dumps and I think this is a separate issue
that's being exacerbated by my patch.
The code tries to if-convert a block which contains a compare instruction
i.e. sets the CC register.
Now, bb_valid_for_noce_process_p should have caught this, and in particular
insn_valid_noce_process_p
which should check that the instruction doesn't set the CC
register. However, on SPARC the
cc_in_cond returns NULL! This is due to the canonicalize_comparison
implementation that seems to
remove the CC register from the condition expression and returns something like:
(leu (reg/v:SI 109 [ b ])
(const_int -4096 [0xfffffffffffff000])
Therefore the set_of (cc_in_cond (cond), insn) check doesn't get triggered
because cc_in_cond returns NULL.
Regardless of how the branch condition got canonicalized, I think we still
want to reject any insn in the block
that sets a condition code register, so this patch checks the destination
of every set in the block for a MODE_CC
expression and cancels if-conversion if that's the case.
Oleg pointed me to the older PR 58517 affecting SH which seems similar and
I think my previous ifcvt patch would expose
this problem more.
Anyway, with this patch the failing SPARC testcase
gcc.c-torture/execute/20071216-1.c generates the same assembly
as before r227368 and bootstrap and test on aarch64 and x86_64 passes ok for me.
Rainer, could you try this patch on top of the previous patch?
(https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00689.html)
The two together should fix all of PR 67456, 67464, 67465 and 67481.
sorry this took me so long: we've had a major switch failure and my
sparc machines are way slow.
No problem. You're doing me a huge favour by testing the iterations
of the patches. Sorry for causing the regression in the first place.
The issues I'm finding are exposed due to the way the sparc backend
does some things, so my aarch64 and x86_64 testing is unlikely to catch them.
Anyway, here's what I found: the two patches on top of each other do
bootstrap just fine and the gcc.c-torture/execute/20071216-1.c
regressions are gone. However, it introduces a new one:
FAIL: gcc.dg/torture/stackalign/sibcall-1.c -O1 -fpic execution test
It fails for both 32 and 64-bit. The testcase SEGVs:
Indeed, I can see if-conversion triggering here and doing something funky with the
first patch that I posted (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00689.html)
In this testcase we trigger the is_mem path through noce_try_cmove_arith
and we have the 'a' and 'b' having the form reg+symbol. When we try to move
them into a register with noce_emit_move_insn the resulting move is not recognised
(presumably sparc doesn't have any instructions/patterns to do this operation)
and the alternative tricks that noce_emit_move_insn tries to create the move
end up generating a bizzare sequence that involves loading from the memory at reg+symbol
expression and adding the base reg to it!
In any case, this patch doesn't try calling noce_emit_move_insn and instead generates a simple
SET expression, emits that and relies on end_ifcvt_sequence to call recog on it and cancel the
transformation if it's not a valid instruction. IMO this is the desired behaviour since the
move in question is supposed to be a simple move that would ideally be eliminated by the register
allocator later on if the dependencies work out. If it actually expands to more complex sequences
it's not going to be a win to if-convert anyway.
TLDR: This updated patch generates the same code for the sibcall-1.c testcase on sparc as before
the bad transformation and all the previous regressions are still fixed.
Bootstrapped and tested on aarch64 and x86_64.
Rainer, could you please try this patch in combination with the one I sent earlier at:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html
2015-09-17 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
PR rtl-optimization/67456
PR rtl-optimization/67464
PR rtl-optimization/67465
* ifcvt.c (noce_try_cmove_arith): Bail out if cannot conditionally
move in the mode of x. Handle combination of complex and simple
block pairs as well as the case when one is empty.
2015-09-17 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* gcc.dg/pr67465.c: New test.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0x00010bb0 in ix86_split_ashr (mode=1)
at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/torture/stackalign/sibcall-1.c:20
20 : gen_x86_64_shrd) (0);
(gdb) where
#0 0x00010bb0 in ix86_split_ashr (mode=1)
at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/torture/stackalign/sibcall-1.c:20
#1 0x00010be4 in main ()
at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/torture/stackalign/sibcall-1.c:27
1: x/i $pc
=> 0x10bb0 <ix86_split_ashr+40>: ld [ %g1 ], %g1
(gdb) p/x $g1
$1 = 0x317f8
truss reveals:
Incurred fault #6, FLTBOUNDS %pc = 0x00010BB0
siginfo: SIGSEGV SEGV_MAPERR addr=0x000317F8
Received signal #11, SIGSEGV [default]
siginfo: SIGSEGV SEGV_MAPERR addr=0x000317F8
with
#define FLTBOUNDS 6 /* Memory bounds (invalid address) */
and indeed that address isn't mapped according to
ro@apoc 58 > pmap 26536
26536: /var/gcc/regression/trunk/11-gcc/build/gcc/testsuite/gcc/sibcall-1.exe
00010000 8K r-x-- /var/gcc/regression/trunk/11-gcc/build/gcc/testsuite/gcc/sibcall-1.exe
00020000 8K rwx-- /var/gcc/regression/trunk/11-gcc/build/gcc/testsuite/gcc/sibcall-1.exe
FEE60000 696K r-x-- /lib/libm.so.2
FEF1C000 16K rwx-- /lib/libm.so.2
FF180000 1464K r-x-- /lib/libc.so.1
FF2FE000 48K rwx-- /lib/libc.so.1
FF350000 24K rwx-- [ anon ]
FF360000 8K rw--- [ anon ]
FF370000 8K rw--- [ anon ]
FF380000 8K rw--- [ anon ]
FF390000 8K rw--- [ anon ]
FF3A0000 248K r-x-- /lib/ld.so.1
FF3EE000 16K rwx-- /lib/ld.so.1
FFBFC000 16K rwx-- [ stack ]
total 2576K
Something is totally amiss here.
Rainer
commit 9c327def49735ab179b68f2301fc4623ee45c974
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date: Mon Sep 7 14:58:01 2015 +0100
[RTL-ifcvt] PR rtl-optimization/67465: Do not ifcvt complex blocks if the else block is empty
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 1e773d8..5b133f1 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2038,6 +2038,11 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
insn_a = if_info->insn_a;
insn_b = if_info->insn_b;
+ machine_mode x_mode = GET_MODE (x);
+
+ if (!can_conditionally_move_p (x_mode))
+ return FALSE;
+
unsigned int then_cost;
unsigned int else_cost;
if (insn_a)
@@ -2074,13 +2079,38 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
}
}
- if (!a_simple && then_bb && !b_simple && else_bb
+ if (then_bb && else_bb && !a_simple && !b_simple
&& (!bbs_ok_for_cmove_arith (then_bb, else_bb)
|| !bbs_ok_for_cmove_arith (else_bb, then_bb)))
return FALSE;
start_sequence ();
+ /* If one of the blocks is empty then the corresponding B or A value
+ came from the test block. The non-empty complex block that we will
+ emit might clobber the register used by B or A, so move it to a pseudo
+ first. */
+
+ if (b_simple || !else_bb)
+ {
+ rtx tmp_b = gen_reg_rtx (x_mode);
+ /* Perform the simplest kind of set. The register allocator
+ should remove it if it's not actually needed. If this set is not
+ a valid insn (can happen on the is_mem path) then end_ifcvt_sequence
+ will cancel the whole sequence. Don't try any of the fallback paths
+ from noce_emit_move_insn since we want this to be the simplest kind
+ of move. */
+ emit_insn (gen_rtx_SET (tmp_b, b));
+ b = tmp_b;
+ }
+
+ if (a_simple || !then_bb)
+ {
+ rtx tmp_a = gen_reg_rtx (x_mode);
+ emit_insn (gen_rtx_SET (tmp_a, a));
+ a = tmp_a;
+ }
+
orig_a = a;
orig_b = b;
diff --git a/gcc/testsuite/gcc.dg/pr67465.c b/gcc/testsuite/gcc.dg/pr67465.c
new file mode 100644
index 0000000..321fd38
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr67465.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -std=gnu99" } */
+
+int a, b, c, d, e, h;
+
+int
+fn1 (int p1)
+{
+ {
+ int g[2];
+ for (int i = 0; i < 1; i++)
+ g[i] = 0;
+ if (g[0] < c)
+ {
+ a = (unsigned) (1 ^ p1) % 2;
+ return 0;
+ }
+ }
+ return 0;
+}
+
+void
+fn2 ()
+{
+ for (h = 0; h < 1; h++)
+ {
+ for (int j = 0; j < 2; j++)
+ {
+ for (b = 1; b; b = 0)
+ a = 1;
+ for (; b < 1; b++)
+ ;
+ if (e)
+ continue;
+ a = 2;
+ }
+ fn1 (h);
+ short k = -16;
+ d = k > a;
+ }
+}
+
+int
+main ()
+{
+ fn2 ();
+
+ if (a != 2)
+ __builtin_abort ();
+
+ return 0;
+}
+