This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] MIPS16/GCC: Emit bounds checking as RTL in `casesi'
- From: "Maciej W. Rozycki" <macro at imgtec dot com>
- To: Catherine Moore <clm at codesourcery dot com>, Matthew Fortune <Matthew dot Fortune at imgtec dot com>
- Cc: Peter Bergner <bergner at vnet dot ibm dot com>, <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 7 Jun 2017 08:37:32 +0100
- Subject: [PATCH] MIPS16/GCC: Emit bounds checking as RTL in `casesi'
- Authentication-results: sourceware.org; auth=none
Hi,
Further to my changes made last November here is an update to the MIPS16
`casesi' pattern making it emit bounds checking as RTL rather than having
it as hardcoded assembly within the `casesi_internal_mips16_<mode>'
dispatcher. See below for how PR tree-optimization/51513 has prevented me
from proceeding back then.
This new arrangement has several advantages:
1. There is no hardcoded BTEQZ branch instruction that has a limited span
and can overflow causing an assembly failure if the target label is too
distant. GCC is able to relax out of range MIPS16 branches these days,
but obviously they have to be expressed in RTL rather than buried in
assembly code. This overflow can be easily reproduced; please enquire
for a boring example if interested.
2. The `casesi_internal_mips16_<mode>' pattern now has an accurate length
(aka instruction count) recorded as all its remaining emitted assembly
instructions are known in advance to fit in their regular (16-bit)
machine encodings. Previously there was uncertainty about the SLTU and
BTEQZ instructions used for the bounds check, which depending on their
exact arguments could have each resulted in their either regular
(16-bit) or extended (32-bit) encoding. Consequently the worst length
estimate was recorded instead, possibly causing worse code generation
(e.g. premature out-of-range branch relaxation or jump table expansion
from half-words to full words).
3. GCC now has freedom to schedule code around bounds checking as it sees
fit rather than having to adapt to the fixed assembly block. In fact
it tends to make use of it right away swapping BTEQZ for the BTNEZ
instruction and rescheduling code such that the out-of-bounds (default)
case executes linearly.
There are probably more benefits, but these are what has immediately come
to my mind.
As noted above I meant to propose this change along with the rest so as
to have it in GCC 7, however emitting the bounds check as a separate RTL
pattern triggered an unrelated bug, then unknown to me, causing a fatal
code generation regression, and the lack of time did not allow me to
investigate it further. This was easily reproduced with a piece of code
(reduced from actual Linux kernel code) like this:
$ cat frob.c
int
frob (int i)
{
switch (i)
{
case -5:
return -2;
case -3:
return -1;
case 0:
return 0;
case 3:
return 1;
case 5:
break;
default:
__builtin_unreachable ();
}
return i;
}
producing truncated assembly like this:
$ gcc -O2 -mips16 -mcode-readable=yes -dp -S frob.c
$ cat frob.s
.file 1 "frob.c"
.section .mdebug.abi32
.previous
.nan legacy
.module fp=32
.module nooddspreg
.abicalls
.option pic0
.text
.align 2
.weak frob
.set mips16
.set nomicromips
.ent frob
.type frob, @function
frob:
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
addiu $2,$4,5 # 11 *addsi3_mips16/7 [length = 2]
.end frob
.size frob, .-frob
.ident "GCC: (GNU) 7.0.0 20161117 (experimental)"
$
-- where as you can see the whole switch statement has vanished along with
any return path from the function, and only the preparatory addition
emitted from `casesi' with:
emit_insn (gen_addsi3 (reg, operands[0], offset));
remained.
The causing bug has turned out to be what was filed as PR
tree-optimization/51513 and has been kindly fixed by Peter recently
(thanks, Peter!) with r247844 ("Fix PR51513, switch statement with default
case containing __builtin_unreachable leads to wild branch"),
<https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00739.html>, enabling me to
proceed with this change without having to investigate the cause of code
breakage -- which for the MIPS16 target has clearly turned out to be
graver than a mere silly branch never to be executed.
Given the previous troubles with this change I have decided to add
MIPS16 test cases to verify that code truncation has not happened,
complementing gcc.target/powerpc/pr51513.c, in case further tweaks in this
area might do something bad. This would be caught by
gcc.target/mips/insn-casesi.c added with r242424, but that test case does
not refer to PR tree-optimization/51513, so let's make it explicit. With
the PR tree-optimization/51513 fix removed the two new cases indeed cause:
FAIL: gcc.target/mips/pr51513-1.c -O2 scan-assembler \tjrc?\t\\$31\n
FAIL: gcc.target/mips/pr51513-1.c -O3 -g scan-assembler \tjrc?\t\\$31\n
FAIL: gcc.target/mips/pr51513-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler \tjrc?\t\\$31\n
FAIL: gcc.target/mips/pr51513-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler \tjrc?\t\\$31\n
FAIL: gcc.target/mips/pr51513-2.c -O2 execution test
FAIL: gcc.target/mips/pr51513-2.c -O3 -g execution test
FAIL: gcc.target/mips/pr51513-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
and overall (when executed individually):
=== gcc Summary ===
# of expected passes 21
# of unexpected failures 7
Also since all the assembly templates in `casesi_internal_mips16_<mode>'
must have been changed due to the rearrangement of operands, I have
decided to make their formatting consistent with our templates used
elsewhere, that is to remove the space between `,' and the following
operand. This in turn required a small tweak to
gcc.target/mips/data-sym-jump.c. I hope it is fine to have this along
with the rest as a single change.
As a further update we could probably emit assembly instructions
currently produced with `casesi_internal_mips16_<mode>' individually as
RTL instructions, however the result would likely not be as spectacular;
certainly not as the branch overflow fix is, though it could possibly
enable some optimisations in certain cases. So I'm leaving it as an
exercise for whoever feels compelled to do it.
gcc/
* config/mips/mips.md (MIPS16_T_REGNUM): Remove constant.
(casesi): Emit bounds checking as RTL.
(casesi_internal_mips16_<mode>): Remove bounds checking.
gcc/testsuite/
* gcc.target/mips/data-sym-jump.c: Adjust for whitespace
changes.
* gcc.target/mips/pr51513-1.c: New test.
* gcc.target/mips/pr51513-2.c: New test.
No regressions in o32 little-endian MIPS16 multilib testing. OK to
apply?
Maciej
gcc-mips16-casesi-rtl.diff
Index: gcc/gcc/config/mips/mips.md
===================================================================
--- gcc.orig/gcc/config/mips/mips.md 2017-06-06 21:53:32.979416504 +0100
+++ gcc/gcc/config/mips/mips.md 2017-06-06 21:53:48.596362464 +0100
@@ -162,7 +162,6 @@
[(TLS_GET_TP_REGNUM 3)
(GET_FCSR_REGNUM 2)
(SET_FCSR_REGNUM 4)
- (MIPS16_T_REGNUM 24)
(PIC_FUNCTION_ADDR_REGNUM 25)
(RETURN_ADDR_REGNUM 31)
(CPRESTORE_SLOT_REGNUM 76)
@@ -6389,68 +6388,57 @@
if (!arith_operand (operands[0], SImode))
operands[0] = force_reg (SImode, operands[0]);
- operands[2] = GEN_INT (INTVAL (operands[2]) + 1);
-
+ emit_cmp_and_jump_insns (operands[0], operands[2], GTU,
+ NULL_RTX, SImode, 1, operands[4]);
emit_jump_insn (PMODE_INSN (gen_casesi_internal_mips16,
- (operands[0], operands[2],
- operands[3], operands[4])));
-
+ (operands[0], operands[3])));
DONE;
})
(define_insn "casesi_internal_mips16_<mode>"
[(set (pc)
- (if_then_else
- (ltu (match_operand:SI 0 "register_operand" "d")
- (match_operand:SI 1 "arith_operand" "dI"))
- (unspec:P
- [(match_dup 0)
- (label_ref (match_operand 2 "" ""))]
- UNSPEC_CASESI_DISPATCH)
- (label_ref (match_operand 3 "" ""))))
- (clobber (match_scratch:P 4 "=d"))
- (clobber (match_scratch:P 5 "=d"))
- (clobber (reg:SI MIPS16_T_REGNUM))]
+ (unspec:P [(match_operand:SI 0 "register_operand" "d")
+ (label_ref (match_operand 1 "" ""))]
+ UNSPEC_CASESI_DISPATCH))
+ (clobber (match_scratch:P 2 "=d"))
+ (clobber (match_scratch:P 3 "=d"))]
"TARGET_MIPS16_SHORT_JUMP_TABLES"
{
- rtx diff_vec = PATTERN (NEXT_INSN (as_a <rtx_insn *> (operands[2])));
+ rtx diff_vec = PATTERN (NEXT_INSN (as_a <rtx_insn *> (operands[1])));
gcc_assert (GET_CODE (diff_vec) == ADDR_DIFF_VEC);
-
- output_asm_insn ("sltu\t%0, %1", operands);
- output_asm_insn ("bteqz\t%3", operands);
-
+
switch (GET_MODE (diff_vec))
{
case HImode:
- output_asm_insn ("sll\t%5, %0, 1", operands);
- output_asm_insn ("<d>la\t%4, %2", operands);
- output_asm_insn ("<d>addu\t%5, %4, %5", operands);
- output_asm_insn ("lh\t%5, 0(%5)", operands);
+ output_asm_insn ("sll\t%3,%0,1", operands);
+ output_asm_insn ("<d>la\t%2,%1", operands);
+ output_asm_insn ("<d>addu\t%3,%2,%3", operands);
+ output_asm_insn ("lh\t%3,0(%3)", operands);
break;
case SImode:
- output_asm_insn ("sll\t%5, %0, 2", operands);
- output_asm_insn ("<d>la\t%4, %2", operands);
- output_asm_insn ("<d>addu\t%5, %4, %5", operands);
- output_asm_insn ("lw\t%5, 0(%5)", operands);
+ output_asm_insn ("sll\t%3,%0,2", operands);
+ output_asm_insn ("<d>la\t%2,%1", operands);
+ output_asm_insn ("<d>addu\t%3,%2,%3", operands);
+ output_asm_insn ("lw\t%3,0(%3)", operands);
break;
default:
gcc_unreachable ();
}
- output_asm_insn ("<d>addu\t%4, %4, %5", operands);
+ output_asm_insn ("<d>addu\t%2,%2,%3", operands);
if (GENERATE_MIPS16E)
- return "jrc\t%4";
+ return "jrc\t%2";
else
- return "jr\t%4";
+ return "jr\t%2";
}
[(set (attr "insn_count")
(if_then_else (match_test "GENERATE_MIPS16E")
- (const_string "10")
- (const_string "11")))])
+ (const_string "6")
+ (const_string "7")))])
;; For TARGET_USE_GOT, we save the gp in the jmp_buf as well.
;; While it is possible to either pull it off the stack (in the
Index: gcc/gcc/testsuite/gcc.target/mips/data-sym-jump.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/mips/data-sym-jump.c 2017-06-06 21:53:32.995674428 +0100
+++ gcc/gcc/testsuite/gcc.target/mips/data-sym-jump.c 2017-06-06 21:53:48.642154539 +0100
@@ -25,7 +25,7 @@ frob (int i)
/* Expect assembly like:
- la $2, $L4
+ la $2,$L4
# Anything goes here.
.type __jump_frob_4, @object # Symbol # must match label.
__jump_frob_4: # The symbol must match.
@@ -47,4 +47,4 @@ __jend_frob_4: # The symbol must mat
that is `__jump_*'/`__jend_*' symbols inserted around a jump table. */
-/* { dg-final { scan-assembler "\tla\t\\\$\[0-9\]+, (.L(\[0-9\]+))\n.*\t\\.type\t(__jump_frob_\\2), @object\n\\3:\n\\1:\n(?:\t\\.(?:half|word)\t.L\[0-9\]+-\\1\n)\{11\}\t\\.type\t(__jend_frob_\\2), @function\n\\4:\n\t\\.insn\n" } } */
+/* { dg-final { scan-assembler "\tla\t\\\$\[0-9\]+,(.L(\[0-9\]+))\n.*\t\\.type\t(__jump_frob_\\2), @object\n\\3:\n\\1:\n(?:\t\\.(?:half|word)\t.L\[0-9\]+-\\1\n)\{11\}\t\\.type\t(__jend_frob_\\2), @function\n\\4:\n\t\\.insn\n" } } */
Index: gcc/gcc/testsuite/gcc.target/mips/pr51513-1.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ gcc/gcc/testsuite/gcc.target/mips/pr51513-1.c 2017-06-07 05:15:48.170762991 +0100
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-mips16 -mcode-readable=yes" } */
+
+/* PR tree-optimization/51513 verification variant for MIPS16, #1. */
+
+int __attribute__ ((weak))
+frob (int i)
+{
+ switch (i)
+ {
+ case -5:
+ return -2;
+ case -3:
+ return -1;
+ case 0:
+ return 0;
+ case 3:
+ return 1;
+ case 5:
+ break;
+ default:
+ __builtin_unreachable ();
+ }
+ return i;
+}
+
+/* Without the fix for PR tree-optimization/51513 truncated code
+ would be emitted for `frob', like:
+
+ .text
+ .align 2
+ .weak frob
+ .set mips16
+ .set nomicromips
+ .ent frob
+ .type frob, @function
+frob:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ addiu $2,$4,5
+ .end frob
+ .size frob, .-frob
+
+ meaning `frob' will have no chance to return, let alone produce
+ the result expected. */
+
+/* { dg-final { scan-assembler "\tjrc?\t\\\$31\n" } } */
Index: gcc/gcc/testsuite/gcc.target/mips/pr51513-2.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ gcc/gcc/testsuite/gcc.target/mips/pr51513-2.c 2017-06-07 05:15:48.187060805 +0100
@@ -0,0 +1,56 @@
+/* { dg-do run } */
+/* { dg-options "-mips16 -mcode-readable=yes" } */
+
+/* PR tree-optimization/51513 verification variant for MIPS16, #2. */
+
+int __attribute__ ((weak))
+frob (int i)
+{
+ switch (i)
+ {
+ case -5:
+ return -2;
+ case -3:
+ return -1;
+ case 0:
+ return 0;
+ case 3:
+ return 1;
+ case 5:
+ break;
+ default:
+ __builtin_unreachable ();
+ }
+ return i;
+}
+
+int
+main (void)
+{
+ return !(frob (-5) == -2
+ & frob (-3) == -1
+ & frob (0) == 0
+ & frob (3) == 1
+ & frob (5) == 5);
+}
+
+/* Without the fix for PR tree-optimization/51513 truncated code
+ would be emitted for `frob', like:
+
+ .text
+ .align 2
+ .weak frob
+ .set mips16
+ .set nomicromips
+ .ent frob
+ .type frob, @function
+frob:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ addiu $2,$4,5
+ .end frob
+ .size frob, .-frob
+
+ meaning `frob' will have no chance to return, let alone produce
+ the result expected. */