This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, MIPS] fix MIPS16 jump table overflow
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: Sandra Loosemore <sandra at codesourcery dot com>
- Cc: <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 21 Aug 2012 21:23:48 +0100
- Subject: Re: [PATCH, MIPS] fix MIPS16 jump table overflow
- References: <5032EAAA.2070703@codesourcery.com>
Sandra Loosemore <sandra@codesourcery.com> writes:
> In config/mips/mips.h, there is presently this comment:
>
> /* ??? 16-bit offsets can overflow in large functions. */
> #define TARGET_MIPS16_SHORT_JUMP_TABLES TARGET_MIPS16_TEXT_LOADS
>
> A while ago we had a bug report where a big switch statement did, in
> fact, overflow the range of 16-bit offsets, causing a runtime error.
>
> GCC already has generic machinery to shorten offset tables for switch
> statements that does the necessary range checking, but it only works
> with "casesi", not the lower-level "tablejump" expansion. So, this
> patch provides a "casesi" expander to handle this situation.
Nice.
> This patch has been in use on our local 4.5 and 4.6 branches for about a
> year now. When testing it on mainline, I found it tripped over the
> recent change to add MIPS16 branch overflow checking in other
> situations, causing it to get into an infinite loop. I think telling it
> to ignore these new jump insns it doesn't know how to process is the
> right thing to do, but I'm not sure if there's a better way to restrict
> the condition or make mips16_split_long_branches more robust. Richard,
> since that's your code I assume you'll suggest an alternative if this
> doesn't meet with your approval.
Changing it to:
if (JUMP_P (insn)
&& USEFUL_INSN_P (insn)
&& get_attr_length (insn) > 8
&& (any_condjump_p (insn) || any_uncond_jump_p (insn)))
should be OK.
> @@ -5937,6 +5933,91 @@
> [(set_attr "type" "jump")
> (set_attr "mode" "none")])
>
> +;; For MIPS16, we don't know whether a given jump table will use short or
> +;; word-sized offsets until late in compilation, when we are able to determine
> +;; the sizes of the insns which comprise the containing function. This
> +;; necessitates the use of the casesi rather than the tablejump pattern, since
> +;; the latter tries to calculate the index of the offset to jump through early
> +;; in compilation, i.e. at expand time, when nothing is known about the
> +;; eventual function layout.
> +
> +(define_expand "casesi"
> + [(match_operand:SI 0 "register_operand" "") ; index to jump on
> + (match_operand:SI 1 "const_int_operand" "") ; lower bound
> + (match_operand:SI 2 "const_int_operand" "") ; total range
> + (match_operand:SI 3 "" "") ; table label
> + (match_operand:SI 4 "" "")] ; out of range label
The last two are Pmode rather than SImode. Since there aren't different
case* patterns for different Pmodes, we can't use :P instead, so let's just
drop the modes on 4 and 5.
Would be nice to add a compile test for -mabi=64 just to make sure
that Pmode == DImode works. A copy of an existing test like
code-readable-1.c would be fine.
> +(define_insn "casesi_internal_mips16"
> + [(set (pc)
> + (if_then_else
> + (leu (match_operand:SI 0 "register_operand" "d")
> + (match_operand:SI 1 "arith_operand" "dI"))
> + (mem:SI (plus:SI (mult:SI (match_dup 0) (const_int 4))
> + (label_ref (match_operand 2 "" ""))))
> + (label_ref (match_operand 3 "" ""))))
> + (clobber (match_scratch:SI 4 "=&d"))
> + (clobber (match_scratch:SI 5 "=d"))
> + (clobber (reg:SI MIPS16_T_REGNUM))
> + (use (label_ref (match_dup 2)))]
Although this is descriptive, the MEM is probably more trouble
than it's worth. A hard-coded MEM like this will alias a lot
of things, whereas we're only reading from the function itself.
I think an unspec would be better.
This pattern should have :P for operands 4 and 5, with the pattern
name becoming:
"casesi_internal_mips16_<mode>"
PMODE_INSN should make it easy to wrap up the difference.
There shouldn't be any need for the final USE. Let me know
if you found otherwise, because that sounds like a bug.
> + "TARGET_MIPS16_SHORT_JUMP_TABLES"
> +{
> + rtx diff_vec = PATTERN (next_real_insn (operands[2]));
> +
> + gcc_assert (GET_CODE (diff_vec) == ADDR_DIFF_VEC);
> +
> + output_asm_insn ("sltu\t%0, %1", operands);
> + output_asm_insn ("bteqz\t%3", operands);
> + output_asm_insn ("la\t%4, %2", operands);
> +
> + switch (GET_MODE (diff_vec))
> + {
> + case HImode:
> + output_asm_insn ("sll\t%5, %0, 1", operands);
> + output_asm_insn ("addu\t%5, %4, %5", operands);
> + output_asm_insn ("lh\t%5, 0(%5)", operands);
> + break;
> +
> + case SImode:
> + output_asm_insn ("sll\t%5, %0, 2", operands);
> + output_asm_insn ("addu\t%5, %4, %5", operands);
> + output_asm_insn ("lw\t%5, 0(%5)", operands);
> + break;
> +
> + default:
> + gcc_unreachable ();
> + }
> +
> + output_asm_insn ("addu\t%4, %4, %5", operands);
> +
> + return "j\t%4";
> +}
> + [(set_attr "length" "32")])
The "addu"s here ought to be "<d>addu"s after the :P change.
I think we can avoid the earlyclobber on operand 4 by moving the LA
after the SLL.
> +#define CASE_VECTOR_MODE ptr_mode
> +
> +/* Only use short offsets if their range will not overflow. */
> +#define CASE_VECTOR_SHORTEN_MODE(MIN, MAX, BODY) \
> + (TARGET_MIPS16_SHORT_JUMP_TABLES && ((MIN) >= -32768 && (MAX) < 32768) \
> + ? HImode : ptr_mode)
ptr_mode should be SImode here, to match the switch above.
We don't even begin to support functions bigger than 2GB. :-)
> Index: gcc/testsuite/gcc.target/mips/code-readable-1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/mips/code-readable-1.c (revision 190463)
> +++ gcc/testsuite/gcc.target/mips/code-readable-1.c (working copy)
> @@ -25,7 +25,7 @@ bar (void)
> }
>
> /* { dg-final { scan-assembler "\tla\t" } } */
> -/* { dg-final { scan-assembler "\t\\.half\t" } } */
> +/* { dg-final { scan-assembler "\t\\.\(half\|word\)\t\\.L\[0-9\]*-\\.L\[0-9\]\*" } } */
> /* { dg-final { scan-assembler-not "%hi\\(\[^)\]*L" } } */
> /* { dg-final { scan-assembler-not "%lo\\(\[^)\]*L" } } */
>
I'd prefer to add -O and stick to requiring .half.
We call shorten_branches several times, so I was worried that we
might be shortening the cases prematurely. It looks like later
calls are already able to undo the shortening though, so hopefully
that isn't a problem.
Richard