This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH] MIPS16/GCC: Emit bounds checking as RTL in `casesi'


Maciej Rozycki <Maciej.Rozycki@imgtec.com> writes:
>  Further to my changes made last November here is an update to the MIPS16
> `casesi' pattern making it emit bounds checking as RTL rather than having
> it as hardcoded assembly within the `casesi_internal_mips16_<mode>'
> dispatcher.  See below for how PR tree-optimization/51513 has prevented me
> from proceeding back then.
> 
>  This new arrangement has several advantages:
> 
> 1. There is no hardcoded BTEQZ branch instruction that has a limited span
>    and can overflow causing an assembly failure if the target label is too
>    distant.  GCC is able to relax out of range MIPS16 branches these days,
>    but obviously they have to be expressed in RTL rather than buried in
>    assembly code.  This overflow can be easily reproduced; please enquire
>    for a boring example if interested.
> 
> 2. The `casesi_internal_mips16_<mode>' pattern now has an accurate length
>    (aka instruction count) recorded as all its remaining emitted assembly
>    instructions are known in advance to fit in their regular (16-bit)
>    machine encodings.  Previously there was uncertainty about the SLTU and
>    BTEQZ instructions used for the bounds check, which depending on their
>    exact arguments could have each resulted in their either regular
>    (16-bit) or extended (32-bit) encoding.  Consequently the worst length
>    estimate was recorded instead, possibly causing worse code generation
>    (e.g. premature out-of-range branch relaxation or jump table expansion
>    from half-words to full words).
> 
> 3. GCC now has freedom to schedule code around bounds checking as it sees
>    fit rather than having to adapt to the fixed assembly block.  In fact
>    it tends to make use of it right away swapping BTEQZ for the BTNEZ
>    instruction and rescheduling code such that the out-of-bounds (default)
>    case executes linearly.
> 
> There are probably more benefits, but these are what has immediately come
> to my mind.

Sounds good, I certainly agree that we will see the benefits you list above.

>  As noted above I meant to propose this change along with the rest so as
> to have it in GCC 7, however emitting the bounds check as a separate RTL
> pattern triggered an unrelated bug, then unknown to me, causing a fatal
> code generation regression, and the lack of time did not allow me to
> investigate it further.  This was easily reproduced with a piece of code
> (reduced from actual Linux kernel code) like this:
> 
> $ cat frob.c
> int
> frob (int i)
> {
>   switch (i)
>     {
>     case -5:
>       return -2;
>     case -3:
>       return -1;
>     case 0:
>       return 0;
>     case 3:
>       return 1;
>     case 5:
>       break;
>     default:
>       __builtin_unreachable ();
>     }
>   return i;
> }
> 
> producing truncated assembly like this:
> 
> $ gcc -O2 -mips16 -mcode-readable=yes -dp -S frob.c
> $ cat frob.s
> 	.file	1 "frob.c"
> 	.section .mdebug.abi32
> 	.previous
> 	.nan	legacy
> 	.module	fp=32
> 	.module	nooddspreg
> 	.abicalls
> 	.option	pic0
> 	.text
> 	.align	2
> 	.weak	frob
> 	.set	mips16
> 	.set	nomicromips
> 	.ent	frob
> 	.type	frob, @function
> frob:
> 	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
> 	.mask	0x00000000,0
> 	.fmask	0x00000000,0
> 	addiu	$2,$4,5	 # 11	*addsi3_mips16/7	[length = 2]
> 	.end	frob
> 	.size	frob, .-frob
> 	.ident	"GCC: (GNU) 7.0.0 20161117 (experimental)"
> $
> 
> -- where as you can see the whole switch statement has vanished along with
> any return path from the function, and only the preparatory addition
> emitted from `casesi' with:
> 
>       emit_insn (gen_addsi3 (reg, operands[0], offset));
> 
> remained.
> 
>  The causing bug has turned out to be what was filed as PR
> tree-optimization/51513 and has been kindly fixed by Peter recently
> (thanks, Peter!) with r247844 ("Fix PR51513, switch statement with default
> case containing __builtin_unreachable leads to wild branch"),
> <https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00739.html>, enabling me to
> proceed with this change without having to investigate the cause of code
> breakage -- which for the MIPS16 target has clearly turned out to be
> graver than a mere silly branch never to be executed.
> 
>  Given the previous troubles with this change I have decided to add
> MIPS16 test cases to verify that code truncation has not happened,
> complementing gcc.target/powerpc/pr51513.c, in case further tweaks in this
> area might do something bad.  This would be caught by
> gcc.target/mips/insn-casesi.c added with r242424, but that test case does
> not refer to PR tree-optimization/51513, so let's make it explicit.  With
> the PR tree-optimization/51513 fix removed the two new cases indeed cause:
> 
> FAIL: gcc.target/mips/pr51513-1.c   -O2   scan-assembler \tjrc?\t\\$31\n
> FAIL: gcc.target/mips/pr51513-1.c   -O3 -g   scan-assembler \tjrc?\t\\$31\n
> FAIL: gcc.target/mips/pr51513-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none
> scan-assembler \tjrc?\t\\$31\n
> FAIL: gcc.target/mips/pr51513-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> scan-assembler \tjrc?\t\\$31\n
> FAIL: gcc.target/mips/pr51513-2.c   -O2  execution test
> FAIL: gcc.target/mips/pr51513-2.c   -O3 -g  execution test
> FAIL: gcc.target/mips/pr51513-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none
> execution test
> 
> and overall (when executed individually):
> 
>                 === gcc Summary ===
> 
> # of expected passes            21
> # of unexpected failures        7
> 
>  Also since all the assembly templates in `casesi_internal_mips16_<mode>'
> must have been changed due to the rearrangement of operands, I have
> decided to make their formatting consistent with our templates used
> elsewhere, that is to remove the space between `,' and the following
> operand.  This in turn required a small tweak to
> gcc.target/mips/data-sym-jump.c.  I hope it is fine to have this along
> with the rest as a single change.
> 
>  As a further update we could probably emit assembly instructions
> currently produced with `casesi_internal_mips16_<mode>' individually as
> RTL instructions, however the result would likely not be as spectacular;
> certainly not as the branch overflow fix is, though it could possibly
> enable some optimisations in certain cases.  So I'm leaving it as an
> exercise for whoever feels compelled to do it.

The only benefit would be if something could fill the load latency between
obtaining the offset and adding to the base which seems low probability.

> 	gcc/
> 	* config/mips/mips.md (MIPS16_T_REGNUM): Remove constant.
> 	(casesi): Emit bounds checking as RTL.
> 	(casesi_internal_mips16_<mode>): Remove bounds checking.
> 
> 	gcc/testsuite/
> 	* gcc.target/mips/data-sym-jump.c: Adjust for whitespace
> 	changes.
> 	* gcc.target/mips/pr51513-1.c: New test.
> 	* gcc.target/mips/pr51513-2.c: New test.
> 
>  No regressions in o32 little-endian MIPS16 multilib testing.  OK to
> apply?

OK.

Thanks for the intricate commentary on this patch, makes it very easy to
review.

Thanks,
Matthew


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]