This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[patch, arm] Minor optimization on thumb2 tail call
- From: "Joey Ye" <joey dot ye at arm dot com>
- To: <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 19 Nov 2014 10:43:00 +0800
- Subject: [patch, arm] Minor optimization on thumb2 tail call
- Authentication-results: sourceware.org; auth=none
Current thumb2 -Os generates suboptimal code for following tail call case:
int f4(int b, int a, int c, int d);
int g(int a, int b, int c, int d)
{ return f4(b, a, c, d); }
arm-none-eabi-gcc -Os -mthumb -mcpu=cortex-m3 test.c
push
{r4, lr}
mov r4, r1
mov r1, r0
mov r0, r4
pop {r4, lr}
b f4
There are two issues: The first one is that saving/restoring lr is not
necessary, as there is no return via pop pc. The second one is that even if
we managed to avoid lr push/pop, ldmia.w sp!, {r4} is still emitted as there
is a missing pattern for pop single and code size is not optimal.
This patch fixes these two issues and introduces a shared test case. CSiBE
thumb2 -Os shows cross board code size reduction, except for one case with 4
bytes regression. The case is like:
void f ()
{
if ()
...
else if ()
...
else g();
}
There are N=2 non-sibcall returns and S=1 sibcall return. Originally the
non-sibcall returns are just pop {r4, r5, pc}, now they become
b.n .Lreturn
.Lreturn:
pop {r4, r5}
bx lr
The one byte save from sibcall return does not win the non-sibcall return
regressions back. In general scenario, number of N non-sibcall returns use
b.n branching to merged tail, number of S sibcalls save 2 bytes by avoid
poping lr. It results in 4-2*S bytes regression. In the worst scenario, each
non-sibcall return has to use b.w branching to merged tail, resulting in
(N-S)*2 bytes regression. The worst scenario is rare, according to CSiBE.
The general regression scenario can only regress 2 bytes at most. So I would
not introduce additional complexity to handle the regression case.
Make check cortex-m3: pass
thumb2 bootstrap (O2/Os): pass
* config/arm/arm.c (arm_compute_save_reg_mask):
Do not save lr in case of tail call.
* config/arm/thumb2.md (*thumb2_pop_single): New pattern.
* gcc.target/arm/thumb2-pop-single.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4f04707..20d0b9e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19190,6 +19190,7 @@ arm_compute_save_reg_mask (void)
|| (save_reg_mask
&& optimize_size
&& ARM_FUNC_TYPE (func_type) == ARM_FT_NORMAL
+ && !crtl->tail_call_emit
&& !crtl->calls_eh_return))
save_reg_mask |= 1 << LR_REGNUM;
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 64acfea..29cfb17 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -267,6 +267,17 @@
(set_attr "type" "multiple")]
)
+;; Pop a single register as its size is preferred over a post-incremental
load
+(define_insn "*thumb2_pop_single"
+ [(set (match_operand:SI 0 "low_register_operand" "=r")
+ (mem:SI (post_inc:SI (reg:SI SP_REGNUM))))]
+ "TARGET_THUMB2 && (reload_in_progress || reload_completed)"
+ "pop\t{%0}"
+ [(set_attr "type" "load1")
+ (set_attr "length" "2")
+ (set_attr "predicable" "yes")]
+)
+
;; We have two alternatives here for memory loads (and similarly for
stores)
;; to reflect the fact that the permissible constant pool ranges differ
;; between ldr instructions taking low regs and ldr instructions taking
high