81541 – Potential size optimisation: reusing common function endings

Bug 81541 - Potential size optimisation: reusing common function endings

Summary: Potential size optimisation: reusing common function endings

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	ipa (show other bugs)
Version:	8.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	icf
	Show dependency tree / graph

Reported:	2017-07-24 20:28 UTC by jpakkane
Modified:	2021-12-25 11:54 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-12-25 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description jpakkane 2017-07-24 20:28:32 UTC

In embedded development code size optimizations are very important and there are some optimizations that commercial compilers do but GCC does not seem to. As an example compiling the following code:

int func5() {
  int i = 0;
  i+=func2();
  i+=func3();
  i+=func4();
  return i+func1();
}

int func6() {
  int i=1;
  i+=func3();
  i+=func3();
  i+=func4();
  return i+func1();
}

On ARM using -Os produces the following assembly:

func5():
        push    {r4, lr}
        bl      func2()
        mov     r4, r0
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr
func6():
        push    {r4, lr}
        bl      func3()
        add     r4, r0, #1
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr

Note how the ends of both of these functions have identical endings. This could be converted to the following:

func5():
        push    {r4, lr}
        bl      func2()
        mov     r4, r0
common_tail:
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr
func6():
        push    {r4, lr}
        bl      func3()
        add     r4, r0, #1
        b common_tail

which has smaller code size.

I did a simple experiment and based on that this might not be all that useful on x86: http://nibblestew.blogspot.com/2017/07/experiment-binary-size-reduction-by.html

On ARM and similar platforms this would probably be beneficial, especially given that commercial compilers perform this optimization.

Comment 1 Richard Biener 2017-07-25 07:43:47 UTC

Both crossjumping (on RTL) and tail-merging on GIMPLE perform the desired transform inside a function.  You are looking for the IPA side of this
(which only can common whole functions at the moment), basically function
part outlining plus ICF.

Comment 2 Martin Liška 2017-07-26 11:28:38 UTC

Thanks for filling this, it will probably need some sharing of infrastructure of current IPA split and IPA ICF. I will read the post and think about it more.