Bug 81541 - Potential size optimisation: reusing common function endings
Summary: Potential size optimisation: reusing common function endings
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 8.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: icf
  Show dependency treegraph
 
Reported: 2017-07-24 20:28 UTC by jpakkane
Modified: 2021-12-25 11:54 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jpakkane 2017-07-24 20:28:32 UTC
In embedded development code size optimizations are very important and there are some optimizations that commercial compilers do but GCC does not seem to. As an example compiling the following code:

int func5() {
  int i = 0;
  i+=func2();
  i+=func3();
  i+=func4();
  return i+func1();
}

int func6() {
  int i=1;
  i+=func3();
  i+=func3();
  i+=func4();
  return i+func1();
}

On ARM using -Os produces the following assembly:

func5():
        push    {r4, lr}
        bl      func2()
        mov     r4, r0
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr
func6():
        push    {r4, lr}
        bl      func3()
        add     r4, r0, #1
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr

Note how the ends of both of these functions have identical endings. This could be converted to the following:

func5():
        push    {r4, lr}
        bl      func2()
        mov     r4, r0
common_tail:
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr
func6():
        push    {r4, lr}
        bl      func3()
        add     r4, r0, #1
        b common_tail

which has smaller code size.

I did a simple experiment and based on that this might not be all that useful on x86: http://nibblestew.blogspot.com/2017/07/experiment-binary-size-reduction-by.html

On ARM and similar platforms this would probably be beneficial, especially given that commercial compilers perform this optimization.
Comment 1 Richard Biener 2017-07-25 07:43:47 UTC
Both crossjumping (on RTL) and tail-merging on GIMPLE perform the desired transform inside a function.  You are looking for the IPA side of this
(which only can common whole functions at the moment), basically function
part outlining plus ICF.
Comment 2 Martin Liška 2017-07-26 11:28:38 UTC
Thanks for filling this, it will probably need some sharing of infrastructure of current IPA split and IPA ICF. I will read the post and think about it more.