41004 – missed merge of basic blocks

Bug 41004 - missed merge of basic blocks

Summary: missed merge of basic blocks

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.5.0

Importance:	P3 normal
Target Milestone:	4.8.0
Assignee:	Steven Bosscher

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2009-08-08 00:09 UTC by Carrot
Modified:	2023-12-28 00:06 UTC (History)
CC List:	4 users (show)

See Also:	40730 54364
Host:	i686-linux
Target:	arm-eabi
Build:	i686-linux
Known to work:
Known to fail:	4.8.0
Last reconfirmed:	2009-08-08 10:06:54

Attachments
test case (208 bytes, text/plain) 2009-08-08 00:10 UTC, Carrot	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Carrot 2009-08-08 00:09:51 UTC

Compile the attached source code with options -Os -march=armv5te -mthumb
Gcc generates following code snippet:

      ...
       cmp     r0, r2
       bne     .L5
       b       .L15                            <--- A
.L9:
       ldr     r3, [r1]
       cmp     r3, #0
       beq     .L7
       str     r0, [r1, #8]
       b       .L8
.L7:
       str     r3, [r1, #8]
.L8:
       ldr     r1, [r1, #4]
       b       .L12                          <---- C
.L15:
       mov     r0, #1                       <--- B
.L12:
       cmp     r1, r2                       <---- D
       bne     .L9
       ...

inst A jump to B then fall through to D
inst C jump to D

there is no other instructions jump to instruction B, so we can put inst B just before A, then A jump to D, and C can be removed.

There are two possible functions can potentially do this optimization. They are merge_blocks_move and try_forward_edges. 

Function try_forward_edges can only redirect a series of forwarder blocks. It can't move the target blocks before the forwarder blocks.

In function merge_blocks_move only when both block b and c aren't forwarder blocks then can they be merged. In this case block A is a forwarder block, so they are not merged.

Comment 1 Carrot 2009-08-08 00:10:23 UTC

Created attachment 18326 [details]
test case

Comment 2 Steven Bosscher 2009-08-08 09:07:04 UTC

Why does the basic block reordering pass also not handle this?

Comment 3 Richard Biener 2009-08-08 10:36:18 UTC

I would suggest to closely look at SSA expansion as well.  There is I think
still the regression from pre-SSA expand that it doesn't expand PHI nodes
optimally at -Os (pre-SSA expand split common PHI args into extra blocks to
minimize copy instructions which is something that can be done as a tree
level transform).  Like

BB5:
 # a_1 = PHI <a_2(2), a_2(3), b_3(4)>

to

BB6:
 # tmp_4 = PHI <a_2(2), a_2(3)>

BB5:
 # a_1 = PHI <tmp_4(6), b_3(4)>

which we usually don't want and in fact undo during mergephi.  But it needs
one less copy if a_2 and tmp_4 can be coalesced together.

Comment 4 Carrot 2009-08-19 21:55:39 UTC

(In reply to comment #2)
> Why does the basic block reordering pass also not handle this?
> 

Basic block reordering is disabled with options -Os. 

The basic block reordering algorithm is for performance only, it usually increases code size. So it won't be called when do optimization for size. But for this specific case, the extra branch can be removed when I compile it with -O2.

Comment 5 Richard Henderson 2009-08-19 23:06:51 UTC

Basic block reordering *should* be altered to work with -Os.
It's definitely required to clean up silly jump sequences that
exception handling creates.

There's no reason, in principal, that the ordering algorithm
can't be adjusted to minimize the number of branches, or the
total size of all of the branch instructions (though that could
require some clever changes to the shorten_branches code).

Comment 6 Steven Bosscher 2010-01-11 09:35:51 UTC

Re. comment #5 -- rth, any suggestions what an algorithm would look like to "minimize the number of branches, or the total size of all of the branch instructions"?  And what do you mean with "some clever changes to the shorten_branches code"?  Did you mean the calculation of insn sizes, or avoiding padding, or something else completely?  Hope you can help a bit...

Comment 7 Andrew Pinski 2012-02-04 03:00:33 UTC

First off, I think cselim should also be run at -Os.
And then we have the issue of tree-ssa-tail-merge not running at -Os.

Comment 8 Steven Bosscher 2012-03-16 23:59:41 UTC

Ehm, why does tree tail-merge not run at -Os? It's a size optimization, after all!

Comment 9 Steven Bosscher 2012-11-08 23:07:36 UTC

May be fixed by the patch that was applied for PR54364.

Comment 10 Steven Bosscher 2012-11-09 21:38:39 UTC

Resulting code from trunk r193358

primal_update_flow:
        push    {r4, r5, lr}
        mov     r3, #1
        mov     r4, #0
.L2:
        cmp     r0, r2
        beq     .L11
        ldr     r5, [r0]
        cmp     r5, #0
        beq     .L3
        str     r4, [r0, #8]
        b       .L4
.L3:
        str     r3, [r0, #8]
.L4:
        ldr     r0, [r0, #4]
        b       .L2
.L11:
        mov     r0, #1
.L6:
        cmp     r1, r2
        beq     .L12
        ldr     r3, [r1]
        cmp     r3, #0
        beq     .L7
        str     r0, [r1, #8]
        b       .L8
.L7:
        str     r3, [r1, #8]
.L8:
        ldr     r1, [r1, #4]
        b       .L6
.L12:   
        @ sp needed
        pop     {r4, r5, pc}

So the issue of comment #0 is fixed.

With "-march=armv7 -mthumb" the stores to "[r0,#8]" and "[r0,#8]" are
properly if-converted, I don't see anything that could be improved
further.  So, fixed.