Serious code size regression from 3.0.2 to now part two

Mon Jul 29 21:12:00 GMT 2002

On Fri, 26 Jul 2002, Joern Rennecke wrote:

> tm wrote:
> > 
> > Okay, I've started using -fno-reorder-blocks on my testcase map_fog.i, and
> > the code size is still about 10% worse than 3.0.x.
> > 
> > I think I've tracked this down to really bad branches being generated by
> > gcc. Take a look at this code sequence:
> > 
> >  5124                   .L320:
> >  5125 2a02 4011                 cmp/pz  r0
> >  5126 2a04 8F0C                 bf/s    .L322
> >  5127 2a06 6813                 mov     r1,r8
> >  5128 2a08 9226                 mov.w   .L683,r2
> >  5129 2a0a 3027                 cmp/gt  r2,r0
> >  5130 2a0c 8F09                 bf/s    .L323
> >  5131 2a0e 6103                 mov     r0,r1
> >  5132 2a10 A007                 bra     .L323
> >  5133 2a12 6123                 mov     r2,r1
> >  5134 2a14 00090009             .align 5
> >  5134      00090009
> >  5134      00090009
> >  5135                   .L322:
> >  5136 2a20 E100                 mov     #0,r1
> >  5137                   .L323:
> >  5138 2a22 6013                 mov     r1,r0
> >  5139 2a24 4818                 shll8   r8
> > ..
> > 
> > This is really twisted branch logic.
> 
> It appears the code is geared towards the r0 < 0 case.  Assuming both
> r1 and r0 need to contain the result, optimized code would be:
> would be:
>  cmpz/pz r0
>  mov r1,r8
>  bf/s .L322
>  mov #0,r1
>  mov.w .L683,r2
>  mov r0,r1
>  cmp/gt r2,r0
>  bf L322
>  mov r2,r1
> L323:
>  mov r1,r0
> L322:

I've tracked down the problem, and it appears jump_optimize() in 3.0.4 did
a really good job of optimizing the banches.

For this sample:

insn 7680 7679 7681 (set (reg:SI 18 t)
        (ge:SI (reg/v:SI 64)
            (const_int 0 [0x0]))) -1 (nil)
    (nil))

(jump_insn 7681 7680 7685 (set (pc)
        (if_then_else (eq (reg:SI 18 t)
                (const_int 0 [0x0]))
            (label_ref 7695)
            (pc))) -1 (nil)
    (nil))

(insn 7685 7681 7687 (set (reg:SI 3207)
        (reg/v:SI 64)) -1 (nil)
    (nil))

(insn 7687 7685 7688 (set (reg:SI 3209)
        (const_int 255 [0xff])) -1 (nil)
    (expr_list:REG_EQUAL (const_int 255 [0xff])
        (nil)))

(insn 7688 7687 7689 (set (reg:SI 18 t)
        (gt:SI (reg:SI 3207)
            (reg:SI 3209))) -1 (nil)
    (nil))

(jump_insn 7689 7688 7691 (set (pc)
        (if_then_else (eq (reg:SI 18 t)
                (const_int 0 [0x0]))
            (label_ref 7692)
            (pc))) -1 (nil)
    (nil))

(insn 7691 7689 7692 (set (reg:SI 3207)
        (const_int 255 [0xff])) -1 (nil)
    (nil))

(code_label 7692 7691 7693 324 "" "" [0 uses])

(jump_insn 7693 7692 7694 (set (pc)
        (label_ref 7698)) -1 (nil)
    (nil))

(barrier 7694 7693 7695)

(code_label 7695 7694 7697 322 "" "" [0 uses])

(insn 7697 7695 7698 (set (reg:SI 3207)
        (const_int 0 [0x0])) -1 (nil)
    (nil))

(code_label 7698 7697 7700 323 "" "" [0 uses])

...it appears to do ten passes over the function, and hoists insn 7697 to
a different location which simplifies jump_insn 7693 to a "jump next
instruction" which is immediately deleted.

It appears cfg_cleanup() in GCC cvs is the replacement for
jump_optimize() and it does not appear to optimize as well as its
predecessor, since it fails to simplify the branching.

Is it reasonable to expect cfg_cleanup to perform this optimization?

Toshi