Serious code size regression from 3.0.2 to now part two
tm
tm@mail.kloo.net
Mon Jul 29 21:12:00 GMT 2002
On Fri, 26 Jul 2002, Joern Rennecke wrote:
> tm wrote:
> >
> > Okay, I've started using -fno-reorder-blocks on my testcase map_fog.i, and
> > the code size is still about 10% worse than 3.0.x.
> >
> > I think I've tracked this down to really bad branches being generated by
> > gcc. Take a look at this code sequence:
> >
> > 5124 .L320:
> > 5125 2a02 4011 cmp/pz r0
> > 5126 2a04 8F0C bf/s .L322
> > 5127 2a06 6813 mov r1,r8
> > 5128 2a08 9226 mov.w .L683,r2
> > 5129 2a0a 3027 cmp/gt r2,r0
> > 5130 2a0c 8F09 bf/s .L323
> > 5131 2a0e 6103 mov r0,r1
> > 5132 2a10 A007 bra .L323
> > 5133 2a12 6123 mov r2,r1
> > 5134 2a14 00090009 .align 5
> > 5134 00090009
> > 5134 00090009
> > 5135 .L322:
> > 5136 2a20 E100 mov #0,r1
> > 5137 .L323:
> > 5138 2a22 6013 mov r1,r0
> > 5139 2a24 4818 shll8 r8
> > ..
> >
> > This is really twisted branch logic.
>
> It appears the code is geared towards the r0 < 0 case. Assuming both
> r1 and r0 need to contain the result, optimized code would be:
> would be:
> cmpz/pz r0
> mov r1,r8
> bf/s .L322
> mov #0,r1
> mov.w .L683,r2
> mov r0,r1
> cmp/gt r2,r0
> bf L322
> mov r2,r1
> L323:
> mov r1,r0
> L322:
I've tracked down the problem, and it appears jump_optimize() in 3.0.4 did
a really good job of optimizing the banches.
For this sample:
insn 7680 7679 7681 (set (reg:SI 18 t)
(ge:SI (reg/v:SI 64)
(const_int 0 [0x0]))) -1 (nil)
(nil))
(jump_insn 7681 7680 7685 (set (pc)
(if_then_else (eq (reg:SI 18 t)
(const_int 0 [0x0]))
(label_ref 7695)
(pc))) -1 (nil)
(nil))
(insn 7685 7681 7687 (set (reg:SI 3207)
(reg/v:SI 64)) -1 (nil)
(nil))
(insn 7687 7685 7688 (set (reg:SI 3209)
(const_int 255 [0xff])) -1 (nil)
(expr_list:REG_EQUAL (const_int 255 [0xff])
(nil)))
(insn 7688 7687 7689 (set (reg:SI 18 t)
(gt:SI (reg:SI 3207)
(reg:SI 3209))) -1 (nil)
(nil))
(jump_insn 7689 7688 7691 (set (pc)
(if_then_else (eq (reg:SI 18 t)
(const_int 0 [0x0]))
(label_ref 7692)
(pc))) -1 (nil)
(nil))
(insn 7691 7689 7692 (set (reg:SI 3207)
(const_int 255 [0xff])) -1 (nil)
(nil))
(code_label 7692 7691 7693 324 "" "" [0 uses])
(jump_insn 7693 7692 7694 (set (pc)
(label_ref 7698)) -1 (nil)
(nil))
(barrier 7694 7693 7695)
(code_label 7695 7694 7697 322 "" "" [0 uses])
(insn 7697 7695 7698 (set (reg:SI 3207)
(const_int 0 [0x0])) -1 (nil)
(nil))
(code_label 7698 7697 7700 323 "" "" [0 uses])
...it appears to do ten passes over the function, and hoists insn 7697 to
a different location which simplifies jump_insn 7693 to a "jump next
instruction" which is immediately deleted.
It appears cfg_cleanup() in GCC cvs is the replacement for
jump_optimize() and it does not appear to optimize as well as its
predecessor, since it fails to simplify the branching.
Is it reasonable to expect cfg_cleanup to perform this optimization?
Toshi
More information about the Gcc-bugs
mailing list