Bug 60537

Summary: Loop header copying code bloat for simple loops that don't benefit
Product: gcc Reporter: Oleg Endo <olegendo>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: enhancement CC: hp
Priority: P3 Keywords: missed-optimization
Version: 4.9.0   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190
Host: Target: sh*-*-*
Build: Known to work:
Known to fail: Last reconfirmed: 2014-03-17 00:00:00

Description Oleg Endo 2014-03-15 13:10:14 UTC
I have noticed this on SH, maybe it also applies to other targets (checked on 4.9 r208241).

The following simple loop (simple strlen implementation):

unsigned int test (const char* s0)
{
  const char* s1 = s0;

  while (*s1)
    s1++;

  return s1 - s0;
}

With -O2 -m4 gets compiled to:

        mov.b   @r4,r1
        tst     r1,r1
        bt/s    .L4
        mov     r4,r1
        add     #1,r1
	.align 2
.L3:
        mov     r1,r0
        mov.b   @r0,r2
        tst     r2,r2
        bf/s    .L3
        add     #1,r1
        rts
        sub     r4,r0
        .align 1
.L4:
        rts
        mov     #0,r0


With -Os -m4 it is basically just the inner loop:
        mov	r4,r1
.L2:
        mov     r1,r0
        mov.b   @r0,r2
        tst     r2,r2
        bf/s    .L2
        add     #1,r1
        rts
        sub     r4,r0


The additional loop test in the loop header in the -O2 version seems a bit pointless.  If the loop exists at the first iteration, it simply falls through.  The additional test and jump around the loop doesn't gain anything in this case but just increases code size unnecessarily.
Comment 1 Richard Biener 2014-03-17 09:43:36 UTC
For -O2 we do this to enable loop optimizations which almost all require
do { } while style loops.  This canonicalization can sometimes peel an
entire iteration as you can see here, and this canonicalization is 
not done at -Os unless the loop is determined as hot (so with -Os
and profile-feedback some loops may get this treatment).

It's hard to undo this transform but that's what would be needed here ...
(or make more passes deal with number-of-iterations == n or zero)