Bug 60537 - Loop header copying code bloat for simple loops that don't benefit
Summary: Loop header copying code bloat for simple loops that don't benefit
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2014-03-15 13:10 UTC by Oleg Endo
Modified: 2021-11-29 08:14 UTC (History)
1 user (show)

See Also:
Host:
Target: sh*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-03-17 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Endo 2014-03-15 13:10:14 UTC
I have noticed this on SH, maybe it also applies to other targets (checked on 4.9 r208241).

The following simple loop (simple strlen implementation):

unsigned int test (const char* s0)
{
  const char* s1 = s0;

  while (*s1)
    s1++;

  return s1 - s0;
}

With -O2 -m4 gets compiled to:

        mov.b   @r4,r1
        tst     r1,r1
        bt/s    .L4
        mov     r4,r1
        add     #1,r1
	.align 2
.L3:
        mov     r1,r0
        mov.b   @r0,r2
        tst     r2,r2
        bf/s    .L3
        add     #1,r1
        rts
        sub     r4,r0
        .align 1
.L4:
        rts
        mov     #0,r0


With -Os -m4 it is basically just the inner loop:
        mov	r4,r1
.L2:
        mov     r1,r0
        mov.b   @r0,r2
        tst     r2,r2
        bf/s    .L2
        add     #1,r1
        rts
        sub     r4,r0


The additional loop test in the loop header in the -O2 version seems a bit pointless.  If the loop exists at the first iteration, it simply falls through.  The additional test and jump around the loop doesn't gain anything in this case but just increases code size unnecessarily.
Comment 1 Richard Biener 2014-03-17 09:43:36 UTC
For -O2 we do this to enable loop optimizations which almost all require
do { } while style loops.  This canonicalization can sometimes peel an
entire iteration as you can see here, and this canonicalization is 
not done at -Os unless the loop is determined as hot (so with -Os
and profile-feedback some loops may get this treatment).

It's hard to undo this transform but that's what would be needed here ...
(or make more passes deal with number-of-iterations == n or zero)