Bug 65483

Summary: bzip2 bsR/bsW should be auto-inlined
Product: gcc Reporter: Jan Hubicka <hubicka>
Component: ipaAssignee: Not yet assigned to anyone <unassigned>
Status: UNCONFIRMED ---    
Severity: normal CC: izamyatin, jeffreyalaw, rguenth, rth
Priority: P3 Keywords: missed-optimization
Version: 5.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed:
Bug Depends on:    
Bug Blocks: 26163, 84613    

Description Jan Hubicka 2015-03-20 02:51:46 UTC
bzip2 contains:
INLINE UInt32 bsR ( Int32 n )
{
   UInt32 v;
   bsNEEDR ( n );
   v = (bsBuff >> (bsLive-n)) & ((1 << n)-1);
   bsLive -= n;
   return v;
}

and

INLNE void bsW ( Int32 n, UInt32 v )
{
   bsNEEDW ( n );
   bsBuff |= (v << (32 - bsLive - n));
   bsLive += n;
}

which should be inlined.  INLINE is however defined to nothing for SPEC.
The catch is that we instead inline fgetc/fputc into the functions here:

#define bsNEEDR(nz)                           \
{                                             \
   while (bsLive < nz) {                      \
      Int32 zzi = fgetc ( bsStream );         \
      if (zzi == EOF) compressedStreamEOF();  \
      bsBuff = (bsBuff << 8) | (zzi & 0xffL); \
      bsLive += 8;                            \
   }                                          \
}


/*---------------------------------------------*/
#define bsNEEDW(nz)                           \
{                                             \
   while (bsLive >= 8) {                      \
      fputc ( (UChar)(bsBuff >> 24),          \
               bsStream );                    \
      bsBuff <<= 8;                           \
      bsLive -= 8;                            \
      bytesOut++;                             \
   }                                          \
}

Considering spec_getc/285 with 33 size
 to be inlined into bsR/98 in unknown:-1
 Estimated badness is -21.814074, frequency 21.04.
    Badness calculation for bsR/98 -> spec_getc/285
      size growth 27, time 22 inline hints: cross_module big_speedup
      -10.907037: guessed profile. frequency 21.035000, count 0 caller count 0 time w/o inlining 1063.840001, time w inlining 769.350000 overall growth 0 (current) 0 (original)
      Adjusted by hints -21.814074
                Accounting size:20.00, time:304.69 on predicate:(true)
...
 Inlined into bsR which now has time 767 and size 55,net change of +27.

which makes it to reach inline-insns-auto limit.

bsR is estimated as:

Inline summary for bsR/98 inlinable
  self time:       559
  global time:     0
  self size:       28
  global size:     0
  min size:       0
  self stack:      0
  global stack:    0
    size:21.000000, time:304.328000, predicate:(true)
    size:3.000000, time:1.982000, predicate:(not inlined)
  calls:
    compressedStreamEOF/143 function not considered for inlining
      loop depth: 0 freq:   8 size: 1 time: 10 callee size:12 stack: 0
    spec_getc/153 function body not available
      loop depth: 1 freq:21035 size: 3 time: 12 callee size: 0 stack: 0

The spec_getc is implemented as:

int spec_getc (int fd) {
    int rc = 0;
    debug1(4,"spec_getc: %d = ", fd);
    if (fd > MAX_SPEC_FD) {
        fprintf(stderr, "spec_read: fd=%d, > MAX_SPEC_FD!\n", fd);
        exit (1);
    }
    if (spec_fd[fd].pos >= spec_fd[fd].len) {
        debug(4,"EOF\n");
        return EOF;
    }
    rc = spec_fd[fd].buf[spec_fd[fd].pos++];
    debug1(4,"%d\n", rc);
    return rc;
}

we however split out the error handling into spec_getc.part and get:

Inline summary for spec_getc/38 inlinable
  self time:       24
  global time:     0
  self size:       33
  global size:     0
  min size:       0
  self stack:      0
  global stack:    0
    size:20.000000, time:14.485000, predicate:(true)
    size:3.000000, time:1.998000, predicate:(not inlined)

which makes it quite good inline candidate especially because the call appears within what we consider an internal loop of bsR.

Apparently clang gets lucky here because it inlines more at copmile time and spec_getc is housed in different translation unit.
Comment 1 Jan Hubicka 2015-03-20 03:15:42 UTC
Benchmarking build with -O3 -flto -Ofast -funroll-loops

For mainline I get (running on  input.graphic)

real    0m35.673s
user    0m35.556s
sys     0m0.133s

and setting early-inlining-insns=80 to get bsR/bsW inlined before we get LTO

real    0m31.975s
user    0m31.867s
sys     0m0.124s

-fno-ipa-cp:

real    0m34.232s
user    0m34.132s
sys     0m0.117s


For GCC 4.9 I get.

real    0m32.719s
user    0m32.615s
sys     0m0.124s

Oddly enought GCC 4.9 does not inlie bsR/bsW either.
Comment 2 Jan Hubicka 2015-03-20 03:30:41 UTC
The difference between 4.9 and 5.0 seems to be unrolling of the decoder loop and increased register pressure

4.9 does:
0000000000406d60 <bsR>:
  406d60:       8b 35 32 14 01 00       mov    0x11432(%rip),%esi        # 418198 <bsLive>
  406d66:       53                      push   %rbx
  406d67:       8b 05 27 14 01 00       mov    0x11427(%rip),%eax        # 418194 <bsBuff>
  406d6d:       39 f7                   cmp    %esi,%edi
  406d6f:       0f 8e f3 00 00 00       jle    406e68 <bsR+0x108>
  406d75:       48 63 05 20 14 01 00    movslq 0x11420(%rip),%rax        # 41819c <bsStream>
  406d7c:       83 f8 03                cmp    $0x3,%eax
  406d7f:       0f 8f 03 01 00 00       jg     406e88 <bsR+0x128>
  406d85:       4c 8d 0c 40             lea    (%rax,%rax,2),%r9
  406d89:       49 c1 e1 03             shl    $0x3,%r9
  406d8d:       49 8d 91 c0 81 41 00    lea    0x4181c0(%r9),%rdx
  406d94:       8b 4a 08                mov    0x8(%rdx),%ecx
  406d97:       39 4a 04                cmp    %ecx,0x4(%rdx)
  406d9a:       0f 8e c0 00 00 00       jle    406e60 <bsR+0x100>
  406da0:       44 8d 56 08             lea    0x8(%rsi),%r10d
  406da4:       89 fe                   mov    %edi,%esi
  406da6:       48 63 d9                movslq %ecx,%rbx
  406da9:       48 03 5a 10             add    0x10(%rdx),%rbx
  406dad:       8b 05 e1 13 01 00       mov    0x113e1(%rip),%eax        # 418194 <bsBuff>
  406db3:       44 8d 59 01             lea    0x1(%rcx),%r11d
  406db7:       44 29 d6                sub    %r10d,%esi
  406dba:       83 c6 07                add    $0x7,%esi
  406dbd:       83 e6 08                and    $0x8,%esi
  406dc0:       74 3e                   je     406e00 <bsR+0xa0>
  406dc2:       45 89 d8                mov    %r11d,%r8d
  406dc5:       44 89 5a 08             mov    %r11d,0x8(%rdx)
  406dc9:       44 0f b6 1b             movzbl (%rbx),%r11d
  406dcd:       c1 e0 08                shl    $0x8,%eax
  406dd0:       44 89 d6                mov    %r10d,%esi
  406dd3:       44 89 15 be 13 01 00    mov    %r10d,0x113be(%rip)        # 418198 <bsLive>
  406dda:       44 09 d8                or     %r11d,%eax
  406ddd:       44 39 d7                cmp    %r10d,%edi
  406de0:       89 05 ae 13 01 00       mov    %eax,0x113ae(%rip)        # 418194 <bsBuff>
  406de6:       0f 8e 7c 00 00 00       jle    406e68 <bsR+0x108>
  406dec:       41 83 c2 08             add    $0x8,%r10d
  406df0:       48 83 c3 01             add    $0x1,%rbx
  406df4:       44 39 42 04             cmp    %r8d,0x4(%rdx)
  406df8:       44 8d 59 02             lea    0x2(%rcx),%r11d
  406dfc:       7e 62                   jle    406e60 <bsR+0x100>
  406dfe:       66 90                   xchg   %ax,%ax
  406e00:       49 8d 91 c0 81 41 00    lea    0x4181c0(%r9),%rdx
  406e07:       c1 e0 08                shl    $0x8,%eax
  406e0a:       44 89 d6                mov    %r10d,%esi
  406e0d:       44 89 5a 08             mov    %r11d,0x8(%rdx)
  406e11:       0f b6 0b                movzbl (%rbx),%ecx
  406e14:       44 89 15 7d 13 01 00    mov    %r10d,0x1137d(%rip)        # 418198 <bsLive>
  406e1b:       09 c8                   or     %ecx,%eax
 406e1d:       44 39 d7                cmp    %r10d,%edi
  406e20:       89 05 6e 13 01 00       mov    %eax,0x1136e(%rip)        # 418194 <bsBuff>
  406e26:       7e 40                   jle    406e68 <bsR+0x108>
  406e28:       44 39 5a 04             cmp    %r11d,0x4(%rdx)
  406e2c:       45 8d 42 08             lea    0x8(%r10),%r8d
  406e30:       41 8d 73 01             lea    0x1(%r11),%esi
  406e34:       7e 2a                   jle    406e60 <bsR+0x100>
  406e36:       89 72 08                mov    %esi,0x8(%rdx)
  406e39:       0f b6 4b 01             movzbl 0x1(%rbx),%ecx
  406e3d:       c1 e0 08                shl    $0x8,%eax
  406e40:       41 83 c2 10             add    $0x10,%r10d
  406e44:       41 83 c3 02             add    $0x2,%r11d
  406e48:       48 83 c3 02             add    $0x2,%rbx
  406e4c:       44 89 05 45 13 01 00    mov    %r8d,0x11345(%rip)        # 418198 <bsLive>
  406e53:       09 c8                   or     %ecx,%eax
  406e55:       39 72 04                cmp    %esi,0x4(%rdx)
  406e58:       89 05 36 13 01 00       mov    %eax,0x11336(%rip)        # 418194 <bsBuff>
  406e5e:       7f a0                   jg     406e00 <bsR+0xa0>
  406e60:       e8 3b 28 00 00          callq  4096a0 <compressedStreamEOF>
  406e65:       0f 1f 00                nopl   (%rax)
  406e68:       89 f1                   mov    %esi,%ecx
  406e6a:       41 b9 01 00 00 00       mov    $0x1,%r9d
  406e70:       29 f9                   sub    %edi,%ecx
  406e72:       d3 e8                   shr    %cl,%eax
  406e74:       89 0d 1e 13 01 00       mov    %ecx,0x1131e(%rip)        # 418198 <bsLive>
  406e7a:       89 f9                   mov    %edi,%ecx
  406e7c:       41 d3 e1                shl    %cl,%r9d
  406e7f:       41 83 e9 01             sub    $0x1,%r9d
  406e83:       44 21 c8                and    %r9d,%eax
  406e86:       5b                      pop    %rbx
  406e87:       c3                      retq   
  406e88:       89 c7                   mov    %eax,%edi
  406e8a:       e8 35 9c ff ff          callq  400ac4 <spec_getc.part.1.lto_priv.1>
  406e8f:       90                      nop

While 5.0
00000000004071e0 <bsR>:
  4071e0:       41 55                   push   %r13
  4071e2:       41 54                   push   %r12
  4071e4:       55                      push   %rbp
  4071e5:       53                      push   %rbx
  4071e6:       48 83 ec 08             sub    $0x8,%rsp
  4071ea:       8b 05 dc a5 01 00       mov    0x1a5dc(%rip),%eax        # 4217cc <bsLive>
  4071f0:       8b 15 da a5 01 00       mov    0x1a5da(%rip),%edx        # 4217d0 <bsBuff>
  4071f6:       39 c7                   cmp    %eax,%edi
  4071f8:       0f 8e 92 01 00 00       jle    407390 <bsR+0x1b0>
  4071fe:       48 63 15 cf a5 01 00    movslq 0x1a5cf(%rip),%rdx        # 4217d4 <bsStream>
  407205:       41 89 fc                mov    %edi,%r12d
  407208:       83 fa 03                cmp    $0x3,%edx
  40720b:       0f 8f de 01 00 00       jg     4073ef <bsR+0x20f>
  407211:       48 8d 0c 52             lea    (%rdx,%rdx,2),%rcx
  407215:       48 8d 1c cd 80 17 42    lea    0x421780(,%rcx,8),%rbx
  40721c:       00 
  40721d:       8b 6b 08                mov    0x8(%rbx),%ebp
  407220:       44 8b 5b 04             mov    0x4(%rbx),%r11d
  407224:       41 39 eb                cmp    %ebp,%r11d
  407227:       0f 8e 53 01 00 00       jle    407380 <bsR+0x1a0>
  40722d:       44 8d 48 08             lea    0x8(%rax),%r9d
  407231:       41 89 fd                mov    %edi,%r13d
  407234:       4c 63 d5                movslq %ebp,%r10
  407237:       41 83 c3 01             add    $0x1,%r11d
  40723b:       4c 03 53 10             add    0x10(%rbx),%r10
  40723f:       8b 15 8b a5 01 00       mov    0x1a58b(%rip),%edx        # 4217d0 <bsBuff>
  407245:       45 29 cd                sub    %r9d,%r13d
  407248:       8d 75 01                lea    0x1(%rbp),%esi
  40724b:       41 83 c5 07             add    $0x7,%r13d
  40724f:       41 c1 ed 03             shr    $0x3,%r13d
  407253:       41 83 e5 03             and    $0x3,%r13d
  407257:       0f 84 a0 00 00 00       je     4072fd <bsR+0x11d>
  40725d:       89 73 08                mov    %esi,0x8(%rbx)
  407260:       41 0f b6 32             movzbl (%r10),%esi
  407264:       c1 e2 08                shl    $0x8,%edx
  407267:       44 89 c8                mov    %r9d,%eax
  40726a:       44 89 0d 5b a5 01 00    mov    %r9d,0x1a55b(%rip)        # 4217cc <bsLive>
  407271:       09 f2                   or     %esi,%edx
  407273:       44 39 cf                cmp    %r9d,%edi
  407276:       89 15 54 a5 01 00       mov    %edx,0x1a554(%rip)        # 4217d0 <bsBuff>
  40727c:       0f 8e 0e 01 00 00       jle    407390 <bsR+0x1b0>
  407282:       8d 75 02                lea    0x2(%rbp),%esi
  407285:       41 83 c1 08             add    $0x8,%r9d
  407289:       49 83 c2 01             add    $0x1,%r10
  40728d:       44 39 de                cmp    %r11d,%esi
  407290:       0f 84 ea 00 00 00       je     407380 <bsR+0x1a0>
  407296:       41 83 fd 01             cmp    $0x1,%r13d
  4072a0:       74 2e                   je     4072d0 <bsR+0xf0>
  4072a2:       89 73 08                mov    %esi,0x8(%rbx)
  4072a5:       45 0f b6 02             movzbl (%r10),%r8d
  4072a9:       8d 75 03                lea    0x3(%rbp),%esi
  4072ac:       c1 e2 08                shl    $0x8,%edx
  4072af:       44 89 0d 16 a5 01 00    mov    %r9d,0x1a516(%rip)        # 4217cc <bsLive>
  4072b6:       49 83 c2 01             add    $0x1,%r10
  4072ba:       41 83 c1 08             add    $0x8,%r9d
  4072be:       44 09 c2                or     %r8d,%edx
  4072c1:       44 39 de                cmp    %r11d,%esi
  4072c4:       89 15 06 a5 01 00       mov    %edx,0x1a506(%rip)        # 4217d0 <bsBuff>
  4072ca:       0f 84 b0 00 00 00       je     407380 <bsR+0x1a0>
  4072d0:       89 73 08                mov    %esi,0x8(%rbx)
  4072d3:       41 0f b6 0a             movzbl (%r10),%ecx
  4072d7:       c1 e2 08                shl    $0x8,%edx
  4072da:       83 c6 01                add    $0x1,%esi
  4072dd:       44 89 0d e8 a4 01 00    mov    %r9d,0x1a4e8(%rip)        # 4217cc <bsLive>
  4072e4:       49 83 c2 01             add    $0x1,%r10
  4072e8:       41 83 c1 08             add    $0x8,%r9d
  4072ec:       09 ca                   or     %ecx,%edx
  4072ee:       44 39 de                cmp    %r11d,%esi
  4072f1:       89 15 d9 a4 01 00       mov    %edx,0x1a4d9(%rip)        # 4217d0 <bsBuff>
  4072f7:       0f 84 83 00 00 00       je     407380 <bsR+0x1a0>
  4072fd:       89 73 08                mov    %esi,0x8(%rbx)
  407300:       41 0f b6 2a             movzbl (%r10),%ebp
  407304:       c1 e2 08                shl    $0x8,%edx
  407307:       44 89 c8                mov    %r9d,%eax
  40730a:       44 89 0d bb a4 01 00    mov    %r9d,0x1a4bb(%rip)        # 4217cc <bsLive>
  407311:       09 ea                   or     %ebp,%edx
  407313:       45 39 cc                cmp    %r9d,%r12d
  407316:       89 15 b4 a4 01 00       mov    %edx,0x1a4b4(%rip)        # 4217d0 <bsBuff>
  40731c:       7e 72                   jle    407390 <bsR+0x1b0>
  40731e:       8d 46 01                lea    0x1(%rsi),%eax
  407321:       45 8d 69 08             lea    0x8(%r9),%r13d
  407325:       44 39 d8                cmp    %r11d,%eax
  407328:       74 56                   je     407380 <bsR+0x1a0>
  40732a:       89 43 08                mov    %eax,0x8(%rbx)
  40732d:       45 0f b6 42 01          movzbl 0x1(%r10),%r8d
  407332:       8d 6e 02                lea    0x2(%rsi),%ebp
  407335:       c1 e2 08                shl    $0x8,%edx
  407338:       44 89 2d 8d a4 01 00    mov    %r13d,0x1a48d(%rip)        # 4217cc <bsLive>
  40733f:       41 8d 49 10             lea    0x10(%r9),%ecx
  407343:       44 09 c2                or     %r8d,%edx
  407346:       44 39 dd                cmp    %r11d,%ebp
  407349:       89 15 81 a4 01 00       mov    %edx,0x1a481(%rip)        # 4217d0 <bsBuff>
  40734f:       74 2f                   je     407380 <bsR+0x1a0>
  407354:       45 0f b6 6a 02          movzbl 0x2(%r10),%r13d
  407359:       8d 46 03                lea    0x3(%rsi),%eax
  40735c:       c1 e2 08                shl    $0x8,%edx
  40735f:       89 0d 67 a4 01 00       mov    %ecx,0x1a467(%rip)        # 4217cc <bsLive>
  407365:       45 8d 41 18             lea    0x18(%r9),%r8d
  407369:       44 09 ea                or     %r13d,%edx
  40736c:       44 39 d8                cmp    %r11d,%eax
  40736f:       89 15 5b a4 01 00       mov    %edx,0x1a45b(%rip)        # 4217d0 <bsBuff>
  407375:       75 49                   jne    4073c0 <bsR+0x1e0>
  407377:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  40737e:       00 00 
  407380:       e8 0b e1 ff ff          callq  405490 <compressedStreamEOF>
  407385:       66 66 2e 0f 1f 84 00    data32 nopw %cs:0x0(%rax,%rax,1)
  40738c:       00 00 00 00 
  407390:       29 f8                   sub    %edi,%eax
  407392:       89 f9                   mov    %edi,%ecx
  407394:       41 bc 01 00 00 00       mov    $0x1,%r12d
  40739a:       41 d3 e4                shl    %cl,%r12d
  40739d:       89 c1                   mov    %eax,%ecx
  40739f:       89 05 27 a4 01 00       mov    %eax,0x1a427(%rip)        # 4217cc <bsLive>
  4073a5:       d3 ea                   shr    %cl,%edx
  4073a7:       48 83 c4 08             add    $0x8,%rsp
  4073ab:       41 83 ec 01             sub    $0x1,%r12d
  4073af:       89 d0                   mov    %edx,%eax
  4073b1:       5b                      pop    %rbx
  4073b2:       44 21 e0                and    %r12d,%eax
  4073b5:       5d                      pop    %rbp
  4073b6:       41 5c                   pop    %r12
  4073b8:       41 5d                   pop    %r13
  4073ba:       c3                      retq   
  4073bb:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  4073c0:       89 43 08                mov    %eax,0x8(%rbx)
  4073c3:       41 0f b6 4a 03          movzbl 0x3(%r10),%ecx
  4073c8:       c1 e2 08                shl    $0x8,%edx
  4073cb:       83 c6 04                add    $0x4,%esi
  4073ce:       41 83 c1 20             add    $0x20,%r9d
  4073d2:       49 83 c2 04             add    $0x4,%r10
  4073d6:       44 89 05 ef a3 01 00    mov    %r8d,0x1a3ef(%rip)        # 4217cc <bsLive>
  4073dd:       09 ca                   or     %ecx,%edx
  4073df:       44 39 de                cmp    %r11d,%esi
  4073e2:       89 15 e8 a3 01 00       mov    %edx,0x1a3e8(%rip)        # 4217d0 <bsBuff>
  4073e8:       74 96                   je     407380 <bsR+0x1a0>
  4073ea:       e9 0e ff ff ff          jmpq   4072fd <bsR+0x11d>
  4073ef:       89 d7                   mov    %edx,%edi
  4073f1:       e8 ce 96 ff ff          callq  400ac4 <spec_getc.part.1.lto_priv.19>

0000000000407400 <bsR.constprop.3>:
  407400:       48 83 ec 08             sub    $0x8,%rsp
  407404:       8b 0d c2 a3 01 00       mov    0x1a3c2(%rip),%ecx        # 4217cc <bsLive>
  40740a:       8b 05 c0 a3 01 00       mov    0x1a3c0(%rip),%eax        # 4217d0 <bsBuff>
  407410:       83 f9 07                cmp    $0x7,%ecx
  407413:       0f 8f 87 01 00 00       jg     4075a0 <bsR.constprop.3+0x1a0>
  407419:       48 63 3d b4 a3 01 00    movslq 0x1a3b4(%rip),%rdi        # 4217d4 <bsStream>
  407420:       83 ff 03                cmp    $0x3,%edi
  407423:       0f 8f c6 01 00 00       jg     4075ef <bsR.constprop.3+0x1ef>
  407429:       48 8d 04 7f             lea    (%rdi,%rdi,2),%rax
  40742d:       4c 8d 14 c5 80 17 42    lea    0x421780(,%rax,8),%r10
  407434:       00
  407435:       41 8b 7a 08             mov    0x8(%r10),%edi
  407439:       45 8b 4a 04             mov    0x4(%r10),%r9d
  40743d:       44 39 cf                cmp    %r9d,%edi
  407440:       0f 8d 4a 01 00 00       jge    407590 <bsR.constprop.3+0x190>
  407446:       8d 71 08                lea    0x8(%rcx),%esi
  407449:       41 bb 0f 00 00 00       mov    $0xf,%r11d
  40744f:       4c 63 c7                movslq %edi,%r8
  407452:       4d 03 42 10             add    0x10(%r10),%r8
  407456:       8b 05 74 a3 01 00       mov    0x1a374(%rip),%eax        # 4217d0 <bsBuff>
  40745c:       41 29 f3                sub    %esi,%r11d
  40745f:       41 c1 eb 03             shr    $0x3,%r11d
  407463:       41 83 e3 03             and    $0x3,%r11d
  407467:       0f 84 9e 00 00 00       je     40750b <bsR.constprop.3+0x10b>
  40746d:       83 c7 01                add    $0x1,%edi
  407470:       c1 e0 08                shl    $0x8,%eax
  407473:       41 89 7a 08             mov    %edi,0x8(%r10)
  407477:       41 0f b6 08             movzbl (%r8),%ecx
  40747b:       89 35 4b a3 01 00       mov    %esi,0x1a34b(%rip)        # 4217cc <bsLive>
  407481:       09 c8                   or     %ecx,%eax
  407483:       83 fe 07                cmp    $0x7,%esi
  407486:       89 f1                   mov    %esi,%ecx
  407488:       89 05 42 a3 01 00       mov    %eax,0x1a342(%rip)        # 4217d0 <bsBuff>
  40748e:       0f 8f 0c 01 00 00       jg     4075a0 <bsR.constprop.3+0x1a0>
  407494:       83 c6 08                add    $0x8,%esi
  407497:       49 83 c0 01             add    $0x1,%r8
  40749b:       44 39 cf                cmp    %r9d,%edi
  40749e:       0f 84 ec 00 00 00       je     407590 <bsR.constprop.3+0x190>
  4074a4:       41 83 fb 01             cmp    $0x1,%r11d
  4074a8:       74 61                   je     40750b <bsR.constprop.3+0x10b>
  4074aa:       41 83 fb 02             cmp    $0x2,%r11d
  4074ae:       74 2d                   je     4074dd <bsR.constprop.3+0xdd>
  4074b0:       83 c7 01                add    $0x1,%edi
  4074b3:       c1 e0 08                shl    $0x8,%eax
  4074b6:       49 83 c0 01             add    $0x1,%r8
  4074ba:       41 89 7a 08             mov    %edi,0x8(%r10)
  4074be:       41 0f b6 50 ff          movzbl -0x1(%r8),%edx
  4074c3:       89 35 03 a3 01 00       mov    %esi,0x1a303(%rip)        # 4217cc <bsLive>
  4074c9:       83 c6 08                add    $0x8,%esi
  4074cc:       09 d0                   or     %edx,%eax
  4074ce:       44 39 cf                cmp    %r9d,%edi
  4074d1:       89 05 f9 a2 01 00       mov    %eax,0x1a2f9(%rip)        # 4217d0 <bsBuff>
  4074d7:       0f 84 b3 00 00 00       je     407590 <bsR.constprop.3+0x190>
  4074dd:       83 c7 01                add    $0x1,%edi
  4074e0:       c1 e0 08                shl    $0x8,%eax
  4074e3:       49 83 c0 01             add    $0x1,%r8
  4074e7:       41 89 7a 08             mov    %edi,0x8(%r10)
  4074eb:       45 0f b6 58 ff          movzbl -0x1(%r8),%r11d
  4074f0:       89 35 d6 a2 01 00       mov    %esi,0x1a2d6(%rip)        # 4217cc <bsLive>
  4074f6:       83 c6 08                add    $0x8,%esi
  4074f9:       44 09 d8                or     %r11d,%eax
  4074fc:       44 39 cf                cmp    %r9d,%edi
  4074ff:       89 05 cb a2 01 00       mov    %eax,0x1a2cb(%rip)        # 4217d0 <bsBuff>
  407505:       0f 84 85 00 00 00       je     407590 <bsR.constprop.3+0x190>
  40750b:       8d 57 01                lea    0x1(%rdi),%edx
  40750e:       c1 e0 08                shl    $0x8,%eax
  407511:       41 89 52 08             mov    %edx,0x8(%r10)
  407515:       41 0f b6 08             movzbl (%r8),%ecx
  407519:       89 35 ad a2 01 00       mov    %esi,0x1a2ad(%rip)        # 4217cc <bsLive>
  40751f:       09 c8                   or     %ecx,%eax
  407521:       83 fe 07                cmp    $0x7,%esi
  407524:       89 f1                   mov    %esi,%ecx
  407526:       89 05 a4 a2 01 00       mov    %eax,0x1a2a4(%rip)        # 4217d0 <bsBuff>
  40752c:       7f 72                   jg     4075a0 <bsR.constprop.3+0x1a0>
  40752e:       44 39 ca                cmp    %r9d,%edx
  407531:       44 8d 5e 08             lea    0x8(%rsi),%r11d
  407535:       74 59                   je     407590 <bsR.constprop.3+0x190>
  407537:       8d 4f 02                lea    0x2(%rdi),%ecx
  40753a:       c1 e0 08                shl    $0x8,%eax
  40753d:       41 89 4a 08             mov    %ecx,0x8(%r10)
  407541:       41 0f b6 50 01          movzbl 0x1(%r8),%edx
  407546:       44 89 1d 7f a2 01 00    mov    %r11d,0x1a27f(%rip)        # 4217cc <bsLive>
  40754d:       44 8d 5e 10             lea    0x10(%rsi),%r11d
  407551:       09 d0                   or     %edx,%eax
  407553:       44 39 c9                cmp    %r9d,%ecx
  407556:       89 05 74 a2 01 00       mov    %eax,0x1a274(%rip)        # 4217d0 <bsBuff>
  40755c:       74 32                   je     407590 <bsR.constprop.3+0x190>
  40755e:       8d 4f 03                lea    0x3(%rdi),%ecx
  407561:       c1 e0 08                shl    $0x8,%eax
  407564:       41 89 4a 08             mov    %ecx,0x8(%r10)
  407568:       41 0f b6 50 02          movzbl 0x2(%r8),%edx
  40756d:       44 89 1d 58 a2 01 00    mov    %r11d,0x1a258(%rip)        # 4217cc <bsLive>
  407574:       44 8d 5e 18             lea    0x18(%rsi),%r11d
  407578:       09 d0                   or     %edx,%eax
  40757a:       44 39 c9                cmp    %r9d,%ecx
  40757d:       89 05 4d a2 01 00       mov    %eax,0x1a24d(%rip)        # 4217d0 <bsBuff>
  407583:       75 3b                   jne    4075c0 <bsR.constprop.3+0x1c0>
  407585:       66 66 2e 0f 1f 84 00    data32 nopw %cs:0x0(%rax,%rax,1)
  40758c:       00 00 00 00 
  407590:       e8 fb de ff ff          callq  405490 <compressedStreamEOF>
  407595:       66 66 2e 0f 1f 84 00    data32 nopw %cs:0x0(%rax,%rax,1)
  40759c:       00 00 00 00 
  4075a0:       83 e9 08                sub    $0x8,%ecx
  4075a3:       d3 e8                   shr    %cl,%eax
  4075a5:       89 0d 21 a2 01 00       mov    %ecx,0x1a221(%rip)        # 4217cc <bsLive>
  4075ab:       48 83 c4 08             add    $0x8,%rsp
  4075af:       0f b6 c0                movzbl %al,%eax
  4075b2:       c3                      retq   
  4075b3:       66 66 66 66 2e 0f 1f    data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  4075ba:       84 00 00 00 00 00 
  4075c0:       83 c7 04                add    $0x4,%edi
  4075c3:       c1 e0 08                shl    $0x8,%eax
  4075c6:       83 c6 20                add    $0x20,%esi
  4075c9:       41 89 7a 08             mov    %edi,0x8(%r10)
  4075cd:       41 0f b6 48 03          movzbl 0x3(%r8),%ecx
  4075d2:       49 83 c0 04             add    $0x4,%r8
  4075d6:       44 89 1d ef a1 01 00    mov    %r11d,0x1a1ef(%rip)        # 4217cc <bsLive>
  4075dd:       09 c8                   or     %ecx,%eax
  4075df:       44 39 cf                cmp    %r9d,%edi
  4075e2:       89 05 e8 a1 01 00       mov    %eax,0x1a1e8(%rip)        # 4217d0 <bsBuff>
  4075e8:       74 a6                   je     407590 <bsR.constprop.3+0x190>
  4075ea:       e9 1c ff ff ff          jmpq   40750b <bsR.constprop.3+0x10b>
  4075ef:       e8 d0 94 ff ff          callq  400ac4 <spec_getc.part.1.lto_priv.19>
  4075f4:       66 66 66 2e 0f 1f 84    data32 data32 nopw %cs:0x0(%rax,%rax,1)
  4075fb:       00 00 00 00 00 


which, given the fast path across function, is quite an overkill.

Richard, perhaps we can somehow derive the value range and fact that the number of iterations is at most 4?
Comment 3 Richard Biener 2015-03-20 10:07:08 UTC
Testcase?  I suppose you are talking about the loops in the bsNEEDR/W macros?
Comment 4 Jan Hubicka 2015-03-20 18:11:35 UTC
> Testcase?  I suppose you are talking about the loops in the bsNEEDR/W macros?
bzip2 is quite small by itself, but I will take a look later today. Yes, it is bsNEEDR/W macros
that gets unrolled.

Honza
Comment 5 Andrew Pinski 2016-08-10 07:31:22 UTC
Does this still happen or do we need to crank up the inlining limits still?