This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Idea for code size reduction

On Fri, Mar 07, 2008 at 01:05:03PM +0100, Philipp Marek wrote:
> When wouldn't that possible? My script currently splits on an
> instruction-level -- although I would see no problem that some branch
> jumps into a "half" opcode of another branch, if the byte sequence
> matches.

00000000 <bar>:
       0:       b8 a4 00 00 00          mov    $0xa4,%eax
       5:       ba fc 04 00 00          mov    $0x4fc,%edx
       a:       f7 e2                   mul    %edx
       c:       05 d2 04 00 00          add    $0x4d2,%eax
      11:       c3                      ret    

00020012 <foo>:
   20012:       39 d2                   cmp    %edx,%edx
   20014:       75 07                   jne    2001d <foo+0xb>
   20016:       ba fc 04 00 00          mov    $0x4fc,%edx
   2001b:       f7 e2                   mul    %edx
   2001d:       05 d2 04 00 00          add    $0x4d2,%eax
   20022:       c3                      ret    

If you merge the mov/mul/add/ret sequences by replacing the foo tail
sequence with jmp bar+5, then the jne will branch to wrong place, or
if you try to adjust it, it is too far to reach the target.

> > but even jmp argument is relative, not absolute.
> That's why I take only jumps with 32bit arguments - these are absolute.

No, they are relative.

00000000 <bar>:
       0:       e9 05 00 02 00          jmp    2000a <foo>
       5:       e9 00 00 02 00          jmp    2000a <foo>

0002000a <foo>:
   2000a:       90                      nop    

See how they are encoded.

> Yes ... I think doing a second compile pass might be the easiest way, and
> not much slower than other solutions.
>   (We could always remember which object files could be optimized, and
>   only recompile *those*. After all, it's just an additional
>   optimization.)

BTW, have you tried to compile the whole kernel with --combine, or at least
e.g. each kernel directory with --combine?  I guess that will give you
bigger savings than 30K.  Also, stop defining inline to inline
__attribute__((always_inline)), I think Ingo also added such patch recently
and it saved 120K.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]