gcc-3.0 -Os wastes a byte in the following code on i386: a: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax andl $1, %eax cwtl popl %ebp ret The cwtl is unnecessary since %eax has just been anded with 1. The source is: int a(short i) { return i & 1; } Release: 3.0 20010526 (Debian prerelease) (Debian testing/unstable) Environment: System: Linux smile 2.2.17 #1 Sun Oct 8 19:26:41 MEST 2000 i686 unknown Architecture: i686 host: i386-pc-linux-gnu build: i386-pc-linux-gnu target: i386-pc-linux-gnu configured with: ../src/configure -v --enable-languages=c,c++,java,f77,proto,objc --prefix=/usr --infodir=/share/info --mandir=/share/man --enable-shared --with-gnu-as --with-gnu-ld --with-system-zlib --enable-long-long --enable-nls --without-x --without-included-gettext --disable-checking --enable-threads=posix --enable-java-gc=boehm --with-cpp-install-dir=bin --enable-objc-gc i386-linux
State-Changed-From-To: open->analyzed State-Changed-Why: The combiner creates (and:HI (mem:HI ...) (const_int 1)) at which point it isn't allowed to widen the operation. Need to adjust ix86_binary_operator_ok such that it doesn't allow a memory source without a register source that we could possibly match.
State-Changed-From-To: analyzed->closed State-Changed-Why: This has been fixed on mainline by the recent patch 2002-04-17 Roger Sayle <roger@eyesopen.com> Jakub Jelinek <jakub@redhat.com> * fold-const.c (fold) [NOP_EXPR]: Convert (T)(x&c) into ((T)x&(T)c) for integer constant c (if x has unsigned type or sign bit is not set in c). This folds the zero/sign extension into the bit-wise and operation. We now no longer generate the unnecessary "cwtl" instruction for the test case in the PR. [Unfortunately with -Os, there is no net space saving for this example function as mainline also decides to use a longer but faster load instruction].
Herbert Xu writes: I appologise for all the reopening. Be rest assured that your efforts in gettings these issues addressed is most appreciated. Unfortunately as in other cases this bug hasn't been addressed in full by the upstream. Although the cwtl instructions are gone, it has been replaced by the equivalent mov?wl: -- a.c -- #include <ctype.h> int a(short i) { return i & 1; } int foo(int c) { return isdigit(c); } -- -- gcc -S -O2 a.c -- a: pushl %ebp movl %esp, %ebp movswl 8(%ebp),%eax andl $1, %eax leave ret ... foo: pushl %ebp movl %esp, %ebp subl $8, %esp call __ctype_b_loc movl (%eax), %edx movl 8(%ebp), %eax movzwl (%edx,%eax,2), %eax andl $2048, %eax leave ret .size foo, .-foo .ident "GCC: (GNU) 3.3 20030509 (Debian prerelease)" In both instances the extension (mov?wl) is unnecessary as it is followed immediately by an and which zaps the high bits.
The problem is that gcc is producing a sign_extend and not removing based on the "and": (insn 4 3 6 0 (set (reg/v:SI 59 [ i ]) (sign_extend:SI (mem/f:HI (reg/f:SI 16 argp) [2 i+0 S2 A16]))) 90 {extendhisi2} (nil) (nil)) (note 6 4 13 0 NOTE_INSN_FUNCTION_BEG) (insn 13 6 18 0 (parallel [ (set (reg:SI 61) (and:SI (reg/v:SI 59 [ i ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) 212 {*andsi_1} (insn_list 4 (nil)) (expr_list:REG_DEAD (reg/v:SI 59 [ i ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))
To produce the same problem on PPC the follow code is needed: int a(short *i) { int t = *i; return t & 1; }
Note even though on PPC, there is only two instructions, lha is slower on the Power4/970 than a lhz would be because it is cracked aka takes two slots in the instrcution group.
I created a patch to do this with peephole2 for i686. I will also try combine-and-split-after-reload route.
A better way to deal with this is to convert the trees from a (i) { short int T.0; int t; T.0 = *i; t = (int)T.0; return t & 1; } into a1 (i) { short int t; short int tt; int ttt; t = *i; tt = t & 1; ttt = (int)tt; return ttt; } But that would only fix the problem on PPC. To fix the problem on x86, mode changing needs to happen with subregisters instead of full registers which is what is causing these problems.
Mine for the problem on PPC.
I almost think the combiner should run on the instruction stream backwards.
Note that either movzbl or movzwl is required on i686 to avoid a partial register stall. Which is why things are generated this way. It would perhaps be possible to avoid this for earlier processors, but I don't believe this is worth the effort.
Subject: Re: inefficient code with and On Mon, Sep 13, 2004 at 03:37:03AM -0000, rth at gcc dot gnu dot org wrote: > > Note that either movzbl or movzwl is required on i686 to avoid a > partial register stall. Which is why things are generated this way. > It would perhaps be possible to avoid this for earlier processors, > but I don't believe this is worth the effort. Could we have it optimised away with -Os though? I don't think the stall would matter there.
Thank you for the additional information you have supplied regarding this Bug report. This is an automatically generated reply to let you know your message has been received. Your message has not been forwarded to the package maintainers or other interested parties; you should ensure that the developers are aware of the problem you have entered into the system - preferably quoting the Bug reference number, #95318. If you wish to submit further information on this problem, please send it to 95318-quiet@bugs.debian.org. Please do not send mail to owner@bugs.debian.org unless you wish to report a problem with the Bug-tracking system.
Fixed on the trunk.