Bug 2962 - inefficient code with and
Summary: inefficient code with and
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 3.0
: P3 enhancement
Target Milestone: ---
Assignee: Kazu Hirata
URL:
Keywords: missed-optimization
Depends on: 14792
Blocks:
  Show dependency treegraph
 
Reported: 2001-05-26 11:45 UTC by 95318-quiet
Modified: 2023-12-30 02:09 UTC (History)
6 users (show)

See Also:
Host: i386-pc-linux-gnu
Target: i386-pc-linux-gnu
Build: i386-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2007-07-02 21:50:25


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description 95318-quiet 2001-05-26 11:45:59 UTC
gcc-3.0 -Os wastes a byte in the following code on i386:

a:
	pushl	%ebp
	movl	%esp, %ebp
	movl	8(%ebp), %eax
	andl	$1, %eax
	cwtl
	popl	%ebp
	ret

The cwtl is unnecessary since %eax has just been anded with 1.

The source is:

int a(short i) {
	return i & 1;
}

Release:
3.0 20010526 (Debian prerelease) (Debian testing/unstable)

Environment:
System: Linux smile 2.2.17 #1 Sun Oct 8 19:26:41 MEST 2000 i686 unknown
Architecture: i686

	
host: i386-pc-linux-gnu
build: i386-pc-linux-gnu
target: i386-pc-linux-gnu
configured with: ../src/configure -v --enable-languages=c,c++,java,f77,proto,objc --prefix=/usr --infodir=/share/info --mandir=/share/man --enable-shared --with-gnu-as --with-gnu-ld --with-system-zlib --enable-long-long --enable-nls --without-x --without-included-gettext --disable-checking --enable-threads=posix --enable-java-gc=boehm --with-cpp-install-dir=bin --enable-objc-gc i386-linux
Comment 1 Richard Henderson 2002-04-02 01:15:31 UTC
State-Changed-From-To: open->analyzed
State-Changed-Why: The combiner creates
      (and:HI (mem:HI ...) (const_int 1))
    at which point it isn't allowed to widen the operation.
    
    Need to adjust ix86_binary_operator_ok such that it doesn't
    allow a memory source without a register source that we could
    possibly match.
Comment 2 Roger Sayle 2002-05-06 20:34:42 UTC
State-Changed-From-To: analyzed->closed
State-Changed-Why: This has been fixed on mainline by the recent patch
    
    2002-04-17  Roger Sayle  <roger@eyesopen.com>
                Jakub Jelinek  <jakub@redhat.com>
    
        * fold-const.c (fold) [NOP_EXPR]: Convert (T)(x&c) into ((T)x&(T)c)
        for integer constant c (if x has unsigned type or sign bit is not
        set in c).  This folds the zero/sign extension into the bit-wise
        and operation.
    
    We now no longer generate the unnecessary "cwtl" instruction for the
    test case in the PR.  [Unfortunately with -Os, there is no net space
    saving for this example function as mainline also decides to use a
    longer but faster load instruction].
Comment 3 Debian GCC Maintainers 2003-08-10 15:28:15 UTC
Herbert Xu writes:

I appologise for all the reopening.  Be rest assured that your efforts 
in gettings these issues addressed is most appreciated. 
 
Unfortunately as in other cases this bug hasn't been addressed in 
full by the upstream.  Although the cwtl instructions are gone, it has 
been replaced by the equivalent mov?wl: 
 
-- a.c -- 
#include <ctype.h> 
 
int a(short i) { 
        return i & 1; 
} 
 
int foo(int c) { 
        return isdigit(c); 
} 
-- 
 
-- gcc -S -O2 a.c -- 
a: 
        pushl   %ebp 
        movl    %esp, %ebp 
        movswl  8(%ebp),%eax 
        andl    $1, %eax 
        leave 
        ret 
 
... 
 
foo: 
        pushl   %ebp 
        movl    %esp, %ebp 
        subl    $8, %esp 
        call    __ctype_b_loc 
        movl    (%eax), %edx 
        movl    8(%ebp), %eax 
        movzwl  (%edx,%eax,2), %eax 
        andl    $2048, %eax 
        leave 
        ret 
        .size   foo, .-foo 
        .ident  "GCC: (GNU) 3.3 20030509 (Debian prerelease)" 
 
In both instances the extension (mov?wl) is unnecessary as 
it is followed immediately by an and which zaps the high bits. 
Comment 4 Andrew Pinski 2003-08-10 15:34:50 UTC
The problem is that gcc is producing a sign_extend and not removing based on the "and":
(insn 4 3 6 0 (set (reg/v:SI 59 [ i ])
        (sign_extend:SI (mem/f:HI (reg/f:SI 16 argp) [2 i+0 S2 A16]))) 90 {extendhisi2} (nil)
    (nil))

(note 6 4 13 0 NOTE_INSN_FUNCTION_BEG)

(insn 13 6 18 0 (parallel [
            (set (reg:SI 61)
                (and:SI (reg/v:SI 59 [ i ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) 212 {*andsi_1} (insn_list 4 (nil))
    (expr_list:REG_DEAD (reg/v:SI 59 [ i ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
Comment 5 Andrew Pinski 2003-12-26 04:09:21 UTC
To produce the same problem on PPC the follow code is needed:
int a(short *i) {
        int t = *i;
        return t & 1;
}
Comment 6 Andrew Pinski 2003-12-26 04:21:14 UTC
Note even though on PPC, there is only two instructions, lha is slower on the Power4/970 
than a lhz would be because it is cracked aka takes two slots in the instrcution group.
Comment 7 Kazu Hirata 2003-12-26 04:24:06 UTC
I created a patch to do this with peephole2 for i686.
I will also try combine-and-split-after-reload route.


Comment 8 Andrew Pinski 2004-03-29 05:22:49 UTC
A better way to deal with this is to convert the trees from
a (i)
{
  short int T.0;
  int t;

  T.0 = *i;
  t = (int)T.0;
  return t & 1;
}

into
a1 (i)
{
  short int t;
  short int tt;
  int ttt;

  t = *i;
  tt = t & 1;
  ttt = (int)tt;
  return ttt;
}
But that would only fix the problem on PPC.
To fix the problem on x86, mode changing needs to happen with subregisters instead of 
full registers which is what is causing these problems.
Comment 9 Andrew Pinski 2004-03-31 05:44:09 UTC
Mine for the problem on PPC.
Comment 10 Andrew Pinski 2004-06-03 04:15:49 UTC
I almost think the combiner should run on the instruction stream backwards.
Comment 11 Richard Henderson 2004-09-13 03:37:01 UTC
Note that either movzbl or movzwl is required on i686 to avoid a
partial register stall.  Which is why things are generated this way.
It would perhaps be possible to avoid this for earlier processors,
but I don't believe this is worth the effort.
Comment 12 herbert 2004-09-13 04:25:42 UTC
Subject: Re:  inefficient code with and

On Mon, Sep 13, 2004 at 03:37:03AM -0000, rth at gcc dot gnu dot org wrote:
> 
> Note that either movzbl or movzwl is required on i686 to avoid a
> partial register stall.  Which is why things are generated this way.
> It would perhaps be possible to avoid this for earlier processors,
> but I don't believe this is worth the effort.

Could we have it optimised away with -Os though? I don't think the
stall would matter there.
Comment 13 owner 2012-02-12 22:09:22 UTC
Thank you for the additional information you have supplied regarding
this Bug report.

This is an automatically generated reply to let you know your message
has been received.

Your message has not been forwarded to the package maintainers or
other interested parties; you should ensure that the developers are
aware of the problem you have entered into the system - preferably
quoting the Bug reference number, #95318.

If you wish to submit further information on this problem, please
send it to 95318-quiet@bugs.debian.org.

Please do not send mail to owner@bugs.debian.org unless you wish
to report a problem with the Bug-tracking system.
Comment 14 Andrew Pinski 2012-02-12 22:20:27 UTC
Fixed on the trunk.
Comment 15 owner 2012-02-12 22:24:21 UTC
Thank you for the additional information you have supplied regarding
this Bug report.

This is an automatically generated reply to let you know your message
has been received.

Your message has not been forwarded to the package maintainers or
other interested parties; you should ensure that the developers are
aware of the problem you have entered into the system - preferably
quoting the Bug reference number, #95318.

If you wish to submit further information on this problem, please
send it to 95318-quiet@bugs.debian.org.

Please do not send mail to owner@bugs.debian.org unless you wish
to report a problem with the Bug-tracking system.
Comment 16 owner 2012-02-13 06:36:24 UTC
Thank you for the additional information you have supplied regarding
this Bug report.

This is an automatically generated reply to let you know your message
has been received.

Your message has not been forwarded to the package maintainers or
other interested parties; you should ensure that the developers are
aware of the problem you have entered into the system - preferably
quoting the Bug reference number, #95318.

If you wish to submit further information on this problem, please
send it to 95318-quiet@bugs.debian.org.

Please do not send mail to owner@bugs.debian.org unless you wish
to report a problem with the Bug-tracking system.
Comment 17 owner 2012-03-22 11:14:10 UTC
Thank you for the additional information you have supplied regarding
this Bug report.

This is an automatically generated reply to let you know your message
has been received.

Your message has not been forwarded to the package maintainers or
other interested parties; you should ensure that the developers are
aware of the problem you have entered into the system - preferably
quoting the Bug reference number, #95318.

If you wish to submit further information on this problem, please
send it to 95318-quiet@bugs.debian.org.

Please do not send mail to owner@bugs.debian.org unless you wish
to report a problem with the Bug-tracking system.