Bug 45670 - [4.6 Regression] Less efficient x86 addressing mode selection on 4.6, causes -Os size regression from 4.5
Summary: [4.6 Regression] Less efficient x86 addressing mode selection on 4.6, causes ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.0
: P2 normal
Target Milestone: 4.6.0
Assignee: Jakub Jelinek
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2010-09-14 20:03 UTC by Steinar H. Gunderson
Modified: 2010-11-05 19:10 UTC (History)
3 users (show)

See Also:
Host: i486-linux-gnu
Target: i486-linux-gnu
Build: i486-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2010-09-15 05:25:27


Attachments
gcc46-pr45670.patch (714 bytes, patch)
2010-11-05 10:48 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Steinar H. Gunderson 2010-09-14 20:03:09 UTC
Hi,

Given the following test C++ file:

class Class
{
public:
	void func();

        float *buf;
        int size;
};

void Class::func()
{
        for (int i = 0; i < size; ++i) {
                buf[i] = 0;
	}
}

4.6 (see below for exact version) will generate larger code (36 vs. 30 bytes) than 4.5.1 (Debian 4.5.1-6) given -Os. The output is

00000000 <Class::func()>:
   0:	55                   	push   %ebp
   1:	31 c0                	xor    %eax,%eax
   3:	89 e5                	mov    %esp,%ebp
   5:	8b 4d 08             	mov    0x8(%ebp),%ecx
   8:	53                   	push   %ebx
   9:	8b 59 04             	mov    0x4(%ecx),%ebx
   c:	eb 10                	jmp    1e <Class::func()+0x1e>
   e:	8d 14 85 00 00 00 00 	lea    0x0(,%eax,4),%edx
  15:	40                   	inc    %eax
  16:	03 11                	add    (%ecx),%edx
  18:	c7 02 00 00 00 00    	movl   $0x0,(%edx)
  1e:	39 d8                	cmp    %ebx,%eax
  20:	7c ec                	jl     e <Class::func()+0xe>
  22:	5b                   	pop    %ebx
  23:	5d                   	pop    %ebp
  24:	c3                   	ret    

Basically the problem is that the lea is large (due to the zero immediate taking up 32 bits); 4.5 uses a variation where the address calculation takes both a base and an index register, which has a shorter form not requiring to store the zero. (The joys of x86; lea edx, [eax*4 + ecx] takes less space then lea edx, [eax*4]...)

===

Configured with: ../src/configure -v --with-pkgversion='Debian 20100828-1' --with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs --enable-languages=c,ada,c++,fortran,objc,obj-c++ --prefix=/usr/lib/gcc-snapshot --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --disable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --with-plugin-ld=ld.gold --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --disable-werror --enable-checking=yes --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.6.0 20100828 (experimental) [trunk revision 163616] (Debian 20100828-1)
Comment 1 H.J. Lu 2010-09-15 05:25:26 UTC
It is caused by revision 162618:

http://gcc.gnu.org/ml/gcc-cvs/2010-07/msg00972.html
Comment 2 Eric Botcazou 2010-09-15 05:38:31 UTC
> It is caused by revision 162618:
> 
> http://gcc.gnu.org/ml/gcc-cvs/2010-07/msg00972.html

No, it isn't, this commit reverted an incorrect change done in revision 161907.
Comment 3 Jakub Jelinek 2010-11-05 10:03:15 UTC
Well, it is actually very much related to that reversion.  While it didn't regress in between 161907 and 162617, it regressed from the MEM_REF merge till 161906 and from 162618 onwards.
The problem is that due to the MEM load combiner isn't able to fix this up, so if we don't get it right during expansion, we are out of luck.  Looking at it now...
Comment 4 Jakub Jelinek 2010-11-05 10:48:39 UTC
Created attachment 22291 [details]
gcc46-pr45670.patch

Untested fix.
Comment 5 Eric Botcazou 2010-11-05 11:17:11 UTC
> Well, it is actually very much related to that reversion.  While it didn't
> regress in between 161907 and 162617, it regressed from the MEM_REF merge till
> 161906 and from 162618 onwards.

Yes, that isn't different from what I said, it had been caused by something else.

> The problem is that due to the MEM load combiner isn't able to fix this up, so
> if we don't get it right during expansion, we are out of luck.  Looking at it
> now...

Thanks!
Comment 6 Jakub Jelinek 2010-11-05 19:00:32 UTC
Author: jakub
Date: Fri Nov  5 19:00:27 2010
New Revision: 166371

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166371
Log:
	PR target/45670
	* expr.c (expand_expr_real_1) <case MEM_REF>: Use EXPAND_SUM
	instead of EXPAND_NORMAL for base expansion.

	* gcc.target/i386/pr45670.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr45670.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/expr.c
    trunk/gcc/testsuite/ChangeLog
Comment 7 Jakub Jelinek 2010-11-05 19:10:57 UTC
Fixed.