This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Performance gain through dereferencing?
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Peter Schneider <schneiderp at gmx dot net>,gcc at schneiderp dot de,gcc at gcc dot gnu dot org
- Date: Wed, 16 Apr 2014 20:22:26 +0200
- Subject: Re: Performance gain through dereferencing?
- Authentication-results: sourceware.org; auth=none
- References: <534E932E dot 6000004 at schneiderp dot de> <534EC1D3 dot 1000205 at gmx dot net>
On April 16, 2014 7:45:55 PM CEST, Peter Schneider <schneiderp@gmx.net> wrote:
>In order to see what difference a different processor makes I also
>tried
>the same code on a fairly old 32 bit "AMD Athlon(tm) XP 3000+" with the
>
>current stable gcc (4.7.2). The difference is even more striking
>(dereferencing is much faster). I see that the size of the code inside
>the loop for the faster pointer access is exactly 8. No idea whether
>that has any significance.
Alignment of jump targets are important. I don't think we do anything special there at O0, so the result will be pure luck.
Richard.
>Here as well I performed several runs with similar results. Statistical
>
>significance was established around n=2 ;-).
>
>gcc -v
>Using built-in specs.
>COLLECT_GCC=gcc
>COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.7/lto-wrapper
>Target: i486-linux-gnu
>Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5'
>
>--with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
>--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
>--program-suffix=-4.7 --enable-shared --enable-linker-build-id
>--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
>--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7
>--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
>--enable-libstdcxx-debug --enable-libstdcxx-time=yes
>--enable-gnu-unique-object --enable-plugin --enable-objc-gc
>--enable-targets=all --with-arch-32=i586 --with-tune=generic
>--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
>--target=i486-linux-gnu
>Thread model: posix
>gcc version 4.7.2 (Debian 4.7.2-5)
>
>ppeterr@www:~/src/test/obj-vs-ptr$ cat t
>#!/bin/bash
>cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1
>
>ppeterr@www:~/src/test/obj-vs-ptr$ ./t obj
>int main()
>{
> int localInt;
> for (int i = 0; i < 100000000; ++i)
> localInt = i;
> return 0;
>}
>
>real 0m0.418s
>user 0m0.416s
>sys 0m0.004s
>ppeterr@www:~/src/test/obj-vs-ptr$ ./t ptr
>int main()
>{
> int localInt;
> int *localP = &localInt;
> for (int i = 0; i < 100000000; ++i)
> *localP = i;
> return 0;
>}
>
>real 0m0.243s
>user 0m0.240s
>sys 0m0.000s
>
>===============================================================
>
>The disassembly is for the direct access (slower):
>
> localInt = i;
> 80483eb: 8b 45 fc mov -0x4(%ebp),%eax
> 80483ee: 89 45 f8 mov %eax,-0x8(%ebp)
>
>And for the pointer access (faster):
>
> *localP = i;
> 80483f1: 8b 45 f8 mov -0x8(%ebp),%eax
> 80483f4: 8b 55 fc mov -0x4(%ebp),%edx
> 80483f7: 89 10 mov %edx,(%eax)