This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance gain through dereferencing?


Hi,

You cannot learn useful timing information from a single run of a short
test like this - there are far too many other factors that come into play.

You cannot learn useful timing information from unoptimised code.

There is too much luck involved in a test like this to be useful.  You
need optimised code (at least -O1), longer times, more tests, varied
code, etc., before being able to conclude anything.  Otherwise the
result could be nothing more than a quirk of the way caching worked out.

mvh.,

David


On 16/04/14 16:26, Peter Schneider wrote:
> I have made a curious performance observation with gcc under 64 bit
> cygwin on a corei7. I'm genuinely puzzled and couldn't find any
> information about it. Perhaps this is only indirectly a gcc question
> though, bear with me.
> 
> I have two trivial programs which assign a loop variable to a local
> variable 10^8 times. One does it the obvious way, the other one accesses
> the variable through a pointer, which means it must dereference the
> pointer first. This is reflected nicely in the disassembly snippets of
> the respective loop bodies below. Funny enough, the loop with the extra
> dereferencing runs considerably faster than the loop with the direct
> assignment (>10%). While the issue (indeed the whole program ;-) ) goes
> away with optimization, in less trivial scenarios that may not be so.
> 
> My first question is: What makes the smaller code slower?
> The gcc question is: Should assignment always be performed through a
> pointer if it is faster? (Probably not, but why not?) A session
> transcript including the compilable source is below.
> 
> Here are the disassembled loop bodies:
> 
> Direct access
> =====================================================
>         localInt = i;
>    1004010e6:   8b 45 fc                mov    -0x4(%rbp),%eax
>    1004010e9:   89 45 f8                mov    %eax,-0x8(%rbp)
> 
> 
> Pointer access
> =====================================================
>         *localP = i;
>    1004010ee:   48 8b 45 f0             mov    -0x10(%rbp),%rax
>    1004010f2:   8b 55 fc                mov    -0x4(%rbp),%edx
>    1004010f5:   89 10                   mov    %edx,(%rax)
> 
> Note the first instruction which moves the address into %rax. The other
> two are similar to the direct assignment above.--
> 
> Here is a session transcript:
> 
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/lto-wrapper.exe
> Target: x86_64-pc-cygwin
> Configured with:
> /cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2/configure
> --srcdir=/cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2
> --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin
> --libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var
> --sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share
> --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C
> --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin
> --target=x86_64-pc-cygwin --without-libiconv-prefix
> --without-libintl-prefix --enable-shared --enable-shared-libgcc
> --enable-static --enable-version-specific-runtime-libs
> --enable-bootstrap --disable-__cxa_atexit --with-dwarf2
> --with-tune=generic
> --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite
> --enable-threads=posix --enable-libatomic --enable-libgomp
> --disable-libitm --enable-libquadmath --enable-libquadmath-support
> --enable-libssp --enable-libada --enable-libgcj-sublibs
> --disable-java-awt --disable-symvers
> --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as
> --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix
> --without-libintl-prefix --with-system-zlib --libexecdir=/usr/lib
> Thread model: posix
> gcc version 4.8.2 (GCC)
> 
> peter@peter-lap ~/src/test/obj_vs_ptr
> $ cat ./t
> #!/bin/bash
> 
> cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1
> 
> 
> peter@peter-lap ~/src/test/obj_vs_ptr
> $ ./t obj
> int main()
> {
>     int localInt;
>     for (int i = 0; i < 100000000; ++i)
>         localInt = i;
>     return 0;
> }
> real    0m0.248s
> user    0m0.234s
> sys     0m0.015s
> 
> peter@peter-lap ~/src/test/obj_vs_ptr
> $ ./t ptr
> int main()
> {
>     int localInt;
>     int *localP = &localInt;
>     for (int i = 0; i < 100000000; ++i)
>         *localP = i;
>     return 0;
> }
> 
> real    0m0.215s
> user    0m0.203s
> sys     0m0.000s
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]