This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance gain through dereferencing?


Hello,

I completely agree with David.
Note that your results will greatly vary depending on the machine you
run the tests on. Performance on such tests it is very
machine-dependant, so the conclusion cannot be generalized.

David

2014-04-16 16:49 GMT+02:00 David Brown <david@westcontrol.com>:
>
> Hi,
>
> You cannot learn useful timing information from a single run of a short
> test like this - there are far too many other factors that come into play.
>
> You cannot learn useful timing information from unoptimised code.
>
> There is too much luck involved in a test like this to be useful.  You
> need optimised code (at least -O1), longer times, more tests, varied
> code, etc., before being able to conclude anything.  Otherwise the
> result could be nothing more than a quirk of the way caching worked out.
>
> mvh.,
>
> David
>
>
> On 16/04/14 16:26, Peter Schneider wrote:
>> I have made a curious performance observation with gcc under 64 bit
>> cygwin on a corei7. I'm genuinely puzzled and couldn't find any
>> information about it. Perhaps this is only indirectly a gcc question
>> though, bear with me.
>>
>> I have two trivial programs which assign a loop variable to a local
>> variable 10^8 times. One does it the obvious way, the other one accesses
>> the variable through a pointer, which means it must dereference the
>> pointer first. This is reflected nicely in the disassembly snippets of
>> the respective loop bodies below. Funny enough, the loop with the extra
>> dereferencing runs considerably faster than the loop with the direct
>> assignment (>10%). While the issue (indeed the whole program ;-) ) goes
>> away with optimization, in less trivial scenarios that may not be so.
>>
>> My first question is: What makes the smaller code slower?
>> The gcc question is: Should assignment always be performed through a
>> pointer if it is faster? (Probably not, but why not?) A session
>> transcript including the compilable source is below.
>>
>> Here are the disassembled loop bodies:
>>
>> Direct access
>> =====================================================
>>         localInt = i;
>>    1004010e6:   8b 45 fc                mov    -0x4(%rbp),%eax
>>    1004010e9:   89 45 f8                mov    %eax,-0x8(%rbp)
>>
>>
>> Pointer access
>> =====================================================
>>         *localP = i;
>>    1004010ee:   48 8b 45 f0             mov    -0x10(%rbp),%rax
>>    1004010f2:   8b 55 fc                mov    -0x4(%rbp),%edx
>>    1004010f5:   89 10                   mov    %edx,(%rax)
>>
>> Note the first instruction which moves the address into %rax. The other
>> two are similar to the direct assignment above.--
>>
>> Here is a session transcript:
>>
>> $ gcc -v
>> Using built-in specs.
>> COLLECT_GCC=gcc
>> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/lto-wrapper.exe
>> Target: x86_64-pc-cygwin
>> Configured with:
>> /cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2/configure
>> --srcdir=/cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2
>> --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin
>> --libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var
>> --sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share
>> --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C
>> --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin
>> --target=x86_64-pc-cygwin --without-libiconv-prefix
>> --without-libintl-prefix --enable-shared --enable-shared-libgcc
>> --enable-static --enable-version-specific-runtime-libs
>> --enable-bootstrap --disable-__cxa_atexit --with-dwarf2
>> --with-tune=generic
>> --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite
>> --enable-threads=posix --enable-libatomic --enable-libgomp
>> --disable-libitm --enable-libquadmath --enable-libquadmath-support
>> --enable-libssp --enable-libada --enable-libgcj-sublibs
>> --disable-java-awt --disable-symvers
>> --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as
>> --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix
>> --without-libintl-prefix --with-system-zlib --libexecdir=/usr/lib
>> Thread model: posix
>> gcc version 4.8.2 (GCC)
>>
>> peter@peter-lap ~/src/test/obj_vs_ptr
>> $ cat ./t
>> #!/bin/bash
>>
>> cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1
>>
>>
>> peter@peter-lap ~/src/test/obj_vs_ptr
>> $ ./t obj
>> int main()
>> {
>>     int localInt;
>>     for (int i = 0; i < 100000000; ++i)
>>         localInt = i;
>>     return 0;
>> }
>> real    0m0.248s
>> user    0m0.234s
>> sys     0m0.015s
>>
>> peter@peter-lap ~/src/test/obj_vs_ptr
>> $ ./t ptr
>> int main()
>> {
>>     int localInt;
>>     int *localP = &localInt;
>>     for (int i = 0; i < 100000000; ++i)
>>         *localP = i;
>>     return 0;
>> }
>>
>> real    0m0.215s
>> user    0m0.203s
>> sys     0m0.000s
>>
>>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]