This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: Polyhedron benchmark on Opteron
- From: Tim Prince <timothyprince at sbcglobal dot net>
- Cc: Fortran List <fortran at gcc dot gnu dot org>, burnus at net-b dot de
- Date: Sat, 30 Sep 2006 07:02:54 -0700
- Subject: Re: Polyhedron benchmark on Opteron
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=sbcglobal.net; h=Received:Message-ID:Date:From:Reply-To:User-Agent:MIME-Version:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=uWm0aoJb/z47D2iXP7tUNjeZQJDDaBVniz1w0+KrLiZzvIa3bPRaZQ4JWAmMcQyqmiJq2EoLd905KukW7ThgNZGZfdRbkIWHElMkp482R25agM8N3DFGIHwv/ac0p8ZAWaFUG7h6AltVIwNV7kvzCkv6k2sCH6CMlAVlKVHMTGM= ;
- References: <19c433eb0609290713x64f74089m45e5ea291343e1d7@mail.gmail.com>
- Reply-to: tprince at myrealbox dot com
François-Xavier Coudert wrote:
I wanted to report some results for the Polyhedron benchmark** on
Opteron (hardware details at the bottom of this mail). I used gfortran
mainline (4.2.0 on 2006-09-28) and Intel 9.1.037 for comparison.
Options used are :
* gfortran -march=k8 -ffast-math -funroll-loops -static -O3
* ifort -O3 -xW -ipo -static -V
Unfortunately, there are also tests for which Intel is a clear winner:
-- fatigue, by 22%
-- gas_dyn, by 115%
I used Core 2 Duo 2.93Ghz, 4GB DDR2-667, a somewhat older version of
gfortran 4.2
gfortran -funroll-loops -ftree-vectorize -pg
ifort -xW -fp-model precise -pg
gfortran profile of fatigue:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
52.75 5.49 5.49 28446735 0.00 0.00 __perdida_m__perdida
34.01 9.03 3.54 31712641 0.00 0.00
__perdida_m__generalized_h
ookes_law
13.07 10.39 1.36 1 1.36 10.41 MAIN__
0.19 10.41 0.02 1443280 0.00 0.00
__perdida_m__damage_rate
ifort profile:
65.84 6.32 6.32 28446735 0.00 0.00 perdida_m_mp_perdida_
15.00 7.76 1.44 31712641 0.00 0.00
perdida_m_mp_generalized_h
ookes_law_
14.48 9.15 1.39 1 1.39 9.18 MAIN__
2.19 9.36 0.21 cos.L
2.08 9.56 0.20 sin.L
So, gfortran loses performance only in generalized_hookes_law.
gfortran profile of gas_dyn:
% cumulative self self total
time seconds seconds calls s/call s/call name
86.71 5.15 5.15 10002 0.00 0.00 eos_
6.40 5.53 0.38 10001 0.00 0.00 chozdt_
3.70 5.75 0.22 1434725 0.00 0.00 area_
3.20 5.94 0.19 1 0.19 5.94 MAIN__
0.00 5.94 0.00 50000 0.00 0.00 drag_
ifort:
39.53 0.83 0.83 10002 0.00 0.00 eos_
28.57 1.43 0.60 10001 0.00 0.00 chozdt_
15.24 1.75 0.32 1434730 0.00 0.00 area_
5.71 1.87 0.12 1 0.12 1.87 MAIN__
3.33 1.94 0.07
for_write_seq_fmt_xmit
By fixing those silly loops where the last value is set to the next to
last value inside the loop, rather than afterwards, gfortran can chop
off 0.30 seconds, leaving the monster array assignment with vector sqrt
in eos as the one performance differentiation.