This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Polyhedron tests on Intel Darwin8/9

From: dominiq at lps dot ens dot fr (Dominique Dhumieres)
To: paulthomas2 at wanadoo dot fr, fortran at gcc dot gnu dot org, dominiq at lps dot ens dot fr
Date: Mon, 26 Nov 2007 15:42:57 +0100
Subject: Re: Polyhedron tests on Intel Darwin8/9
References: <20071120141552.0C95B5BB6C@mailhost.lps.ens.fr> <474367C4.60802@wanadoo.fr>

Replacing

                  denominator = sqrt(dot_product(rot_c_vector-rot_q_vector,                 &
                                                 rot_c_vector-rot_q_vector))

with

                  denominator = sqrt((rot_c_vector(1)-rot_q_vector(1)) * &
                                     (rot_c_vector(1)-rot_q_vector(1)) + &
                                     (rot_c_vector(2)-rot_q_vector(2)) * &
                                     (rot_c_vector(2)-rot_q_vector(2)) + &
                                     (rot_c_vector(3)-rot_q_vector(3)) * &
                                     (rot_c_vector(3)-rot_q_vector(3)))
                  l12_lower = l12_lower + numerator/denominator

everywhere in induct.f90 decreases the execution time (-O3 -ffast-math
-funroll-loops core2Duo 2.16Ghz) from

93.207u 0.101s 1:33.32 99.9%	0+0k 0+1io 0pf+0w

to

73.361u 0.061s 1:13.42 100.0%	0+0k 0+0io 0pf+0w

Then, replacing everywhere

                  numerator = w1gauss(j) * w2gauss(k) *                                     &
                              dot_product(coil_current_vec,current_vector)

with

                  numerator = w1gauss(j) * w2gauss(k) *                                     &
                              (coil_current_vec(1)*current_vector(1) + &
                               coil_current_vec(2)*current_vector(2) + &
                               coil_current_vec(3)*current_vector(3))

further decrease the execution time to

65.560u 0.080s 1:05.64 100.0%	0+0k 0+1io 0pf+0w

Although I may understand that the first optimization is missed, I don't 
understand why the second is missed if the dot product is inlined.

Dominique

Follow-Ups:
- Re: Polyhedron tests on Intel Darwin8/9
  - From: Paul Thomas

References:
- Polyhedron tests on Intel Darwin8/9
  - From: Dominique Dhumieres
- Re: Polyhedron tests on Intel Darwin8/9
  - From: Paul Thomas

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]