[PATCH][RFC] Add versioning for constant strides for vectorization
Dominique Dhumieres
dominiq@lps.ens.fr
Sun Jan 25 12:12:00 GMT 2009
Richard,
> This patch adds the capability to the vectorizer to perform versioning
> for the case of a constant (suitable) stride.
I have applied the patch on i686-apple-darwin9 (Core2 2.1Ghz, 4Mb cache,
2Gb RAM). It regtested without regression. However the following test:
program mymatmul
implicit none
integer, parameter :: n = 2000
real, dimension(n,n) :: rr, ri
complex, dimension(n,n) :: a,b,c
real :: t1, t2
integer :: i, j, k
call random_number (rr)
call random_number (ri)
a = cmplx (rr, ri)
call random_number (rr)
call random_number (ri)
b = cmplx (rr, ri)
call cpu_time (t1)
c = cmplx (0., 0.)
do j = 1, n
do k = 1, n
do i = 1, n
c(i,j) = c(i,j) + a(i,k) * b(k,j)
end do
end do
end do
call cpu_time (t2)
write (*,'(F8.4)') t2-t1
end program mymatmul
did not vectorize:
[ibook-dhum] bug/timing% gfc -m64 -O3 -ffast-math -funroll-loops
-fomit-frame-pointer -ftree-vectorizer-verbose=2 mymatmul_db.f90
mymatmul_db.f90:24: note: not vectorized: can't calculate alignment
for data ref.
mymatmul_db.f90:14: note: not vectorized: complicated access pattern.
mymatmul_db.f90:14: note: not vectorized: can't calculate alignment
for data ref.
mymatmul_db.f90:11: note: not vectorized: complicated access pattern.
mymatmul_db.f90:11: note: not vectorized: can't calculate alignment
for data ref.
mymatmul_db.f90:1: note: vectorized 0 loops in function.
Is it expected?
> I didn't yet performance test this extensively, but it might need
> cost-model adjustments and/or need to wait until we have profile feedback
> to properly seed vectorizer analysis here. A micro-benchmark based on
> the above loop shows around 15% improvement on AMD K10.
I can only report some timing with the polyhedron test suite:
================================================================================
Test Name : pbharness
Compile Command : gfc %n.f90 -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer
-finline-limit=600 --param min-vect-loop-bound=2
-o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct
linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 300.0
Target Error % : 0.200
Minimum Repeats : 2
Maximum Repeats : 5
Date & Time : 21 Jan 2009 14:06:52 24 Jan 2009 9:16:15 (patched)
Bench. Comp. Exec. Ave Run # Estim Comp. Exec. Ave Run # Estim
Name (secs) (bytes) (secs) Run Err % (secs) (bytes) (secs) Run Err %
-------- ------ ------- ------- --- ------ ------ -------- ------- --- ------
ac 2.33 42560 12.27 2 0.0081 2.51 42560 12.43 5 0.3163
aermod 86.99 1270544 29.94 3 0.1371 92.59+ 1331976+ 30.08 3 0.1636
air 5.60 77336 8.40 2 0.0060 5.49 77336 8.35 2 0.0060
capacita 3.46 72760 55.41 2 0.0794 5.41+ 105528+ 51.79- 2 0.1690
channel 2.11 38648 2.26 2 0.0442 2.13 38648 2.28 5 0.0683
doduc 11.65 200024 43.07 2 0.0441 11.67 200024 42.97 2 0.0093
fatigue 5.13 89024 10.78 5 0.3519 4.95 89024 11.87+ 5 0.3516
gas_dyn 6.45 708584 10.32 5 0.3332 6.51 708584 10.28 5 0.7988
induct 10.03 181168 34.37 2 0.1222 10.37 181168 34.30 2 0.0087
linpk 1.64 42536 27.63 2 0.0290 1.54 42536 27.67 2 0.0397
mdbx 3.37 73000 14.74 2 0.0000 3.29 73000 14.80 2 0.0169
nf 24.10 161416 31.91 2 0.0627 18.61- 140936- 32.06 2 0.0764
protein 10.55 126424 47.05 2 0.0000 10.34 126424 46.24 3 0.1754
rnflow 11.09 179616 36.14 2 0.0982 13.15+ 191904+ 36.61 2 0.1065
test_fpu 10.16 166512 12.39 2 0.0403 10.05 162416- 12.43 2 0.1006
tfft 1.14 26432 2.82 2 0.0177 1.15 26432 2.84 3 0.1960
Geom. Mean Exec. Time = 17.01s 17.07s
================================================================================
Polyhedron Benchmark Validator
Copyright (C) Polyhedron Software Ltd - 2004 - All rights reserved
The timing shows a ~10% improvement for capacita.f90 compensated by a ~10%
degradation for fatigue.f90. All the other times are within the noise.
Thanks for the patch.
Dominique
PS Most of the time in capacita and tfft is spent in FFT subroutines that
are not vectorized. Anything that can be done to change that?
More information about the Gcc-patches
mailing list