Regular gcc benchmark runs for sparse-matrix vector multiplication?
Harald Anlauf
anlauf@gmx.de
Mon Dec 17 02:19:00 GMT 2018
On 12/15/18 17:50, Harald Anlauf wrote:
>> 1. Do you have a small test case that shows the problem ?
>
> not yet. I'd need to sample from the data (matrices) used,
> in case it turns out due to different tuning in gcc-9.
I have been able to reduce my application so that I better
understand when the apparent performance degradation shows up.
For the code below, performance is very similar for gcc 7, 8,
and 9 if no bounds-checking is used. (I use bounds-checking
for development). However, if bounds-checking is enabled,
I am seeing roughly the following penalty:
gcc 7, 8: + 60% runtime
gcc 9: + 90 % runtime
Thus the bounds-checking overhead is roughly 20-25% higher,
which I find hard to understand. To my untrained eyes, the
dump-tree-original is essentially the same for all 3 compiler
versions, but the dump-tree-optimized shows significant differences
between 9 and former versions.
Here's the code and compiler options used:
module csc
implicit none
integer, parameter :: sp = 4, dp = 8, mp = sp, wp = dp, ip = 4
contains
subroutine csc_times_vector (a, ja, ia, x, y, n)
real(mp) ,intent(in) :: a (:) ! coefficients of matrix A
integer(ip),intent(in) :: ja (:) ! row indices of matrix A
integer ,intent(in) :: ia (:) ! indices to a,ia for column indices
real(wp) ,intent(in) :: x (:) ! right hand side
real(wp) ,intent(inout) :: y (:) ! left hand side
integer ,intent(in) :: n ! number of columns
integer :: i, j, k
do j=1,n ! Outer loop j: columns of A
!CDIR ALTCODE=LOOPCNT
!CDIR NODEP
!DIR$ IVDEP
do k = ia(j), ia(j+1)-1 ! Inner loop i: rows of (sparse) A
i = ja(k) ! (the i's are distinct for different j's)
y(i) = y(i) + a(k) * x(j)
end do
end do
end subroutine csc_times_vector
end module csc
FFLAGS="-O2 -g -march=skylake -mfpmath=sse -ftree-vectorize
-funroll-loops -fno-realloc-lhs -fopt-info -fcheck=bounds"
If there's interest, I can create a bugzilla with test program
and test data. If people think that bounds-checking must be
expensive, then I will not waste anybody's time.
Thanks,
Harald
More information about the Gcc-help
mailing list