[patch,libgfortran] PR51119 - MATMUL slow for large matrices
Richard Biener
richard.guenther@gmail.com
Tue Nov 15 08:21:00 GMT 2016
On Mon, Nov 14, 2016 at 11:13 PM, Jerry DeLisle <jvdelisle@charter.net> wrote:
> On 11/13/2016 11:03 PM, Thomas Koenig wrote:
>>
>> Hi Jerry,
>>
>> I think this
>>
>> + /* Parameter adjustments */
>> + c_dim1 = m;
>> + c_offset = 1 + c_dim1;
>>
>> should be
>>
>> + /* Parameter adjustments */
>> + c_dim1 = rystride;
>> + c_offset = 1 + c_dim1;
>>
>> Regarding options for matmul: It is possible to add the
>> options to the lines in Makefile.in
>>
>> # Turn on vectorization and loop unrolling for matmul.
>> $(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS +=
>> -ftree-vectorize
>> -funroll-loops
>>
>> This is a great step forward. I think we can close most matmul-related
>> PRs once this patch has been applied.
>>
>> Regards
>>
>> Thomas
>>
>
> With Thomas suggestion, I can remove the #pragma optimize from the source
> code. Doing this: (long lines wrapped as shown)
>
> diff --git a/libgfortran/Makefile.am b/libgfortran/Makefile.am
> index 39d3e11..9ee17f9 100644
> --- a/libgfortran/Makefile.am
> +++ b/libgfortran/Makefile.am
> @@ -850,7 +850,7 @@ intrinsics/dprod_r8.f90 \
> intrinsics/f2c_specifics.F90
>
> # Turn on vectorization and loop unrolling for matmul.
> -$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
> -funroll-loops
> +$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ffast-math
> -fno-protect-parens -fstack-arrays -ftree-vectorize -funroll-loops --param
> max-unroll-times=4 -ftree-loop-vectorize
-ftree-vectorize turns on -ftree-loop-vectorize and
-ftree-slp-vectorize already.
> # Logical matmul doesn't vectorize.
> $(patsubst %.c,%.lo,$(notdir $(i_matmull_c))): AM_CFLAGS += -funroll-loops
>
>
> Comparing gfortran 6 vs 7: (test program posted in PR51119)
>
> $ gfc6 -static -Ofast -finline-matmul-limit=32 -funroll-loops --param
> max-unroll-times=4 compare.f90
> $ ./a.out
> =========================================================
> ================ MEASURED GIGAFLOPS =
> =========================================================
> Matmul Matmul
> fixed Matmul variable
> Size Loops explicit refMatmul assumed explicit
> =========================================================
> 2 2000 11.928 0.047 0.082 0.138
> 4 2000 1.455 0.220 0.371 0.316
> 8 2000 1.476 0.737 0.704 1.574
> 16 2000 4.536 3.755 2.825 3.820
> 32 2000 6.070 5.443 3.124 5.158
> 64 2000 5.423 5.355 5.405 5.413
> 128 2000 5.913 5.841 5.917 5.917
> 256 477 5.865 5.252 5.863 5.862
> 512 59 2.794 2.841 2.794 2.791
> 1024 7 1.662 1.356 1.662 1.661
> 2048 1 1.753 1.724 1.753 1.754
>
> $ gfc -static -Ofast -finline-matmul-limit=32 -funroll-loops --param
> max-unroll-times=4 compare.f90
> $ ./a.out
> =========================================================
> ================ MEASURED GIGAFLOPS =
> =========================================================
> Matmul Matmul
> fixed Matmul variable
> Size Loops explicit refMatmul assumed explicit
> =========================================================
> 2 2000 12.146 0.042 0.090 0.146
> 4 2000 1.496 0.232 0.384 0.325
> 8 2000 2.330 0.765 0.763 0.965
> 16 2000 4.611 4.120 2.792 3.830
> 32 2000 6.068 5.265 3.102 4.859
> 64 2000 6.527 5.329 6.425 6.495
> 128 2000 8.207 5.643 8.336 8.441
> 256 477 9.210 4.967 9.367 9.299
> 512 59 8.330 2.772 8.422 8.342
> 1024 7 8.430 1.378 8.511 8.424
> 2048 1 8.339 1.718 8.425 8.322
>
> I do think we need to adjust the default inline limit and should do this
> separately from this patch.
>
> With these changes, OK for trunk?
>
> Regards,
>
> Jerry
>
More information about the Gcc-patches
mailing list