[patch,libgfortran] PR51119 - MATMUL slow for large matrices

Richard Biener richard.guenther@gmail.com
Tue Nov 15 08:21:00 GMT 2016


On Mon, Nov 14, 2016 at 11:13 PM, Jerry DeLisle <jvdelisle@charter.net> wrote:
> On 11/13/2016 11:03 PM, Thomas Koenig wrote:
>>
>> Hi Jerry,
>>
>> I think this
>>
>> +      /* Parameter adjustments */
>> +      c_dim1 = m;
>> +      c_offset = 1 + c_dim1;
>>
>> should be
>>
>> +      /* Parameter adjustments */
>> +      c_dim1 = rystride;
>> +      c_offset = 1 + c_dim1;
>>
>> Regarding options for matmul:  It is possible to add the
>> options to the lines in Makefile.in
>>
>> # Turn on vectorization and loop unrolling for matmul.
>> $(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS +=
>> -ftree-vectorize
>> -funroll-loops
>>
>> This is a great step forward.  I think we can close most matmul-related
>> PRs once this patch has been applied.
>>
>> Regards
>>
>>     Thomas
>>
>
> With Thomas suggestion, I can remove the #pragma optimize from the source
> code. Doing this: (long lines wrapped as shown)
>
> diff --git a/libgfortran/Makefile.am b/libgfortran/Makefile.am
> index 39d3e11..9ee17f9 100644
> --- a/libgfortran/Makefile.am
> +++ b/libgfortran/Makefile.am
> @@ -850,7 +850,7 @@ intrinsics/dprod_r8.f90 \
>  intrinsics/f2c_specifics.F90
>
>  # Turn on vectorization and loop unrolling for matmul.
> -$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
> -funroll-loops
> +$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ffast-math
> -fno-protect-parens -fstack-arrays -ftree-vectorize -funroll-loops --param
> max-unroll-times=4 -ftree-loop-vectorize

-ftree-vectorize turns on -ftree-loop-vectorize and
-ftree-slp-vectorize already.

>  # Logical matmul doesn't vectorize.
>  $(patsubst %.c,%.lo,$(notdir $(i_matmull_c))): AM_CFLAGS += -funroll-loops
>
>
> Comparing gfortran 6 vs 7: (test program posted in PR51119)
>
> $ gfc6 -static -Ofast -finline-matmul-limit=32 -funroll-loops --param
> max-unroll-times=4 compare.f90
> $ ./a.out
>  =========================================================
>  ================            MEASURED GIGAFLOPS          =
>  =========================================================
>                  Matmul                           Matmul
>                  fixed                 Matmul     variable
>  Size  Loops     explicit   refMatmul  assumed    explicit
>  =========================================================
>     2  2000     11.928      0.047      0.082      0.138
>     4  2000      1.455      0.220      0.371      0.316
>     8  2000      1.476      0.737      0.704      1.574
>    16  2000      4.536      3.755      2.825      3.820
>    32  2000      6.070      5.443      3.124      5.158
>    64  2000      5.423      5.355      5.405      5.413
>   128  2000      5.913      5.841      5.917      5.917
>   256   477      5.865      5.252      5.863      5.862
>   512    59      2.794      2.841      2.794      2.791
>  1024     7      1.662      1.356      1.662      1.661
>  2048     1      1.753      1.724      1.753      1.754
>
> $ gfc -static -Ofast -finline-matmul-limit=32 -funroll-loops --param
> max-unroll-times=4 compare.f90
> $ ./a.out
>  =========================================================
>  ================            MEASURED GIGAFLOPS          =
>  =========================================================
>                  Matmul                           Matmul
>                  fixed                 Matmul     variable
>  Size  Loops     explicit   refMatmul  assumed    explicit
>  =========================================================
>     2  2000     12.146      0.042      0.090      0.146
>     4  2000      1.496      0.232      0.384      0.325
>     8  2000      2.330      0.765      0.763      0.965
>    16  2000      4.611      4.120      2.792      3.830
>    32  2000      6.068      5.265      3.102      4.859
>    64  2000      6.527      5.329      6.425      6.495
>   128  2000      8.207      5.643      8.336      8.441
>   256   477      9.210      4.967      9.367      9.299
>   512    59      8.330      2.772      8.422      8.342
>  1024     7      8.430      1.378      8.511      8.424
>  2048     1      8.339      1.718      8.425      8.322
>
> I do think we need to adjust the default inline limit and should do this
> separately from this patch.
>
> With these changes, OK for trunk?
>
> Regards,
>
> Jerry
>



More information about the Gcc-patches mailing list