This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[fortran,patch] BLAS-enabled matmul


:ADDPATCH fortran:

Hi all,

This rather long mail is divided in 3 parts : the first describe the usage of the new options I have implemented, the second describes the inner working of the patch for reviewers, and the third asks a few questions to any GCC or gfortran developers interested to help (about linking order, static and dynamic linking, GCC drivers, const and restrict keywords and vectorisation).

And, by the way, the patch includes a fix for PR libfortran/26985 (matmul gives wrong result), which I found by staring at this code so long trying to understand how it could work out (the answer is: it doesn't).



*** What this patch enables you to do ***

$ cat b.f90
  integer,parameter :: n = 700
  real(8) :: a(n,n), b(n,n), c(n,n)
  integer :: i, j

  do i = 1, n ; do j = 1, n
    a(i,j) = sin(dble(i*j)) ; b(i,j) = cos(dble(i*j))
  end do ; end do
  c = matmul(a,b)
  print *, sum(c)
  end

[Using the standard no-option compile line]

$ gfortran b.f90 && time ./a.out
   88.7353056829913

real    0m6.366s
user    0m6.316s
sys     0m0.048s

[Using a external BLAS library, here a non-optimized static one]

$ gfortran b.f90 -fexternal-blas=./libblas_nonoptimized.a && time ./a.out
   88.7353056829913

real    0m7.791s
user    0m7.739s
sys     0m0.044s

[Using a complex argument to -fexternal-blas, to link in an optimized ATLAS version of BLAS]

$ ./bin/gfortran b.f90 -fexternal-blas="-llapack -lcblas -lf77blas -latlas -L$HOME/ATLAS/lib/Linux_PIIISSE1" && time ./a.out
88.7353056829957


real    0m0.979s
user    0m0.923s
sys     0m0.054s

And that "0.979s" doesn't sound too bad!




*** How does it work? ***


On the library side: we create a new library called libgfortran_blas, which is static only, and which contains a minimal BLAS implentation.
The intrinsics/libgfortran_blas.c file contains this minimal implementation of sgemm/dgemm/cgemm/zgemm, based on the code we currently use for matmul. Then, change the matmul_??.c code to generate BLAS calls under certain conditions (no stride except for transposed arguments, size above a given size that can be specified at compile-time) and otherwise use the current code. So, when we link, the driver either links with libgfortran_blas or with the optimized blas library provided.


As for how the matmul_??.c code is changed, I modified the m4 to:
1. know which numeric types have a BLAS routine associated, and what is the letter associated (this is in iparm.m4)
2. protect the BLAS call with an "#if 0" in the case there is actually no BLAS routine for that type, that is, for all but real(4 or 8) and complex(4 or 8); that might not be a very nice solution (maybe the if-clause could be done in m4 rather than preprocessing), so I find a smarter way to do this, I'll tell you
3. I added some debug code, commented out with "#if 0", that prints out the differents strides and counts of all arrays; i'd like to let that in, since it was wonderfully useful during my debugging of this patch (and to write the testcase for PR libfortran/26985)



Now there's one thing slightly subtle about all this: suppose a user compiles some code that calls DGEMM, and forgets to link in his favorite BLAS library. Well, the program will compile and link fine, because we provided a symbol called dgemm_ in our libgfortran_blas library. But since this later is only a minimalist implementation, it does not handle all possibles cases (for example, it asserts that alpha=1 and beta=0), so this is very bad. To avoid that, I created a symbol libgfortran_called_blas in libgfortran, that is set to 1 before every BLAS call initiated by our library, and is set back to 0 afterwards. Similarly, the ?gemm routines in libgfortran_blas first to check if libgfortran_called_blas has value 1, else issue a runtime error message explaining the situation. I'm not happy with this situation, but couldn't think of a better way to handle this. If you have any idea, please share it!




On the front-end side: we look for the new options -fexternal-blas and -fblas-matmul-limit. If the second one is used for compilation of the main program, a call to set the library compile_options.blas_matmul_limit value accordingly. As for the first one:

1. if it's not specified, the driver adds "-lgfortran_blas -lgfortran" to the options, making the full final libraries order "-lgfortranbegin -lgfortran -lgfortran_blas -lgfortran -lm"; this is needed because libgfortran_blas references a symbol from libgfortran, namely libgfortran_called_blas (see above about this one)

2. if -fexternal-blas is specified, its argument is added instead of -lgfortran_blas, but everything else from point 1 is still valid; append_blas() breaks the space-separated words in
-fexternal-blas argument before appending them to the arguments. As for the order of linking, -lgfortran still needs to be added after the external blas library, because if the external library was compiled with gfortran, it might have some libgfortran symbols referred in it :)




*** How you can help this patch ***

1. I'd like some careful examination of my changes to gcc/fortran/gfortranspec.c since it's the first time I actually change the gfortran driver. Comments, both on the driver changes and on the global linking order scheme, are welcome!


2. The following show that the current matmul code is faster than its rewrite in libgfortran_blas.c:


$ ./bin/gfortran b.f90 && time ./a.out
   88.7353056829913

real    0m5.589s
user    0m5.549s
sys     0m0.038s
$ ./bin/gfortran b.f90 -fblas-matmul-limit=800 && time ./a.out
   88.7353056829913

real    0m3.826s
user    0m3.789s
sys     0m0.036s

Almost two seconds difference is too much to be explained by the additional system calls. I believe that, although libgfortran_blas.c is compiled with -ftree-vectorize -funroll-loops and I thought I had kept all the restrict/const qualifiers from the original code, something escaped my understanding. Could someone qualified in that (Janne maybe, since the original vectorization was your work) have a look at that code?


3. If you're one of the gfortran Happy Few, review that patch, or part of it (front-end, library, m4...) ;-)


So, I'll proceed with the magic ritual:
Bootstrapped and regtested on i686-linux. OK for 4.2?
2006-04-03  Francois-Xavier Coudert  <coudert@clipper.ens.fr>

	* lang.opt: Add options -fexternal-blas and -fblas-matmul-limit.
	* gfortran.h (gfc_options): Add new fields for the new options.
	* gfortranspec.c (Option): New case for -fexternal-blas option.
	(lookup_option): New case for -fexternal-blas option.
	(append_blas): New function.
	(lang_specific_driver): Append libgfortran_blas or external BLAS
	library depending on the -fexternal-blas option.
	* trans-decl.c (gfc_build_builtin_function_decls): Create
	function decl for set_blas_matmul_limit.
	(gfc_generate_function_code): Add function call to
	set_blas_matmul_limit if -fblas-matmul-limit option is used.
	* options.c (gfc_handle_option,gfc_init_options): Handle the
	new options.
	* trans.h: Add prototype for gfor_fndecl_set_blas_matmul_limit.


2006-04-03  Francois-Xavier Coudert  <coudert@clipper.ens.fr>

	* m4/iparm.m4: Add $1_hasblas and $1_blasletter (like
	rtype_hasblas and rtype_blasletter) for each numeric type.
	* m4/matmul.m4: Add debugging code protected by #if 0.
	Insert BLAS calls when the size of the arrays exceed
	compile_options.blas_matmul_limit. Fix the condition for
	contigous arrays on which we can use memset.
	* runtime/compile_options.c (set_blas_matmul_limit): New function.
	* runtime/main.c (libgfortran_called_blas): Add
	libgfortran_called_blas.
	* intrinsics/libgfortran_blas.c: New file.
	* libgfortran.h: Add prototype for libgfortran_called_blas.
	(compile_options_t): Add blas_matmul_limit field.
	* Makefile.am: Add definitions for libgfortran_blas.{la,a}.
Index: gcc/fortran/gfortran.h
===================================================================
--- gcc/fortran/gfortran.h	(revision 112542)
+++ gcc/fortran/gfortran.h	(working copy)
@@ -1630,6 +1630,8 @@
   int flag_f2c;
   int flag_automatic;
   int flag_backslash;
+  int flag_external_blas;
+  int blas_matmul_limit;
   int flag_cray_pointer;
   int flag_d_lines;
   int flag_openmp;
Index: gcc/fortran/lang.opt
===================================================================
--- gcc/fortran/lang.opt	(revision 112542)
+++ gcc/fortran/lang.opt	(working copy)
@@ -85,6 +85,14 @@
 Fortran
 Specify that backslash in string introduces an escape character
 
+fexternal-blas=
+Fortran RejectNegative JoinedOrMissing
+Specify that an optimized BLAS library will be provided at link time
+
+fblas-matmul-limit=
+Fortran RejectNegative Joined UInteger
+-fblas-matmul-limit=<n>        Size of the smallest matrix for which matmul will use BLAS
+
 fdefault-double-8
 Fortran
 Set the default double precision kind to an 8 byte wide type
Index: gcc/fortran/trans.h
===================================================================
--- gcc/fortran/trans.h	(revision 112542)
+++ gcc/fortran/trans.h	(working copy)
@@ -470,6 +470,7 @@
 extern GTY(()) tree gfor_fndecl_runtime_error;
 extern GTY(()) tree gfor_fndecl_set_fpe;
 extern GTY(()) tree gfor_fndecl_set_std;
+extern GTY(()) tree gfor_fndecl_set_blas_matmul_limit;
 extern GTY(()) tree gfor_fndecl_ttynam;
 extern GTY(()) tree gfor_fndecl_ctime;
 extern GTY(()) tree gfor_fndecl_fdate;
Index: gcc/fortran/gfortranspec.c
===================================================================
--- gcc/fortran/gfortranspec.c	(revision 112542)
+++ gcc/fortran/gfortranspec.c	(working copy)
@@ -65,6 +65,10 @@
 #define FORTRAN_LIBRARY "-lgfortran"
 #endif
 
+#ifndef BLAS_LIBRARY
+#define BLAS_LIBRARY "-lgfortran_blas"
+#endif
+
 /* Options this driver needs to recognize, not just know how to
    skip over.  */
 typedef enum
@@ -73,6 +77,7 @@
   OPTION_B,			/* Aka --target.  */
   OPTION_c,			/* Aka --compile.  */
   OPTION_E,			/* Aka --preprocess.  */
+  OPTION_fexternal_blas_,	/* -fexternal_blas= */
   OPTION_help,			/* --help.  */
   OPTION_i,			/* -imacros, -include, -include-*.  */
   OPTION_l,
@@ -171,6 +176,8 @@
 	opt = OPTION_syntax_only;
       else if (!strcmp (text, "-dumpversion"))
 	opt = OPTION_version;
+      else if (!strncmp (text, "-fexternal-blas=",16))
+	opt = OPTION_fexternal_blas_;
       else if (!strcmp (text, "-fversion"))	/* Really --version!! */
 	opt = OPTION_version;
       else if (!strcmp (text, "-Xlinker") || !strcmp (text, "-specs"))
@@ -232,6 +239,40 @@
   g77_newargv[g77_newargc++] = arg;
 }
 
+static void
+append_blas (const char * external_blas, const char * library)
+{
+  char * s;
+  int u, i, len;
+
+  if (external_blas == NULL)
+    {
+      append_arg (BLAS_LIBRARY);
+      append_arg (library);
+      return;
+    }
+
+  s = xstrdup (external_blas);
+  len = strlen(s);
+
+  for (i = 0, u = 0; i < len; i++)
+    if (s[i] == ' ')
+      {
+	s[i] = '\0';
+	append_arg (s+u);
+	i++;
+	u = i;
+	while (s[u] == ' ')
+	  u++, i++;
+      }
+
+  if (s[u] != '\0')
+    append_arg (s+u);
+
+  if (library)
+    append_arg (library);
+}
+
 void
 lang_specific_driver (int *in_argc, const char *const **in_argv,
 		      int *in_added_libraries ATTRIBUTE_UNUSED)
@@ -240,6 +281,7 @@
   const char *const *argv = *in_argv;
   int i;
   int verbose = 0;
+  const char * has_external_blas = NULL;
   Option opt;
   int skip;
   const char *arg;
@@ -359,6 +401,10 @@
 	     cool facility for handling --help and --verbose --help.  */
 	  return;
 
+	case OPTION_fexternal_blas_:
+	  has_external_blas = argv[i] + 16;
+	  break;
+
 	default:
 	  break;
 	}
@@ -416,6 +462,9 @@
 	{
 	  /* Not a filename or library.  */
 
+	  if (saw_library == 1)
+	    append_blas (has_external_blas, library);
+
 	  if (saw_library == 1 && need_math)	/* -l<library>.  */
 	    append_arg (MATH_LIBRARY);
 
@@ -442,7 +491,8 @@
 	      saw_speclang = (strcmp (lang, "none") != 0);
 	    }
 
-	  append_arg (argv[i]);
+	  if (opt != OPT_fexternal_blas_)
+	    append_arg (argv[i]);
 
 	  for (; skip != 0; --skip)
 	    append_arg (argv[++i]);
@@ -459,7 +509,10 @@
 	  if (strcmp (argv[i], MATH_LIBRARY) == 0)
 	    {
 	      if (saw_library == 1)
-		saw_library = 2;	/* -l<library> -lm.  */
+		{
+		  saw_library = 2;	/* -l<library> -lm.  */
+		  append_blas (has_external_blas, library);
+		}
 	      else
 		{
 		  if (0 == use_init)
@@ -468,12 +521,15 @@
 		      use_init = 1;
 		    }
 		  append_arg (FORTRAN_LIBRARY);
+		  append_blas (has_external_blas, FORTRAN_LIBRARY);
 		}
 	    }
 	  else if (strcmp (argv[i], FORTRAN_LIBRARY) == 0)
 	    saw_library = 1;	/* -l<library>.  */
 	  else
 	    {			/* Other library, or filename.  */
+	      if (saw_library == 1)
+		append_blas (has_external_blas, library);
 	      if (saw_library == 1 && need_math)
 		append_arg (MATH_LIBRARY);
 	      saw_library = 0;
@@ -499,6 +555,7 @@
 	    }
 	  append_arg (library);
 	case 1:
+	  append_blas (has_external_blas, library);
 	  if (need_math)
 	    append_arg (MATH_LIBRARY);
 	default:
Index: gcc/fortran/trans-decl.c
===================================================================
--- gcc/fortran/trans-decl.c	(revision 112542)
+++ gcc/fortran/trans-decl.c	(working copy)
@@ -92,6 +92,7 @@
 tree gfor_fndecl_runtime_error;
 tree gfor_fndecl_set_fpe;
 tree gfor_fndecl_set_std;
+tree gfor_fndecl_set_blas_matmul_limit;
 tree gfor_fndecl_set_convert;
 tree gfor_fndecl_set_record_marker;
 tree gfor_fndecl_ctime;
@@ -2294,6 +2295,12 @@
 				    gfc_int4_type_node,
 				    gfc_int4_type_node);
 
+  gfor_fndecl_set_blas_matmul_limit =
+    gfc_build_library_function_decl (get_identifier
+				      (PREFIX("set_blas_matmul_limit")),
+				       void_type_node, 1,
+				       gfc_c_int_type_node);
+
   gfor_fndecl_set_convert =
     gfc_build_library_function_decl (get_identifier (PREFIX("set_convert")),
 				     void_type_node, 1, gfc_c_int_type_node);
@@ -2933,6 +2940,21 @@
       gfc_add_expr_to_block (&body, tmp);
     }
 
+  /* If this is the main program and the -fblas-matmul-limit was used,
+     add a call to set_blas_matmul_limit.  */
+  if (sym->attr.is_main_program && gfc_option.blas_matmul_limit != 0)
+    {
+      tree arglist, gfc_c_int_type_node;
+
+      gfc_c_int_type_node = gfc_get_int_type (gfc_c_int_kind);
+      arglist = gfc_chainon_list (NULL_TREE,
+				  build_int_cst (gfc_c_int_type_node,
+						 gfc_option.blas_matmul_limit));
+      tmp = build_function_call_expr (gfor_fndecl_set_blas_matmul_limit,
+				      arglist);
+      gfc_add_expr_to_block (&body, tmp);
+    }
+
   /* If this is the main program and an -fconvert option was provided,
      add a call to set_convert.  */
 
Index: gcc/fortran/options.c
===================================================================
--- gcc/fortran/options.c	(revision 112542)
+++ gcc/fortran/options.c	(working copy)
@@ -77,6 +77,8 @@
   gfc_option.flag_preprocessed = 0;
   gfc_option.flag_automatic = 1;
   gfc_option.flag_backslash = 1;
+  gfc_option.flag_external_blas = 0;
+  gfc_option.blas_matmul_limit = 0;
   gfc_option.flag_cray_pointer = 0;
   gfc_option.flag_d_lines = -1;
   gfc_option.flag_openmp = 0;
@@ -445,6 +447,14 @@
       gfc_option.flag_backslash = value;
       break;
 
+    case OPT_fexternal_blas_:
+      gfc_option.flag_external_blas = 1;
+      break;
+
+    case OPT_fblas_matmul_limit_:
+      gfc_option.blas_matmul_limit = value;
+      break;
+
     case OPT_fd_lines_as_code:
       gfc_option.flag_d_lines = 1;
       break;
Index: libgfortran/m4/iparm.m4
===================================================================
--- libgfortran/m4/iparm.m4	(revision 112610)
+++ libgfortran/m4/iparm.m4	(working copy)
@@ -15,6 +15,11 @@
 define($1_kind,$2)dnl
 ')dnl
 define($1_code,$1_letter`'$1_kind)dnl
+define($1_hasblas,ifelse($1_letter,`r',dnl
+ifelse($1_kind,`4',`1',ifelse($1_kind,`8',`1',`0')),dnl
+ifelse($1_letter,`c',dnl
+ifelse($1_kind,`4',`1',ifelse($1_kind,`8',`1',`0')),`0')))dnl
+define($1_blasletter,ifelse($1_code,`r4',`s',ifelse($1_code,`r8',`d',ifelse($1_code,`c4',`c',ifelse($1_code,`c8',`z',`')))))dnl
 define($1,get_arraytype($1_letter,$1_kind))dnl
 define($1_name, get_typename($1_letter, $1_kind))')dnl
 dnl
Index: libgfortran/m4/matmul.m4
===================================================================
--- libgfortran/m4/matmul.m4	(revision 112610)
+++ libgfortran/m4/matmul.m4	(working copy)
@@ -188,37 +188,73 @@
   bbase = b->data;
   dest = retarray->data;
 
-  if (rxstride == 1 && axstride == 1 && bxstride == 1)
+#if 0
+  printf ("rxstride = %d\t", rxstride);
+  printf ("axstride = %d\t", axstride);
+  printf ("bxstride = %d\n", bxstride);
+  printf ("rystride = %d\t", rystride);
+  printf ("aystride = %d\t", aystride);
+  printf ("bystride = %d\n", bystride);
+  printf ("xcount = %d\t", xcount);
+  printf ("ycount = %d\n", ycount);
+  printf ("count = %d\n", count);
+#endif
+
+#define POW3(x) ((x)*(x)*(x))
+
+`#if 'rtype_hasblas
+  if (rxstride == 1 && (axstride == 1 || aystride == 1)
+      && (bxstride == 1 || bystride == 1)
+      && (xcount > POW3((float) compile_options.blas_matmul_limit)
+	  || xcount > POW3((float) compile_options.blas_matmul_limit)
+	  || count > POW3((float) compile_options.blas_matmul_limit)
+	  || (float) xcount * (float) ycount * (float) count
+	     > POW3((float) compile_options.blas_matmul_limit)))
     {
-      const rtype_name * restrict bbase_y;
-      rtype_name * restrict dest_y;
-      const rtype_name * restrict abase_n;
-      rtype_name bbase_yn;
+      const int m = xcount, n = ycount, k = count, ldc = rystride;
+      const double one = 1, zero = 0;
+      const int lda = (axstride == 1) ? aystride : axstride,
+		ldb = (bxstride == 1) ? bystride : bxstride;
 
-      if (rystride == ycount)
-	memset (dest, 0, (sizeof (rtype_name) * size0((array_t *) retarray)));
-      else
+      libgfortran_called_blas = 1;
+      rtype_blasletter`'gemm_ (axstride == 1 ? "N" : "T",
+	bxstride == 1 ? "N" : "T", &m, &n, &k, &one, abase, &lda,
+	bbase, &ldb, &zero, dest, &ldc);
+      libgfortran_called_blas = 0;
+    }
+  else
+#endif
+    {
+
+      if (rxstride == 1 && axstride == 1 && bxstride == 1)
 	{
-	  for (y = 0; y < ycount; y++)
-	    for (x = 0; x < xcount; x++)
-	      dest[x + y*rystride] = (rtype_name)0;
-	}
+	  const rtype_name * restrict bbase_y;
+	  rtype_name * restrict dest_y;
+	  const rtype_name * restrict abase_n;
+	  rtype_name bbase_yn;
 
-      for (y = 0; y < ycount; y++)
-	{
-	  bbase_y = bbase + y*bystride;
-	  dest_y = dest + y*rystride;
-	  for (n = 0; n < count; n++)
+	  if (rystride == xcount)
+	    memset (dest, 0, (sizeof (rtype_name) * size0((array_t *) retarray)));
+	  else
 	    {
-	      abase_n = abase + n*aystride;
-	      bbase_yn = bbase_y[n];
-	      for (x = 0; x < xcount; x++)
+	      for (y = 0; y < ycount; y++)
+		for (x = 0; x < xcount; x++)
+		  dest[x + y*rystride] = (rtype_name)0;
+	    }
+
+	  for (y = 0; y < ycount; y++)
+	    {
+	      bbase_y = bbase + y*bystride;
+	      dest_y = dest + y*rystride;
+	      for (n = 0; n < count; n++)
 		{
-		  dest_y[x] += abase_n[x] * bbase_yn;
+		  abase_n = abase + n*aystride;
+		  bbase_yn = bbase_y[n];
+		  for (x = 0; x < xcount; x++)
+		    dest_y[x] += abase_n[x] * bbase_yn;
 		}
 	    }
 	}
-    }
   else if (rxstride == 1 && aystride == 1 && bxstride == 1)
     {
       const rtype_name *restrict abase_x;
@@ -273,6 +309,7 @@
 	    }
 	}
     }
+  }
 }
 
 #endif
Index: libgfortran/runtime/compile_options.c
===================================================================
--- libgfortran/runtime/compile_options.c	(revision 112610)
+++ libgfortran/runtime/compile_options.c	(working copy)
@@ -50,7 +50,6 @@
   compile_options.allow_std = allow_std;
 }
 
-
 /* Default values for the compile-time options.  Keep in sync with
    gcc/fortran/options.c (gfc_init_options).  */
 void
@@ -61,8 +60,24 @@
   compile_options.allow_std = GFC_STD_F95_OBS | GFC_STD_F95_DEL
     | GFC_STD_F2003 | GFC_STD_F95 | GFC_STD_F77 | GFC_STD_GNU | GFC_STD_LEGACY;
   compile_options.pedantic = 0;
+
+  /* This provides a not-so-well-tuned limit for calling BLAS instead
+     of our own vectorized matmul code.  */
+  compile_options.blas_matmul_limit = 50;
 }
 
+
+/* -fblas-matmul-limit=<n> */
+
+extern void  set_blas_matmul_limit(int);
+export_proto(set_blas_matmul_limit);
+
+void
+set_blas_matmul_limit (int value)
+{
+  compile_options.blas_matmul_limit = value;
+}
+
 /* Function called by the front-end to tell us the
    default for unformatted data conversion.  */
 
Index: libgfortran/runtime/main.c
===================================================================
--- libgfortran/runtime/main.c	(revision 112610)
+++ libgfortran/runtime/main.c	(working copy)
@@ -49,6 +49,12 @@
 int l8_to_l4_offset = 0;
 
 
+/* This is a static parameter used by our fallback BLAS implentation
+   to detect if it is called by libgfortran, or if it was called by user
+   code, which is the result of a mistake at link-time.  */
+int libgfortran_called_blas = 0;
+
+
 /* Figure out endianness for this machine.  */
 
 static void
Index: libgfortran/intrinsics/libgfortran_blas.c
===================================================================
--- libgfortran/intrinsics/libgfortran_blas.c	(revision 0)
+++ libgfortran/intrinsics/libgfortran_blas.c	(revision 0)
@@ -0,0 +1,184 @@
+/* Fallback implementation of the BLAS intrinsics used by libgfortran
+   Copyright (C) 2006 Free Software Foundation, Inc.
+
+This file is part of the GNU Fortran 95 runtime library (libgfortran).
+
+Libgfortran is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public
+License as published by the Free Software Foundation; either
+version 2 of the License, or (at your option) any later version.
+
+In addition to the permissions in the GNU General Public License, the
+Free Software Foundation gives you unlimited permission to link the
+compiled version of this file into combinations with other programs,
+and to distribute those combinations without any restriction coming
+from the use of this file.  (The General Public License restrictions
+do apply in other respects; for example, they cover modification of
+the file, and distribution when not linked into a combine
+executable.)
+
+Libgfortran is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public
+License along with libgfortran; see the file COPYING.  If not,
+write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.  */
+
+#include "config.h"
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include <stdio.h>
+#include "libgfortran.h"
+
+
+/* This routine checks that we are really called by libgfortran,
+   and that we do not interfere with a user call to ?GEMM.  */
+static void
+sanity_check (void)
+{
+  if (libgfortran_called_blas == 1)
+    return;
+  fprintf (stderr, "Fortran runtime error: "
+	   "libgfortran_blas fallback BLAS routines called by user code.\n");
+  exit (2);
+}
+
+
+/* This routine implements the BLAS ?GEMM routines in only a very few 
+   specific cases, to be called by libgfortran in case no optimized
+   BLAS is specified at link-time.
+ 
+   For details about the assertions made in this code, read
+   code leading to it in libgfortran/generated/matmul_r8.c  */
+
+#define GEMM_ROUTINE(LETTER,INDEX,TYPE) \
+\
+void LETTER ## gemm_ (const char * const, const char * const, \
+		      const INDEX * const, const INDEX * const, \
+		      const INDEX * const, const TYPE * const, \
+		      const TYPE * restrict, const INDEX * const, \
+		      const TYPE * restrict, \
+		      const INDEX * const, const TYPE * const, \
+		      TYPE * restrict, const INDEX * const); \
+\
+void LETTER ## gemm_ (const char * const transa, const char * const transb,\
+	     const INDEX * const m, const INDEX * const n,\
+	     const INDEX * const k, const TYPE * const alpha,\
+	     const TYPE * restrict a, const INDEX * const lda,\
+	     const TYPE * restrict b,\
+	     const INDEX * const ldb, const TYPE * const beta,\
+	     TYPE * restrict c, const INDEX * const ldc)\
+{\
+  sanity_check();\
+\
+  if (*transa == 'N' && *transb == 'N')\
+    {\
+      INDEX x, y, u; \
+      const TYPE * restrict bbase_y;\
+      TYPE * restrict dest_y;\
+      const TYPE * restrict abase_n;\
+      TYPE bbase_yn;\
+\
+      assert(*alpha == (TYPE) 1);\
+      assert(*beta == (TYPE) 0);\
+\
+      /* used to be (ldc == n) ? */\
+      if (*ldc == *m)\
+	memset (c, 0, sizeof(TYPE) * (*n) * (*m));\
+      else\
+	{\
+	  for (y = 0; y < *n; y++)\
+	    for (x = 0; x < *m; x++)\
+	      c[x + y * (*ldc)] = (TYPE) 0;\
+	}\
+\
+      for (y = 0; y < *n; y++)\
+	{\
+	  bbase_y = b + y * (*ldb);\
+	  dest_y = c + y * (*ldc);\
+	  for (u = 0; u < *k; u++)\
+	    {\
+	      abase_n = a + u * (*lda);\
+	      bbase_yn = bbase_y[u];\
+	      for (x = 0; x < *m; x++)\
+		dest_y[x] += abase_n[x] * bbase_yn;\
+	    }\
+	}\
+    }\
+  else if (*transa == 'T' && *transb == 'N')\
+    {\
+      INDEX x, y, u; \
+      const TYPE * restrict abase_x;\
+      const TYPE * restrict bbase_y;\
+      TYPE * restrict dest_y;\
+      TYPE s;\
+\
+      for (y = 0; y < *n; y++)\
+        {\
+          bbase_y = &b[y * (*ldb)];\
+          dest_y = &c[y * (*ldc)];\
+          for (x = 0; x < *m; x++)\
+            {\
+              abase_x = &a[x * (*lda)];\
+              s = (TYPE) 0;\
+              for (u = 0; u < *k; u++)\
+                s += abase_x[u] * bbase_y[u];\
+              dest_y[x] = s;\
+            }\
+        }\
+    }\
+  else if (*transa == 'N')\
+    {\
+      /* rxstride == 1 && axstride == 1 */\
+      INDEX x, y, u;\
+      const INDEX bxstride = (*transb == 'N' ? 1 : (*ldb));\
+      const INDEX bystride = (*transb == 'T' ? 1 : (*ldb));\
+\
+      for (y = 0; y < *n; y++)\
+	for (x = 0; x < *m; x++)\
+	  c[x + y * (*ldc)] = (TYPE) 0;\
+\
+      for (y = 0; y < *n; y++)\
+	for (u = 0; u < *k; u++)\
+	  for (x = 0; x < *m; x++)\
+	    /* dest[x,y] += a[x,u] * b[u,y] */\
+	    c[x + y * (*ldc)] += a[x + u * (*lda)]\
+	                         * b[u*bxstride + y*bystride];\
+    }\
+  else\
+    {\
+      /* rxstride == 1 && aystride == 1 */\
+      INDEX x, y, u;\
+      const INDEX bxstride = (*transb == 'N' ? 1 : (*ldb));\
+      const INDEX bystride = (*transb == 'T' ? 1 : (*ldb));\
+      const TYPE *restrict abase_x;\
+      const TYPE *restrict bbase_y;\
+      TYPE *restrict dest_y;\
+      TYPE s;\
+\
+      for (y = 0; y < *n; y++)\
+        {\
+          bbase_y = &b[y*bystride];\
+          dest_y = &c[y * (*ldc)];\
+          for (x = 0; x < *m; x++)\
+            {\
+              abase_x = &a[x * (*lda)];\
+              s = (TYPE) 0;\
+              for (u = 0; u < *k; u++)\
+                s += abase_x[u] * bbase_y[u*bxstride];\
+              dest_y[x] = s;\
+            }\
+        }\
+    }\
+}
+
+
+GEMM_ROUTINE(s,int,float)
+GEMM_ROUTINE(d,int,double)
+GEMM_ROUTINE(c,int,complex)
+GEMM_ROUTINE(z,int,complex double)
+
Index: libgfortran/libgfortran.h
===================================================================
--- libgfortran/libgfortran.h	(revision 112610)
+++ libgfortran/libgfortran.h	(working copy)
@@ -207,6 +207,10 @@
 extern int l8_to_l4_offset;
 internal_proto(l8_to_l4_offset);
 
+/* This will be set to 1 when libgfortran calls a BLAS routine.  */
+extern int libgfortran_called_blas;
+iexport_data_proto(libgfortran_called_blas);
+
 #define GFOR_POINTER_L8_TO_L4(p8) \
   (l8_to_l4_offset + (GFC_LOGICAL_4 *)(p8))
 
@@ -337,6 +341,7 @@
   int warn_std;
   int allow_std;
   int pedantic;
+  int blas_matmul_limit;
   int convert;
   size_t record_marker;
 }
Index: Makefile.am
===================================================================
--- Makefile.am (revision 112610)
+++ Makefile.am (working copy)
@@ -6,7 +6,7 @@
 ## May be used by toolexeclibdir.
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 
-toolexeclib_LTLIBRARIES = libgfortran.la libgfortranbegin.la
+toolexeclib_LTLIBRARIES = libgfortran.la libgfortranbegin.la libgfortran_blas.la
 
 libgfortran_la_LDFLAGS = -version-info `grep -v '^\#' $(srcdir)/libtool-version` -lm $(extra_ldflags_libgfortran)
 
@@ -14,6 +14,10 @@
 libgfortranbegin_la_SOURCES = fmain.c
 libgfortranbegin_la_LDFLAGS = -static
 
+libgfortran_blas_la_SOURCES = intrinsics/libgfortran_blas.c
+libgfortran_blas_la_CFLAGS = $(AM_CFLAGS) -ftree-vectorize -funroll-loops
+libgfortran_blas_la_LDFLAGS = -static
+
 ## io.h conflicts with some a system header on some platforms, so
 ## use -iquote
 AM_CPPFLAGS = -iquote$(srcdir)/io -I$(srcdir)/$(MULTISRCTOP)../gcc \
@@ -110,7 +114,6 @@
 runtime/stop.c \
 runtime/string.c \
 runtime/select.c \
-gfortypes.h \
 libgfortran.h
 
 i_all_c= \

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]