Target Specific Optimization

The target specific optimization has several internal stages. These stages can be delivered in different GCC releases. The first two stages are geared towards people who need to build high performance libraries that must span several different underlying architectures, while the third stage is meant to be usable by the majority of programmers, since it will not involve source code modifications to use. While the focus of this work is to allow ix86 programmers to code for various AMD and Intel platforms, other GCC backends will be able to use target specific optimization by adding the appropriate machine dependent parts. Stages 1 and 2 were checked into the GCC mainline revision 138082.

The stages are:

Stage1: Compile single function with specific options using attributes

Stage1: Objective of compiling a single function with specific options

Stage1: Details of compiling a single function with specific options

Stage1: Syntax for target specific option using attributes

I propose we add a new attribute option that allows the user to use certain ix86 options. Other backends that wish to provide function specific options to their users can use the same syntax (but of course will have different options). The option attribute takes one or more strings that are parsed by the backend. In the case of the x86, the string will take options separated by commas. Each option is equivalent to the -m option. The -m is not specified in the option string. The fpmath=sse,387 option must be passed as fpmath=sse+387 or fpmath=both, since the comma would separate other options. The options that would be provided are:

Stage1: Example using attribute

Here is an example of how you might use target specific functions using attributes. It uses the GCC intrinsics. The code calculates a minimum of a vector of 32-bit signed integers, using the pcomd and pcmov instructions under SSE5 and the pminsd instruction under SSE4.1.

   1 typedef int __v4si __attribute__ ((__vector_size__ (16), __may_alias__));
   2 void sse5_min (__v4si *, __v4si *, __v4si *, int) __attribute__ ((__option__("sse5")));
   3 void sse4_1_min (__v4si *, __v4si *, __v4si *, int) __attribute__ ((__option__("sse4.1")));
   4 void generic_min (__v4si *, __v4si *, __v4si *, int);
   5 
   6 void sse5_min (__v4si *a, __v4si *b, __v4si *c, int n) {
   7     int i;
   8     for (i = 0; i < n; i++) {
   9         __v4si test = __builtin_ia32_pcomltd (b[i], c[i]);
  10         a[i] = __builtin_ia32_pcmov_v4si (b[i], c[i], test);
  11     }
  12 }
  13 
  14 void sse4_1_min (__v4si *a, __v4si *b, __v4si *c, int n) {
  15     int i;
  16     for (i = 0; i < n; i++) {
  17         a[i] = __builtin_ia32_pminsd (b[i], c[i]);
  18     }
  19 }
  20 
  21 void generic_min (__v4si *a, __v4si *b, __v4si *c, int n) {
  22     int i;
  23     int n_int = 4 * n;
  24     int *a_int = (int *) a;
  25     int *b_int = (int *) b;
  26     int *c_int = (int *) c;
  27     for (i = 0; i < n_int; i++) {
  28         a_int[i] = (b_int[i] < c_int[i]) ? b_int[i] : c_int[i];
  29     }
  30 }
  31 
  32 void do_min (__v4si *a, __v4si *b, __v4si *c, int n) {
  33     if (HAVE_SSE5) {
  34         sse5_min (a, b, c, n);
  35     } else if (HAVE_SSE4_1) {
  36         sse4_1_min (a, b, c, n);
  37     } else {
  38         generic_min (a, b, c, n);
  39     }
  40 }

Stage1: Syntax for optimization option using attributes

In addition to setting target options, users would like to be able to change the optimization level for functions. For example, you might want to use -O3 -funroll-loops for functions that are executed all of the time and -Os for functions that are rarely executed. I propose we add a new attribute optimize that allows the user to change the optimization options. This would be supported for all targets. The hot attribute would be modified to set the -O3 option and the cold attribute would be modified to set the -Os option. The optimize attribute takes one or more strings or a number. Commas can separate separate options in in string. Each string option is equivalent to the -f option, unless the string begins with 'O'. Numbers are equivalent to the appropriate -O level. The -f is not specified in the option string.

Stage1: Work items

This section is an attempt to break down the stage1 work into smaller chunks, with separate deliverables. It has now been rewritten after the fact to describe the work that was done.

  1. A subversion branch (function-specific-branch) will be created at the FSF to host this project. All work will be done in this branch. All people contributing to this branch must have the appropriate FSF paperwork so that their work can be incorporated into the mainstream GCC. All FSF coding guidelines will be used. Merges from the mainline will occur at least monthly. It will take 1 day to create the branch. It is anticipated that each merge will take 1 day to do the merge, and do any updates to the target specific work that is needed.
  2. Modify the opt*.awk scripts so that there is a new flag, Save, which indicates which variables need to be saved and restored. A structure, cl_option_attr, will be created to hold these options. Two functions, cl_options_save and cl_options_restore, will be created to save and restore the options.

  3. Add support in c-common.c to add attribute((option(...))) and call a back end hook, valid_option_attribute_p, to validate the option.

  4. Add a new field, function_specific, to the tree_function_decl node to hold the back end information for storing the information needed for each function with function specific options.

  5. Use the set current function hook to change the target options when it is different than the previous function. Call target_reinit to reinitialize things like which registers are allowed to be used in the current ISA.
  6. Change the inliner in ipa-inline.c to call tree_can_inline_p to validate each potential inline candidate. Add tree_can_inline_p to tree-inline.c to pogo to the target hook. Add a new target hook, can_inline_p, which vets the inline. Add the hook to the x86 port.

  7. Merge all of the ix86 isa options that use independent variables into ix86_isa_flags flags word. Merge other boolean options into the target_flags word.

  8. Modify the builtin function handling so that most builtin functions which map into x86 instructions are added to the list of declarations, and issue an error if the user tries to use the builtin function without having the proper isa.
  9. Write tests.
  10. Submit patches to the gcc-patches mailing list.
  11. Deal with the comments and modify the code appropriately.

Stage2: Compile single function with specific options using pragmas

The attribute syntax is kind of clunky if you are defining multiple functions using the same function specific options. I would propose adding new #pragmas that change the default options for the functions defined after the #pragma. Internally, the #pragma would save the appropriate information and then add attribute((option(...)))'s to the function. Ideally the preprocessor variables like SSE, etc. should be changed by the #pragma.

Stage2: pragma syntax

Stage2: Example using #pragma

Here is an example of how you might use target specific functions using *#pragma*. It uses the common compiler intrinsics include files (and needs pragma because bmmintrin.h and smmintrin.h check for SSE5 and SSE4_1 being defined). The code calculates a minimum of a vector of 32-bit signed integers, using the pcomd and pcmov instructions under SSE5 and the pminsd instruction under SSE4.1.

   1 #pragma GCC option(push)
   2 #pragma GCC option("sse5")
   3 #include <bmmintrin.h>
   4 
   5 void sse5_min (__m128i *a, __m128i *b, __m128i *c, int n) {
   6     int i;
   7     for (i = 0; i < n; i++) {
   8         __m128i test = _mm_comlt_epi32 (b[i], c[i]);
   9         a[i] = _mm_cmov_si128 (b[i], c[i], test);
  10     }
  11 }
  12 
  13 #pragma GCC option(pop)
  14 #pragma GCC option(push)
  15 #pragma GCC option("sse4.1")
  16 #include <smmintrin.h>
  17 
  18 void sse4_1_min (__m128i *a, __m128i *b, __m128i *c, int n) {
  19     int i;
  20     for (i = 0; i < n; i++) {
  21         a[i] = _mm_min_epi32 (b[i], c[i]);
  22     }
  23 }
  24 
  25 #pragma GCC option(pop)
  26 
  27 void generic_min (__m128i *a, __m128i *b, __m128i *c, int n) {
  28     int i;
  29     int n_int = 4 * n;
  30     int *a_int = (int *) a;
  31     int *b_int = (int *) b;
  32     int *c_int = (int *) c;
  33     for (i = 0; i < n_int; i++) {
  34         a_int[i] = (b_int[i] < c_int[i]) ? b_int[i] : c_int[i];
  35     }
  36 }
  37 
  38 void do_min (__m128i *a, __m128i *b, __m128i *c, int n) {
  39     if (HAVE_SSE5) {
  40         sse5_min (a, b, c, n);
  41     } else if (HAVE_SSE4_1) {
  42         sse4_1_min (a, b, c, n);
  43     } else {
  44         generic_min (a, b, c, n);
  45     }
  46 }

Stage3: Details of compiling a single function multiple times manually

Stage3: Example

If you have a function declared as a clone, such as:

   1 void my_min (int *, int *, int *) __attribute__((__clone__));
   2 void my_min (int *a, int *b, int *c, int n) {
   3     int i;
   4     for (i = 0; i < n; i++) {
   5         a[i] = (b[i] < c[i]) ? b[i] : c[i];
   6     }
   7 }

The compiler would logically generate code that would be equivalent to:

   1 static void __do_cpuid (void) __attribute__ ((__constructor__));
   2 static void my_min__clone_generic (int *, int *, int *, int);
   3 static void my_min__clone_sse5 (int *, int *, int *, int) __attribute__((__sse5__));
   4 static void my_min__clone_sse4_1 (int *, int *, int *, int) __attribute__((__sse4_1__));
   5 static void (*my_min__clone_ptr)(int *, int *, int *, int) = my_min__clone_generic;
   6 static void __do_cpuid (void) {
   7     int have_sse5;
   8     int have_sse4_1;
   9     /* code to initialize have_sse5 and have_sse4_1 via CPUID.  */
  10     /* Update all clone pointers generated in this module */
  11     if (have_sse5) {
  12        my_min__clone_ptr = my_min__clone_sse5;
  13     } else if (have_sse4_1) {
  14         my_min__clone_ptr = my_min__clone_sse4_1;
  15     } else {
  16         my_min__clone_ptr = my_min__clone_generic;
  17     }
  18 }
  19 void my_min (int *a, int *b, int *c, int n) {
  20     (* my_min__clone_ptr) (a, b, c, n);
  21 }
  22 static void my_min__clone_generic (int *a, int *b, int *c, int n) {
  23     int i;
  24     for (i = 0; i < n; i++) {
  25         a[i] = (b[i] < c[i]) ? b[i] : c[i];
  26     }
  27 }
  28 /* compile with -msse5 as per the attribute in the declaration.  */
  29 static void my_min__clone_sse5 (int *a, int *b, int *c, int n) {
  30     int i;
  31     for (i = 0; i < n; i++) {
  32         a[i] = (b[i] < c[i]) ? b[i] : c[i];
  33     }
  34 }
  35 /* compile with -msse4.1 as per the attribute in the declaration.  */
  36 void my_min__clone_sse4_1 (int *a, int *b, int *c, int n) {
  37     int i;
  38     for (i = 0; i < n; i++) {
  39         a[i] = (b[i] < c[i]) ? b[i] : c[i];
  40     }
  41 }

Stage4: Compile functions with multiple different options automatically

Stage3: Objective of compiling a single function multiple times automatically

Branch

None: FunctionSpecificOpt (last edited 2011-03-16 10:47:23 by ManuelLópezIbáñez)