This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH][1/n] Re-organize -fvect-cost-model, enable basic vectorization at -O2

From: Richard Biener <rguenther at suse dot de>
To: gcc-patches at gcc dot gnu dot org
Date: Fri, 10 May 2013 09:47:45 +0200 (CEST)
Subject: Re: [PATCH][1/n] Re-organize -fvect-cost-model, enable basic vectorization at -O2
References: <alpine dot LNX dot 2 dot 00 dot 1305071442171 dot 24881 at zhemvz dot fhfr dot qr>
On Tue, 7 May 2013, Richard Biener wrote:

> 
> The following patch is a first step towards being able to enable
> vectorizing of a subset of all vectorizable functions at -O2 by
> default.  Analysis of Polyhedron (loop heavy code) shows that
> the cost of doing vectorizer analysis is in the noise but compile-time
> (and binary size) grows only with the number of loops we emit
> vectorized code for (because we generate up to 3 extra loops for
> each vectorizable loop that we have to move down the pass pipeline).
> 
> This very first patch makes sure that a runtime cost model check
> comes first - and not after alias or alignment versioning checks.
> That part of the patch would be a no-op without the rest because
> currently peeling for alignment and versioning cannot coexist
> (well - they could not, before I introduced
> STMT_VINFO_LOOP_PHI_EVOLUTION_PART a few releases ago ...).  Thus
> the patch enables doing both which may eventually speedup things.
> 
> Sofar tested on the testsuite and polyhedron, full bootstrap and
> regtest is underway.
> 
> Patches in this series will transform -fvect-cost-model to get
> an additional parameter (not sure how that will end up looking like)
> to be able to control vectorization in the following 'steps'
> 
>  1 vectorize loops that do not require a runtime cost check to be
>    profitable and that do not require versioning (to be enabled at -O2)
>  2 vectorize loops like we do now
>  3 vectorize loops like we do now but assume the runtime cost check
>    will always succeed (thus, omit it)  [-fno-vect-cost-model]
> 
> I'm not sure yet if restricting the versioning makes sense
> (it's supposed to reduce code bloat and compile-time of course),
> esp. considering that peeling for alignment could be disabled as well
> (but then HW without unaligned accesses will likely vectorize nothing).
> Thus the complication seems to be the code size considerations
> (the cost model is currently set up to compare runtime cost only).
>
> Richard.
> 
> 2013-05-07  Richard Biener  <rguenther@suse.de>
> 
> 	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not
> 	disable peeling when we version for aliasing.
> 	* tree-vect-loop-manip.c (vect_can_advance_ivs_p): Use
> 	STMT_VINFO_LOOP_PHI_EVOLUTION_PART instead of recomputing it.
> 	* tree-vect-loop.c (vect_transform_loop): First apply versioning,
> 	then peeling to arrange for the cost-model check to come first.

Runs into issues with gcc.target/i386/l_fma_double_1.c and friends
because that now does peeling for alignment instead of just versioning
for alias.  And peeling for alignment is only applied on x86_64 for
double because on i686 alignment might not be reachable with peeling.
Trying to fixup in the testcase with a properly aligned double type
shows that we don't honor user alignment in the vectorizer for this
purpose - thus, fixed, to allow not too ugly adjustments of
the fma testcases.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2013-05-10  Richard Biener  <rguenther@suse.de>

	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not
	disable peeling when we version for aliasing.
	(vector_alignment_reachable_p): Honor explicit user alignment.
	(vect_supportable_dr_alignment): Likewise.
	* tree-vect-loop-manip.c (vect_can_advance_ivs_p): Use
	STMT_VINFO_LOOP_PHI_EVOLUTION_PART instead of recomputing it.
	* tree-vect-loop.c (vect_transform_loop): First apply versioning,
	then peeling to arrange for the cost-model check to come first.

	* gcc.target/i386/avx256-unaligned-load-2.c: Make well-defined.
	* gcc.target/i386/l_fma_double_1.c: Adjust.
	* gcc.target/i386/l_fma_double_2.c: Likewise.
	* gcc.target/i386/l_fma_double_3.c: Likewise.
	* gcc.target/i386/l_fma_double_4.c: Likewise.
	* gcc.target/i386/l_fma_double_5.c: Likewise.
	* gcc.target/i386/l_fma_double_6.c: Likewise.
	* gcc.target/i386/l_fma_float_1.c: Likewise.
	* gcc.target/i386/l_fma_float_2.c: Likewise.
	* gcc.target/i386/l_fma_float_3.c: Likewise.
	* gcc.target/i386/l_fma_float_4.c: Likewise.
	* gcc.target/i386/l_fma_float_5.c: Likewise.
	* gcc.target/i386/l_fma_float_6.c: Likewise.

Index: gcc/tree-vect-data-refs.c
===================================================================
*** gcc/tree-vect-data-refs.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/tree-vect-data-refs.c	2013-05-08 14:23:42.937265437 +0200
*************** vector_alignment_reachable_p (struct dat
*** 1024,1030 ****
        if (dump_enabled_p ())
  	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, 
                           "Unknown misalignment, is_packed = %d",is_packed);
!       if (targetm.vectorize.vector_alignment_reachable (type, is_packed))
  	return true;
        else
  	return false;
--- 1024,1031 ----
        if (dump_enabled_p ())
  	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, 
                           "Unknown misalignment, is_packed = %d",is_packed);
!       if ((TYPE_USER_ALIGN (type) && !is_packed)
! 	  || targetm.vectorize.vector_alignment_reachable (type, is_packed))
  	return true;
        else
  	return false;
*************** vect_enhance_data_refs_alignment (loop_v
*** 1323,1329 ****
    bool stat;
    gimple stmt;
    stmt_vec_info stmt_info;
-   int vect_versioning_for_alias_required;
    unsigned int npeel = 0;
    bool all_misalignments_unknown = true;
    unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
--- 1324,1329 ----
*************** vect_enhance_data_refs_alignment (loop_v
*** 1510,1524 ****
          }
      }
  
!   vect_versioning_for_alias_required
!     = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo);
! 
!   /* Temporarily, if versioning for alias is required, we disable peeling
!      until we support peeling and versioning.  Often peeling for alignment
!      will require peeling for loop-bound, which in turn requires that we
!      know how to adjust the loop ivs after the loop.  */
!   if (vect_versioning_for_alias_required
!       || !vect_can_advance_ivs_p (loop_vinfo)
        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
      do_peeling = false;
  
--- 1510,1517 ----
          }
      }
  
!   /* Check if we can possibly peel the loop.  */
!   if (!vect_can_advance_ivs_p (loop_vinfo)
        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
      do_peeling = false;
  
*************** vect_supportable_dr_alignment (struct da
*** 4722,4730 ****
        if (!known_alignment_for_access_p (dr))
  	is_packed = not_size_aligned (DR_REF (dr));
  
!       if (targetm.vectorize.
! 	  support_vector_misalignment (mode, type,
! 				       DR_MISALIGNMENT (dr), is_packed))
  	/* Can't software pipeline the loads, but can at least do them.  */
  	return dr_unaligned_supported;
      }
--- 4715,4724 ----
        if (!known_alignment_for_access_p (dr))
  	is_packed = not_size_aligned (DR_REF (dr));
  
!       if ((TYPE_USER_ALIGN (type) && !is_packed)
! 	  || targetm.vectorize.
! 	       support_vector_misalignment (mode, type,
! 					    DR_MISALIGNMENT (dr), is_packed))
  	/* Can't software pipeline the loads, but can at least do them.  */
  	return dr_unaligned_supported;
      }
*************** vect_supportable_dr_alignment (struct da
*** 4736,4744 ****
        if (!known_alignment_for_access_p (dr))
  	is_packed = not_size_aligned (DR_REF (dr));
  
!      if (targetm.vectorize.
!          support_vector_misalignment (mode, type,
! 				      DR_MISALIGNMENT (dr), is_packed))
         return dr_unaligned_supported;
      }
  
--- 4730,4739 ----
        if (!known_alignment_for_access_p (dr))
  	is_packed = not_size_aligned (DR_REF (dr));
  
!      if ((TYPE_USER_ALIGN (type) && !is_packed)
! 	 || targetm.vectorize.
! 	      support_vector_misalignment (mode, type,
! 					   DR_MISALIGNMENT (dr), is_packed))
         return dr_unaligned_supported;
      }
  
Index: gcc/tree-vect-loop.c
===================================================================
*** gcc/tree-vect-loop.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/tree-vect-loop.c	2013-05-08 13:33:14.445119190 +0200
*************** vect_transform_loop (loop_vec_info loop_
*** 5499,5517 ****
        check_profitability = true;
      }
  
!   /* Peel the loop if there are data refs with unknown alignment.
!      Only one data ref with unknown store is allowed.  */
  
!   if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
      {
!       vect_do_peeling_for_alignment (loop_vinfo, th, check_profitability);
        check_profitability = false;
      }
  
!   if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo)
!       || LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
      {
!       vect_loop_versioning (loop_vinfo, th, check_profitability);
        check_profitability = false;
      }
  
--- 5499,5520 ----
        check_profitability = true;
      }
  
!   /* Version the loop first, if required, so the profitability check
!      comes first.  */
  
!   if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo)
!       || LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
      {
!       vect_loop_versioning (loop_vinfo, th, check_profitability);
        check_profitability = false;
      }
  
!   /* Peel the loop if there are data refs with unknown alignment.
!      Only one data ref with unknown store is allowed.  */
! 
!   if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
      {
!       vect_do_peeling_for_alignment (loop_vinfo, th, check_profitability);
        check_profitability = false;
      }
  
Index: gcc/tree-vect-loop-manip.c
===================================================================
*** gcc/tree-vect-loop-manip.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/tree-vect-loop-manip.c	2013-05-08 13:33:14.446119201 +0200
*************** vect_can_advance_ivs_p (loop_vec_info lo
*** 1555,1561 ****
      dump_printf_loc (MSG_NOTE, vect_location, "vect_can_advance_ivs_p:");
    for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
      {
-       tree access_fn = NULL;
        tree evolution_part;
  
        phi = gsi_stmt (gsi);
--- 1555,1560 ----
*************** vect_can_advance_ivs_p (loop_vec_info lo
*** 1588,1618 ****
  
        /* Analyze the evolution function.  */
  
!       access_fn = instantiate_parameters
! 	(loop, analyze_scalar_evolution (loop, PHI_RESULT (phi)));
! 
!       if (!access_fn)
! 	{
! 	  if (dump_enabled_p ())
! 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!                              "No Access function.");
! 	  return false;
! 	}
! 
!       STRIP_NOPS (access_fn);
!       if (dump_enabled_p ())
!         {
! 	  dump_printf_loc (MSG_NOTE, vect_location,
!                            "Access function of PHI: ");
! 	  dump_generic_expr (MSG_NOTE, TDF_SLIM, access_fn);
!         }
! 
!       evolution_part = evolution_part_in_loop_num (access_fn, loop->num);
! 
        if (evolution_part == NULL_TREE)
          {
  	  if (dump_enabled_p ())
! 	    dump_printf (MSG_MISSED_OPTIMIZATION, "No evolution.");
  	  return false;
          }
  
--- 1587,1599 ----
  
        /* Analyze the evolution function.  */
  
!       evolution_part
! 	= STMT_VINFO_LOOP_PHI_EVOLUTION_PART (vinfo_for_stmt (phi));
        if (evolution_part == NULL_TREE)
          {
  	  if (dump_enabled_p ())
! 	    dump_printf (MSG_MISSED_OPTIMIZATION,
! 			 "No access function or evolution.");
  	  return false;
          }
  
Index: gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c
===================================================================
*** gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c	2013-05-08 13:33:14.446119201 +0200
***************
*** 1,26 ****
  /* { dg-do compile { target { ! ia32 } } } */
  /* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-load" } */
  
- #define N 1024
- 
- char **ep;
- char **fp;
- 
  void
! avx_test (void)
  {
    int i;
!   char **ap;
!   char **bp;
!   char **cp;
! 
!   ap = ep;
!   bp = fp;
!   for (i = 128; i >= 0; i--)
!     {
!       *ap++ = *cp++;
!       *bp++ = 0;
!     }
  }
  
  /* { dg-final { scan-assembler-not "avx_loaddqu256" } } */
--- 1,13 ----
  /* { dg-do compile { target { ! ia32 } } } */
  /* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-load" } */
  
  void
! avx_test (char **cp, char **ep)
  {
    int i;
!   char **ap = __builtin_assume_aligned (ep, 32);
!   for (i = 128; i > 0; i--)
!     *ap++ = *cp++;
  }
  
  /* { dg-final { scan-assembler-not "avx_loaddqu256" } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_double_1.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_double_1.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_double_1.c	2013-05-08 14:05:06.520670173 +0200
***************
*** 4,26 ****
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! #define TYPE double
  
  #include "l_fma_1.h"
  
  /* { dg-final { scan-assembler-times "vfmadd132pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd231pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfmsub132pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmsub231pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd231pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub231pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfmadd213sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfmsub213sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfnmadd213sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfnmsub213sd" 16  } } */
--- 4,27 ----
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! typedef double adouble __attribute__((aligned(sizeof (double))));
! #define TYPE adouble
  
  #include "l_fma_1.h"
  
  /* { dg-final { scan-assembler-times "vfmadd132pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd213pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfmsub132pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmsub213pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd231pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub231pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 28  } } */
! /* { dg-final { scan-assembler-times "vfmadd213sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfmsub213sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfnmadd213sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfnmsub213sd" 28 } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_double_2.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_double_2.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_double_2.c	2013-05-08 14:24:27.513768881 +0200
***************
*** 4,10 ****
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! #define TYPE double
  
  #include "l_fma_2.h"
  
--- 4,11 ----
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! typedef double adouble __attribute__((aligned(sizeof (double))));
! #define TYPE adouble
  
  #include "l_fma_2.h"
  
***************
*** 12,18 ****
  /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 32  } } */
--- 13,19 ----
  /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 56  } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 56 } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 56 } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 56 } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_double_3.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_double_3.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_double_3.c	2013-05-08 14:24:43.541949905 +0200
***************
*** 4,26 ****
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! #define TYPE double
  
  #include "l_fma_3.h"
  
  /* { dg-final { scan-assembler-times "vfmadd132pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd231pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfmsub132pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmsub231pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd231pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub231pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfmadd213sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfmsub213sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfnmadd213sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 16  } } */
! /* { dg-final { scan-assembler-times "vfnmsub213sd" 16  } } */
--- 4,27 ----
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! typedef double adouble __attribute__((aligned(sizeof (double))));
! #define TYPE adouble
  
  #include "l_fma_3.h"
  
  /* { dg-final { scan-assembler-times "vfmadd132pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd213pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfmsub132pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmsub213pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd231pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub231pd" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfmadd213sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfmsub213sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfnmadd213sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 28 } } */
! /* { dg-final { scan-assembler-times "vfnmsub213sd" 28 } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_double_4.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_double_4.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_double_4.c	2013-05-08 14:24:47.562995313 +0200
***************
*** 4,10 ****
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! #define TYPE double
  
  #include "l_fma_4.h"
  
--- 4,11 ----
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! typedef double adouble __attribute__((aligned(sizeof (double))));
! #define TYPE adouble
  
  #include "l_fma_4.h"
  
***************
*** 12,18 ****
  /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 32  } } */
--- 13,19 ----
  /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 56 } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 56 } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 56 } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_double_5.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_double_5.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_double_5.c	2013-05-08 14:24:54.507073710 +0200
***************
*** 4,10 ****
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! #define TYPE double
  
  #include "l_fma_5.h"
  
--- 4,11 ----
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! typedef double adouble __attribute__((aligned(sizeof (double))));
! #define TYPE adouble
  
  #include "l_fma_5.h"
  
***************
*** 12,18 ****
  /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 32  } } */
--- 13,19 ----
  /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 56  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 56  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 56  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_double_6.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_double_6.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_double_6.c	2013-05-08 14:24:57.838111351 +0200
***************
*** 4,10 ****
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! #define TYPE double
  
  #include "l_fma_6.h"
  
--- 4,11 ----
  /* Test that the compiler properly optimizes floating point multiply
     and add instructions into FMA3 instructions.  */
  
! typedef double adouble __attribute__((aligned(sizeof (double))));
! #define TYPE adouble
  
  #include "l_fma_6.h"
  
***************
*** 12,18 ****
  /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 32 } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 32  } } */
--- 13,19 ----
  /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */
! /* { dg-final { scan-assembler-times "vfmsub132sd" 56  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132sd" 56  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132sd" 56  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_float_1.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_float_1.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_float_1.c	2013-05-08 13:33:14.447119212 +0200
***************
*** 9,26 ****
  #include "l_fma_1.h"
  
  /* { dg-final { scan-assembler-times "vfmadd132ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd231ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfmsub132ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmsub231ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd231ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub231ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfmadd213ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfmsub213ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmadd213ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmsub213ss" 32  } } */
--- 9,26 ----
  #include "l_fma_1.h"
  
  /* { dg-final { scan-assembler-times "vfmadd132ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd213ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfmsub132ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmsub213ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd231ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub231ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 60 } } */
! /* { dg-final { scan-assembler-times "vfmadd213ss" 60 } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 60 } } */
! /* { dg-final { scan-assembler-times "vfmsub213ss" 60 } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 60 } } */
! /* { dg-final { scan-assembler-times "vfnmadd213ss" 60 } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 60 } } */
! /* { dg-final { scan-assembler-times "vfnmsub213ss" 60 } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_float_2.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_float_2.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_float_2.c	2013-05-08 13:33:14.448119223 +0200
***************
*** 12,18 ****
  /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 64  } } */
--- 12,18 ----
  /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 120  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_float_3.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_float_3.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_float_3.c	2013-05-08 13:33:14.448119223 +0200
***************
*** 9,26 ****
  #include "l_fma_3.h"
  
  /* { dg-final { scan-assembler-times "vfmadd132ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd231ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfmsub132ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmsub231ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd231ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub231ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfmadd213ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfmsub213ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmadd213ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 32  } } */
! /* { dg-final { scan-assembler-times "vfnmsub213ss" 32  } } */
--- 9,26 ----
  #include "l_fma_3.h"
  
  /* { dg-final { scan-assembler-times "vfmadd132ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd213ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfmsub132ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmsub213ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmadd231ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 4  } } */
  /* { dg-final { scan-assembler-times "vfnmsub231ps" 4  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 60  } } */
! /* { dg-final { scan-assembler-times "vfmadd213ss" 60  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 60  } } */
! /* { dg-final { scan-assembler-times "vfmsub213ss" 60  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 60  } } */
! /* { dg-final { scan-assembler-times "vfnmadd213ss" 60  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 60  } } */
! /* { dg-final { scan-assembler-times "vfnmsub213ss" 60  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_float_4.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_float_4.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_float_4.c	2013-05-08 13:33:14.448119223 +0200
***************
*** 12,18 ****
  /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 64  } } */
--- 12,18 ----
  /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 120  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_float_5.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_float_5.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_float_5.c	2013-05-08 13:33:14.448119223 +0200
***************
*** 12,18 ****
  /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 64  } } */
--- 12,18 ----
  /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 120  } } */
Index: gcc/testsuite/gcc.target/i386/l_fma_float_6.c
===================================================================
*** gcc/testsuite/gcc.target/i386/l_fma_float_6.c.orig	2013-05-08 13:26:16.000000000 +0200
--- gcc/testsuite/gcc.target/i386/l_fma_float_6.c	2013-05-08 13:33:14.448119223 +0200
***************
*** 12,18 ****
  /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 64  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 64  } } */
--- 12,18 ----
  /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
  /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
! /* { dg-final { scan-assembler-times "vfmadd132ss" 120 } } */
! /* { dg-final { scan-assembler-times "vfmsub132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfnmadd132ss" 120  } } */
! /* { dg-final { scan-assembler-times "vfnmsub132ss" 120  } } */
References:
- [PATCH][1/n] Re-organize -fvect-cost-model, enable basic vectorization at -O2
  - From: Richard Biener
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]