This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH] Enabling Software Prefetching by Default at -O3


Hi, 

Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
If this flag is not explicitly set,  (for -O3) we turn it on in gcc/config/i386/i386.c 
(override_options).

Is this OK to commit now?

Thanks, 

Changpeng

________________________________________
From: Mark Mitchell [mark@codesourcery.com]
Sent: Saturday, June 19, 2010 3:05 PM
To: Christian Borntraeger
Cc: gcc-patches@gcc.gnu.org; H.J. Lu; Fang, Changpeng; rguenther@suse.de; sebpop@gmail.com; Zdenek Dvorak; Maxim Kuvyrkov
Subject: Re: [PATCH] Enabling Software Prefetching by Default at -O3

Christian Borntraeger wrote:

> It also might be worth to investigate if overriding the parameters per
> -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
> that on s390 since the default values were not ideal

Yes, that might be a good idea for i7.

But, in the meantime, I think we should get a version of the patch that
turns on prefetching on AMD CPUs with -O3.  There's no reason to demand
consistency for all CPUs and it clearly benefits the AMD CPUs.
Changpeng, would you please submit a patch that activates this
optimization only with tuning for AMD CPUs?

Thanks,

--
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

From 7f48bc625b0e451dd8c05a3a3cc20f68dcaa695c Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@pathscale.(none)>
Date: Wed, 23 Jun 2010 17:05:59 -0700
Subject: [PATCH 3/3] Enable prefetching at -O3 for AMD cpus

	* gcc/common.opt (fprefetch-loop-arrays): Re-define
	-fprefetch-loop-arrays as a tri-state option with the
	initial value of -1.
	* gcc/tree-ssa-loop.c (gate_tree_ssa_loop_prefetch): Invoke
	prefetch pass only when flag_prefetch_loop_arrays > 0.
	* gcc/toplev.c (process_options): Note that, with tri-states,
	flag_prefetch_loop_arrays>0 means prefetching is enabled.
	* gcc/config/i386/i386.c (override_options): Enable prefetching
	at -O3 for a set of CPUs that sw prefetching is helpful.
	(software_prefetching_beneficial_p): New.  Return TRUE if
	software prefetching is beneficial for the given CPU.
---
 gcc/common.opt         |    2 +-
 gcc/config/i386/i386.c |   27 +++++++++++++++++++++++++++
 gcc/toplev.c           |    6 +++---
 gcc/tree-ssa-loop.c    |    2 +-
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4904481..74fbd1d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -937,7 +937,7 @@ Common Report Var(flag_predictive_commoning) Optimization
 Run predictive commoning optimization.
 
 fprefetch-loop-arrays
-Common Report Var(flag_prefetch_loop_arrays) Optimization
+Common Report Var(flag_prefetch_loop_arrays) Init(-1) Optimization
 Generate prefetch instructions, if available, for arrays in loops
 
 fprofile
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2a46f89..605e57b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2691,6 +2691,26 @@ ix86_target_string (int isa, int flags, const char *arch, const char *tune,
   return ret;
 }
 
+/* Return TRUE if software prefetching is beneficial for the
+   given CPU. */
+
+static bool
+software_prefetching_beneficial_p (void)
+{
+  switch (ix86_tune)
+    {
+    case PROCESSOR_GEODE:
+    case PROCESSOR_K6:
+    case PROCESSOR_ATHLON:
+    case PROCESSOR_K8:
+    case PROCESSOR_AMDFAM10:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
 /* Function that is callable from the debugger to print the current
    options.  */
 void
@@ -3531,6 +3551,13 @@ override_options (bool main_args_p)
   if (!PARAM_SET_P (PARAM_L2_CACHE_SIZE))
     set_param_value ("l2-cache-size", ix86_cost->l2_cache_size);
 
+  /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
+  if (flag_prefetch_loop_arrays < 0
+      && HAVE_prefetch
+      && optimize >= 3
+      && software_prefetching_beneficial_p())
+    flag_prefetch_loop_arrays = 1;
+
   /* If using typedef char *va_list, signal that __builtin_va_start (&ap, 0)
      can be optimized to ap = __builtin_next_arg (0).  */
   if (!TARGET_64BIT)
diff --git a/gcc/toplev.c b/gcc/toplev.c
index ff4c850..369820b 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2078,13 +2078,13 @@ process_options (void)
     }
 
 #ifndef HAVE_prefetch
-  if (flag_prefetch_loop_arrays)
+  if (flag_prefetch_loop_arrays > 0)
     {
       warning (0, "-fprefetch-loop-arrays not supported for this target");
       flag_prefetch_loop_arrays = 0;
     }
 #else
-  if (flag_prefetch_loop_arrays && !HAVE_prefetch)
+  if (flag_prefetch_loop_arrays > 0 && !HAVE_prefetch)
     {
       warning (0, "-fprefetch-loop-arrays not supported for this target (try -march switches)");
       flag_prefetch_loop_arrays = 0;
@@ -2093,7 +2093,7 @@ process_options (void)
 
   /* This combination of options isn't handled for i386 targets and doesn't
      make much sense anyway, so don't allow it.  */
-  if (flag_prefetch_loop_arrays && optimize_size)
+  if (flag_prefetch_loop_arrays > 0 && optimize_size)
     {
       warning (0, "-fprefetch-loop-arrays is not supported with -Os");
       flag_prefetch_loop_arrays = 0;
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 344cfa8..c9c5bbd 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -600,7 +600,7 @@ tree_ssa_loop_prefetch (void)
 static bool
 gate_tree_ssa_loop_prefetch (void)
 {
-  return flag_prefetch_loop_arrays != 0;
+  return flag_prefetch_loop_arrays > 0;
 }
 
 struct gimple_opt_pass pass_loop_prefetch =
-- 
1.6.3.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]