This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [PATCH] Enabling Software Prefetching by Default at -O3
Hi,
Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus
only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays.
If this flag is not explicitly set, (for -O3) we turn it on in gcc/config/i386/i386.c
(override_options).
Is this OK to commit now?
Thanks,
Changpeng
________________________________________
From: Mark Mitchell [mark@codesourcery.com]
Sent: Saturday, June 19, 2010 3:05 PM
To: Christian Borntraeger
Cc: gcc-patches@gcc.gnu.org; H.J. Lu; Fang, Changpeng; rguenther@suse.de; sebpop@gmail.com; Zdenek Dvorak; Maxim Kuvyrkov
Subject: Re: [PATCH] Enabling Software Prefetching by Default at -O3
Christian Borntraeger wrote:
> It also might be worth to investigate if overriding the parameters per
> -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did
> that on s390 since the default values were not ideal
Yes, that might be a good idea for i7.
But, in the meantime, I think we should get a version of the patch that
turns on prefetching on AMD CPUs with -O3. There's no reason to demand
consistency for all CPUs and it clearly benefits the AMD CPUs.
Changpeng, would you please submit a patch that activates this
optimization only with tuning for AMD CPUs?
Thanks,
--
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713
From 7f48bc625b0e451dd8c05a3a3cc20f68dcaa695c Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@pathscale.(none)>
Date: Wed, 23 Jun 2010 17:05:59 -0700
Subject: [PATCH 3/3] Enable prefetching at -O3 for AMD cpus
* gcc/common.opt (fprefetch-loop-arrays): Re-define
-fprefetch-loop-arrays as a tri-state option with the
initial value of -1.
* gcc/tree-ssa-loop.c (gate_tree_ssa_loop_prefetch): Invoke
prefetch pass only when flag_prefetch_loop_arrays > 0.
* gcc/toplev.c (process_options): Note that, with tri-states,
flag_prefetch_loop_arrays>0 means prefetching is enabled.
* gcc/config/i386/i386.c (override_options): Enable prefetching
at -O3 for a set of CPUs that sw prefetching is helpful.
(software_prefetching_beneficial_p): New. Return TRUE if
software prefetching is beneficial for the given CPU.
---
gcc/common.opt | 2 +-
gcc/config/i386/i386.c | 27 +++++++++++++++++++++++++++
gcc/toplev.c | 6 +++---
gcc/tree-ssa-loop.c | 2 +-
4 files changed, 32 insertions(+), 5 deletions(-)
diff --git a/gcc/common.opt b/gcc/common.opt
index 4904481..74fbd1d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -937,7 +937,7 @@ Common Report Var(flag_predictive_commoning) Optimization
Run predictive commoning optimization.
fprefetch-loop-arrays
-Common Report Var(flag_prefetch_loop_arrays) Optimization
+Common Report Var(flag_prefetch_loop_arrays) Init(-1) Optimization
Generate prefetch instructions, if available, for arrays in loops
fprofile
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2a46f89..605e57b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2691,6 +2691,26 @@ ix86_target_string (int isa, int flags, const char *arch, const char *tune,
return ret;
}
+/* Return TRUE if software prefetching is beneficial for the
+ given CPU. */
+
+static bool
+software_prefetching_beneficial_p (void)
+{
+ switch (ix86_tune)
+ {
+ case PROCESSOR_GEODE:
+ case PROCESSOR_K6:
+ case PROCESSOR_ATHLON:
+ case PROCESSOR_K8:
+ case PROCESSOR_AMDFAM10:
+ return true;
+
+ default:
+ return false;
+ }
+}
+
/* Function that is callable from the debugger to print the current
options. */
void
@@ -3531,6 +3551,13 @@ override_options (bool main_args_p)
if (!PARAM_SET_P (PARAM_L2_CACHE_SIZE))
set_param_value ("l2-cache-size", ix86_cost->l2_cache_size);
+ /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
+ if (flag_prefetch_loop_arrays < 0
+ && HAVE_prefetch
+ && optimize >= 3
+ && software_prefetching_beneficial_p())
+ flag_prefetch_loop_arrays = 1;
+
/* If using typedef char *va_list, signal that __builtin_va_start (&ap, 0)
can be optimized to ap = __builtin_next_arg (0). */
if (!TARGET_64BIT)
diff --git a/gcc/toplev.c b/gcc/toplev.c
index ff4c850..369820b 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2078,13 +2078,13 @@ process_options (void)
}
#ifndef HAVE_prefetch
- if (flag_prefetch_loop_arrays)
+ if (flag_prefetch_loop_arrays > 0)
{
warning (0, "-fprefetch-loop-arrays not supported for this target");
flag_prefetch_loop_arrays = 0;
}
#else
- if (flag_prefetch_loop_arrays && !HAVE_prefetch)
+ if (flag_prefetch_loop_arrays > 0 && !HAVE_prefetch)
{
warning (0, "-fprefetch-loop-arrays not supported for this target (try -march switches)");
flag_prefetch_loop_arrays = 0;
@@ -2093,7 +2093,7 @@ process_options (void)
/* This combination of options isn't handled for i386 targets and doesn't
make much sense anyway, so don't allow it. */
- if (flag_prefetch_loop_arrays && optimize_size)
+ if (flag_prefetch_loop_arrays > 0 && optimize_size)
{
warning (0, "-fprefetch-loop-arrays is not supported with -Os");
flag_prefetch_loop_arrays = 0;
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 344cfa8..c9c5bbd 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -600,7 +600,7 @@ tree_ssa_loop_prefetch (void)
static bool
gate_tree_ssa_loop_prefetch (void)
{
- return flag_prefetch_loop_arrays != 0;
+ return flag_prefetch_loop_arrays > 0;
}
struct gimple_opt_pass pass_loop_prefetch =
--
1.6.3.3