This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [hsa 7/12] Disabling the vectorizer for GPU kernels/functions
- From: Martin Jambor <mjambor at suse dot cz>
- To: Richard Biener <rguenther at suse dot de>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Martin Liska <mliska at suse dot cz>
- Date: Tue, 10 Nov 2015 15:48:53 +0100
- Subject: Re: [hsa 7/12] Disabling the vectorizer for GPU kernels/functions
- Authentication-results: sourceware.org; auth=none
- References: <20151105215108 dot GC9264 at virgil dot suse dot cz> <20151105220105 dot GJ9264 at virgil dot suse dot cz> <alpine dot LSU dot 2 dot 11 dot 1511060936320 dot 10078 at zhemvz dot fhfr dot qr>
On Fri, Nov 06, 2015 at 09:38:21AM +0100, Richard Biener wrote:
> On Thu, 5 Nov 2015, Martin Jambor wrote:
>
> > Hi,
> >
> > in the previous email I wrote we need to "change behavior" of a few
> > optimization passes. One was the flattening of GPU functions and the
> > other two are in the patch below. It all comes to that, at the
> > moment, we need to switch off the vectorizer (only for the GPU
> > functions, of course).
> >
> > We are actually quite close to being able to handle gimple vector
> > input in HSA back-end but not all the way yet, and before allowing the
> > vectorizer again, we will have to make sure it never produces vectors
> > bigger than 128bits (in GPU functions).
>
> Hmm. I'd rather have this modify
> DECL_FUNCTION_SPECIFIC_OPTIMIZATION of the hsa function to get this
> effect. I think I mentioned this to the OACC guys as well for a
> similar needs of them.
I see, that is a good idea. I have reverted changes to
tree-ssa-loop.c and tree-vectorizer.c and on top of that committed the
following patch to the branch which makes modifications to HSA fndecls
at a more convenient spot and disables vectorization in the following
way:
tree gdecl = gpu->decl;
tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
if (fn_opts == NULL_TREE)
fn_opts = optimization_default_node;
fn_opts = copy_node (fn_opts);
TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
I hope that is what you meant. I have also verified that it works.
Thanks,
Martin
2015-11-10 Martin Jambor <mjambor@suse.cz>
* hsa.h (hsa_summary_t): Add a comment to method link_functions.
(hsa_summary_t::link_functions): Moved...
* hsa.c (hsa_summary_t::link_functions): ...here. Added common fndecl
modifications.
Include stringpool.h.
* ipa-hsa.c (process_hsa_functions): Do not add flatten attribute
here. Fixed comments.
diff --git a/gcc/hsa.c b/gcc/hsa.c
index ab05a1d..e63be95 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see
#include "alloc-pool.h"
#include "cgraph.h"
#include "print-tree.h"
+#include "stringpool.h"
#include "symbol-summary.h"
#include "hsa.h"
@@ -693,6 +694,40 @@ hsa_get_declaration_name (tree decl)
return NULL;
}
+/* Couple GPU and HOST as gpu-specific and host-specific implementation of the
+ same function. KIND determines whether GPU is a host-invokable kernel or
+ gpu-callable function. */
+
+inline void
+hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
+ hsa_function_kind kind)
+{
+ hsa_function_summary *gpu_summary = get (gpu);
+ hsa_function_summary *host_summary = get (host);
+
+ gpu_summary->m_kind = kind;
+ host_summary->m_kind = kind;
+
+ gpu_summary->m_gpu_implementation_p = true;
+ host_summary->m_gpu_implementation_p = false;
+
+ gpu_summary->m_binded_function = host;
+ host_summary->m_binded_function = gpu;
+
+ tree gdecl = gpu->decl;
+ DECL_ATTRIBUTES (gdecl)
+ = tree_cons (get_identifier ("flatten"), NULL_TREE,
+ DECL_ATTRIBUTES (gdecl));
+
+ tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
+ if (fn_opts == NULL_TREE)
+ fn_opts = optimization_default_node;
+ fn_opts = copy_node (fn_opts);
+ TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
+ TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
+ DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
+}
+
/* Add a HOST function to HSA summaries. */
void
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 025de67..b6855ea 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -1161,27 +1161,14 @@ public:
hsa_summary_t (symbol_table *table):
function_summary<hsa_function_summary *> (table) { }
+ /* Couple GPU and HOST as gpu-specific and host-specific implementation of
+ the same function. KIND determines whether GPU is a host-invokable kernel
+ or gpu-callable function. */
+
void link_functions (cgraph_node *gpu, cgraph_node *host,
hsa_function_kind kind);
};
-inline void
-hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
- hsa_function_kind kind)
-{
- hsa_function_summary *gpu_summary = get (gpu);
- hsa_function_summary *host_summary = get (host);
-
- gpu_summary->m_kind = kind;
- host_summary->m_kind = kind;
-
- gpu_summary->m_gpu_implementation_p = true;
- host_summary->m_gpu_implementation_p = false;
-
- gpu_summary->m_binded_function = host;
- host_summary->m_binded_function = gpu;
-}
-
/* in hsa.c */
extern struct hsa_function_representation *hsa_cfun;
extern hash_map <tree, vec <const char *> *> *hsa_decl_kernel_dependencies;
diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index b4cb58e..d77fa6b 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -90,16 +90,12 @@ process_hsa_functions (void)
cgraph_node *clone = node->create_virtual_clone
(vec <cgraph_edge *> (), NULL, NULL, "hsa");
TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
- if (s->m_kind == HSA_KERNEL)
- DECL_ATTRIBUTES (clone->decl)
- = tree_cons (get_identifier ("flatten"), NULL_TREE,
- DECL_ATTRIBUTES (clone->decl));
clone->force_output = true;
hsa_summaries->link_functions (clone, node, s->m_kind);
if (dump_file)
- fprintf (dump_file, "HSA creates a new clone: %s, type: %s\n",
+ fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n",
clone->name (),
s->m_kind == HSA_KERNEL ? "kernel" : "function");
}
@@ -116,7 +112,7 @@ process_hsa_functions (void)
hsa_summaries->link_functions (clone, node, HSA_FUNCTION);
if (dump_file)
- fprintf (dump_file, "HSA creates a new function clone: %s\n",
+ fprintf (dump_file, "Created a new HSA function clone: %s\n",
clone->name ());
}
}