[gcc r15-1211] middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]
Victor Do Nascimento
victorldn@gcc.gnu.org
Wed Jun 12 13:39:24 GMT 2024
https://gcc.gnu.org/g:adcc815a01ae009d2768b6afb546e357bd37bbd2
commit r15-1211-gadcc815a01ae009d2768b6afb546e357bd37bbd2
Author: Victor Do Nascimento <victor.donascimento@arm.com>
Date: Wed May 22 12:14:11 2024 +0100
middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]
At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'. A simple example of such
loop is given below:
void foo(double * restrict a, double * restrict b, int n){
int i;
for(i=0; i<n; ++i){
a[i] = a[i] + b[i];
__builtin_prefetch(&(b[i+8]));
}
}
The failure stems from two issues:
1. Given that it is typically not possible to fully reason about a
function call due to the possibility of side effects, the
autovectorizer does not attempt to vectorize loops which make such
calls.
Given the memory reference passed to `__builtin_prefetch', in the
absence of assurances about its effect on the passed memory
location the compiler deems the function unsafe to vectorize,
marking it as clobbering memory in `vect_find_stmt_data_reference'.
This leads to the failure in autovectorization.
2. Notwithstanding the above issue, though the prefetch statement
would be classed as `vect_unused_in_scope', the loop invariant that
is used in the address of the prefetch is the scalar loop's and not
the vector loop's IV. That is, it still uses `i' and not `vec_iv'
because the instruction wasn't vectorized, causing DCE to think the
value is live, such that we now have both the vector and scalar loop
invariant actively used in the loop.
This patch addresses both of these:
1. About the issue regarding the memory clobber, data prefetch does
not generate faults if its address argument is invalid and does not
write to memory. Therefore, it does not alter the internal state
of the program or its control flow under any circumstance. As
such, it is reasonable that the function be marked as not affecting
memory contents.
To achieve this, we add the necessary logic to
`get_references_in_stmt' to ensure that builtin functions are given
given the same treatment as internal functions. If the gimple call
is to a builtin function and its function code is
`BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
2. Finding precedence in the way clobber statements are handled,
whereby the vectorizer drops these from both the scalar and
vectorized versions of a given loop, we choose to drop prefetch
hints in a similar fashion. This seems appropriate given how
software prefetch hints are typically ignored by processors across
architectures, as they seldom lead to performance gain over their
hardware counterparts.
gcc/ChangeLog:
PR tree-optimization/114061
* tree-data-ref.cc (get_references_in_stmt): set
`clobbers_memory' to false for __builtin_prefetch.
* tree-vect-loop.cc (vect_transform_loop): Drop all
__builtin_prefetch calls from loops.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-prefetch-drop.c: New test.
* gcc.target/aarch64/vect-prefetch-drop.c: Likewise.
Diff:
---
gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 12 ++++++++++++
gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c | 13 +++++++++++++
gcc/tree-data-ref.cc | 2 ++
gcc/tree-vect-loop.cc | 6 ++++--
4 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
new file mode 100644
index 000000000000..7a8915eb716b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+void foo(int * restrict a, int * restrict b, int n){
+ int i;
+ for(i=0; i<n; ++i){
+ a[i] = a[i] + b[i];
+ __builtin_prefetch(&(b[i+8]));
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c
new file mode 100644
index 000000000000..e654b99fde86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-O3 -march=armv9.2-a+sve --std=c99" { target { aarch64*-*-* } } } */
+
+void foo(double * restrict a, double * restrict b, int n){
+ int i;
+ for(i=0; i<n; ++i){
+ a[i] = a[i] + b[i];
+ __builtin_prefetch(&(b[i+8]));
+ }
+}
+
+/* { dg-final { scan-assembler-not "prfm" } } */
+/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m, z\[0-9\]+.d, z\[0-9\]+.d" } } */
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index 7b5f2d162386..bd61069b6316 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -5868,6 +5868,8 @@ get_references_in_stmt (gimple *stmt, vec<data_ref_loc, va_heap> *references)
clobbers_memory = true;
break;
}
+ else if (gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH))
+ clobbers_memory = false;
else
clobbers_memory = true;
}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b1ad06eca66..bbd5d261907c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12185,8 +12185,10 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
!gsi_end_p (si);)
{
stmt = gsi_stmt (si);
- /* During vectorization remove existing clobber stmts. */
- if (gimple_clobber_p (stmt))
+ /* During vectorization remove existing clobber stmts and
+ prefetches. */
+ if (gimple_clobber_p (stmt)
+ || gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH))
{
unlink_stmt_vdef (stmt);
gsi_remove (&si, true);
More information about the Gcc-cvs
mailing list