* Works when compiled with -O0 * Fails when compiled with -O1 → "result" is 0 instead of 1. Observations: * Issue occurs with '||' and '&&' – and for char/short/int/long. * Note: -O1 implies that omp_max_vf() returns != 1 Thus, only with -O1 there is SIMT, which seems to cause the problem. * When replacing 'reduction(||:' by 'reduction(|:' the code passes. Regarding the latter: Given that the order is not determined (i.e. 'a || b' and 'b || a' can occur), I think '||' can always be replaced by '|' in the reduction. @jakub: ^ does this make sense? * * * Long testcase is tests/5.0/loop/test_loop_reduction_or_device.c from https://github.com/SOLLVE/sollve_vv/ [the ..._and_... ('&&') testcase fails in the same way.] — Short testcase is below. Short testcase (runs into abort with -O1, works with -O0 or with "reduction:(|"): * * * int main () { char arr[100]; int result = 0; for (int i = 0; i < 100; ++i) arr[i] = 0; arr[5] = 1; #pragma omp target parallel map(tofrom:arr,result) #pragma omp loop reduction(||: result) for (int i = 0; i < 100; ++i) result = result || arr[i]; if (result != 1) __builtin_abort (); return 0; }
I can reproduce the problem. I've made the simd explicit (I hope that's still valid openmp code): ... $ cat libgomp/testsuite/libgomp.c/test.c #define N 32 static char arr[N]; int main (void) { unsigned int result = 0; for (unsigned int i = 0; i < N; ++i) arr[i] = 0; arr[5] = 1; #pragma omp target map(tofrom:result) map(to:arr) #pragma omp simd reduction(||: result) for (unsigned int i = 0; i < N; ++i) result = result || arr[i]; if (result != 1) return 1; return 0; } ... Easy workaround: ... diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d932d74cb03..bf6845d654e 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -4641,6 +4641,15 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, sctx->max_vf = 1; break; } + + if (OMP_CLAUSE_REDUCTION_CODE (c) == TRUTH_ANDIF_EXPR + || OMP_CLAUSE_REDUCTION_CODE (c) == TRUTH_ORIF_EXPR) + { + sctx->max_vf = 1; + break; + } } } if (maybe_gt (sctx->max_vf, 1U)) ...
I think the problem can be seen already at omp-lower, in the body of the butterfly loop. Let's first look at what we have if we use reduction op '|': ... D.2173 = .GOMP_SIMT_VF (); D.2164 = 1; D.2161 = 0; goto <D.2175>; <D.2174>: D.2165 = D.2163; D.2165 = D.2163; D.2166 = .GOMP_SIMT_XCHG_BFLY (D.2165, D.2164); D.2167 = D.2165 | D.2166; D.2163 = D.2167; D.2164 = D.2164 << 1; <D.2175>: if (D.2164 < D.2173) goto <D.2174>; else goto <D.2176>; <D.2176>: ... Fairly straightforward, we have a loop, runs a couple of times, first a shuffle (GOMP_SIMT_XCHG_BFLY), then an update (D.2167 = D.2165 | D.2166). Now compare that with reduction op '||': ... D.2183 = .GOMP_SIMT_VF (); D.2164 = 1; D.2161 = 0; goto <D.2185>; <D.2184>: D.2169 = D.2163; D.2170 = (_Bool) D.2169; if (D.2170 != 0) goto <D.2166>; else goto <D.2171>; <D.2171>: D.2169 = D.2163; D.2172 = .GOMP_SIMT_XCHG_BFLY (D.2169, D.2164); D.2173 = (_Bool) D.2172; if (D.2173 != 0) goto <D.2166>; else goto <D.2167>; <D.2166>: iftmp.5 = 1; goto <D.2168>; <D.2167>: iftmp.5 = 0; <D.2168>: D.2163 = iftmp.5; D.2164 = D.2164 << 1; <D.2185>: if (D.2164 < D.2183) goto <D.2184>; else goto <D.2186>; <D.2186>: ... The shuffle is now conditional. I think the shuffle is inserted too late, in the middle of the update rather than before.
Hmm, that seems to be actually due to: ... if (sctx.is_simt) { if (!simt_lane) simt_lane = create_tmp_var (unsigned_type_node); x = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, TREE_TYPE (ivar), 2, ivar, simt_lane); x = build2 (code, TREE_TYPE (ivar), ivar, x); gimplify_assign (ivar, x, &llist[2]); } ... which gimplifies assigning: ... (gdb) call debug_generic_expr (x) D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) ... to: ... (gdb) call debug_generic_expr (ivar) D.2163 ...
This fixes it: ... diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d932d74cb03..f2ac8f98e32 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -6734,7 +6734,21 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimpl e_seq *dlist, x = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, TREE_TYPE (ivar), 2, ivar, simt_lane); - x = build2 (code, TREE_TYPE (ivar), ivar, x); + /* Make sure x is evaluated unconditionally. */ + enum tree_code update_code; + switch (OMP_CLAUSE_REDUCTION_CODE (c)) + { + case TRUTH_ANDIF_EXPR: + update_code = TRUTH_AND_EXPR; + break; + case TRUTH_ORIF_EXPR: + update_code = TRUTH_OR_EXPR; + break; + default: + update_code = code; + break; + } + x = build2 (update_code, TREE_TYPE (ivar), ivar, x); gimplify_assign (ivar, x, &llist[2]); } tree ivar2 = ivar; ...
Note, I guess this is related to PR94366 and r12-438-g1580fc764423bf89e9b which was fixing it for non-SIMT and quite possibly left out the SIMT stuff out. Using the TRUTH_{AND,OR}_EXPR instead of TRUTH_{AND,OR}IF_EXPR ought to be fine, it is only merging the private vars into the original var, so neither has really side-effects, all we want is that we are actually merging orig = (orig != 0) & (private != 0) rather than orig & private etc.
And yes, #c1 is valid. But would be nice to have similar test with && and initial result = 2; and arr[] say { 1, 2, 3, 4, 5, 6, 7, ..., 32 } and test result is 1 at the end to make sure we don't actually do just orig = orig & (private != 0) style merging or even just orig = orig & private;
Alternative fix that doesn't require fiddling with the 'code' var: ... diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d932d74cb03..d0ddd4a6142 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -6734,7 +6734,10 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimpl e_seq *dlist, x = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, TREE_TYPE (ivar), 2, ivar, simt_lane); - x = build2 (code, TREE_TYPE (ivar), ivar, x); + /* Make sure x is evaluated unconditionally. */ + tree bfly_var = create_tmp_var (TREE_TYPE (ivar)); + gimplify_assign (bfly_var, x, &llist[2]); + x = build2 (code, TREE_TYPE (ivar), ivar, bfly_var); gimplify_assign (ivar, x, &llist[2]); } tree ivar2 = ivar; ...
(In reply to Jakub Jelinek from comment #6) > And yes, #c1 is valid. Thanks for confirming. > But would be nice to have similar test with && and > initial result = 2; and arr[] say { 1, 2, 3, 4, 5, 6, 7, ..., 32 } and test > result is 1 at the end to make sure we don't actually do just > orig = orig & (private != 0) > style merging or even just > orig = orig & private; Ack, will add that.
Created attachment 52647 [details] Tentative patch with test-cases, rationale and changelog I'll put this through testing, and submit if no problems found.
Comment on attachment 52647 [details] Tentative patch with test-cases, rationale and changelog Please change arr[5] = 1; to arr[5] = 42; or so also to test it is doing != 0 comparisons.
The master branch has been updated by Tom de Vries <vries@gcc.gnu.org>: https://gcc.gnu.org/g:093cdadbce30ce2d36846a05d979b8afc2eff618 commit r12-7702-g093cdadbce30ce2d36846a05d979b8afc2eff618 Author: Tom de Vries <tdevries@suse.de> Date: Thu Mar 17 14:37:28 2022 +0100 [openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR Consider test-case pr104952-1.c, included in this commit, containing: ... #pragma omp target map(tofrom:result) map(to:arr) #pragma omp simd reduction(||: result) ... When run on x86_64 with nvptx accelerator, the test-case either aborts or hangs. The reduction clause is translated by the SIMT code (active for nvptx) as a butterfly reduction loop with this butterfly shuffle / update pair: ... D.2163 = D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) ... in the loop body. The problem is that the butterfly shuffle is possibly not executed, while it needs to be executed unconditionally. Fix this by translating instead as: ... D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) D.2163 = D.2163 || D.tmp_bfly ... Tested on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY is executed unconditionally. libgomp/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * testsuite/libgomp.c/pr104952-1.c: New test. * testsuite/libgomp.c/pr104952-2.c: New test.
Fixed by "[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR".