Created attachment 30145 [details] Test case for vectorization of loops containing max and if Use of the Fortran min or max intrinsic functions within a loop appears to prevent vectorization of the loop. Replacement of min or max with conditional assignment using if statements allows vectorization. A simple test case using max is attached. If compiled with "gfortran -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=1 -c max_vs_ifs_in_loop.F90", I get (with extraneous output snipped): ... max_vs_ifs_in_loop.F90:1: note: vectorized 0 loops in function. ... max_vs_ifs_in_loop.F90:17: note: LOOP VECTORIZED. ... gfortran should be able to vectorize loops containing min or max, using any number of arguments to these intrinsics, e.g. "tmp = max(r1, r2, r3, r4)". Compiler info: user@host $ gfortran --version GNU Fortran (GCC) 4.8.0 Copyright (C) 2013 Free Software Foundation, Inc. GNU Fortran comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of GNU Fortran under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING
subroutine max_in_loop(rin, rout) integer :: rin(1000), rout(1000), tmp !real :: rin(1000), rout(1000), tmp integer :: i do i = 2, 1000 tmp = min(rin(i-1), rin(i)) rout(i) = tmp end do end subroutine Is vectorized. The floating point number makes it special in some way. Looking in trans-intrinic.c , it is special. /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to __builtin_isnan might be made dependent on that module being loaded, to help performance of programs that don't rely on IEEE semantics. */
(In reply to Bud Davis from comment #1) > The floating point number makes it special in some way. My suspicion is that this is due to special handling for IEEE 754:2008, which requires that MAX (NaN, x) = MAX (x, NaN) = x MIN (NaN, x) = MIN (x, NaN) = x
Yes, you generally need -ffast-math here (or -ffinite-math-only at least).
(In reply to Richard Biener from comment #3) > Yes, you generally need -ffast-math here (or -ffinite-math-only at least). SSE2 has an unord comparison instruction (aka isnan) though, so vectorizing the full version of min/max should still work, and be even more worth it than for the finite-only min/max... Maybe a target issue?
But vectorization reorders the loop iterations, thus say if some value is sNaN, you'd get exceptions in different order. So, I'm afraid without -ffast-math you can vectorize this only if the user says that the order of iterations doesn't matter (say using OpenMP 4.0 #pragma omp simd or Cilk+ #pragma simd).
(In reply to Jakub Jelinek from comment #5) > But vectorization reorders the loop iterations, thus say if some value is > sNaN, you'd get exceptions in different order. So, I'm afraid without > -ffast-math you can vectorize this only if the user says that the order of > iterations doesn't matter (say using OpenMP 4.0 #pragma omp simd or Cilk+ > #pragma simd). Ah, I was only thinking of quiet nans. -fno-signaling-nans should be enough though, no? (I checked and it doesn't help, which makes sense since it is the default) I think it is quite common to care about quiet nans but not use signaling nans.
(In reply to Jakub Jelinek from comment #5) > But vectorization reorders the loop iterations, thus say if some value is > sNaN, you'd get exceptions in different order. So, I'm afraid without > -ffast-math you can vectorize this only if the user says that the order of > iterations doesn't matter (say using OpenMP 4.0 #pragma omp simd or Cilk+ > #pragma simd). I'm not sure this is actually a problem (or perhaps there is a another bug), because as I noted in the PR replacing min or max with a "functionally equivalent" sequence of if statements allows vectorization.
The compiler generates code for min and max that checks if an argument is NaN. (floating point numbers only, of course). This is different than the example you posted, as it would not give the correct answer when an argument is NaN.
The difficulty seems to be with vectorizing an AND or OR of 2 conditions. a<b || b unord b leaves control flow around a<b | b unord b complains that bit-precision arithmetic is not supported The second one in particular looks like a limitation in the vectorizer that would be nice to lift. I get the same issue with a loop using a[i]<0&b[i]<=0, it isn't related to unord in particular.
(In reply to Marc Glisse from comment #9) > The difficulty seems to be with vectorizing an AND or OR of 2 conditions. > a<b || b unord b leaves control flow around > a<b | b unord b complains that bit-precision arithmetic is not supported > > The second one in particular looks like a limitation in the vectorizer that > would be nice to lift. I get the same issue with a loop using > a[i]<0&b[i]<=0, it isn't related to unord in particular. Interestingly enough, using cond=a[i]<0&b[i]<=0 in a cond_expr fails to vectorize, but using (double)cond!=0 in the same cond_expr does vectorize (to a horrible result), thanks to vect_recog_bool_pattern, which is triggered by a conversion of the result of AND. I assume we want a similar pattern thing triggered by cond_expr. What pattern should it present to the vectorizer? Something like c1=(a[i]<0)?-1:0 (for c1, -1 and 0 of an integer type of the same size as the cond_expr), c2=(b[i]<=0)?-1:0, c=c1&c2, cond_expr(c!=0,...) maybe, so it looks as close as possible to the desired vectorized form? (we don't have to use -1 instead of 1 here, sticking to 1 may allow to share more code with the existing pattern, but then we would want a later pass to change it)
Confirming that we currently can't vectorize code that involves a (con|dis)jonction of several conditions.
The testcase is vectorized just fine for me with -Ofast for 4.8, 4.9 and trunk. With -O3 half of it is vectorized as reported. On trunk this is fixed and we vectorize both functions (after the fix for PR61194).