Bug 57328 - Missed optimization: Unable to vectorize Fortran min and max intrinsics
Summary: Missed optimization: Unable to vectorize Fortran min and max intrinsics
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2013-05-19 02:39 UTC by Brian Taylor
Modified: 2014-05-19 08:09 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.10.0
Known to fail:
Last reconfirmed: 2013-06-13 00:00:00


Attachments
Test case for vectorization of loops containing max and if (149 bytes, text/plain)
2013-05-19 02:39 UTC, Brian Taylor
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Taylor 2013-05-19 02:39:43 UTC
Created attachment 30145 [details]
Test case for vectorization of loops containing max and if

Use of the Fortran min or max intrinsic functions within a loop appears to prevent vectorization of the loop.  Replacement of min or max with conditional assignment using if statements allows vectorization.

A simple test case using max is attached.  If compiled with "gfortran -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=1 -c max_vs_ifs_in_loop.F90", I get (with extraneous output snipped):

...
max_vs_ifs_in_loop.F90:1: note: vectorized 0 loops in function.
...
max_vs_ifs_in_loop.F90:17: note: LOOP VECTORIZED.
...


gfortran should be able to vectorize loops containing min or max, using any number of arguments to these intrinsics, e.g. "tmp = max(r1, r2, r3, r4)".


Compiler info:
user@host $ gfortran --version
GNU Fortran (GCC) 4.8.0
Copyright (C) 2013 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
Comment 1 Bud Davis 2013-05-21 01:03:52 UTC
subroutine max_in_loop(rin, rout)
integer :: rin(1000), rout(1000), tmp
!real :: rin(1000), rout(1000), tmp
integer :: i

do i = 2, 1000
  tmp = min(rin(i-1), rin(i))
  rout(i) = tmp
end do

end subroutine

Is vectorized.
The floating point number makes it special in some way.

Looking in trans-intrinic.c , it is special.

   /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
         __builtin_isnan might be made dependent on that module being loaded,
         to help performance of programs that don't rely on IEEE semantics.  */
Comment 2 Tobias Burnus 2013-05-21 07:08:39 UTC
(In reply to Bud Davis from comment #1)
> The floating point number makes it special in some way.

My suspicion is that this is due to special handling for IEEE 754:2008, which requires that
   MAX (NaN, x) = MAX (x, NaN) = x
   MIN (NaN, x) = MIN (x, NaN) = x
Comment 3 Richard Biener 2013-05-21 09:15:59 UTC
Yes, you generally need -ffast-math here (or -ffinite-math-only at least).
Comment 4 Marc Glisse 2013-05-21 09:55:54 UTC
(In reply to Richard Biener from comment #3)
> Yes, you generally need -ffast-math here (or -ffinite-math-only at least).

SSE2 has an unord comparison instruction (aka isnan) though, so vectorizing the full version of min/max should still work, and be even more worth it than for the finite-only min/max... Maybe a target issue?
Comment 5 Jakub Jelinek 2013-05-21 10:15:38 UTC
But vectorization reorders the loop iterations, thus say if some value is sNaN, you'd get exceptions in different order.  So, I'm afraid without -ffast-math you can vectorize this only if the user says that the order of iterations doesn't matter (say using OpenMP 4.0 #pragma omp simd or Cilk+ #pragma simd).
Comment 6 Marc Glisse 2013-05-21 11:20:39 UTC
(In reply to Jakub Jelinek from comment #5)
> But vectorization reorders the loop iterations, thus say if some value is
> sNaN, you'd get exceptions in different order.  So, I'm afraid without
> -ffast-math you can vectorize this only if the user says that the order of
> iterations doesn't matter (say using OpenMP 4.0 #pragma omp simd or Cilk+
> #pragma simd).

Ah, I was only thinking of quiet nans. -fno-signaling-nans should be enough though, no? (I checked and it doesn't help, which makes sense since it is the default) I think it is quite common to care about quiet nans but not use signaling nans.
Comment 7 Brian Taylor 2013-05-21 18:16:49 UTC
(In reply to Jakub Jelinek from comment #5)
> But vectorization reorders the loop iterations, thus say if some value is
> sNaN, you'd get exceptions in different order.  So, I'm afraid without
> -ffast-math you can vectorize this only if the user says that the order of
> iterations doesn't matter (say using OpenMP 4.0 #pragma omp simd or Cilk+
> #pragma simd).

I'm not sure this is actually a problem (or perhaps there is a another bug), because as I noted in the PR replacing min or max with a "functionally equivalent" sequence of if statements allows vectorization.
Comment 8 Bud Davis 2013-05-21 20:25:12 UTC
The compiler generates code for min and max that checks if an argument is NaN. 
(floating point numbers only, of course).

This is different than the example you posted, as it would not give the correct answer when an argument is NaN.
Comment 9 Marc Glisse 2013-05-21 21:25:38 UTC
The difficulty seems to be with vectorizing an AND or OR of 2 conditions.
a<b || b unord b leaves control flow around
a<b | b unord b complains that bit-precision arithmetic is not supported

The second one in particular looks like a limitation in the vectorizer that would be nice to lift. I get the same issue with a loop using a[i]<0&b[i]<=0, it isn't related to unord in particular.
Comment 10 Marc Glisse 2013-05-22 14:22:10 UTC
(In reply to Marc Glisse from comment #9)
> The difficulty seems to be with vectorizing an AND or OR of 2 conditions.
> a<b || b unord b leaves control flow around
> a<b | b unord b complains that bit-precision arithmetic is not supported
> 
> The second one in particular looks like a limitation in the vectorizer that
> would be nice to lift. I get the same issue with a loop using
> a[i]<0&b[i]<=0, it isn't related to unord in particular.

Interestingly enough, using cond=a[i]<0&b[i]<=0 in a cond_expr fails to vectorize, but using (double)cond!=0 in the same cond_expr does vectorize (to a horrible result), thanks to vect_recog_bool_pattern, which is triggered by a conversion of the result of AND. I assume we want a similar pattern thing triggered by cond_expr. What pattern should it present to the vectorizer? Something like c1=(a[i]<0)?-1:0 (for c1, -1 and 0 of an integer type of the same size as the cond_expr), c2=(b[i]<=0)?-1:0, c=c1&c2, cond_expr(c!=0,...) maybe, so it looks as close as possible to the desired vectorized form? (we don't have to use -1 instead of 1 here, sticking to 1 may allow to share more code with the existing pattern, but then we would want a later pass to change it)
Comment 11 Marc Glisse 2013-06-13 20:29:11 UTC
Confirming that we currently can't vectorize code that involves a (con|dis)jonction of several conditions.
Comment 12 Richard Biener 2014-05-19 08:09:30 UTC
The testcase is vectorized just fine for me with -Ofast for 4.8, 4.9 and trunk.
With -O3 half of it is vectorized as reported.

On trunk this is fixed and we vectorize both functions (after the fix
for PR61194).