Per IEEE 754:2008, one has max(x,NaN) == max(NaN,x) == x. Gfortran's inline version of maxloc, maxval and max (ditto for min*) follows this. However, the libgfortran version does not: real :: r(4), z z = 0.0 r = (/ z/z, -1., -42., 849. /) print *,r print *, minloc(r), minval(r) print *, maxloc(r), maxval(r) end Produces: NaN -1.0000000 -42.000000 849.00000 1 -42.000000 1 849.00000 Correct is - as ifort has -: NaN -1.000000 -42.00000 849.0000 3 -42.00000 4 849.0000
Even the inline version is wrong I think. real :: r(4), z z = 0.0 r = (/ z/z, z/z, z/z, z/z /) print *,r print *, minloc(r,dim=1), minval(r,dim=1) print *, maxloc(r,dim=1), maxval(r,dim=1) end Not sure what minval/maxval should be in this case, but minloc/maxloc 0 looks wrong (given that for non-empty arrays it is supposed to be 1 .. arrayextent).
Created attachment 18174 [details] gcc45-pr40643.patch Inline minmaxloc patch. Is this the behavior we want? If NaNs aren't supported and array size is known, no mask, then inline maxloc/minloc should be quite a bit faster (condition should be just val < limit instead of val < limit || (pos == 0 && val == limit)). On the other side, if array size isn't known at compile time (or non-scalar mask is used) and NaNs must be supported, the loop is more expensive. I haven't touched minmaxval inline expansion yet, there I'm afraid for floating point without -ffast-math the loop will have to be quite a bit slower than what we do ATM, as we'd need to return -inf for maxval on array full of -infs (or full of nans with at least one -inf), or nan for array full of nans, but for empty array or completely masked away array -huge. The patch also disables the inline expansion for -O0, I think at -O0 we should aim at generating smaller and more easily debuggable code. Probably we shouldn't be expanding this inline for optimize_size either.
Created attachment 18175 [details] Results for the testcase with ifort, sunf95, NAG f95, openf95, g95 > Is this the behavior we want? Good question. See attachment for the results of other compilers.
Created attachment 18192 [details] gcc45-pr40643.patch Slightly adjusted patch, so that even when array size isn't known compile time constant it can avoid expensive runtime tests in the loop.
Created attachment 18193 [details] gcc45-pr40643.patch And now a patch which uses two loops instead of one if needed for performance (in the honor nans case or when mask is used). The first loop will stop after changing pos the first time, the second loop then can just use val < limit comparison. E.g. on subroutine bar(a, n) real, allocatable :: a(:) integer :: n end subroutine program main implicit none interface bar subroutine bar(a, n) real, allocatable :: a(:) integer :: n end subroutine end interface integer :: n real, allocatable :: a(:) integer :: i allocate (a(100)) call random_number(a) do i=1, 10000000 n = minloc(a, dim=1) call bar(a, n) end do end program main this patch shows very noticeable difference both for -O3 -ffast-math -funroll-loops and for -O3 -funroll-loops. If you think this is the way we should go, I can change also the inline minval/maxval version and attempt to change the library routines as well.
See also PR 30694.
Subject: Bug 40643 Author: jakub Date: Fri Jul 24 07:57:13 2009 New Revision: 150041 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=150041 Log: PR fortran/40643 PR fortran/31067 * trans-intrinsic.c (gfc_conv_intrinsic_minmaxloc, gfc_conv_intrinsic_minmaxval): Handle Infinities and NaNs properly, optimize. * trans-array.c (gfc_trans_scalarized_loop_end): No longer static. * trans-array.h (gfc_trans_scalarized_loop_end): New prototype. * libgfortran.h (GFC_REAL_4_INFINITY, GFC_REAL_8_INFINITY, GFC_REAL_10_INFINITY, GFC_REAL_16_INFINITY, GFC_REAL_4_QUIET_NAN, GFC_REAL_8_QUIET_NAN, GFC_REAL_10_QUIET_NAN, GFC_REAL_16_QUIET_NAN): Define. * m4/iparm.m4 (atype_inf, atype_nan): Define. * m4/ifunction.m4: Formatting. * m4/iforeach.m4: Likewise. (START_FOREACH_FUNCTION): Initialize dest to all 1s, not all 0s. (START_FOREACH_BLOCK, FINISH_FOREACH_FUNCTION, FINISH_MASKED_FOREACH_FUNCTION): Run foreach block inside a loop until count[0] == extent[0]. * m4/minval.m4: Formatting. Handle NaNs and infinities. Optimize. * m4/maxval.m4: Likewise. * m4/minloc0.m4: Likewise. * m4/maxloc0.m4: Likewise. * m4/minloc1.m4: Likewise. * m4/maxloc1.m4: Likewise. * generated/maxloc0_16_i16.c: Regenerated. * generated/maxloc0_16_i1.c: Likewise. * generated/maxloc0_16_i2.c: Likewise. * generated/maxloc0_16_i4.c: Likewise. * generated/maxloc0_16_i8.c: Likewise. * generated/maxloc0_16_r10.c: Likewise. * generated/maxloc0_16_r16.c: Likewise. * generated/maxloc0_16_r4.c: Likewise. * generated/maxloc0_16_r8.c: Likewise. * generated/maxloc0_4_i16.c: Likewise. * generated/maxloc0_4_i1.c: Likewise. * generated/maxloc0_4_i2.c: Likewise. * generated/maxloc0_4_i4.c: Likewise. * generated/maxloc0_4_i8.c: Likewise. * generated/maxloc0_4_r10.c: Likewise. * generated/maxloc0_4_r16.c: Likewise. * generated/maxloc0_4_r4.c: Likewise. * generated/maxloc0_4_r8.c: Likewise. * generated/maxloc0_8_i16.c: Likewise. * generated/maxloc0_8_i1.c: Likewise. * generated/maxloc0_8_i2.c: Likewise. * generated/maxloc0_8_i4.c: Likewise. * generated/maxloc0_8_i8.c: Likewise. * generated/maxloc0_8_r10.c: Likewise. * generated/maxloc0_8_r16.c: Likewise. * generated/maxloc0_8_r4.c: Likewise. * generated/maxloc0_8_r8.c: Likewise. * generated/maxloc1_16_i16.c: Likewise. * generated/maxloc1_16_i1.c: Likewise. * generated/maxloc1_16_i2.c: Likewise. * generated/maxloc1_16_i4.c: Likewise. * generated/maxloc1_16_i8.c: Likewise. * generated/maxloc1_16_r10.c: Likewise. * generated/maxloc1_16_r16.c: Likewise. * generated/maxloc1_16_r4.c: Likewise. * generated/maxloc1_16_r8.c: Likewise. * generated/maxloc1_4_i16.c: Likewise. * generated/maxloc1_4_i1.c: Likewise. * generated/maxloc1_4_i2.c: Likewise. * generated/maxloc1_4_i4.c: Likewise. * generated/maxloc1_4_i8.c: Likewise. * generated/maxloc1_4_r10.c: Likewise. * generated/maxloc1_4_r16.c: Likewise. * generated/maxloc1_4_r4.c: Likewise. * generated/maxloc1_4_r8.c: Likewise. * generated/maxloc1_8_i16.c: Likewise. * generated/maxloc1_8_i1.c: Likewise. * generated/maxloc1_8_i2.c: Likewise. * generated/maxloc1_8_i4.c: Likewise. * generated/maxloc1_8_i8.c: Likewise. * generated/maxloc1_8_r10.c: Likewise. * generated/maxloc1_8_r16.c: Likewise. * generated/maxloc1_8_r4.c: Likewise. * generated/maxloc1_8_r8.c: Likewise. * generated/maxval_i16.c: Likewise. * generated/maxval_i1.c: Likewise. * generated/maxval_i2.c: Likewise. * generated/maxval_i4.c: Likewise. * generated/maxval_i8.c: Likewise. * generated/maxval_r10.c: Likewise. * generated/maxval_r16.c: Likewise. * generated/maxval_r4.c: Likewise. * generated/maxval_r8.c: Likewise. * generated/minloc0_16_i16.c: Likewise. * generated/minloc0_16_i1.c: Likewise. * generated/minloc0_16_i2.c: Likewise. * generated/minloc0_16_i4.c: Likewise. * generated/minloc0_16_i8.c: Likewise. * generated/minloc0_16_r10.c: Likewise. * generated/minloc0_16_r16.c: Likewise. * generated/minloc0_16_r4.c: Likewise. * generated/minloc0_16_r8.c: Likewise. * generated/minloc0_4_i16.c: Likewise. * generated/minloc0_4_i1.c: Likewise. * generated/minloc0_4_i2.c: Likewise. * generated/minloc0_4_i4.c: Likewise. * generated/minloc0_4_i8.c: Likewise. * generated/minloc0_4_r10.c: Likewise. * generated/minloc0_4_r16.c: Likewise. * generated/minloc0_4_r4.c: Likewise. * generated/minloc0_4_r8.c: Likewise. * generated/minloc0_8_i16.c: Likewise. * generated/minloc0_8_i1.c: Likewise. * generated/minloc0_8_i2.c: Likewise. * generated/minloc0_8_i4.c: Likewise. * generated/minloc0_8_i8.c: Likewise. * generated/minloc0_8_r10.c: Likewise. * generated/minloc0_8_r16.c: Likewise. * generated/minloc0_8_r4.c: Likewise. * generated/minloc0_8_r8.c: Likewise. * generated/minloc1_16_i16.c: Likewise. * generated/minloc1_16_i1.c: Likewise. * generated/minloc1_16_i2.c: Likewise. * generated/minloc1_16_i4.c: Likewise. * generated/minloc1_16_i8.c: Likewise. * generated/minloc1_16_r10.c: Likewise. * generated/minloc1_16_r16.c: Likewise. * generated/minloc1_16_r4.c: Likewise. * generated/minloc1_16_r8.c: Likewise. * generated/minloc1_4_i16.c: Likewise. * generated/minloc1_4_i1.c: Likewise. * generated/minloc1_4_i2.c: Likewise. * generated/minloc1_4_i4.c: Likewise. * generated/minloc1_4_i8.c: Likewise. * generated/minloc1_4_r10.c: Likewise. * generated/minloc1_4_r16.c: Likewise. * generated/minloc1_4_r4.c: Likewise. * generated/minloc1_4_r8.c: Likewise. * generated/minloc1_8_i16.c: Likewise. * generated/minloc1_8_i1.c: Likewise. * generated/minloc1_8_i2.c: Likewise. * generated/minloc1_8_i4.c: Likewise. * generated/minloc1_8_i8.c: Likewise. * generated/minloc1_8_r10.c: Likewise. * generated/minloc1_8_r16.c: Likewise. * generated/minloc1_8_r4.c: Likewise. * generated/minloc1_8_r8.c: Likewise. * generated/minval_i16.c: Likewise. * generated/minval_i1.c: Likewise. * generated/minval_i2.c: Likewise. * generated/minval_i4.c: Likewise. * generated/minval_i8.c: Likewise. * generated/minval_r10.c: Likewise. * generated/minval_r16.c: Likewise. * generated/minval_r4.c: Likewise. * generated/minval_r8.c: Likewise. * generated/product_c10.c: Likewise. * generated/product_c16.c: Likewise. * generated/product_c4.c: Likewise. * generated/product_c8.c: Likewise. * generated/product_i16.c: Likewise. * generated/product_i1.c: Likewise. * generated/product_i2.c: Likewise. * generated/product_i4.c: Likewise. * generated/product_i8.c: Likewise. * generated/product_r10.c: Likewise. * generated/product_r16.c: Likewise. * generated/product_r4.c: Likewise. * generated/product_r8.c: Likewise. * generated/sum_c10.c: Likewise. * generated/sum_c16.c: Likewise. * generated/sum_c4.c: Likewise. * generated/sum_c8.c: Likewise. * generated/sum_i16.c: Likewise. * generated/sum_i1.c: Likewise. * generated/sum_i2.c: Likewise. * generated/sum_i4.c: Likewise. * generated/sum_i8.c: Likewise. * generated/sum_r10.c: Likewise. * generated/sum_r16.c: Likewise. * generated/sum_r4.c: Likewise. * generated/sum_r8.c: Likewise. * gfortran.dg/maxlocval_2.f90: New test. * gfortran.dg/maxlocval_3.f90: New test. * gfortran.dg/maxlocval_4.f90: New test. * gfortran.dg/minlocval_1.f90: New test. * gfortran.dg/minlocval_2.f90: New test. * gfortran.dg/minlocval_3.f90: New test. * gfortran.dg/minlocval_4.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/maxlocval_2.f90 trunk/gcc/testsuite/gfortran.dg/maxlocval_3.f90 trunk/gcc/testsuite/gfortran.dg/maxlocval_4.f90 trunk/gcc/testsuite/gfortran.dg/minlocval_1.f90 trunk/gcc/testsuite/gfortran.dg/minlocval_2.f90 trunk/gcc/testsuite/gfortran.dg/minlocval_3.f90 trunk/gcc/testsuite/gfortran.dg/minlocval_4.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/trans-array.c trunk/gcc/fortran/trans-array.h trunk/gcc/fortran/trans-intrinsic.c trunk/gcc/testsuite/ChangeLog trunk/libgfortran/ChangeLog trunk/libgfortran/generated/maxloc0_16_i1.c trunk/libgfortran/generated/maxloc0_16_i16.c trunk/libgfortran/generated/maxloc0_16_i2.c trunk/libgfortran/generated/maxloc0_16_i4.c trunk/libgfortran/generated/maxloc0_16_i8.c trunk/libgfortran/generated/maxloc0_16_r10.c trunk/libgfortran/generated/maxloc0_16_r16.c trunk/libgfortran/generated/maxloc0_16_r4.c trunk/libgfortran/generated/maxloc0_16_r8.c trunk/libgfortran/generated/maxloc0_4_i1.c trunk/libgfortran/generated/maxloc0_4_i16.c trunk/libgfortran/generated/maxloc0_4_i2.c trunk/libgfortran/generated/maxloc0_4_i4.c trunk/libgfortran/generated/maxloc0_4_i8.c trunk/libgfortran/generated/maxloc0_4_r10.c trunk/libgfortran/generated/maxloc0_4_r16.c trunk/libgfortran/generated/maxloc0_4_r4.c trunk/libgfortran/generated/maxloc0_4_r8.c trunk/libgfortran/generated/maxloc0_8_i1.c trunk/libgfortran/generated/maxloc0_8_i16.c trunk/libgfortran/generated/maxloc0_8_i2.c trunk/libgfortran/generated/maxloc0_8_i4.c trunk/libgfortran/generated/maxloc0_8_i8.c trunk/libgfortran/generated/maxloc0_8_r10.c trunk/libgfortran/generated/maxloc0_8_r16.c trunk/libgfortran/generated/maxloc0_8_r4.c trunk/libgfortran/generated/maxloc0_8_r8.c trunk/libgfortran/generated/maxloc1_16_i1.c trunk/libgfortran/generated/maxloc1_16_i16.c trunk/libgfortran/generated/maxloc1_16_i2.c trunk/libgfortran/generated/maxloc1_16_i4.c trunk/libgfortran/generated/maxloc1_16_i8.c trunk/libgfortran/generated/maxloc1_16_r10.c trunk/libgfortran/generated/maxloc1_16_r16.c trunk/libgfortran/generated/maxloc1_16_r4.c trunk/libgfortran/generated/maxloc1_16_r8.c trunk/libgfortran/generated/maxloc1_4_i1.c trunk/libgfortran/generated/maxloc1_4_i16.c trunk/libgfortran/generated/maxloc1_4_i2.c trunk/libgfortran/generated/maxloc1_4_i4.c trunk/libgfortran/generated/maxloc1_4_i8.c trunk/libgfortran/generated/maxloc1_4_r10.c trunk/libgfortran/generated/maxloc1_4_r16.c trunk/libgfortran/generated/maxloc1_4_r4.c trunk/libgfortran/generated/maxloc1_4_r8.c trunk/libgfortran/generated/maxloc1_8_i1.c trunk/libgfortran/generated/maxloc1_8_i16.c trunk/libgfortran/generated/maxloc1_8_i2.c trunk/libgfortran/generated/maxloc1_8_i4.c trunk/libgfortran/generated/maxloc1_8_i8.c trunk/libgfortran/generated/maxloc1_8_r10.c trunk/libgfortran/generated/maxloc1_8_r16.c trunk/libgfortran/generated/maxloc1_8_r4.c trunk/libgfortran/generated/maxloc1_8_r8.c trunk/libgfortran/generated/maxval_i1.c trunk/libgfortran/generated/maxval_i16.c trunk/libgfortran/generated/maxval_i2.c trunk/libgfortran/generated/maxval_i4.c trunk/libgfortran/generated/maxval_i8.c trunk/libgfortran/generated/maxval_r10.c trunk/libgfortran/generated/maxval_r16.c trunk/libgfortran/generated/maxval_r4.c trunk/libgfortran/generated/maxval_r8.c trunk/libgfortran/generated/minloc0_16_i1.c trunk/libgfortran/generated/minloc0_16_i16.c trunk/libgfortran/generated/minloc0_16_i2.c trunk/libgfortran/generated/minloc0_16_i4.c trunk/libgfortran/generated/minloc0_16_i8.c trunk/libgfortran/generated/minloc0_16_r10.c trunk/libgfortran/generated/minloc0_16_r16.c trunk/libgfortran/generated/minloc0_16_r4.c trunk/libgfortran/generated/minloc0_16_r8.c trunk/libgfortran/generated/minloc0_4_i1.c trunk/libgfortran/generated/minloc0_4_i16.c trunk/libgfortran/generated/minloc0_4_i2.c trunk/libgfortran/generated/minloc0_4_i4.c trunk/libgfortran/generated/minloc0_4_i8.c trunk/libgfortran/generated/minloc0_4_r10.c trunk/libgfortran/generated/minloc0_4_r16.c trunk/libgfortran/generated/minloc0_4_r4.c trunk/libgfortran/generated/minloc0_4_r8.c trunk/libgfortran/generated/minloc0_8_i1.c trunk/libgfortran/generated/minloc0_8_i16.c trunk/libgfortran/generated/minloc0_8_i2.c trunk/libgfortran/generated/minloc0_8_i4.c trunk/libgfortran/generated/minloc0_8_i8.c trunk/libgfortran/generated/minloc0_8_r10.c trunk/libgfortran/generated/minloc0_8_r16.c trunk/libgfortran/generated/minloc0_8_r4.c trunk/libgfortran/generated/minloc0_8_r8.c trunk/libgfortran/generated/minloc1_16_i1.c trunk/libgfortran/generated/minloc1_16_i16.c trunk/libgfortran/generated/minloc1_16_i2.c trunk/libgfortran/generated/minloc1_16_i4.c trunk/libgfortran/generated/minloc1_16_i8.c trunk/libgfortran/generated/minloc1_16_r10.c trunk/libgfortran/generated/minloc1_16_r16.c trunk/libgfortran/generated/minloc1_16_r4.c trunk/libgfortran/generated/minloc1_16_r8.c trunk/libgfortran/generated/minloc1_4_i1.c trunk/libgfortran/generated/minloc1_4_i16.c trunk/libgfortran/generated/minloc1_4_i2.c trunk/libgfortran/generated/minloc1_4_i4.c trunk/libgfortran/generated/minloc1_4_i8.c trunk/libgfortran/generated/minloc1_4_r10.c trunk/libgfortran/generated/minloc1_4_r16.c trunk/libgfortran/generated/minloc1_4_r4.c trunk/libgfortran/generated/minloc1_4_r8.c trunk/libgfortran/generated/minloc1_8_i1.c trunk/libgfortran/generated/minloc1_8_i16.c trunk/libgfortran/generated/minloc1_8_i2.c trunk/libgfortran/generated/minloc1_8_i4.c trunk/libgfortran/generated/minloc1_8_i8.c trunk/libgfortran/generated/minloc1_8_r10.c trunk/libgfortran/generated/minloc1_8_r16.c trunk/libgfortran/generated/minloc1_8_r4.c trunk/libgfortran/generated/minloc1_8_r8.c trunk/libgfortran/generated/minval_i1.c trunk/libgfortran/generated/minval_i16.c trunk/libgfortran/generated/minval_i2.c trunk/libgfortran/generated/minval_i4.c trunk/libgfortran/generated/minval_i8.c trunk/libgfortran/generated/minval_r10.c trunk/libgfortran/generated/minval_r16.c trunk/libgfortran/generated/minval_r4.c trunk/libgfortran/generated/minval_r8.c trunk/libgfortran/generated/product_c10.c trunk/libgfortran/generated/product_c16.c trunk/libgfortran/generated/product_c4.c trunk/libgfortran/generated/product_c8.c trunk/libgfortran/generated/product_i1.c trunk/libgfortran/generated/product_i16.c trunk/libgfortran/generated/product_i2.c trunk/libgfortran/generated/product_i4.c trunk/libgfortran/generated/product_i8.c trunk/libgfortran/generated/product_r10.c trunk/libgfortran/generated/product_r16.c trunk/libgfortran/generated/product_r4.c trunk/libgfortran/generated/product_r8.c trunk/libgfortran/generated/sum_c10.c trunk/libgfortran/generated/sum_c16.c trunk/libgfortran/generated/sum_c4.c trunk/libgfortran/generated/sum_c8.c trunk/libgfortran/generated/sum_i1.c trunk/libgfortran/generated/sum_i16.c trunk/libgfortran/generated/sum_i2.c trunk/libgfortran/generated/sum_i4.c trunk/libgfortran/generated/sum_i8.c trunk/libgfortran/generated/sum_r10.c trunk/libgfortran/generated/sum_r16.c trunk/libgfortran/generated/sum_r4.c trunk/libgfortran/generated/sum_r8.c trunk/libgfortran/libgfortran.h trunk/libgfortran/m4/iforeach.m4 trunk/libgfortran/m4/ifunction.m4 trunk/libgfortran/m4/iparm.m4 trunk/libgfortran/m4/maxloc0.m4 trunk/libgfortran/m4/maxloc1.m4 trunk/libgfortran/m4/maxval.m4 trunk/libgfortran/m4/minloc0.m4 trunk/libgfortran/m4/minloc1.m4 trunk/libgfortran/m4/minval.m4
As far as I can see, this is fixed. Closing.