This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch,Fortran] PR 33197 Add NORM2 and PARITY
- From: Daniel Kraft <d at domob dot eu>
- To: Tobias Burnus <burnus at net-b dot de>
- Cc: gcc patches f <gcc-patches at gcc dot gnu dot org>, gfortran <fortran at gcc dot gnu dot org>
- Date: Fri, 27 Aug 2010 12:14:43 +0200
- Subject: Re: [Patch,Fortran] PR 33197 Add NORM2 and PARITY
- References: <4C766E7C.7000408@net-b.de>
Hi Tobias,
Tobias Burnus wrote:
This patch adds NORM2 (= sqrt(sum(array**2))) and PARITY(logical_array)
- and thus the last F2008 math intrinsics, which means that PR 33197 can
be closed afterwards. (Though, there are still lots of bits-related
intrinsics missing.)
All algorithms are implemented three times: (a) in simplify, (b) in
trans-intrinsics (for scalar results, i.e. rank-1 arrays or if no dim=
has been given), and (c) in libgfortran.
Please check when reading/testing the patch that it works correctly for
zero-sized arrays, size-1 arrays and (for NORM2) arrays with values >
sqrt(huge(a)).
For norm2 (L2 norm), one essentially does:
tmp = max(array) ! = L_infinity norm
norm2 = tmp * sqrt( (array/tmp)**2)
to avoid an overflow. However, it is implemented using a single pass.
For NORM2, an algorithm based on the one in BLAS is used though I have
realized that I start with SCALE == 1, RESULT = 0 while BLAS uses the
opposite starting values. Both seem to lead to the same result, thus, I
do not know which one is better. Quickly thinking about it, mine seems
to be better (faster + more accurate) for small values (<= 1.0), but I
might be wrong. Cf. http://www.netlib.org/blas/snrm2.f
For simplify, NORM2 does not divide by max(array) as seemingly no
overflow (truncation) occurs until the norm2 calculation has been
completed. Thus, lazy as I was, I did not change that part to the more
complicated algorithm.
Build and regtested on x86-64-linux.
OK for the trunk?
In general ok, but:
+ resolve_mask_arg (array); /* FIXME: Does this make sense? */
I think this is ok.
else
- gfc_init_block (&block);
+ gfc_init_block (&block);
/* Do the actual summation/product. */
I'd rather have the blank line *after* than *before* the gfc_init_block.
+ absX = fold_build1 (ABS_EXPR, type, arrayse.expr);
+ cond = fold_build2 (GT_EXPR, boolean_type_node, absX, scale);
+
+ gfc_add_expr_to_block (&block, tmp);
+ res1 = fold_build2 (MULT_EXPR, type, scale, scale);
+ res1 = fold_build2 (RDIV_EXPR, type, res1,
Are you sure you want the gfc_add_expr_to_block here? tmp is only used
last a long way above... What does this do?
+Calculates the Euclidean vector norm (@math{L_2} norm of
This seems to miss a closing parenthesis.
And most importantly:
+ if (absX > scale)
+ {
+ result = 1.0 + result * (scale*scale) / (absX * absX);
+ scale = absX;
+ }
+ else
+ result += absX * absX / (scale*scale)
I think you should not use (scale*scale)/(absX*absX) here (or the other
way round) but rather (scale/absX)**2. This may lead to an overflow.
See the test:
real(10) :: a(2)
real(10) :: val
val = sqrt(huge(1.0_10)) * 2.0_10
a = (/ val, val /)
print *, norm2(a) / val / sqrt(2.0_10)
end
This prints NaN with your patch but the expected 1.00000... when the
calculation is done as suggested above. Note that I had to use KIND=10
here because probably otherwise the internal calculation is done with
"higher presicion" than the values (on my system). The same is true I
guess for the library version.
Yours,
Daniel
--
http://www.pro-vegan.info/
--
Done: Arc-Bar-Cav-Kni-Ran-Rog-Sam-Tou-Val-Wiz
To go: Hea-Mon-Pri