Using gcc-4.5.0-RC-20100406.tar.bz2 /************************************************************/ #include <stdio.h> void __attribute__((noinline)) f(float * __restrict c, float * __restrict a, float * __restrict b) { int i; for (i = 0; i < 4; i++) { c[i] = a[i] * b[i]; } } int main() { float a[4], b[4], c[4]; a[0] = 1e-40; b[0] = 1e+38; f(c, a, b); printf("c[0]=%f\n", (double)c[0]); if (c[0] < 0.001) printf("precision problem: c[0] was flushed to zero\n"); return 0; } /************************************************************/ # gcc -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O2 -fno-fast-math test.c # ./a.out c[0]=0.010000 # gcc -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O3 -fno-fast-math test.c # ./a.out c[0]=0.000000 precision problem: c[0] was flushed to zero Using -O3 option turns on autovectorization, and the results of operations involving denormals get flushed to zero. This happens even if no "-ffast-math" or any other precision sacrificing options are enabled.
This is exacted really. Denormals are a weird case in general. Plus your testcase depends on uninitialized values.
(In reply to comment #1) > This is exacted really. Denormals are a weird case in general. Well, denormals may be weird. But what about nan's, inf's and the other IEEE stuff, which is not supported by NEON unit? The compiler here takes the liberty of using NEON whenever it likes, and NEON does not fully support IEEE for sure. After reading man gcc, I had an impression that this should have been controlled by -ffast-math and the related options. Floating point performance of VFP Lite unit is a disaster, and using NEON where appropriate is definitely needed. But IMHO this should be controlled somehow. For example by selectively using pragma optimize to set -ffast-math option in the critical parts of code. Also I don't know how fantastic it is, but having a special data type, something like 'fast_float' with the relaxed precision requirements and suitable for use with NEON would be really nice. > Plus your testcase depends on uninitialized values. Yes, the testcase is not quite clean, but is easily fixable. Though this should not cause any problems unless floating point exceptions are enabled, those extra values are just irrelevant. Should I post an updated testcase?
Could you post a cleaned-up testcase ? I tried a cleaned up testcase with the values appropriately zero-initialized and gcc ends up generating the vectorized value in this case.
Created attachment 20913 [details] a fixed testcase A fixed testcase attached. The main problem here is that denormals are not handled in a 'civilized' way by gcc at the moment. They are just silently and unconditionally treated in a relaxed way, and that might be neither wanted nor expected by the user. And 'readelf -A' shows the following EABI tags for the generated object file, even not marking it in a special way with the regards to denormals handling: Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754
I am working on this.
Julian's patch overlapped some other NEON changes I was already preparing for submission, so I did some refactoring before posting it for review. Here's the main part of the fix: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02102.html
Subject: Bug 43703 Author: sandra Date: Sat Jul 3 00:46:51 2010 New Revision: 161763 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161763 Log: 2010-07-02 Julian Brown <julian@codesourcery.com> Sandra Loosemore <sandra@codesourcery.com> PR target/43703 gcc/ * config/arm/vec-common.md (add<mode>3, sub<mode>3, smin<mode>3) (smax<mode>3): Disable for NEON float modes when flag_unsafe_math_optimizations is false. * config/arm/neon.md (*add<mode>3_neon, *sub<mode>3_neon) (*mul<mode>3_neon) (mul<mode>3add<mode>_neon, mul<mode>3neg<mode>add<mode>_neon) (reduc_splus_<mode>, reduc_smin_<mode>, reduc_smax_<mode>): Disable for NEON float modes when flag_unsafe_math_optimizations is false. (quad_halves_<code>v4sf): Only enable if flag_unsafe_math_optimizations is true. * doc/invoke.texi (ARM Options): Add note about floating point vectorization requiring -funsafe-math-optimizations. gcc/testsuite/ * gcc.dg/vect/vect.exp: Add -ffast-math for NEON. * gcc.dg/vect/vect-reduc-6.c: Add XFAIL for NEON. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/neon.md trunk/gcc/config/arm/vec-common.md trunk/gcc/doc/invoke.texi trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-6.c trunk/gcc/testsuite/gcc.dg/vect/vect.exp
Are you planning to backport this to all release branches since this affects all of them ? cheers Ramana
Fixed in 4.6.0. 4.5 is no-longer being maintained.