43703 – Unexpected floating point precision loss due to ARM NEON autovectorization

Bug 43703 - Unexpected floating point precision loss due to ARM NEON autovectorization

Summary: Unexpected floating point precision loss due to ARM NEON autovectorization

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.5.0

Importance:	P3 normal
Target Milestone:	4.6.0
Assignee:	jules

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2010-04-09 14:34 UTC by Siarhei Siamashka
Modified:	2012-07-10 13:22 UTC (History)
CC List:	1 user (show)

See Also:
Host:	armv7l-unknown-linux-gnueabi
Target:	armv7l-unknown-linux-gnueabi
Build:	armv7l-unknown-linux-gnueabi
Known to work:
Known to fail:	4.4.3, 4.5.0
Last reconfirmed:	2010-06-16 11:41:32

Attachments
a fixed testcase (272 bytes, text/plain) 2010-06-15 10:34 UTC, Siarhei Siamashka	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Siarhei Siamashka 2010-04-09 14:34:21 UTC

Using gcc-4.5.0-RC-20100406.tar.bz2

/************************************************************/
#include <stdio.h>

void __attribute__((noinline)) f(float * __restrict c,
                                 float * __restrict a,
                                 float * __restrict b)
{
    int i;
    for (i = 0; i < 4; i++) {
        c[i] = a[i] * b[i];
    }
}

int main()
{
    float a[4], b[4], c[4];

    a[0] = 1e-40;
    b[0] = 1e+38;

    f(c, a, b);

    printf("c[0]=%f\n", (double)c[0]);
    if (c[0] < 0.001)
        printf("precision problem: c[0] was flushed to zero\n");

    return 0;
}
/************************************************************/

# gcc -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O2 -fno-fast-math test.c
# ./a.out
c[0]=0.010000

# gcc -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O3 -fno-fast-math test.c
# ./a.out
c[0]=0.000000
precision problem: c[0] was flushed to zero


Using -O3 option turns on autovectorization, and the results of operations involving denormals get flushed to zero. This happens even if no "-ffast-math" or any other precision sacrificing options are enabled.

Comment 1 Andrew Pinski 2010-04-09 19:55:06 UTC

This is exacted really.  Denormals are a weird case in general.  Plus your testcase depends on uninitialized values.

Comment 2 Siarhei Siamashka 2010-04-09 20:34:09 UTC

(In reply to comment #1)
> This is exacted really.  Denormals are a weird case in general.

Well, denormals may be weird. But what about nan's, inf's and the other IEEE stuff, which is not supported by NEON unit? The compiler here takes the liberty of using NEON whenever it likes, and NEON does not fully support IEEE for sure. After reading man gcc, I had an impression that this should have been controlled by -ffast-math and the related options.

Floating point performance of VFP Lite unit is a disaster, and using NEON where appropriate is definitely needed. But IMHO this should be controlled somehow. For example by selectively using pragma optimize to set -ffast-math option in the critical parts of code.

Also I don't know how fantastic it is, but having a special data type, something like 'fast_float' with the relaxed precision requirements and suitable for use with NEON would be really nice.

> Plus your testcase depends on uninitialized values.

Yes, the testcase is not quite clean, but is easily fixable. Though this should not cause any problems unless floating point exceptions are enabled, those extra values are just irrelevant. Should I post an updated testcase?

Comment 3 Ramana Radhakrishnan 2010-04-12 09:17:11 UTC

Could you post a cleaned-up testcase ? I tried a cleaned up testcase with the values appropriately zero-initialized and gcc ends up generating the vectorized value in this case.

Comment 4 Siarhei Siamashka 2010-06-15 10:34:47 UTC

Created attachment 20913 [details]
a fixed testcase

A fixed testcase attached.

The main problem here is that denormals are not handled in a 'civilized' way by gcc at the moment. They are just silently and unconditionally treated in a relaxed way, and that might be neither wanted nor expected by the user. And 'readelf -A' shows the following EABI tags for the generated object file, even not marking it in a special way with the regards to denormals handling:
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Needed
  Tag_ABI_FP_number_model: IEEE 754

Comment 5 jules 2010-06-16 11:41:32 UTC

I am working on this.

Comment 6 Sandra Loosemore 2010-06-22 01:55:05 UTC

Julian's patch overlapped some other NEON changes I was already preparing for submission, so I did some refactoring before posting it for review.  Here's the main part of the fix:

http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02102.html

Comment 7 sandra 2010-07-03 00:47:03 UTC

Subject: Bug 43703

Author: sandra
Date: Sat Jul  3 00:46:51 2010
New Revision: 161763

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161763
Log:
2010-07-02  Julian Brown  <julian@codesourcery.com>
	    Sandra Loosemore <sandra@codesourcery.com>

	PR target/43703

	gcc/
	* config/arm/vec-common.md (add<mode>3, sub<mode>3, smin<mode>3)
	(smax<mode>3): Disable for NEON float modes when
	flag_unsafe_math_optimizations is false.
	* config/arm/neon.md (*add<mode>3_neon, *sub<mode>3_neon)
	(*mul<mode>3_neon)
	(mul<mode>3add<mode>_neon, mul<mode>3neg<mode>add<mode>_neon)
	(reduc_splus_<mode>, reduc_smin_<mode>, reduc_smax_<mode>): Disable
	for NEON float modes when flag_unsafe_math_optimizations is false.
	(quad_halves_<code>v4sf): Only enable if flag_unsafe_math_optimizations
	is true.
	* doc/invoke.texi (ARM Options): Add note about floating point
	vectorization requiring -funsafe-math-optimizations.

	gcc/testsuite/
	* gcc.dg/vect/vect.exp: Add -ffast-math for NEON.
	* gcc.dg/vect/vect-reduc-6.c: Add XFAIL for NEON.



Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/neon.md
    trunk/gcc/config/arm/vec-common.md
    trunk/gcc/doc/invoke.texi
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-6.c
    trunk/gcc/testsuite/gcc.dg/vect/vect.exp

Comment 8 Ramana Radhakrishnan 2010-07-05 12:22:02 UTC

Are you planning to backport this to all release branches since this affects all of them ?

cheers
Ramana

Comment 9 Richard Earnshaw 2012-07-10 13:22:33 UTC

Fixed in 4.6.0.  4.5 is no-longer being maintained.