Bug 54703

Summary:	_mm_sub_pd is incorrectly substituted with vandnps
Product:	gcc	Reporter:	Matthias Kretz (Vir) <mkretz>
Component:	target	Assignee:	Jakub Jelinek <jakub>
Status:	RESOLVED FIXED
Severity:	major	CC:	jakub
Priority:	P3	Keywords:	wrong-code
Version:	4.8.0
Target Milestone:	4.7.3
Host:		Target:	x86_64--, i?86--
Build:		Known to work:	4.7.3, 4.8.0
Known to fail:	4.6.2, 4.7.0, 4.7.1, 4.7.2	Last reconfirmed:	2012-09-25 00:00:00
Attachments:	gcc48-pr54703.patch

Description Matthias Kretz (Vir) 2012-09-25 13:29:16 UTC

The following testcase:

#include <xmmintrin.h>
__attribute__((aligned(16))) static const unsigned long long mask[2] = { 0xffffffffff000000ull, 0xffffffffff000000ull };
inline __m128d foo(__m128d v1) {
    const __m128d h1 = _mm_and_pd(v1, _mm_load_pd(reinterpret_cast<const double *>(&mask)));
    const __m128d l1 = _mm_sub_pd(v1, h1);
    return _mm_mul_pd(h1, l1);
}
__m128d test() {
    __m128d a = _mm_set1_pd(2.);
    return foo(foo(a));
}

compiles to

        .cfi_startproc
        vmovaps _ZL4mask(%rip), %xmm0
        vandps  .LC0(%rip), %xmm0, %xmm2
        vandnps .LC0(%rip), %xmm0, %xmm1
        vmulpd  %xmm1, %xmm2, %xmm1
        vandps  %xmm0, %xmm1, %xmm0
        vsubpd  %xmm0, %xmm1, %xmm1
        vmulpd  %xmm1, %xmm0, %xmm0
        ret
        .cfi_endproc

The second foo call is correct: vandps and vsubpd are used. But the first call uses vandps and vandnps. This pattern would be correct for integers, but is obviously wrong for floating point numbers.

Comment 1 Matthias Kretz (Vir) 2012-09-25 13:32:27 UTC

Um, sorry. Forgot to note the compiler switches:

gcc -O1 -march=bdver1

I can't reproduce the error with corei7-avx or any other non-AVX target.

Comment 2 Richard Biener 2012-09-25 13:47:48 UTC

Confirmed.

Comment 3 Jakub Jelinek 2012-09-25 13:48:10 UTC

Looking into it.

Comment 4 Jakub Jelinek 2012-09-25 14:34:32 UTC

Created attachment 28270 [details]
gcc48-pr54703.patch

Untested fix.

Comment 5 Andrew Pinski 2012-09-25 18:04:00 UTC

(In reply to comment #4)
> Created attachment 28270 [details]
> gcc48-pr54703.patch

Since I wrote the code which breaks this, this looks good.  The only question I have is why is AND allowed on non integer types anyways?

Comment 6 Jakub Jelinek 2012-09-25 21:56:53 UTC

Yeah, the generic code is not buggy, it seems sane to assume AND is integral/vector integer operation only, but I think i?86 isn't the only backend that uses AND on vector floating point modes extensively.

Comment 7 Matthias Kretz (Vir) 2012-09-26 10:52:38 UTC

Thanks for the quick response! You guys are cool! :)

The pattern here is for calculation with extended precision:

xh = x & mask;
xl = x - xh;
yh = y & mask
yl = y - yh;
/* x * y + small = */ xh * yh + ((xh * yl + xl * yh) + (xl * yl + small));

Other than that, yes there's a lot of ANDPS is SSE code, just this pattern which gets miscompiled is not very common.

Comment 8 Jakub Jelinek 2012-09-27 10:48:14 UTC

Author: jakub
Date: Thu Sep 27 10:48:07 2012
New Revision: 191801

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191801
Log:
	PR target/54703
	* simplify-rtx.c (simplify_binary_operation_1): Perform
	(x - (x & y)) -> (x & ~y) optimization only for integral
	modes.

	* gcc.target/i386/pr54703.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr54703.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/simplify-rtx.c
    trunk/gcc/testsuite/ChangeLog

Comment 9 Jakub Jelinek 2012-09-27 10:53:47 UTC

Author: jakub
Date: Thu Sep 27 10:53:42 2012
New Revision: 191802

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191802
Log:
	PR target/54703
	* simplify-rtx.c (simplify_binary_operation_1): Perform
	(x - (x & y)) -> (x & ~y) optimization only for integral
	modes.

	* gcc.target/i386/pr54703.c: New test.

Added:
    branches/gcc-4_7-branch/gcc/testsuite/gcc.target/i386/pr54703.c
Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/simplify-rtx.c
    branches/gcc-4_7-branch/gcc/testsuite/ChangeLog

Comment 10 Jakub Jelinek 2012-09-27 10:54:36 UTC

Fixed for 4.7.3+ so far.

Comment 11 Jakub Jelinek 2013-04-12 16:29:14 UTC

The 4.6 branch has been closed, fixed in GCC 4.7.3.