Bug 61423 - Incorrect conversion from unsigned int to floating point
Summary: Incorrect conversion from unsigned int to floating point
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: unknown
: P3 normal
Target Milestone: 4.8.4
Assignee: Uroš Bizjak
URL:
Keywords: wrong-code
Depends on: 61446
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-05 15:52 UTC by lvqcl.mail
Modified: 2014-06-18 20:03 UTC (History)
4 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-06-06 00:00:00


Attachments
test program (333 bytes, text/plain)
2014-06-05 15:54 UTC, lvqcl.mail
Details
Proposed patch (931 bytes, patch)
2014-06-06 15:30 UTC, Uroš Bizjak
Details | Diff
Testcase that fails in REE pass (29.14 KB, application/x-bzip)
2014-06-06 15:37 UTC, Uroš Bizjak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description lvqcl.mail 2014-06-05 15:52:04 UTC
The attached program works incorrectly when compiled for i686 with '-O3 -msse2' or '-O1 -ftree-vectorize -msse2'.

GCC compiles the function

#define N 1024
static unsigned int A[N];
double func (void)
{
	unsigned int sum = 0;
	unsigned i;
	double t;

	for (i = 0; i < N; i++) sum += A[i];

	t = sum; /* uint32 -> double */
	return t;
}

into the following:

01:	pxor	%xmm0, %xmm0
02:	movl	$_A, %eax
03: L2:
04:	paddd	(%eax), %xmm0
05:	addl	$16, %eax
06:	cmpl	$_A+4096, %eax
07:	jne	L2
08:	movdqa	%xmm0, %xmm1
09:	subl	$28, %esp
10:	psrldq	$8, %xmm1
11:	paddd	%xmm1, %xmm0
12:	movdqa	%xmm0, %xmm1
13:	psrldq	$4, %xmm1
14:	paddd	%xmm1, %xmm0
15:	movq	%xmm0, 8(%esp)
16:	fildq	8(%esp)
17:	addl	$28, %esp
18:	ret

After the line 07: xmm0 contains four partial sums.
After the line 14: lower 4 bytes of xmm0 contain the total sum, the rest 12 bytes contain garbage.
Lines 15 and 16: *eight* bytes from xmm0 are stored in memory and then loaded into an FPU register.

According to the message from Robert Kausch ( http://lists.xiph.org/pipermail/flac-dev/2014-June/004723.html ) this bug exists since GCC 4.4.
Comment 1 lvqcl.mail 2014-06-05 15:54:16 UTC
Created attachment 32897 [details]
test program
Comment 2 Richard Biener 2014-06-06 08:07:37 UTC
Confirmed.  Works with -mfpmath=sse.  Initial RTL looks ok to me:

(insn 21 20 22 (set (reg:SI 99 [ stmp_sum_5.8 ])
        (vec_select:SI (reg:V4SI 98 [ vect_sum_5.9 ])
            (parallel [
                    (const_int 0 [0])
                ]))) -1
     (nil))

(insn 22 21 23 (parallel [
            (set (reg:DF 97 [ t ])
                (unsigned_float:DF (reg:SI 99 [ stmp_sum_5.8 ])))
            (clobber (mem/c:DI (plus:SI (reg/f:SI 78 virtual-stack-vars)
                        (const_int -8 [0xfffffffffffffff8])) [0  S8 A64]))
            (clobber (scratch:SI))
        ]) t.c:18 -1
     (nil))

(insn 23 22 24 (set (reg:DF 94 [ <retval> ])
        (reg:DF 97 [ t ])) t.c:19 -1
     (nil))

So it must be a bogus *floatunssidf2_1 pattern

(insn 22 21 28 4 (parallel [
            (set (reg:DF 8 st [orig:97 t ] [97])
                (unsigned_float:DF (reg:SI 21 xmm0 [orig:99 stmp_sum_5.8 ] [99])))
            (clobber (mem/c:DI (plus:SI (reg/f:SI 7 sp)
                        (const_int 8 [0x8])) [0  S8 A64]))
            (clobber (scratch:SI))
        ]) t.c:18 211 {*floatunssidf2_1}
     (nil))

split to

(insn 39 38 40 4 (set (mem/c:DI (plus:SI (reg/f:SI 7 sp)
                (const_int 8 [0x8])) [0  S8 A64])
        (reg:DI 21 xmm0 [orig:99 stmp_sum_5.8 ] [99])) t.c:18 89 {*movdi_internal}   
     (nil))
(insn 40 39 28 4 (set (reg:DF 8 st [orig:97 t ] [97])
        (float:DF (mem/c:DI (plus:SI (reg/f:SI 7 sp)
                    (const_int 8 [0x8])) [0  S8 A64]))) t.c:18 206 {*floatdidf2_i387}
     (nil))

note the use of a DImode memory but the missing zero-extend of xmm0:DI.
Comment 3 Uroš Bizjak 2014-06-06 10:28:04 UTC
Looking into it.
Comment 4 Uroš Bizjak 2014-06-06 15:30:40 UTC
Created attachment 32901 [details]
Proposed patch

This patch fies invalid code gen, but unfortunately uncovers a problem in REE pass and fails bootstrap when configured "--with-arch=core-avx-i --with-cpu=core-avx-i".
Comment 5 Uroš Bizjak 2014-06-06 15:37:49 UTC
Created attachment 32903 [details]
Testcase that fails in REE pass

This testcase fails with patched gcc in REE pass:

cc1 -O2 -m32 -march=corei7 libgcc2.i

/home/uros/gcc-svn/trunk/libgcc/libgcc2.c: In function ‘__fixunssfdi’:
/home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1492:1: error: insn does not satisfy its constraints:
 }
 ^
(insn 54 11 47 2 (set (reg:DI 0 ax)
        (reg:DI 21 xmm0)) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1444 89 {*movdi_internal}
     (expr_list:REG_UNUSED (reg:DI 0 ax)
        (nil)))
/home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1492:1: internal compiler error: in copyprop_hardreg_forward_1, at regcprop.c:776
...

_split2 pass produces correct sequence:

(insn 46 11 47 2 (set (reg:DI 21 xmm0 [118])
        (zero_extend:DI (reg/v:SI 0 ax [orig:85 hi ] [85]))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 133 {*zero_extendsidi2}
     (nil))
(insn 47 46 48 2 (set (mem/c:DI (reg/f:SI 7 sp) [0  S8 A64])
        (reg:DI 21 xmm0 [118])) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 89 {*movdi_internal}
     (nil))
(insn 48 47 13 2 (set (reg:DF 9 st(1) [orig:101 D.6895 ] [101])
        (float:DF (mem/c:DI (reg/f:SI 7 sp) [0  S8 A64]))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 206 {*floatdidf2_i387}
     (nil))

and _ree pass converts this to:

(insn 11 45 54 2 (set (reg:DI 21 xmm0)
        (zero_extend:DI (reg:SI 2 cx [99]))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1444 133 {*zero_extendsidi2}
     (nil))
(insn 54 11 47 2 (set (reg:DI 0 ax)
        (reg:DI 21 xmm0)) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1444 -1
     (nil))
(insn 47 54 48 2 (set (mem/c:DI (reg/f:SI 7 sp) [0  S8 A64])
        (reg:DI 21 xmm0 [118])) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 89 {*movdi_internal}
     (nil))
(insn 48 47 13 2 (set (reg:DF 9 st(1) [orig:101 D.6895 ] [101])
        (float:DF (mem/c:DI (reg/f:SI 7 sp) [0  S8 A64]))) /home/uros/gcc-svn/trunk/libgcc/libgcc2.c:1452 206 {*floatdidf2_i387}
     (nil))


Please note ivalid (insn 54).
Comment 6 Uroš Bizjak 2014-06-06 15:42:46 UTC
Jeff, can you please look at what goes wrong in REE pass?
Comment 7 uros 2014-06-06 17:45:42 UTC
Author: uros
Date: Fri Jun  6 17:45:10 2014
New Revision: 211321

URL: http://gcc.gnu.org/viewcvs?rev=211321&root=gcc&view=rev
Log:
	PR target/61423
	* config/i386/i386.md (*floatunssi<mode>2_i387_with_xmm): New
	define_insn_and_split pattern, merged from *floatunssi<mode>2_1
	and corresponding splitters.  Zero extend general register
	or memory input operand to XMM temporary.  Enable for
	TARGET_SSE2 and TARGET_INTER_UNIT_MOVES_TO_VEC only.
	(floatunssi<mode>2): Update expander predicate.

testsuite/ChangeLog:

	PR target/61423
	* gcc.target/i386/pr61423.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr61423.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/testsuite/ChangeLog
Comment 8 Uroš Bizjak 2014-06-08 08:21:36 UTC
(In reply to Uroš Bizjak from comment #5)
> Created attachment 32903 [details]
> Testcase that fails in REE pass
> 
> This testcase fails with patched gcc in REE pass:

This is now PR61446.
Comment 9 Jakub Jelinek 2014-06-11 16:17:34 UTC
Shouldn't this patch be reverted until REE is fixed?
Comment 10 Jeffrey A. Law 2014-06-11 21:27:09 UTC
Either way will cause a failure.  This is #1 on my hit list after the regex problem that's affecting mysql.
Comment 11 Jeffrey A. Law 2014-06-12 21:20:35 UTC
I expect the REE bug to be fixed tomorrow AM (my time).  It's a trivial issue.
Comment 12 uros 2014-06-17 05:01:24 UTC
Author: uros
Date: Tue Jun 17 05:00:52 2014
New Revision: 211723

URL: https://gcc.gnu.org/viewcvs?rev=211723&root=gcc&view=rev
Log:
	Backport from mainline
	2014-06-06  Uros Bizjak  <ubizjak@gmail.com>

	PR target/61423
	* config/i386/i386.md (*floatunssi<mode>2_i387_with_xmm): New
	define_insn_and_split pattern, merged from *floatunssi<mode>2_1
	and corresponding splitters.  Zero extend general register
	or memory input operand to XMM temporary.  Enable for
	TARGET_SSE2 and TARGET_INTER_UNIT_MOVES_TO_VEC only.
	(floatunssi<mode>2): Update expander predicate.

testsuite/ChangeLog:

	Backport from mainline
	2014-06-06  Uros Bizjak  <ubizjak@gmail.com>

	PR target/61423
	* gcc.target/i386/pr61423.c: New test.


Added:
    branches/gcc-4_9-branch/gcc/testsuite/gcc.target/i386/pr61423.c
Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/config/i386/i386.md
    branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
Comment 13 uros 2014-06-18 20:02:08 UTC
Author: uros
Date: Wed Jun 18 20:01:37 2014
New Revision: 211803

URL: https://gcc.gnu.org/viewcvs?rev=211803&root=gcc&view=rev
Log:
	Backport from mainline
	2014-06-06  Uros Bizjak  <ubizjak@gmail.com>

	PR target/61423
	* config/i386/i386.md (*floatunssi<mode>2_i387_with_xmm): New
	define_insn_and_split pattern, merged from *floatunssi<mode>2_1
	and corresponding splitters.  Zero extend general register
	or memory input operand to XMM temporary.  Enable for
	TARGET_SSE2 and TARGET_INTER_UNIT_MOVES_TO_VEC only.
	(floatunssi<mode>2): Update expander predicate.

testsuite/ChangeLog:

	Backport from mainline
	2014-06-13  Ilya Enkovich  <ilya.enkovich@intel.com>

	PR rtl-optimization/61094
	PR rtl-optimization/61446
	* gcc.target/i386/pr61446.c : New.

	Backport from mainline
	2014-06-06  Uros Bizjak  <ubizjak@gmail.com>

	PR target/61423
	* gcc.target/i386/pr61423.c: New test.


Added:
    branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr61423.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr61446.c
Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/config/i386/i386.md
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
Comment 14 Uroš Bizjak 2014-06-18 20:03:32 UTC
Fixed in all release branches.