Bug 42542 - Vectorizer produces incorrect results on max/min of unsigned intergers
Summary: Vectorizer produces incorrect results on max/min of unsigned intergers
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.5
: P3 normal
Target Milestone: 4.3.6
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2009-12-29 16:11 UTC by Debian GCC Maintainers
Modified: 2010-01-17 19:00 UTC (History)
8 users (show)

See Also:
Host:
Target: x86, ia64
Build:
Known to work:
Known to fail: 4.3.4 4.4.2 4.5.0
Last reconfirmed: 2009-12-29 19:41:02


Attachments
A patch (744 bytes, patch)
2009-12-29 23:50 UTC, H.J. Lu
Details | Diff
An updated patch (982 bytes, patch)
2009-12-30 04:49 UTC, H.J. Lu
Details | Diff
ia64 patch (fixes int, not short or char) (501 bytes, patch)
2010-01-11 23:32 UTC, Steve Ellcey
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Debian GCC Maintainers 2009-12-29 16:11:12 UTC
seen at least on x86_64. bug submitter writes:

g++ appears to be (incorrectly) using a signed int comparison when -O3 optimization is enabled. Compiling with -O2 or lower produces the correct output.

#include <numeric>
#include <iostream>

template <typename T>
struct maximum
{
    T operator()(T& a, T& b)
    { 
        return (a < b) ? b : a;
    }
};

int main(void)
{
    typedef unsigned int T;

    T data[13] = {2411691434,  187758716, 2874577865, 1532192406,
                   850395381, 3670100461, 1052104929,  352534891,
                  1000719294, 2219234747, 4264598888, 4166615811,
                  1898580612};

    T init = 0;

    T result = std::accumulate(data, data + 13, init, maximum<T>());

    std::cout << result << std::endl;

    return 0;
}

$ g++ -O2 -Wall bug.cpp && ./a.out 
4264598888
g++ -O3 -Wall bug.cpp && ./a.out 
1898580612
Comment 1 H.J. Lu 2009-12-29 19:41:02 UTC
With -Wall, icc 11.1 complains:

pr42542.cc(10): remark #981: operands are evaluated in unspecified order
          return (a < b) ? b : a;
                    ^
          detected during instantiation of "_Tp std::accumulate(_InputIterator, _InputIterator, _Tp, _BinaryOperation) [with _InputIterator=unsigned int *, _Tp=unsigned int, _BinaryOperation=maximum<unsigned int>]" at line 25

Add "-fno-tree-vectorize" generates the correct result.
Comment 2 H.J. Lu 2009-12-29 19:44:03 UTC
Add "-ftree-vectorize" will cause it to fail.
Comment 3 H.J. Lu 2009-12-29 19:45:26 UTC
It could be a target issue.
Comment 4 H.J. Lu 2009-12-29 19:51:55 UTC
There are no unsigned integer vector compare insns on x86.
Comment 5 H.J. Lu 2009-12-29 20:48:05 UTC
"-ftree-vectorize -msse4" works fine since it generates pmaxud in SSE4.1.
It seems that gcc doesn't properly emulate pmaxud.
Comment 6 H.J. Lu 2009-12-29 20:52:12 UTC
The bug may be in "umaxv4si3" pattern.
Comment 7 H.J. Lu 2009-12-29 21:40:08 UTC
Here is a testcase in C:

[hjl@gnu-6 tmp]$ cat y.c
unsigned int foo[] __attribute__ ((aligned(16))) =
{
  0x80000000, 1, 0xa0000000, 2,
  3, 0xd0000000, 0xf0000000, 0xe0000000
};
unsigned int bar[] __attribute__ ((aligned(16))) =
{
  4, 0xb0000000, 5, 0xc0000000,
  0xd0000000, 6, 7, 8
};

unsigned int val[] =
{
  0x80000000, 0xb0000000, 0xa0000000, 0xc0000000,
  0xd0000000, 0xd0000000, 0xf0000000, 0xe0000000
};

extern void abort ();

void
xxxx ()
{
  int i;

  for (i = 0; i < 8; i++)
    foo[i] = foo[i] < bar [i] ? bar [i] : foo[i];
}

int
main ()
{
  int i;

  xxxx ();
  for (i = 0; i < 8; i++)
    if (val[i] != foo[i])
      abort ();

  return 0;
}
[hjl@gnu-6 tmp]$ gcc /tmp/y.c -O2 -ftree-vectorize
[hjl@gnu-6 tmp]$ ./a.out 
Aborted (core dumped)
[hjl@gnu-6 tmp]$ 
Comment 8 H.J. Lu 2009-12-29 23:50:01 UTC
Created attachment 19420 [details]
A patch

I don't see how we can easily check unsigned underflow in vector
integer subtraction. This patch simply disables it.
Comment 9 H.J. Lu 2009-12-30 04:47:32 UTC
Hi Richard, the code in question is added by

http://gcc.gnu.org/ml/gcc-patches/2005-06/msg02185.html

I don't quite understand how it can properly handle unsigned V4SI
underflow in V4SI vector subtraction.
Comment 10 H.J. Lu 2009-12-30 04:49:20 UTC
Created attachment 19422 [details]
An updated patch

I am testing this patch now.
Comment 11 H.J. Lu 2009-12-30 15:50:16 UTC
A patch is posted at

http://gcc.gnu.org/ml/gcc-patches/2009-12/msg01208.html
Comment 12 hjl@gcc.gnu.org 2010-01-04 15:14:51 UTC
Subject: Bug 42542

Author: hjl
Date: Mon Jan  4 15:14:31 2010
New Revision: 155618

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155618
Log:
Don't convert GTU to GT for V4SI and V2DI

gcc/

2010-01-04  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/42542
	* config/i386/i386.c (ix86_expand_int_vcond): Don't convert
	GTU to GT for V4SI and V2DI.

	* config/i386/sse.md (umaxv4si3): Enabled for SSE4.1 and XOP.
	(umin<mode>3): Removed.
	(uminv8hi3): New.
	(uminv4si3): Likewise.

gcc/testsuite/

2010-01-04  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/42542
	* gcc.target/i386/pr42542-1.c: New.
	* gcc.target/i386/pr42542-1a.c: Likewise.
	* gcc.target/i386/pr42542-1b.c: Likewise.
	* gcc.target/i386/pr42542-2.c: Likewise.
	* gcc.target/i386/pr42542-2a.c: Likewise.
	* gcc.target/i386/pr42542-2b.c: Likewise.
	* gcc.target/i386/pr42542-3.c: Likewise.
	* gcc.target/i386/pr42542-3a.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr42542-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-1a.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-1b.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-2.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-2a.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-2b.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-3.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-3a.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Comment 13 hjl@gcc.gnu.org 2010-01-05 20:44:28 UTC
Subject: Bug 42542

Author: hjl
Date: Tue Jan  5 20:44:14 2010
New Revision: 155660

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155660
Log:
Properly convert GTU to GT for V4SI and V2DI

gcc/

2010-01-05  Paolo Bonzini  <bonzinI@gnu.rg>
	    H.J. Lu  <hongjiu.lu@intel.com>

	PR target/42542
	* config/i386/i386.c (ix86_expand_int_vcond): Convert GTU to GT
	for V4SI and V2DI by subtracting (-(INT MAX) - 1) from both
	operands to make them signed.

	* config/i386/sse.md (umaxv4si3): Revert the last change.
	(umin<mode>3): Likewise.
	(uminv8hi3): Removed.
	(uminv4si3): Likewise.

gcc/testsuite/

2010-01-05  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/pr42542-1.c (res): Make it 8 elements.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/pr42542-1.c

Comment 14 hjl@gcc.gnu.org 2010-01-05 23:53:44 UTC
Subject: Bug 42542

Author: hjl
Date: Tue Jan  5 23:53:29 2010
New Revision: 155666

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155666
Log:
Add smaxv2di3, umaxv2di3, sminv2di3 and uminv2di3

gcc/

2010-01-05  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/42542
	* config/i386/sse.md (smaxv2di3): New.
	(umaxv2di3): Likewise.
	(sminv2di3): Likewise.
	(uminv2di3): Likewise.

gcc/testsuite/

2010-01-05  H.J. Lu  <hongjiu.lu@intel.com>
 
	PR target/42542
	* gcc.target/i386/pr42542-4.c: New.
	* gcc.target/i386/pr42542-4a.c: Likewise.
	* gcc.target/i386/pr42542-5.c: Likewise.
	* gcc.target/i386/pr42542-5a.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr42542-4.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-4a.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-5.c
    trunk/gcc/testsuite/gcc.target/i386/pr42542-5a.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Comment 15 Ozkan Sezer 2010-01-06 08:55:17 UTC
Can we expect a 4.4 backport for this, at least in the ix86/4.4 branch if not in the main 4.4 branch?
Comment 16 H.J. Lu 2010-01-06 14:59:19 UTC
(In reply to comment #15)
> Can we expect a 4.4 backport for this, at least in the ix86/4.4 branch if not
> in the main 4.4 branch?
> 

It will be backported to 4.3/4.4 in a few days.
Comment 17 hjl@gcc.gnu.org 2010-01-07 19:55:55 UTC
Subject: Bug 42542

Author: hjl
Date: Thu Jan  7 19:55:44 2010
New Revision: 155707

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155707
Log:
Properly convert GTU to GT for V4SI and V2DI

gcc/

2010-01-07  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline
	2010-01-05  Paolo Bonzini  <bonzinI@gnu.rg>
		    H.J. Lu  <hongjiu.lu@intel.com>

	PR target/42542
	* config/i386/i386.c (ix86_expand_int_vcond): Convert GTU to GT
	for V4SI and V2DI by subtracting (-(INT MAX) - 1) from both
	operands to make them signed.

gcc/testsuite/

2010-01-07  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline
	2010-01-05  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/42542
	* gcc.target/i386/pr42542-1.c: New.
	* gcc.target/i386/pr42542-1a.c: Likewise.
	* gcc.target/i386/pr42542-1b.c: Likewise.
	* gcc.target/i386/pr42542-2.c: Likewise.
	* gcc.target/i386/pr42542-2a.c: Likewise.
	* gcc.target/i386/pr42542-2b.c: Likewise.
	* gcc.target/i386/pr42542-3.c: Likewise.
	* gcc.target/i386/pr42542-3a.c: Likewise.

Added:
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-1.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-1a.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-1b.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-2.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-2a.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-2b.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-3.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-3a.c
Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/config/i386/i386.c
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog

Comment 18 hjl@gcc.gnu.org 2010-01-07 19:58:26 UTC
Subject: Bug 42542

Author: hjl
Date: Thu Jan  7 19:58:16 2010
New Revision: 155709

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155709
Log:
Properly convert GTU to GT for V4SI and V2DI

gcc/

2010-01-07  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline
	2010-01-05  Paolo Bonzini  <bonzinI@gnu.rg>
		    H.J. Lu  <hongjiu.lu@intel.com>

	PR target/42542
	* config/i386/i386.c (ix86_expand_int_vcond): Convert GTU to GT
	for V4SI and V2DI by subtracting (-(INT MAX) - 1) from both
	operands to make them signed.

gcc/testsuite/

2010-01-07  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline
	2010-01-05  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/42542
	* gcc.target/i386/pr42542-1.c: New.
	* gcc.target/i386/pr42542-1a.c: Likewise.
	* gcc.target/i386/pr42542-1b.c: Likewise.
	* gcc.target/i386/pr42542-2.c: Likewise.
	* gcc.target/i386/pr42542-2a.c: Likewise.
	* gcc.target/i386/pr42542-2b.c: Likewise.
	* gcc.target/i386/pr42542-3.c: Likewise.
	* gcc.target/i386/pr42542-3a.c: Likewise.

Added:
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-1.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-1a.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-1b.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-2.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-2a.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-2b.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-3.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-3a.c
Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/config/i386/i386.c
    branches/gcc-4_3-branch/gcc/testsuite/ChangeLog

Comment 19 H.J. Lu 2010-01-07 20:00:12 UTC
Fixed.
Comment 20 Uroš Bizjak 2010-01-08 07:48:33 UTC
According to http://gcc.gnu.org/ml/gcc-patches/2010-01/msg00213.html, ia64 should be fixed in the same way as x86.

The wrong code is located in ia64/ia64.c ia64_expand_vecint_compare, around line 1730. Correct code is in i386/i386.c ix86_expand_int_vcond, around line 16250.

So, adding ia64 maintainer to CC and reopening PR as ia64 target bug.
Comment 21 Steve Ellcey 2010-01-11 23:32:32 UTC
Created attachment 19544 [details]
ia64 patch (fixes int, not short or char)
Comment 22 Steve Ellcey 2010-01-11 23:33:00 UTC
I am looking at this on IA64 and fixing it for V2SI seems simple enough.  
I will attach a patch.  But I am not sure what to do for V4HI and V8QI. 
The current code uses an 'unsigned saturating subtraction' and that
seems to be what x86 is using for V16QI and V8HI.  But when I run
pr42542-2.c and pr42542-3.c (short and char) on IA64 they fail.  My
patch does make pr42542-1.c (int) work.  I tried handling V4HI and V8QI
in the same way as V2SI but that didn't seem to work either.
Comment 23 H.J. Lu 2010-01-12 00:41:22 UTC
(In reply to comment #22)
> I am looking at this on IA64 and fixing it for V2SI seems simple enough.  
> I will attach a patch.  But I am not sure what to do for V4HI and V8QI. 
> The current code uses an 'unsigned saturating subtraction' and that
> seems to be what x86 is using for V16QI and V8HI.  But when I run
> pr42542-2.c and pr42542-3.c (short and char) on IA64 they fail.  My
> patch does make pr42542-1.c (int) work.  I tried handling V4HI and V8QI
> in the same way as V2SI but that didn't seem to work either.
> 

What was the problem with V4HI and V8QI? The existing code works
OK on pr42542-2.c and pr42542-3.c for me.
Comment 24 Steve Ellcey 2010-01-12 00:58:07 UTC
Never mind, when I copied (and modified) the x86 tests for ia64 I forgot to put a 'return 0' at the end of the main program so I was getting a non-zero exit.  I will test my patch tonight and if all looks good submit it tomorrow.
Comment 25 hjl@gcc.gnu.org 2010-01-17 18:52:12 UTC
Subject: Bug 42542

Author: hjl
Date: Sun Jan 17 18:51:47 2010
New Revision: 155988

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155988
Log:
Backport ia64 fix for PR target/42542 from mainline.

gcc/

2010-01-17  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline:
	2010-01-13  Steve Ellcey  <sje@cup.hp.com>

	PR target/42542
	* config/ia64/ia64.c (ia64_expand_vecint_compare): Convert GTU to GT
	for V2SI by subtracting (-(INT MAX) - 1) from both operands to make
	them signed.

gcc/testsuite/

2010-01-17  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline:
	2010-01-13  Steve Ellcey  <sje@cup.hp.com>

	PR target/42542
	* gcc.target/ia64/pr42542-1.c: New.
	* gcc.target/ia64/pr42542-2.c: New.
	* gcc.target/ia64/pr42542-3.c: New.

Added:
    branches/ix86/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-1.c
      - copied unchanged from r155987, trunk/gcc/testsuite/gcc.target/ia64/pr42542-1.c
    branches/ix86/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-2.c
      - copied unchanged from r155987, trunk/gcc/testsuite/gcc.target/ia64/pr42542-2.c
    branches/ix86/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-3.c
      - copied unchanged from r155987, trunk/gcc/testsuite/gcc.target/ia64/pr42542-3.c
Modified:
    branches/ix86/gcc-4_4-branch/gcc/ChangeLog
    branches/ix86/gcc-4_4-branch/gcc/config/ia64/ia64.c
    branches/ix86/gcc-4_4-branch/gcc/testsuite/ChangeLog

Comment 26 hjl@gcc.gnu.org 2010-01-17 18:55:13 UTC
Subject: Bug 42542

Author: hjl
Date: Sun Jan 17 18:55:03 2010
New Revision: 155989

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155989
Log:
Backport ia64 fix for PR target/42542 from mainline.

gcc/

2010-01-17  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline:
	2010-01-13  Steve Ellcey  <sje@cup.hp.com>

	PR target/42542
	* config/ia64/ia64.c (ia64_expand_vecint_compare): Convert GTU to GT
	for V2SI by subtracting (-(INT MAX) - 1) from both operands to make
	them signed.

gcc/testsuite/

2010-01-17  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline:
	2010-01-13  Steve Ellcey  <sje@cup.hp.com>

	PR target/42542
	* gcc.target/ia64/pr42542-1.c: New.
	* gcc.target/ia64/pr42542-2.c: New.
	* gcc.target/ia64/pr42542-3.c: New.

Added:
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/ia64/pr42542-1.c
      - copied unchanged from r155988, trunk/gcc/testsuite/gcc.target/ia64/pr42542-1.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/ia64/pr42542-2.c
      - copied unchanged from r155988, trunk/gcc/testsuite/gcc.target/ia64/pr42542-2.c
    branches/gcc-4_3-branch/gcc/testsuite/gcc.target/ia64/pr42542-3.c
      - copied unchanged from r155988, trunk/gcc/testsuite/gcc.target/ia64/pr42542-3.c
Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/config/ia64/ia64.c
    branches/gcc-4_3-branch/gcc/testsuite/ChangeLog

Comment 27 hjl@gcc.gnu.org 2010-01-17 18:57:46 UTC
Subject: Bug 42542

Author: hjl
Date: Sun Jan 17 18:57:33 2010
New Revision: 155990

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155990
Log:
Backport ia64 fix for PR target/42542 from mainline.

gcc/

2010-01-17  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline:
	2010-01-13  Steve Ellcey  <sje@cup.hp.com>

	PR target/42542
	* config/ia64/ia64.c (ia64_expand_vecint_compare): Convert GTU to GT
	for V2SI by subtracting (-(INT MAX) - 1) from both operands to make
	them signed.

gcc/testsuite/

2010-01-17  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline:
	2010-01-13  Steve Ellcey  <sje@cup.hp.com>

	PR target/42542
	* gcc.target/ia64/pr42542-1.c: New.
	* gcc.target/ia64/pr42542-2.c: New.
	* gcc.target/ia64/pr42542-3.c: New.

Added:
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-1.c
      - copied unchanged from r155989, trunk/gcc/testsuite/gcc.target/ia64/pr42542-1.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-2.c
      - copied unchanged from r155989, trunk/gcc/testsuite/gcc.target/ia64/pr42542-2.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-3.c
      - copied unchanged from r155989, trunk/gcc/testsuite/gcc.target/ia64/pr42542-3.c
Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/config/ia64/ia64.c
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog

Comment 28 H.J. Lu 2010-01-17 19:00:22 UTC
Fixed.