seen at least on x86_64. bug submitter writes: g++ appears to be (incorrectly) using a signed int comparison when -O3 optimization is enabled. Compiling with -O2 or lower produces the correct output. #include <numeric> #include <iostream> template <typename T> struct maximum { T operator()(T& a, T& b) { return (a < b) ? b : a; } }; int main(void) { typedef unsigned int T; T data[13] = {2411691434, 187758716, 2874577865, 1532192406, 850395381, 3670100461, 1052104929, 352534891, 1000719294, 2219234747, 4264598888, 4166615811, 1898580612}; T init = 0; T result = std::accumulate(data, data + 13, init, maximum<T>()); std::cout << result << std::endl; return 0; } $ g++ -O2 -Wall bug.cpp && ./a.out 4264598888 g++ -O3 -Wall bug.cpp && ./a.out 1898580612
With -Wall, icc 11.1 complains: pr42542.cc(10): remark #981: operands are evaluated in unspecified order return (a < b) ? b : a; ^ detected during instantiation of "_Tp std::accumulate(_InputIterator, _InputIterator, _Tp, _BinaryOperation) [with _InputIterator=unsigned int *, _Tp=unsigned int, _BinaryOperation=maximum<unsigned int>]" at line 25 Add "-fno-tree-vectorize" generates the correct result.
Add "-ftree-vectorize" will cause it to fail.
It could be a target issue.
There are no unsigned integer vector compare insns on x86.
"-ftree-vectorize -msse4" works fine since it generates pmaxud in SSE4.1. It seems that gcc doesn't properly emulate pmaxud.
The bug may be in "umaxv4si3" pattern.
Here is a testcase in C: [hjl@gnu-6 tmp]$ cat y.c unsigned int foo[] __attribute__ ((aligned(16))) = { 0x80000000, 1, 0xa0000000, 2, 3, 0xd0000000, 0xf0000000, 0xe0000000 }; unsigned int bar[] __attribute__ ((aligned(16))) = { 4, 0xb0000000, 5, 0xc0000000, 0xd0000000, 6, 7, 8 }; unsigned int val[] = { 0x80000000, 0xb0000000, 0xa0000000, 0xc0000000, 0xd0000000, 0xd0000000, 0xf0000000, 0xe0000000 }; extern void abort (); void xxxx () { int i; for (i = 0; i < 8; i++) foo[i] = foo[i] < bar [i] ? bar [i] : foo[i]; } int main () { int i; xxxx (); for (i = 0; i < 8; i++) if (val[i] != foo[i]) abort (); return 0; } [hjl@gnu-6 tmp]$ gcc /tmp/y.c -O2 -ftree-vectorize [hjl@gnu-6 tmp]$ ./a.out Aborted (core dumped) [hjl@gnu-6 tmp]$
Created attachment 19420 [details] A patch I don't see how we can easily check unsigned underflow in vector integer subtraction. This patch simply disables it.
Hi Richard, the code in question is added by http://gcc.gnu.org/ml/gcc-patches/2005-06/msg02185.html I don't quite understand how it can properly handle unsigned V4SI underflow in V4SI vector subtraction.
Created attachment 19422 [details] An updated patch I am testing this patch now.
A patch is posted at http://gcc.gnu.org/ml/gcc-patches/2009-12/msg01208.html
Subject: Bug 42542 Author: hjl Date: Mon Jan 4 15:14:31 2010 New Revision: 155618 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155618 Log: Don't convert GTU to GT for V4SI and V2DI gcc/ 2010-01-04 H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * config/i386/i386.c (ix86_expand_int_vcond): Don't convert GTU to GT for V4SI and V2DI. * config/i386/sse.md (umaxv4si3): Enabled for SSE4.1 and XOP. (umin<mode>3): Removed. (uminv8hi3): New. (uminv4si3): Likewise. gcc/testsuite/ 2010-01-04 H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * gcc.target/i386/pr42542-1.c: New. * gcc.target/i386/pr42542-1a.c: Likewise. * gcc.target/i386/pr42542-1b.c: Likewise. * gcc.target/i386/pr42542-2.c: Likewise. * gcc.target/i386/pr42542-2a.c: Likewise. * gcc.target/i386/pr42542-2b.c: Likewise. * gcc.target/i386/pr42542-3.c: Likewise. * gcc.target/i386/pr42542-3a.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr42542-1.c trunk/gcc/testsuite/gcc.target/i386/pr42542-1a.c trunk/gcc/testsuite/gcc.target/i386/pr42542-1b.c trunk/gcc/testsuite/gcc.target/i386/pr42542-2.c trunk/gcc/testsuite/gcc.target/i386/pr42542-2a.c trunk/gcc/testsuite/gcc.target/i386/pr42542-2b.c trunk/gcc/testsuite/gcc.target/i386/pr42542-3.c trunk/gcc/testsuite/gcc.target/i386/pr42542-3a.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
Subject: Bug 42542 Author: hjl Date: Tue Jan 5 20:44:14 2010 New Revision: 155660 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155660 Log: Properly convert GTU to GT for V4SI and V2DI gcc/ 2010-01-05 Paolo Bonzini <bonzinI@gnu.rg> H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * config/i386/i386.c (ix86_expand_int_vcond): Convert GTU to GT for V4SI and V2DI by subtracting (-(INT MAX) - 1) from both operands to make them signed. * config/i386/sse.md (umaxv4si3): Revert the last change. (umin<mode>3): Likewise. (uminv8hi3): Removed. (uminv4si3): Likewise. gcc/testsuite/ 2010-01-05 H.J. Lu <hongjiu.lu@intel.com> * gcc.target/i386/pr42542-1.c (res): Make it 8 elements. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr42542-1.c
Subject: Bug 42542 Author: hjl Date: Tue Jan 5 23:53:29 2010 New Revision: 155666 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155666 Log: Add smaxv2di3, umaxv2di3, sminv2di3 and uminv2di3 gcc/ 2010-01-05 H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * config/i386/sse.md (smaxv2di3): New. (umaxv2di3): Likewise. (sminv2di3): Likewise. (uminv2di3): Likewise. gcc/testsuite/ 2010-01-05 H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * gcc.target/i386/pr42542-4.c: New. * gcc.target/i386/pr42542-4a.c: Likewise. * gcc.target/i386/pr42542-5.c: Likewise. * gcc.target/i386/pr42542-5a.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr42542-4.c trunk/gcc/testsuite/gcc.target/i386/pr42542-4a.c trunk/gcc/testsuite/gcc.target/i386/pr42542-5.c trunk/gcc/testsuite/gcc.target/i386/pr42542-5a.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
Can we expect a 4.4 backport for this, at least in the ix86/4.4 branch if not in the main 4.4 branch?
(In reply to comment #15) > Can we expect a 4.4 backport for this, at least in the ix86/4.4 branch if not > in the main 4.4 branch? > It will be backported to 4.3/4.4 in a few days.
Subject: Bug 42542 Author: hjl Date: Thu Jan 7 19:55:44 2010 New Revision: 155707 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155707 Log: Properly convert GTU to GT for V4SI and V2DI gcc/ 2010-01-07 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline 2010-01-05 Paolo Bonzini <bonzinI@gnu.rg> H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * config/i386/i386.c (ix86_expand_int_vcond): Convert GTU to GT for V4SI and V2DI by subtracting (-(INT MAX) - 1) from both operands to make them signed. gcc/testsuite/ 2010-01-07 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline 2010-01-05 H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * gcc.target/i386/pr42542-1.c: New. * gcc.target/i386/pr42542-1a.c: Likewise. * gcc.target/i386/pr42542-1b.c: Likewise. * gcc.target/i386/pr42542-2.c: Likewise. * gcc.target/i386/pr42542-2a.c: Likewise. * gcc.target/i386/pr42542-2b.c: Likewise. * gcc.target/i386/pr42542-3.c: Likewise. * gcc.target/i386/pr42542-3a.c: Likewise. Added: branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-1.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-1a.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-1b.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-2.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-2a.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-2b.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-3.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr42542-3a.c Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/config/i386/i386.c branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
Subject: Bug 42542 Author: hjl Date: Thu Jan 7 19:58:16 2010 New Revision: 155709 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155709 Log: Properly convert GTU to GT for V4SI and V2DI gcc/ 2010-01-07 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline 2010-01-05 Paolo Bonzini <bonzinI@gnu.rg> H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * config/i386/i386.c (ix86_expand_int_vcond): Convert GTU to GT for V4SI and V2DI by subtracting (-(INT MAX) - 1) from both operands to make them signed. gcc/testsuite/ 2010-01-07 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline 2010-01-05 H.J. Lu <hongjiu.lu@intel.com> PR target/42542 * gcc.target/i386/pr42542-1.c: New. * gcc.target/i386/pr42542-1a.c: Likewise. * gcc.target/i386/pr42542-1b.c: Likewise. * gcc.target/i386/pr42542-2.c: Likewise. * gcc.target/i386/pr42542-2a.c: Likewise. * gcc.target/i386/pr42542-2b.c: Likewise. * gcc.target/i386/pr42542-3.c: Likewise. * gcc.target/i386/pr42542-3a.c: Likewise. Added: branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-1.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-1a.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-1b.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-2.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-2a.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-2b.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-3.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/i386/pr42542-3a.c Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/config/i386/i386.c branches/gcc-4_3-branch/gcc/testsuite/ChangeLog
Fixed.
According to http://gcc.gnu.org/ml/gcc-patches/2010-01/msg00213.html, ia64 should be fixed in the same way as x86. The wrong code is located in ia64/ia64.c ia64_expand_vecint_compare, around line 1730. Correct code is in i386/i386.c ix86_expand_int_vcond, around line 16250. So, adding ia64 maintainer to CC and reopening PR as ia64 target bug.
Created attachment 19544 [details] ia64 patch (fixes int, not short or char)
I am looking at this on IA64 and fixing it for V2SI seems simple enough. I will attach a patch. But I am not sure what to do for V4HI and V8QI. The current code uses an 'unsigned saturating subtraction' and that seems to be what x86 is using for V16QI and V8HI. But when I run pr42542-2.c and pr42542-3.c (short and char) on IA64 they fail. My patch does make pr42542-1.c (int) work. I tried handling V4HI and V8QI in the same way as V2SI but that didn't seem to work either.
(In reply to comment #22) > I am looking at this on IA64 and fixing it for V2SI seems simple enough. > I will attach a patch. But I am not sure what to do for V4HI and V8QI. > The current code uses an 'unsigned saturating subtraction' and that > seems to be what x86 is using for V16QI and V8HI. But when I run > pr42542-2.c and pr42542-3.c (short and char) on IA64 they fail. My > patch does make pr42542-1.c (int) work. I tried handling V4HI and V8QI > in the same way as V2SI but that didn't seem to work either. > What was the problem with V4HI and V8QI? The existing code works OK on pr42542-2.c and pr42542-3.c for me.
Never mind, when I copied (and modified) the x86 tests for ia64 I forgot to put a 'return 0' at the end of the main program so I was getting a non-zero exit. I will test my patch tonight and if all looks good submit it tomorrow.
Subject: Bug 42542 Author: hjl Date: Sun Jan 17 18:51:47 2010 New Revision: 155988 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155988 Log: Backport ia64 fix for PR target/42542 from mainline. gcc/ 2010-01-17 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline: 2010-01-13 Steve Ellcey <sje@cup.hp.com> PR target/42542 * config/ia64/ia64.c (ia64_expand_vecint_compare): Convert GTU to GT for V2SI by subtracting (-(INT MAX) - 1) from both operands to make them signed. gcc/testsuite/ 2010-01-17 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline: 2010-01-13 Steve Ellcey <sje@cup.hp.com> PR target/42542 * gcc.target/ia64/pr42542-1.c: New. * gcc.target/ia64/pr42542-2.c: New. * gcc.target/ia64/pr42542-3.c: New. Added: branches/ix86/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-1.c - copied unchanged from r155987, trunk/gcc/testsuite/gcc.target/ia64/pr42542-1.c branches/ix86/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-2.c - copied unchanged from r155987, trunk/gcc/testsuite/gcc.target/ia64/pr42542-2.c branches/ix86/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-3.c - copied unchanged from r155987, trunk/gcc/testsuite/gcc.target/ia64/pr42542-3.c Modified: branches/ix86/gcc-4_4-branch/gcc/ChangeLog branches/ix86/gcc-4_4-branch/gcc/config/ia64/ia64.c branches/ix86/gcc-4_4-branch/gcc/testsuite/ChangeLog
Subject: Bug 42542 Author: hjl Date: Sun Jan 17 18:55:03 2010 New Revision: 155989 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155989 Log: Backport ia64 fix for PR target/42542 from mainline. gcc/ 2010-01-17 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline: 2010-01-13 Steve Ellcey <sje@cup.hp.com> PR target/42542 * config/ia64/ia64.c (ia64_expand_vecint_compare): Convert GTU to GT for V2SI by subtracting (-(INT MAX) - 1) from both operands to make them signed. gcc/testsuite/ 2010-01-17 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline: 2010-01-13 Steve Ellcey <sje@cup.hp.com> PR target/42542 * gcc.target/ia64/pr42542-1.c: New. * gcc.target/ia64/pr42542-2.c: New. * gcc.target/ia64/pr42542-3.c: New. Added: branches/gcc-4_3-branch/gcc/testsuite/gcc.target/ia64/pr42542-1.c - copied unchanged from r155988, trunk/gcc/testsuite/gcc.target/ia64/pr42542-1.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/ia64/pr42542-2.c - copied unchanged from r155988, trunk/gcc/testsuite/gcc.target/ia64/pr42542-2.c branches/gcc-4_3-branch/gcc/testsuite/gcc.target/ia64/pr42542-3.c - copied unchanged from r155988, trunk/gcc/testsuite/gcc.target/ia64/pr42542-3.c Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/config/ia64/ia64.c branches/gcc-4_3-branch/gcc/testsuite/ChangeLog
Subject: Bug 42542 Author: hjl Date: Sun Jan 17 18:57:33 2010 New Revision: 155990 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155990 Log: Backport ia64 fix for PR target/42542 from mainline. gcc/ 2010-01-17 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline: 2010-01-13 Steve Ellcey <sje@cup.hp.com> PR target/42542 * config/ia64/ia64.c (ia64_expand_vecint_compare): Convert GTU to GT for V2SI by subtracting (-(INT MAX) - 1) from both operands to make them signed. gcc/testsuite/ 2010-01-17 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline: 2010-01-13 Steve Ellcey <sje@cup.hp.com> PR target/42542 * gcc.target/ia64/pr42542-1.c: New. * gcc.target/ia64/pr42542-2.c: New. * gcc.target/ia64/pr42542-3.c: New. Added: branches/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-1.c - copied unchanged from r155989, trunk/gcc/testsuite/gcc.target/ia64/pr42542-1.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-2.c - copied unchanged from r155989, trunk/gcc/testsuite/gcc.target/ia64/pr42542-2.c branches/gcc-4_4-branch/gcc/testsuite/gcc.target/ia64/pr42542-3.c - copied unchanged from r155989, trunk/gcc/testsuite/gcc.target/ia64/pr42542-3.c Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/config/ia64/ia64.c branches/gcc-4_4-branch/gcc/testsuite/ChangeLog