Bug 51534

Summary: Bad code gen for vcgtq_u32 NEON intrinsic
Product: gcc Reporter: Ryan Mansfield <rmansfield>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Severity: normal    
Priority: P3    
Version: 4.7.0   
Target Milestone: ---   
Host: i686-unknown-linux-gnu Target: arm-unknown-linux-gnueabi
Build: i686-unknown-linux-gnu Known to work:
Known to fail: Last reconfirmed: 2011-12-14 00:00:00

Description Ryan Mansfield 2011-12-13 19:28:12 UTC
$ ./xgcc -v
Using built-in specs.
Target: arm-unknown-linux-gnueabi
Configured with: ../configure --target=arm-unknown-linux-gnueabi --prefix=/home/ryan/x-tools/arm-unknown-linux-gnueabi --with-sysroot=/home/ryan/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --disable-multilib --with-local-prefix=/home/ryan/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sys-root --disable-nls --enable-threads=posix --enable-symvers=gnu --enable-c99 --enable-long-long --enable-target-optspace target_alias=arm-unknown-linux-gnueabi --enable-languages=c++ --disable-shared --disable-libmudflap --disable-libssp
Thread model: posix
gcc version 4.7.0 20111213 (experimental) [trunk revision 182291] (GCC) 

$ cat ~/foo.c
#include <arm_neon.h>

void foo (unsigned * src, unsigned *dst, int width)
  const int32x4_t vec_alpha_shift = vdupq_n_s32 (0);
  const uint32x4_t vec_one = vdupq_n_u32 (1u);
  const uint32x4_t vec_zero = vdupq_n_u32 (0u);

while (width >= 4)
      uint32x4_t s0 = vld1q_u32 (src);
      uint32x4_t d0 = vld1q_u32 (dst);
      uint32x4_t vec_alpha = vshlq_u32 (s0, vec_alpha_shift);
      vec_alpha =
	vaddq_u32 (vec_alpha,
		   vandq_u32 (vcgtq_u32 (vec_alpha, vec_zero), vec_one));
      s0 = vmulq_u32 (s0, vec_alpha);
      d0 = vaddq_u32 (s0, d0);
      vst1q_u32 (dst, d0);

$ ./xgcc -B. -O3 -ftree-vectorize -mfpu=neon -mfloat-abi=softfp ~/foo.c  -march=armv7-a -c

Changing the code from:

const uint32x4_t vec_zero = vdupq_n_u32 (0u)


const uint32x4_t vec_zero = vdupq_n_u32 (1u)

results in a proper reg load and operand to vcgt. 

 	vmov.i32	q9, #0  @ v4si
 	vld1.32	{d16-d17}, [r8]
+	vmov.i32	q12, #1  @ v4si
 	mov	r0, sl
 	vld1.32	{d20-d21}, [sl]
 	vshl.u32	q9, q8, q9
-	vcgt.u32	q11, q9, #0
+	vcgt.u32	q11, q9, q12
 	vand	q11, q11, q4
 	vadd.i32	q9, q9, q11
 	vmul.i32	q8, q8, q9

Also happens on the 4.6 branch. Compiles OK with 4.4 branch. I haven't checked 4.5 yet.
Comment 1 Ryan Mansfield 2011-12-13 19:47:28 UTC
I truncated the actual error emitted by the assembler.

$ ./xgcc -B. -O3 -ftree-vectorize -mfpu=neon -mfloat-abi=softfp ~/foo.c  -march=armv7-a -c
/tmp/ccCPCd9Z.s: Assembler messages:
/tmp/ccCPCd9Z.s:30: Error: bad type in Neon instruction -- `vcgt.u32 q11,q9,#0'
Comment 2 Richard Earnshaw 2011-12-14 14:36:51 UTC

The VCGT ..., #0 instruction only operates on signed types.  Also applies to VCLE.

Alternatives for unsigned are:
- load 0 into a register
- convert the comparison to VCEQ then, for GT only, invert the result.
Comment 3 mgretton 2012-02-28 16:14:03 UTC
Author: mgretton
Date: Tue Feb 28 16:13:52 2012
New Revision: 184629

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184629
	PR target/51534
	* gcc/config/arm/arm.c (neon_builtin_data): Add entries for vcgeu
	and vcgtu.
	* gcc/config/arm/arm_neon.h: Regenerate.
	* gcc/config/arm/neon.md (unspec): Add UNSPEC_VCGEU, and UNSPEC_VCGTU.
	(neon_vcgeu): New insn.
	(neon_vcgtu): Likewise.
	* gcc/config/arm/neon.ml (s_8_32, u_8_32): New lists.
	(ops): Unsigned comparison intrinsics call a different
	* gcc/testsuite/gcc.target/arm/neon/pr51534.c: New testcase.

Comment 4 mgretton 2012-02-28 16:17:44 UTC
Author: mgretton
Date: Tue Feb 28 16:17:36 2012
New Revision: 184630

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184630
	PR target/51534
	Add testcase forgotten in last commit, ChangeLog entry already present.

Comment 5 Ryan Mansfield 2014-08-10 16:46:07 UTC
Fixed awhile ago.