Bug 46419

Summary: [4.4, 4.5, 4.6 Regression] _mm_cvtpu16_ps (and hence _mm_cvtpu8_ps) returns false result
Product: gcc Reporter: release_candidate
Component: targetAssignee: Uroš Bizjak <ubizjak>
Status: RESOLVED FIXED    
Severity: normal CC: cck0011, release_candidate
Priority: P3 Keywords: ssemmx, wrong-code
Version: 4.4.5   
Target Milestone: 4.4.6   
URL: http://gcc.gnu.org/ml/gcc-patches/2010-11/msg01090.html
Host: Target: x86
Build: Known to work: 4.3.5
Known to fail: 4.4.5, 4.5.1 Last reconfirmed: 2010-11-10 22:14:13
Attachments: example code

Description release_candidate 2010-11-10 20:39:03 UTC
Created attachment 22367 [details]
example code

Dear GCC developers,

I guess the patch set <http://gcc.gnu.org/viewcvs?view=revision&revision=134558> broke the _mm_cvtpu16_ps() and _mm_cvtpu8_ps() intrinsics.

For demonstration, please refer to the attached example. It is intended to convert four chars (1,2,3,4) into a SSE float vector type (__m128) by using the Intel intrinsics _mm_cvtpu8_ps() and _mm_setr_pi8().

The output of the program compiled with gcc-4.3 is:
    image: 1 2 3 4
    out4:  1 2 3 4
This result is correct, and complies with Intel's intrinsic docs 
<http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_mmx_set.htm>
// <http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse_conversion.htm>, as well as the output of icc compilation.

The output of gcc-4.4 and gcc-4.5 compilation is:
    image: 1 2 3 4
    out4:  3 4 1 2


I was able to trace this back the change set referred above. If I include the old xmmintrin.h instead of the new header when using gcc-4.4, the result is correct again. I didn't study the changes of rev. 134558  in detail, and I do not know if the new algorithm is theoretically correct at all.


Could you please fix this bug?
I don't know about the other intrinsics touched by that patch.
Within this context, concerning the bug <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37496> might also be worth while.


Thanks,

Dirk
Comment 1 Uroš Bizjak 2010-11-10 22:14:13 UTC
Ugh...

Patch in testing:

Index: xmmintrin.h
===================================================================
--- xmmintrin.h	(revision 166558)
+++ xmmintrin.h	(working copy)
@@ -626,13 +626,13 @@
   __sign = __builtin_ia32_pcmpgtw ((__v4hi)0LL, (__v4hi)__A);
 
   /* Convert the four words to doublewords.  */
+  __losi = (__v2si) __builtin_ia32_punpcklwd ((__v4hi)__A, __sign);
   __hisi = (__v2si) __builtin_ia32_punpckhwd ((__v4hi)__A, __sign);
-  __losi = (__v2si) __builtin_ia32_punpcklwd ((__v4hi)__A, __sign);
 
   /* Convert the doublewords to floating point two at a time.  */
   __zero = (__v4sf) _mm_setzero_ps ();
-  __ra = __builtin_ia32_cvtpi2ps (__zero, __hisi);
-  __rb = __builtin_ia32_cvtpi2ps (__ra, __losi);
+  __ra = __builtin_ia32_cvtpi2ps (__zero, __losi);
+  __rb = __builtin_ia32_cvtpi2ps (__ra, __hisi);
 
   return (__m128) __builtin_ia32_movlhps (__ra, __rb);
 }
@@ -645,13 +645,13 @@
   __v4sf __zero, __ra, __rb;
 
   /* Convert the four words to doublewords.  */
+  __losi = (__v2si) __builtin_ia32_punpcklwd ((__v4hi)__A, (__v4hi)0LL);
   __hisi = (__v2si) __builtin_ia32_punpckhwd ((__v4hi)__A, (__v4hi)0LL);
-  __losi = (__v2si) __builtin_ia32_punpcklwd ((__v4hi)__A, (__v4hi)0LL);
 
   /* Convert the doublewords to floating point two at a time.  */
   __zero = (__v4sf) _mm_setzero_ps ();
-  __ra = __builtin_ia32_cvtpi2ps (__zero, __hisi);
-  __rb = __builtin_ia32_cvtpi2ps (__ra, __losi);
+  __ra = __builtin_ia32_cvtpi2ps (__zero, __losi);
+  __rb = __builtin_ia32_cvtpi2ps (__ra, __hisi);
 
   return (__m128) __builtin_ia32_movlhps (__ra, __rb);
 }
Comment 2 uros 2010-11-10 23:00:08 UTC
Author: uros
Date: Wed Nov 10 23:00:01 2010
New Revision: 166569

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166569
Log:
	PR middle-end/46419
	* config/i386/xmmintrin.h (_mm_cvtpi16_ps): Swap __hisi and __losi.
	(_mm_cvtpu16_ps): Ditto.

testsuite/ChangeLog:

	PR target/46419
	* gcc-target/i386/pr46419.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr46419.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/xmmintrin.h
    trunk/gcc/testsuite/ChangeLog
Comment 3 uros 2010-11-10 23:26:54 UTC
Author: uros
Date: Wed Nov 10 23:26:49 2010
New Revision: 166571

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166571
Log:
	PR middle-end/46419
	* config/i386/xmmintrin.h (_mm_cvtpi16_ps): Swap __hisi and __losi.
	(_mm_cvtpu16_ps): Ditto.

testsuite/ChangeLog:

	PR target/46419
	* gcc-target/i386/pr46419.c: New test.


Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.target/i386/pr46419.c
      - copied unchanged from r166569, trunk/gcc/testsuite/gcc.target/i386/pr46419.c
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/config/i386/xmmintrin.h
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
Comment 4 uros 2010-11-10 23:28:10 UTC
Author: uros
Date: Wed Nov 10 23:28:03 2010
New Revision: 166572

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=166572
Log:
	PR middle-end/46419
	* config/i386/xmmintrin.h (_mm_cvtpi16_ps): Swap __hisi and __losi.
	(_mm_cvtpu16_ps): Ditto.

testsuite/ChangeLog:

	PR target/46419
	* gcc-target/i386/pr46419.c: New test.


Added:
    branches/gcc-4_4-branch/gcc/testsuite/gcc.target/i386/pr46419.c
      - copied unchanged from r166569, trunk/gcc/testsuite/gcc.target/i386/pr46419.c
Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/config/i386/xmmintrin.h
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
Comment 5 Uroš Bizjak 2010-11-10 23:29:51 UTC
Fixed.
Comment 6 Uroš Bizjak 2011-03-09 16:42:11 UTC
*** Bug 48036 has been marked as a duplicate of this bug. ***