Bug 27562 - SSE instruction selection wrong for Athlon processors.
Summary: SSE instruction selection wrong for Athlon processors.
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-11 18:32 UTC by Ramon Garcia
Modified: 2006-05-11 23:42 UTC (History)
1 user (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ramon Garcia 2006-05-11 18:32:43 UTC
GCC provides the programmer the ability to use SIMD instructions by primitives. These are defined in header files. Unfortunately, GCC follows Intel compiler conventions. But these conventions, dictated by Intel interests, not always reflect correctly the reality, specially of AMD processors.

According to these conventions, there are four levels of SIMD instructions:

Pentium MMX: MMX instructions.
Pentium III and higher: MMX + SSE single precission floating point instructions.
Pentium IV and higher: MMX + SSE single precission + SSE2 integer and double precission 
Newer Pentium IV: MMX + SSE + SSE2 + SSE3

This model does not reflect correctly AMD Atlon processors, that support SSE integer SIMD instructions but do not support SSE2 double precission instructions.

While it is natural that Intel compiler is aligned with the company's interests, it is not acceptable that a vendor neutral free software project follows them.

Thus it is necessary to add a new level that includes integer SSE instructions without including double precission SIMD instructions.

What I request is:

- A flag to support Athlon SSE set of instructions. As it is in the middle of SSE and SSE2 it might be called SSE1.5, with the option -msse1.5.

- A header file that defines SSE integer instructions without define SSE double precission instructions. Since GCC already ships xmmintrin.h, a new header file immintrin.h could be added. This second part is trivial to do and in my opinion should be included in GCC 4.1.1.

Both changes are backward compatible.
Comment 1 Andrew Pinski 2006-05-11 18:37:56 UTC
Which Athlon processor, there are so many and some support full SSE2 also (K8).
Comment 2 Ramon Garcia 2006-05-11 18:42:10 UTC
That is correct. However for those developerers that want to target the existing installed base of Pentium IV and Athlon processors, a set of flags and headers that provide the common subset of them is useful.
Comment 3 Andrew Pinski 2006-05-11 18:45:40 UTC
Have you tried using the -march=athlonXXXX options (where XXXX is replaced with your CPU type)?

Also do you have a list of instructions which supported on which Athlon?
Comment 4 Ramon Garcia 2006-05-11 23:42:35 UTC
Sorry, this bug was based on a misunderstanding.

I was playing with some integer SSE on an Athlon and got surprised because it seemed to work, that is, no ill instruction fault. From Googling one finds pages about AMD Athlon supporting integer SSE. But with GDB one sees the xmm registers didn't seem to be affected. Some promotional materials of AMD mention compatibility with Intel's integer SSE http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_756_3734%5E3738,00.html
What was happening?

The AMD processors "reinterpret" integer SSE instructions and map them to "equivalent" MMX instructions!!! So they "sort of" work, but restricted to 64 bits and using MMX registers.

Sorry for wasting your time, and for the tone of the bug report.


Here is an assembly program:

.data
a:
.word 100, 200, 300, 400, 500, 600, 700, 800
b:
.word 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000
c:
.fill 8, 2, 0
.text
.globl main
main:
push   %ebp
mov    %esp,%ebp
movdqa a,%xmm0
movdqa b,%xmm1
paddw  %xmm1,%xmm0
movdqa %xmm0,c
mov    $8,%ecx
1:
movzwl (c-2)(%ecx,%ecx,1),%eax
pushl   %eax
loop   1b
.section .rodata
2:
.asciz "%d %d %d %d %d %d %d %d\n"
.text
pushl  $2b
call   printf
mov    $0,%eax
leave
ret


that produces the correct output on a Pentium IV:

1100 2200 3300 4400 5500 6600 7700 8800

but only the first four numbers are added in an AMD Athlon:

1100 2200 3300 4400 0 0 0 0

By the way, the texinfo documentation of GCC builtin functions http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/X86-Built_002din-Functions.html#X86-Built_002din-Functions

seems to need a review. I think that there is a typo, -msse is mentioned twice and there is no mention about special intrinsic functions with -msse2. Ramon