Will GCC eventually support SSE2 or SSE4.1?

Stefan Kanthak stefan.kanthak@nexgo.de
Fri May 26 11:28:00 GMT 2023

"Jakub Jelinek" <jakub@redhat.com> wrote:

> On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote:
>> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
>>    That's bad, REALITY CHECK, please!
> You're wrong.
> SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions
> didn't have it.

That's correct, I failed to see this difference.

> The supported CPU names don't distinguish between core2 submodels,
> so if you have core2 with sse4.1, you should either be using -march=native
> if compiling on such a machine, or use -march=core2 -msse4.1,

This is one of the combinations I didn't test until now; with it (and with
-m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise:

# Compilation provided by Compiler Explorer at https://godbolt.org/
ispowerof2(unsigned long long):
        movq    xmm1, QWORD PTR [esp+4]
        pcmpeqd xmm0, xmm0
        xor     eax, eax
        paddq   xmm0, xmm1
        pand    xmm0, xmm1            # SUPERFLUOUS!
        punpcklqdq      xmm0, xmm0    # SUPERFLUOUS!
        ptest   xmm0, xmm0            #    ptest    xmm0, xmm1
        sete    al

9 instructions in 36 bytes instead of 7 instructions in 26 bytes.

JFTR: the documentation of MOVQ specifies

| when the destination operand is an XMM register, the quadword is
| stored to the low quadword of the register, and the high quadword
| is cleared to all 0s.

> there is no -march={conroe,allendale,wolfdale,merom,penryn,...}.
>> 4) If the documenation is right, then the behaviour of GCC is wrong: it
>>    doesn't allow to use SSE4.1 without SSE4.2!
> If you aren't able to read the documentation, it is hard to argue.

When the documentation is wrong or incomplete it's hard to trust it!

| -m32
| The -m32 option sets int, long, and pointer types to 32 bits, and
| generates code that runs on any i386 system.

OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
      generates SSE2 instructions which DONT run on ANY i386 system!

OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates
      code that does NOT run on ANY i386 system!

Where is the precedence of the different -m* options for the CPU type
Where is their influence on each other documented?

| -march=cpu-type
|   Specifying -march=cpu-type implies -mtune=cpu-type, except where noted
|   otherwise.
| -mtune=cpu-type
|    the compiler does not generate any code that cannot run on the default
|    machine type unless you use a -march=cpu-type option.

Why is the "default machine type" not mentioned/specified with -march=?


More information about the Gcc mailing list