Will GCC eventually support SSE2 or SSE4.1?

Jonathan Wakely jwakely.gcc@gmail.com
Fri May 26 11:42:47 GMT 2023

On Fri, 26 May 2023 at 12:29, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
> "Jakub Jelinek" <jakub@redhat.com> wrote:
> > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote:
> >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
> >>    That's bad, REALITY CHECK, please!
> >
> > You're wrong.
> > SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions
> > didn't have it.
> That's correct, I failed to see this difference.


> > The supported CPU names don't distinguish between core2 submodels,
> > so if you have core2 with sse4.1, you should either be using -march=native
> > if compiling on such a machine, or use -march=core2 -msse4.1,
> This is one of the combinations I didn't test until now; with it (and with
> -m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise:
> # Compilation provided by Compiler Explorer at https://godbolt.org/
> ispowerof2(unsigned long long):
>         movq    xmm1, QWORD PTR [esp+4]
>         pcmpeqd xmm0, xmm0
>         xor     eax, eax
>         paddq   xmm0, xmm1
>         pand    xmm0, xmm1            # SUPERFLUOUS!
>         punpcklqdq      xmm0, xmm0    # SUPERFLUOUS!
>         ptest   xmm0, xmm0            #    ptest    xmm0, xmm1
>         sete    al
>         ret
> 9 instructions in 36 bytes instead of 7 instructions in 26 bytes.
> JFTR: the documentation of MOVQ specifies
> | when the destination operand is an XMM register, the quadword is
> | stored to the low quadword of the register, and the high quadword
> | is cleared to all 0s.
> > there is no -march={conroe,allendale,wolfdale,merom,penryn,...}.
> >
> >> 4) If the documenation is right, then the behaviour of GCC is wrong: it
> >>    doesn't allow to use SSE4.1 without SSE4.2!
> >
> > If you aren't able to read the documentation, it is hard to argue.
> When the documentation is wrong or incomplete it's hard to trust it!

Just like when you make incorrect statements and assume everybody else is wrong.

The documentation isn't perfect, but you should not just ignore it and
assume you know better in all cases.

> | -m32
> ...
> | The -m32 option sets int, long, and pointer types to 32 bits, and
> | generates code that runs on any i386 system.
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
>       generates SSE2 instructions which DONT run on ANY i386 system!

That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954

> OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates
>       code that does NOT run on ANY i386 system!
> Where is the precedence of the different -m* options for the CPU type
> documented?
> Where is their influence on each other documented?

-march enables the instructions listed for the relevant cpu family,
then using -mxxx or -mno-xxx adds or removes particular instruction
sets from the ones enabled by -march.

If you give an option twice, e.g. -march=core2 -march=nehalem, then
the second one wins. If you use -msse2 -mno-sse2 then the second one

You can check this using e.g.

gcc -Q --help=target -march=core2 -msse2

> | -march=cpu-type
> ...
> |   Specifying -march=cpu-type implies -mtune=cpu-type, except where noted
> |   otherwise.
> ...
> | -mtune=cpu-type
> ...
> |    the compiler does not generate any code that cannot run on the default
> |    machine type unless you use a -march=cpu-type option.
> Why is the "default machine type" not mentioned/specified with -march=?

Using -march overrides it. The default is set during configure. Adding
-v to the compilation will show what -march option is used by cc1 by

More information about the Gcc mailing list