This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH: Add SSE4.2 support
On 5/30/07, H. J. Lu <hjl@lucon.org> wrote:
> > BTW: If it is not too much trouble, could string/text processing
> > intrinsic be split out into separate patch? The first patch would then
> > implement only SSE4.2 flags handling and logic/CRC operations that we
> > are all somehow familiar with, and the second will add string
> > processing.
> >
>
> I prefer to use one patch for SSE4.2 if possible at all. But I will
> try to use 2 if there is no way around it.
>
Here is the updated patch. I added OPTION_MASK_ISA_XXX_UNSET so
that we only need to change one macro when we add a new ISA. Tested
on Linux/Intel64.
H.J.,
The reason for separate text/string processing patch is, that the
implementation looks fundametally wrong to me (I'll discuss this in a
separate mail), and text/string processing _could_ be moved in a
separate, orthogonal patch. It is OK to include smmintrin.h and
nmmintrin.h unmodified, because unsupported __builtin_ia32_pcmp*
functions will be emmitted as a normal call to __builtin_ia32_pcmp*()
function; that is - at the moment, they won't be expanded to special
RTL sequences. Also, the documentation can be added as is in the
patch, including currently unimplemented __builtin_* string/text
functions. So, for now, just remove all pcmpstr* handling from
i386-modes.def (new modes), i386.c, i386.h (new modes), predicates.md
(new modes handling) and sse.md
+ def_builtin (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_crc32qi",
+ ftype, IX86_BUILTIN_CRC32QI);
These def_builtin() functions should be defined in one line, to
maintain some consistency with other def_builtin() calls.
+ /* Only SSE4.1/SSE4.2 supports V2DImode. */
+ if (mode == V2DImode)
+ {
+ switch (code)
+ {
+ case EQ:
+ /* SSE4.1 supports EQ. */
+ if (!TARGET_SSE4_1)
+ return false;
+ break;
+
+ case GT:
+ case GTU:
+ /* SSE4.2 supports GT/GTU. */
+ if (!TARGET_SSE4_2)
+ return false;
+ break;
You have to add supporting code to convert V2DI GTU into GT in the
code just below the chunk you added. Something similar to existing
V4SI mode handling.
Uros.