PATCH: Add SSE4.2 support

Uros Bizjak ubizjak@gmail.com
Thu May 31 09:49:00 GMT 2007


On 5/30/07, H. J. Lu <hjl@lucon.org> wrote:

> > > BTW: If it is not too much trouble, could string/text processing
> > > intrinsic be split out into separate patch? The first patch would then
> > > implement only SSE4.2 flags handling and logic/CRC operations that we
> > > are all somehow familiar with, and the second will add string
> > > processing.
> > >
> >
> > I prefer to use one patch for SSE4.2 if possible at all. But I will
> > try to use 2 if there is no way around it.
> >
>
> Here is the updated patch. I added OPTION_MASK_ISA_XXX_UNSET so
> that we only need to change one macro when we add a new ISA.  Tested
> on Linux/Intel64.

H.J.,

The reason for separate text/string processing patch is, that the
implementation looks fundametally wrong to me (I'll discuss this in a
separate mail), and text/string processing _could_ be moved in a
separate,  orthogonal patch. It is OK to include smmintrin.h and
nmmintrin.h unmodified, because unsupported __builtin_ia32_pcmp*
functions will be emmitted as a normal call to __builtin_ia32_pcmp*()
function; that is - at the moment, they won't be expanded to special
RTL sequences. Also, the documentation can be added as is in the
patch, including currently unimplemented __builtin_* string/text
functions. So, for now, just remove all pcmpstr* handling from
i386-modes.def (new modes), i386.c, i386.h (new modes), predicates.md
(new modes handling) and sse.md

> +  def_builtin (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_crc32qi",
> +              ftype, IX86_BUILTIN_CRC32QI);

These def_builtin() functions should be defined in one line, to
maintain some consistency with other def_builtin() calls.

> +  /* Only SSE4.1/SSE4.2 supports V2DImode.  */
> +  if (mode == V2DImode)
> +    {
> +      switch (code)
> +	{
> +	case EQ:
> +	  /* SSE4.1 supports EQ.  */
> +	  if (!TARGET_SSE4_1)
> +	    return false;
> +	  break;
> +
> +	case GT:
> +	case GTU:
> +	  /* SSE4.2 supports GT/GTU.  */
> +	  if (!TARGET_SSE4_2)
> +	    return false;
> +	  break;

You have to add supporting code to convert V2DI GTU into GT in the
code just below the chunk you added. Something similar to existing
V4SI mode handling.

Uros.



More information about the Gcc-patches mailing list