[PATCH, libcpp]: Use asm flag outputs in search_line_sse42 main loop

Richard Henderson rth@redhat.com
Tue Jun 30 06:12:00 GMT 2015


On 06/29/2015 08:07 PM, Uros Bizjak wrote:
> Index: lex.c
> ===================================================================
> --- lex.c	(revision 225138)
> +++ lex.c	(working copy)
> @@ -450,15 +450,30 @@ search_line_sse42 (const uchar *s, const uchar *en
>         s = (const uchar *)((si + 16) & -16);
>       }
>
> -  /* Main loop, processing 16 bytes at a time.  By doing the whole loop
> -     in inline assembly, we can make proper use of the flags set.  */
> -  __asm (      "sub $16, %1\n"
> -	"	.balign 16\n"
> +  /* Main loop, processing 16 bytes at a time.  */
> +#ifdef __GCC_ASM_FLAG_OUTPUTS__
> +  while (1)
> +    {
> +      char f;
> +      __asm ("%vpcmpestri\t$0, %2, %3"
> +	     : "=c"(index), "=@ccc"(f)
> +	     : "m"(*s), "x"(search), "a"(4), "d"(16));
> +      if (f)
> +	break;
> +
> +      s += 16;
> +    }

This change looks good.  Modulo keeping a comment mentioning why we can't use 
the builtin.

> +#else
> +  s -= 16;
> +  /* By doing the whole loop in inline assembly,
> +     we can make proper use of the flags set.  */
> +  __asm (      ".balign 16\n"
>   	"0:	add $16, %1\n"
> -	"	%vpcmpestri $0, (%1), %2\n"
> +	"	%vpcmpestri\t$0, (%1), %2\n"
>   	"	jnc 0b"
>   	: "=&c"(index), "+r"(s)
>   	: "x"(search), "a"(4), "d"(16));
> +#endif

I do wonder about keeping this bit around.  Surely we only really care about 
the performance of search_line after a full bootstrap, at which point we've got 
the new path.

I think maybe better to adjust the #ifdef HAVE_SSE4 line above to include the 
G_A_F_O check.


r~



More information about the Gcc-patches mailing list