[PATCH, libcpp]: Use asm flag outputs in search_line_sse42 main loop
Richard Henderson
rth@redhat.com
Tue Jun 30 06:12:00 GMT 2015
On 06/29/2015 08:07 PM, Uros Bizjak wrote:
> Index: lex.c
> ===================================================================
> --- lex.c (revision 225138)
> +++ lex.c (working copy)
> @@ -450,15 +450,30 @@ search_line_sse42 (const uchar *s, const uchar *en
> s = (const uchar *)((si + 16) & -16);
> }
>
> - /* Main loop, processing 16 bytes at a time. By doing the whole loop
> - in inline assembly, we can make proper use of the flags set. */
> - __asm ( "sub $16, %1\n"
> - " .balign 16\n"
> + /* Main loop, processing 16 bytes at a time. */
> +#ifdef __GCC_ASM_FLAG_OUTPUTS__
> + while (1)
> + {
> + char f;
> + __asm ("%vpcmpestri\t$0, %2, %3"
> + : "=c"(index), "=@ccc"(f)
> + : "m"(*s), "x"(search), "a"(4), "d"(16));
> + if (f)
> + break;
> +
> + s += 16;
> + }
This change looks good. Modulo keeping a comment mentioning why we can't use
the builtin.
> +#else
> + s -= 16;
> + /* By doing the whole loop in inline assembly,
> + we can make proper use of the flags set. */
> + __asm ( ".balign 16\n"
> "0: add $16, %1\n"
> - " %vpcmpestri $0, (%1), %2\n"
> + " %vpcmpestri\t$0, (%1), %2\n"
> " jnc 0b"
> : "=&c"(index), "+r"(s)
> : "x"(search), "a"(4), "d"(16));
> +#endif
I do wonder about keeping this bit around. Surely we only really care about
the performance of search_line after a full bootstrap, at which point we've got
the new path.
I think maybe better to adjust the #ifdef HAVE_SSE4 line above to include the
G_A_F_O check.
r~
More information about the Gcc-patches
mailing list