[Bug target/104611] memcmp/strcmp/strncmp can be optimized when the result is tested for [in]equality with 0 on aarch64
wilco at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Feb 21 12:07:30 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wilco at gcc dot gnu.org
--- Comment #1 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #0)
> Take:
>
> bool f(char *a)
> {
> char t[] = "0123456789012345678901234567890";
> return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
> }
>
> Right now GCC uses branches to optimize this but this could be done via a
> few loads followed by xor (eor) of the two sides and then oring the results
> of xor
> and then umavx and then comparing that to 0. This can be done for the
> middle-end code too if there is a max reduction opcode.
It's not worth optimizing small inline memcmp using vector instructions - the
umaxv and move back to integer side adds extra latency.
However the expansion could be more efficient and use the same sequence used in
GLIBC memcmp:
ldp data1, data3, [src1, 16]
ldp data2, data4, [src2, 16]
cmp data1, data2
ccmp data3, data4, 0, eq
b.ne L(return2)
Also the array t[] gets copied on the stack instead of just using the string
literal directly.
More information about the Gcc-bugs
mailing list