[Bug target/104611] memcmp/strcmp/strncmp can be optimized when the result is tested for [in]equality with 0 on aarch64

Mon Feb 21 12:07:30 GMT 2022

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #1 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #0)
> Take:
> 
> bool f(char *a)
> {
>     char t[] = "0123456789012345678901234567890";
>     return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
> }
> 
> Right now GCC uses branches to optimize this but this could be done via a
> few loads followed by xor (eor) of the two sides and then oring the results
> of xor
> and then umavx and then comparing that to 0. This can be done for the
> middle-end code too if there is a max reduction opcode.

It's not worth optimizing small inline memcmp using vector instructions - the
umaxv and move back to integer side adds extra latency.

However the expansion could be more efficient and use the same sequence used in
GLIBC memcmp:

        ldp     data1, data3, [src1, 16]
        ldp     data2, data4, [src2, 16]
        cmp     data1, data2
        ccmp    data3, data4, 0, eq
        b.ne    L(return2)

Also the array t[] gets copied on the stack instead of just using the string
literal directly.