On GIMPLE we should expand memcmp/strcmp/strncmp to inline integral compares
if the result is only tested against equality with 0.
See PR43052 where we hint at that these may be the only profitable cases
to inline for memcmp at least.
Created attachment 26610 [details]
prototype for memcmp
Patch that handles
int cmp (char *p, char *q)
char *pa = __builtin_assume_aligned (p, 4);
char *qa = __builtin_assume_aligned (q, 4);
if (__builtin_memcmp (pa, qa, 4) != 0)
from forwprops simplify_builtin_call.
Possibly we could consider vector modes / compares (for greater size) and
allow misaligned accesses on !STRICT_ALIGNMENT targets. We can also
handle non-power-of-two sizes by using (A ^ B) & mask, but we have to
think about endianess there.
IMHO you should also handle the case when one of the operand is STRING_CST, then you don't need to test its alignment, you compare the memory content with something read from the string at compile time.
For comparison against constants it might be worthwhile to still allow two
loads to improve the number of alignment cases we can handle. Well, I'm not
sure we can expect anything more than 1-byte alignment for the most cases.
This really sounds like the same as PR 12086.
*** Bug 53253 has been marked as a duplicate of this bug. ***
Happens in insn-preds.c quite often:
if (!strncmp (str, "Yi", 2))
if (!strncmp (str, "Ym", 2))
if (!strncmp (str, "Yp", 2))
if (!strncmp (str, "Ya", 2))
and later calls are predicted as cold and thus not expanded inline by
[(set (match_operand:SI 0 "register_operand" "")
(compare:SI (match_operand:BLK 1 "general_operand" "")
(match_operand:BLK 2 "general_operand" "")))
(use (match_operand 3 "general_operand" ""))
(use (match_operand 4 "immediate_operand" ""))]
rtx addr1, addr2, out, outlow, count, countreg, align;
if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS)
the insn-preds.c code could also be optimized, for length 2 a simple
char equality test would be enough ...
(In reply to comment #6)
> Happens in insn-preds.c quite often:
> case 'Y':
> if (!strncmp (str, "Yi", 2))
> return CONSTRAINT_Yi;
> if (!strncmp (str, "Ym", 2))
> return CONSTRAINT_Ym;
> if (!strncmp (str, "Yp", 2))
> return CONSTRAINT_Yp;
> if (!strncmp (str, "Ya", 2))
> return CONSTRAINT_Ya;
Well, for insn-preds.c we could also argue that the generator should emit for this
if (str == 'i') return CONSTRAINT_Yi;
if (str == 'm') return CONSTRAINT_Ym;
Date: Fri Jun 3 14:20:53 2016
New Revision: 237069
* builtins.c (expand_cmpstrn_or_cmpmem): Delete, moved elsewhere.
(expand_builtin_memcmp): New arg RESULT_EQ. All callers changed.
Look for constant strings. Move some code to emit_block_cmp_hints
and use it.
* builtins.def (BUILT_IN_MEMCMP_EQ): New.
* defaults.h (COMPARE_MAX_PIECES): New macro.
* expr.c (move_by_pieces_d, store_by_pieces_d): Remove old structs.
(move_by_pieces_1, store_by_pieces_1, store_by_pieces_2): Remvoe.
(clear_by_pieces_1): Don't declare. Move definition before use.
(can_do_by_pieces): New static function.
(can_move_by_pieces): Use it. Return bool.
(by_pieces_ninsns): Renamed from move_by_pieces_ninsns. New arg
OP. All callers changed. Handle COMPARE_BY_PIECES.
(class pieces_addr); New.
pieces_addr::maybe_predec, pieces_addr::maybe_postinc): New member
functions for it.
(class op_by_pieces_d): New.
(op_by_pieces_d::op_by_pieces_d, op_by_pieces_d::run): New member
functions for it.
(class move_by_pieces_d, class compare_by_pieces_d,
class store_by_pieces_d): New subclasses of op_by_pieces_d.
compare_by_pieces_d::finish_mode): New member functions.
(compare_by_pieces, emit_block_cmp_via_cmpmem): New static
(expand_cmpstrn_or_cmpmem): Moved here from builtins.c.
(emit_block_cmp_hints): New function.
(move_by_pieces, store_by_pieces, clear_by_pieces): Rewrite to just
use the newly defined classes.
* expr.h (by_pieces_constfn): New typedef.
(can_store_by_pieces, store_by_pieces): Use it in arg declarations.
(emit_block_cmp_hints, expand_cmpstrn_or_cmpmem): Declare.
(move_by_pieces_ninsns): Don't declare.
(can_move_by_pieces): Change return value to bool.
* target.def (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Update docs.
(compare_by_pieces_branch_ratio): New hook.
* target.h (enum by_pieces_operation): Add COMPARE_BY_PIECES.
* targethooks.c (default_use_by_pieces_infrastructure_p): Handle
(default_compare_by_pieces_branch_ratio): New function.
* targhooks.h (default_compare_by_pieces_branch_ratio): Declare.
* doc/tm.texi.in (STORE_MAX_PIECES, COMPARE_MAX_PIECES): Document.
* doc/tm.texi: Regenerate.
* tree-ssa-strlen.c: Include "builtins.h".
(handle_builtin_memcmp): New static function.
(strlen_optimize_stmt): Call it for BUILT_IN_MEMCMP.
* tree.c (build_common_builtin_nodes): Create __builtin_memcmp_eq.
* gcc.dg/pr52171.c: New test.
* gcc.target/i386/pr52171.c: New test.
Doesn't this partly solve bug 12086 also?
../../gcc/expr.c:1146:60: error: unused parameter 'mode' [-Werror=unused-parameter]
move_by_pieces_d::generate (rtx op0, rtx op1, machine_mode mode)
Date: Sat Jun 4 11:00:58 2016
New Revision: 237090
* config/sh/sh.c (sh_use_by_pieces_infrastructure_p): Use
by_pieces_ninsns instead of move_by_pieces_ninsns.
This breaks ada bootstrap on ia64, causing bootstrap comparison failure.
Bootstrap comparison failure!
make: *** [compare] Error 1
Can the bug be marked as resolved?
I think the gimple part isn't done fully? Only single-char compares are inlined?
Note, unlike comment #0 seems to imply it is incorrect to "optimize"
to an unaligned 4-byte load from str and a compare because the pointed-to string may be shorter than 4 and the wide load might fault. Likewise for strncmp. For memcmp it's ok.
I think Qing did a fair amount of tuning on this work. If we're not handling a particular case it's likely because it wasn't generally profitable.