Bug 52171 - memcmp/strcmp/strncmp can be optimized when the result is tested for [in]equality with 0
Summary: memcmp/strcmp/strncmp can be optimized when the result is tested for [in]equa...
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords: missed-optimization
: 53253 (view as bug list)
Depends on: 71413
Blocks: 12086 43052
  Show dependency treegraph
 
Reported: 2012-02-08 12:37 UTC by Richard Biener
Modified: 2019-06-13 20:59 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-02-08 00:00:00


Attachments
prototype for memcmp (767 bytes, text/plain)
2012-02-08 13:05 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2012-02-08 12:37:44 UTC
On GIMPLE we should expand memcmp/strcmp/strncmp to inline integral compares
if the result is only tested against equality with 0.

See PR43052 where we hint at that these may be the only profitable cases
to inline for memcmp at least.
Comment 1 Richard Biener 2012-02-08 13:05:23 UTC
Created attachment 26610 [details]
prototype for memcmp

Patch that handles

int cmp (char *p, char *q)
{
  char *pa = __builtin_assume_aligned (p, 4);
  char *qa = __builtin_assume_aligned (q, 4);
  if (__builtin_memcmp (pa, qa, 4) != 0)
    return 1;
  return 0;
}

from forwprops simplify_builtin_call.

Possibly we could consider vector modes / compares (for greater size) and
allow misaligned accesses on !STRICT_ALIGNMENT targets.  We can also
handle non-power-of-two sizes by using (A ^ B) & mask, but we have to
think about endianess there.
Comment 2 Jakub Jelinek 2012-02-08 13:50:18 UTC
IMHO you should also handle the case when one of the operand is STRING_CST, then you don't need to test its alignment, you compare the memory content with something read from the string at compile time.
Comment 3 Richard Biener 2012-02-08 14:37:24 UTC
For comparison against constants it might be worthwhile to still allow two
loads to improve the number of alignment cases we can handle.  Well, I'm not
sure we can expect anything more than 1-byte alignment for the most cases.
Comment 4 Andrew Pinski 2012-05-06 04:55:13 UTC
This really sounds like the same as PR 12086.
Comment 5 Richard Biener 2012-05-07 09:22:57 UTC
*** Bug 53253 has been marked as a duplicate of this bug. ***
Comment 6 Richard Biener 2013-02-19 14:50:52 UTC
Happens in insn-preds.c quite often:

    case 'Y':
      if (!strncmp (str, "Yi", 2))
        return CONSTRAINT_Yi;
      if (!strncmp (str, "Ym", 2))
        return CONSTRAINT_Ym;
      if (!strncmp (str, "Yp", 2))
        return CONSTRAINT_Yp;
      if (!strncmp (str, "Ya", 2))
        return CONSTRAINT_Ya;
...

and later calls are predicted as cold and thus not expanded inline by

(define_expand "cmpstrnsi"
  [(set (match_operand:SI 0 "register_operand" "")
        (compare:SI (match_operand:BLK 1 "general_operand" "")
                    (match_operand:BLK 2 "general_operand" "")))
   (use (match_operand 3 "general_operand" ""))
   (use (match_operand 4 "immediate_operand" ""))]
  ""
{
  rtx addr1, addr2, out, outlow, count, countreg, align;

  if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS)
    FAIL;

the insn-preds.c code could also be optimized, for length 2 a simple
char equality test would be enough ...
Comment 7 Jakub Jelinek 2013-02-19 15:05:01 UTC
(In reply to comment #6)
> Happens in insn-preds.c quite often:
> 
>     case 'Y':
>       if (!strncmp (str, "Yi", 2))
>         return CONSTRAINT_Yi;
>       if (!strncmp (str, "Ym", 2))
>         return CONSTRAINT_Ym;
>       if (!strncmp (str, "Yp", 2))
>         return CONSTRAINT_Yp;
>       if (!strncmp (str, "Ya", 2))
>         return CONSTRAINT_Ya;

Well, for insn-preds.c we could also argue that the generator should emit for this
  case 'Y':
    if (str[1] == 'i') return CONSTRAINT_Yi;
    if (str[1] == 'm') return CONSTRAINT_Ym;
etc.
Comment 8 Bernd Schmidt 2016-06-03 14:21:25 UTC
Author: bernds
Date: Fri Jun  3 14:20:53 2016
New Revision: 237069

URL: https://gcc.gnu.org/viewcvs?rev=237069&root=gcc&view=rev
Log:
        PR tree-optimization/52171
        * builtins.c (expand_cmpstrn_or_cmpmem): Delete, moved elsewhere.
        (expand_builtin_memcmp): New arg RESULT_EQ.  All callers changed.
        Look for constant strings.  Move some code to emit_block_cmp_hints
        and use it.
        * builtins.def (BUILT_IN_MEMCMP_EQ): New.
        * defaults.h (COMPARE_MAX_PIECES): New macro.
        * expr.c (move_by_pieces_d, store_by_pieces_d): Remove old structs.
        (move_by_pieces_1, store_by_pieces_1, store_by_pieces_2): Remvoe.
        (clear_by_pieces_1): Don't declare.  Move definition before use.
        (can_do_by_pieces): New static function.
        (can_move_by_pieces): Use it.  Return bool.
        (by_pieces_ninsns): Renamed from move_by_pieces_ninsns.  New arg
        OP.  All callers changed.  Handle COMPARE_BY_PIECES.
        (class pieces_addr); New.
        (pieces_addr::pieces_addr, pieces_addr::decide_autoinc,
        pieces_addr::adjust, pieces_addr::increment_address,
        pieces_addr::maybe_predec, pieces_addr::maybe_postinc): New member
        functions for it.
        (class op_by_pieces_d): New.
        (op_by_pieces_d::op_by_pieces_d, op_by_pieces_d::run): New member
        functions for it.
        (class move_by_pieces_d, class compare_by_pieces_d,
        class store_by_pieces_d): New subclasses of op_by_pieces_d.
        (move_by_pieces_d::prepare_mode, move_by_pieces_d::generate,
        move_by_pieces_d::finish_endp, store_by_pieces_d::prepare_mode,
        store_by_pieces_d::generate, store_by_pieces_d::finish_endp,
        compare_by_pieces_d::generate, compare_by_pieces_d::prepare_mode,
        compare_by_pieces_d::finish_mode): New member functions.
        (compare_by_pieces, emit_block_cmp_via_cmpmem): New static
        functions.
        (expand_cmpstrn_or_cmpmem): Moved here from builtins.c.
        (emit_block_cmp_hints): New function.
        (move_by_pieces, store_by_pieces, clear_by_pieces): Rewrite to just
        use the newly defined classes.
        * expr.h (by_pieces_constfn): New typedef.
        (can_store_by_pieces, store_by_pieces): Use it in arg declarations.
        (emit_block_cmp_hints, expand_cmpstrn_or_cmpmem): Declare.
        (move_by_pieces_ninsns): Don't declare.
        (can_move_by_pieces): Change return value to bool.
        * target.def (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Update docs.
        (compare_by_pieces_branch_ratio): New hook.
        * target.h (enum by_pieces_operation): Add COMPARE_BY_PIECES.
        (by_pieces_ninsns): Declare.
        * targethooks.c (default_use_by_pieces_infrastructure_p): Handle
        COMPARE_BY_PIECES.
        (default_compare_by_pieces_branch_ratio): New function.
        * targhooks.h (default_compare_by_pieces_branch_ratio): Declare.
        * doc/tm.texi.in (STORE_MAX_PIECES, COMPARE_MAX_PIECES): Document.
        * doc/tm.texi: Regenerate.
        * tree-ssa-strlen.c: Include "builtins.h".
        (handle_builtin_memcmp): New static function.
        (strlen_optimize_stmt): Call it for BUILT_IN_MEMCMP.
        * tree.c (build_common_builtin_nodes): Create __builtin_memcmp_eq.

testsuite/
        PR tree-optimization/52171
        * gcc.dg/pr52171.c: New test.
        * gcc.target/i386/pr52171.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/pr52171.c
    trunk/gcc/testsuite/gcc.target/i386/pr52171.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/builtins.def
    trunk/gcc/defaults.h
    trunk/gcc/doc/tm.texi
    trunk/gcc/doc/tm.texi.in
    trunk/gcc/expr.c
    trunk/gcc/expr.h
    trunk/gcc/target.def
    trunk/gcc/target.h
    trunk/gcc/targhooks.c
    trunk/gcc/targhooks.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-strlen.c
    trunk/gcc/tree.c
Comment 9 Andrew Pinski 2016-06-04 06:07:13 UTC
Doesn't this partly solve bug 12086 also?
Comment 10 Andreas Schwab 2016-06-04 06:32:36 UTC
../../gcc/expr.c:1146:60: error: unused parameter 'mode' [-Werror=unused-parameter]
 move_by_pieces_d::generate (rtx op0, rtx op1, machine_mode mode)
                                                            ^~~~
Comment 11 Oleg Endo 2016-06-04 11:01:30 UTC
Author: olegendo
Date: Sat Jun  4 11:00:58 2016
New Revision: 237090

URL: https://gcc.gnu.org/viewcvs?rev=237090&root=gcc&view=rev
Log:
gcc/ChangeLog
	PR tree-optimization/52171
	* config/sh/sh.c (sh_use_by_pieces_infrastructure_p): Use
	by_pieces_ninsns instead of move_by_pieces_ninsns.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.c
Comment 12 Andreas Schwab 2016-06-05 19:52:59 UTC
This breaks ada bootstrap on ia64, causing bootstrap comparison failure.

Bootstrap comparison failure!
gcc/ada/sem_ch13.o differs
make[1]: *** [compare] Error 1
Comment 13 Martin Liška 2018-11-19 12:01:20 UTC
Can the bug be marked as resolved?
Comment 14 Richard Biener 2018-11-19 12:57:12 UTC
I think the gimple part isn't done fully?  Only single-char compares are inlined?
Comment 15 Alexander Monakov 2018-11-19 13:45:35 UTC
Note, unlike comment #0 seems to imply it is incorrect to "optimize"

  !strcmp(str, "abc")

to an unaligned 4-byte load from str and a compare because the pointed-to string may be shorter than 4 and the wide load might fault. Likewise for strncmp. For memcmp it's ok.
Comment 16 Jeffrey A. Law 2018-11-19 14:51:22 UTC
I think Qing did a fair amount of tuning on this work.  If we're not handling a particular case it's likely because it wasn't generally profitable.