This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)

From: Martin Sebor <msebor at gmail dot com>
To: Qing Zhao <qing dot zhao at oracle dot com>, gcc at gcc dot gnu dot org
Cc: Wilco Dijkstra <wilco dot dijkstra at arm dot com>, Richard Biener <rguenther at suse dot de>, jakub at redhat dot com, rearnsha at gcc dot gnu dot org, hubicka at ucw dot cz
Date: Thu, 16 Nov 2017 16:55:54 -0700
Subject: Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)
Authentication-results: sourceware.org; auth=none
References: <BE7FD4A1-01CD-4A89-A393-439FD43735B3@oracle.com>

On 11/03/2017 08:59 AM, Qing Zhao wrote:

Hi,

This is the first time I am asking for a design review for fixing a GCC enhancement request, Let me know if I need to send this email to other mailing list as well.

I have been studying PR 78809 for some time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809>

with a lot of help from Wilco and other people, and detailed study about the previous discussion and current GCC behavior, I was able to come up with the following writeup
(basically serve as a design doc), and ready for implementation.

Please take a look at it, and let me know any comments and suggestions:

thanks a lot.

Qing

========================================
str(n)cmp and memcmp optimization in gcc
-- A design document for PR78809

11/01/2017

Qing Zhao
===========================================================

0. Summary:

   For PR 78809 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809,
   Will add the following str(n)cmp and memcmp optimizations into GCC 8:

   A. for strncmp (s1, s2, n)
      if one of "s1" or "s2" is a constant string, "n" is a constant, and larger than the length of the constant string:
      change strncmp (s1, s2, n) to strcmp (s1, s2);


Here and I think in some (all?) the other cases below, N doesn't
strictly have to be constant but can be in a range whose lower
bound is greater than the string length.

FWIW, I also recently noticed bug 82950 in a related area.


   B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0
      if the result is ONLY used to do a simple equality test against zero, one of "s1" or "s2" is a small constant string, n is a constant, and the other non-constant string is guaranteed to not read beyond the end of the string:
      change strncmp (s1, s2, n) or strcmp (s1, s2) to corresponding memcmp (s1, s2, n);


It's probably not important but I'm not sure I understand what you
mean by

  "the other non-constant string is guaranteed to not read beyond
  the end of the string."

      (NOTE, currently, memcmp(s1, s2, N) (!)=0 has been optimized to a simple sequence to access all bytes and accumulate the overall result in GCC by
       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171
      )
      as a result, such str(n)cmp call would like to be replaced by simple sequences to access all types and accumulate the overall results.


Thinking about this case made me realize there's another opportunity
here that could be exploited in tree-ssa-strlen even for non-constant
strings by relying on its knowledge of string lengths.   I opened bug
83026 with the details.


   C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n)
      if the result is NOT used to do simple equality test against zero, one of "s1" or "s2" is a small constant string, n is a constant, and the Min value of the length of the constant string and "n" is smaller than a predefined threshold T,
      inline the call by a byte-to-byte comparision sequence to avoid calling overhead.


The str{,n}cmp and memcmp cases must be treated differently because
the former compares up to N bytes while the latter exactly N bytes.
With that, I'm wondering if the precondition "one of "s1" or "s2"
is a small constant string" is necessary for memcmp.  I.e., why
not inline it regardless of whether s1 or s2 are constant?


   A would like to be put in gimple fold phase (in routine "gimple_fold_builtin_string_compare" of gimple-fold.c)


Except for constant strings gimple-fold doesn't know about string
lengths so it will only be able to do so much.  I'm wondering what
the benefits are of doing the transformation there instead of just
in tree-ssa-strlen (along with B).

   B would like to be put in strlen phase (add new "handle_builtin_str(n)cmp" routines in tree-ssa-strlen.c)
   C would like to be put in expand phase (tree-ssa-strlen.c or builtins.c): run-time performance testing is needed to decide the predefined threshold T.

   The reasons to put the optimizations in the above order are:

   * A needs very simple information from source level, and A should be BEFORE B to simplify algorithm in B, gimple-fold phase should be a good place to do it. (Currently, similar transformation such as replacing strncat with strcat is put in this phase);


I'm not sure doing (A) in the folder will simplify the likely much
more involved implementation in tree-ssa-strlen.  That said, other
that making the folder more complex, I can't think of any downside
of folding strncmp to strcmp there either (although there certainly
are downsides with folding other builtin calls where doing so
defeats possible strlen optimizations, such as pr81704 and pr81703).

Thanks for the writeup!

Martin

   * B needs type information (size and alignment) for the safety checking, and B should be BEFORE expand phase to utilize the available memcmp optimization in expand phase, strlen phase is a good place to do it.
   * C would like to replace the call to byte-to-byte comparision, since we want some high-level optimization been applied first on the calls, the inlining of the call might be better to be done in the late stage of the tree optimization stage.  expand phase is a good place to do it.

   These 3 optimization can be implemented seperated, 3 patches might be needed.

the following are details to explain the above.

1. some background:

   #include <string.h>
   int strcmp(const char *s1, const char *s2);
   int strncmp(const char *s1, const char *s2, size_t n);
   int memcmp(const void *s1, const void *s2, size_t n);

   • strcmp compares null-terminated C strings
   • strncmp compares at most N characters of null-terminated C strings
   • memcmp compares binary byte buffers of N bytes.

The major common part among these three is:

   * they all return an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.

The major different part among these three is:

   * both strcmp and strncmp might early stop at NULL terminator of the compared strings. but memcmp will NOT early stop, it means to compare exactly N bytes of both buffers.
   * strcmp compare the whole string, but strncmp only compare the first n chars (or fewer, if the string ends sooner) of the string.

So, when optimizing memcmp and str(n)cmp, we need to consider the following:

   * The compiler can compare multiple bytes at the same time and doesn't have to worry about beyond the end of a string for memcmp, but have to worry about read beyond the end of a string for str(n)cmp when comparing multiple bytes at the same time because it might early stop at Null terminator.
   * when the "n" of strncmp is sufficiently large, strncmp means to compare the whole strings, its behavior is same as strcmp. however, strncmp is likely to be moderately slower because it has to keep track of a counter. therefore, using strcmp to replace strncmp when "n" is big enough might be better for the performance.


2. Optimization:

   Three possible optimization applied on calls to str(n)cmp and memcmp:

   A.  When the "n" of strncmp is larger than the length of constant string "s1" or "s2", it can be replaced with corresponding strcmp.

   B.  When the result of the str(n)cmp or memcmp is ONLY used to do a simple equality test against zero, the compiler is allowed to generate a simpler sequence to access all bytes and accumulate the overall result when guarantee that no read beyond the end of a string.

   C.  When the result of the str(n)cmp or memcmp is NOT used to do simple equality test against zero, if one of the s1 or s2 is a small constant string, it might be better for the compiler to inline the call by a byte-to-byte comparision sequence to avoid calling overhead.

3. The current situation of GCC:

   For 2.A strncmp is replaced by strcmp when "n" is large enough:
    * strncmp(p, "fishi", 6)
    * __builtin_strncmp("fishi", q, 6)

   GLIBC used to replace the strncmp with strcmp when "n" is large enough, but this optimization was removed in:
   https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2

   The reasons to delete it from glibc and move to GCC are:
      ** when doing it in glibc, the user cannot disable it by using -fno-builtin-strcmp/strncmp;
      ** __builtin_strncmp cannot be replaced similarly;
      ** when compile C++ or not using glibc, the transformation is not applied.

   For 2.B when the result is used to do equality test against zero:

    * memcmp (!)= 0
      is currently optimized by GCC to use a simple sequcence to access all bytes and accumulate the overall result.

      all the following patterns are optimized:
      memcmp (pa, qa, 5) != 0
      __builtin_memcmp (pa, qa, 29) != 0
      memcmp (p, "fishi", 5) != 0
      __builtin_memcmp (p, "fishiiii", 8) != 0

    * strncmp (!)= 0
      the following pattersn are optimized ONLY when size is 1
      strncmp (pa, qa, 1) != 0
      strncmp (p, "f", 1) != 0
      __builtin_strncmp (pa, qa, 1) != 0
      __builtin_strncmp (p, "f", 1) != 0

    * strcmp (!)== 0
      the following patterns is optimized Only when size is 0 (by gimple)
      __builtin_strcmp (p, "") != 0

      the following patterns is optimized when size <= 3 (by glibc), this is for 2.B.
      strcmp (p, "abc") != 0

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171

   For 2.C when the result is Not used to do equality test against zero, but one of the s1 or s2 is small constant string, it might be better to inline the call by a byte-to-byte comparision sequence to avoid calling overhead.

    * memcmp
      is currently optimized by glibc ONLY when size is 1;

    * strncmp
      is currently optimized by gimple-fold phase ONLY when size is 1;

    * strcmp
      was inlined by byte-to-byte comparison by glibc when size <= 3, but this functionality was removed from the recent Glibc in:
      https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2

      the reasons to delete it from glibc and move to GCC are:
      ** when doing it in glibc, the user cannot disable it by using -fno-builtin-strcmp/strncmp;
      ** __builtin_str(n)cmp cannot be expanded similarly;
      ** when compile C++ or not using glibc, the expansion is not applied.

4. Missing optimization for str(n)cmp and memcmp:

   For 2.A, we miss the whole optimization in GCC.

   For 2.B, we miss the following optimization:

   str(n)cmp compare with constant string, when the result is used to do equality test against zero, if satisfy safety checking to read beyond the end of the string, we can transform them to corresponding memcmp, which has been optimized well in GCC.

   For 2.C, we miss the whole optimization in GCC.

5. Implementation:
    * for 2.A strncmp is replaced by strcmp when "n" is large enough

      strncmp (s, STR, C) in which, s is a pointer to a string, STR is a constant string, C is a constant.

      if (strlen(STR) < C)
      {
        it can be safely transformed to
        strcmp (s, STR)
      }

    * for 2.B str(n)cmp (!)= 0 is replaced by memcmp (!)= 0 when safety checking is passed.

      strcmp (s, STR) (!)= 0  in which, s is a pointer to a string, STR is a constant string.

      if  (max(sizeof_array(s), alignments(s)) > strlen(STR))
      {
        it can be safely transformed to
        memcmp (s, STR, strlen(STR)+1) (!)= 0
      }

      strncmp (s, STR, C) (!)= 0  in which, s is a pointer to a string, STR is a constant string, C is a constant.

      if (C <= strlen(STR) && max(sizeof_array(s), alignments(s)) > C)
      {
        it can be safely transformed to
        memcmp (s, STR, C) (!)= 0
      }

      (when C > strlen(STR), the strncmp has been replaced with corresponding strcmp already).

    * for 2.C strcmp is expanded when n is small than threshold T.

      strcmp (s, STR) in which, s is a pointer to a string, STR is a constant string

      if ((n = strlen(STR)) <= T)   /* T might be 3, we can do experiment to decide it */
      {
        result = s[0] - STR[0];
        if (result == 0)
        {
          result = s[1] - STR[1];
          ...
          ...
          if (result == 0)
            result = s[n];
        }
      }

      strncmp (s, STR, C) in which, s is a pointer to a string, STR is a constant string, C is a constant.

      if ((C <= strlen(STR) && C <= T)  /* T might be 3, do experiments to decide it */
      {
        result = s[0] - STR[0];
        if (result == 0)
        {
          result = s[1] - STR[1];
          ...
          ...
          if (result == 0)
            result = s[C];
        }
      }

      memcmp (s, STR, C) in which, s is a pointer to a string, STR is a constant string, C is a constant
      similar as strncmp.

Follow-Ups:
- Re: Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)
  - From: Qing Zhao

References:
- Please review writeup for fixing PR 78809 (inline strcmp for small constant strings)
  - From: Qing Zhao

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]