Bug 95556

Summary: Not replacing __builtin___memcpy_chk() as documented
Product: gcc Reporter: Jeff Davis <pgsql>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED INVALID    
Severity: normal CC: jakub
Priority: P3    
Version: 9.2.1   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed:
Attachments: Example 2
Example 1
Example 3

Description Jeff Davis 2020-06-06 01:22:38 UTC
Created attachment 48686 [details]
Example 2

GCC's Object Size Checking doc says:

  "There are built-in functions added for many common
   string operation functions, e.g., for memcpy 
   __builtin___memcpy_chk built-in is provided. This
   built-in has an additional last argument, which is
   the number of bytes remaining in the object the dest
   argument points to or (size_t) -1 if the size is not
   known. The built-in functions are optimized into the
   normal string functions like memcpy if the last
   argument is (size_t) -1 or if it is known at compile
   time that the destination object will not be
   overflowed..."

https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html

In the attached example1.c, __builtin___memcpy_chk() is optimized into the normal memcpy(), as expected.

But in a slightly different example2.c, it is not, despite an object size of -1.

When the checked version is left in place (like example2.c), it causes a significant regression in my case.

This is important because Ubuntu 18.04 uses _FORTIFY_SOURCE, which ends up using __builtin___memcpy_chk() for memcpy. If gcc is arbitrarily leaving it in place when it should be (according to the docs) optimized away, that affects a lot of code.

I'm seeing this on Ubuntu 18.04 with both:

  gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
  gcc-9 (Ubuntu 9.2.1-19ubuntu1~18.04.york0) 9.2.1 20191109

It happens with or without -fno-builtin-memcpy (which is not a surprise, since I am directly calling the builtin version anyway).

Compiled using:
  gcc-9 -O2 -c -S -o example1.S example1.c
  gcc-9 -O2 -c -S -o example2.S example2.c

example1.S:50:
        call    memcpy@PLT

example2.S:75:
        rep movsq
Comment 1 Jeff Davis 2020-06-06 01:23:29 UTC
Created attachment 48687 [details]
Example 1
Comment 2 Jeff Davis 2020-06-06 01:50:01 UTC
Created attachment 48688 [details]
Example 3

Another example that works (i.e. builtin is properly replaced by memcpy as described in the document).

The only difference between this working example and the failing example2.c is that I replaced the sizeof() with a constant.
Comment 3 Jeff Davis 2020-06-06 01:53:03 UTC
Original larger case was discovered in PostgreSQL:

https://www.postgresql.org/message-id/99b2eab335c1592c925d8143979c8e9e81e1575f.camel@j-davis.com
Comment 4 Jakub Jelinek 2020-06-06 09:42:31 UTC
It is unclear what you are complaining about.

for i in gcc-7 gcc-8 gcc-9 gcc-10 gcc; do echo $i; for j in 1 2 3; do /usr/src/$i/obj/gcc/cc1 -quiet -O2 pr95556-$j.c; done; grep 'memcpy\|rep.movs' pr95556-*.s; done
gcc-7
pr95556-1.s:	rep movsq
pr95556-2.s:	call	memcpy
pr95556-3.s:	call	memcpy
gcc-8
pr95556-1.s:	rep movsq
pr95556-2.s:	call	memcpy
pr95556-3.s:	call	memcpy
gcc-9
pr95556-1.s:	rep movsq
pr95556-2.s:	call	memcpy
pr95556-3.s:	call	memcpy
gcc-10
pr95556-1.s:	rep movsq
pr95556-2.s:	rep movsq
pr95556-3.s:	call	memcpy
gcc
pr95556-1.s:	rep movsq
pr95556-2.s:	rep movsq
pr95556-3.s:	call	memcpy

There are no __memcpy_chk calls, which means GCC did in all cases what is documented, replace the __builtin___memcpy_chk calls with the corresponding __builtin_memcpy calls and handled that as usually (which isn't always a library call, there are many different options how a builtin memcpy can be expanded and one can find tune that through various command line options.  It depends on what CPU the code is tuned for, whether it is considered hot or cold code, whether the size is constant and what constant or if it is variable and what alignment guarantees the destination and source has.
Comment 5 Jakub Jelinek 2020-06-06 09:55:34 UTC
And note that
-      if (lt->pos >= (8192-sizeof(S)))
+      if (lt->pos >= (8192-16))
is not an insignificant change, the first one is unsigned comparison, the second one signed.
Comment 6 Jakub Jelinek 2020-06-06 09:59:03 UTC
See -mno-align-stringops, -minline-all-stringops, -minline-stringops-dynamically, -mstringop-strategy= , -mmemcpy-strategy= options and their documentation in the GCC manual.
Comment 7 Jeff Davis 2020-06-06 17:09:33 UTC
"...built-in functions are optimized into the normal string functions like memcpy if the last argument is (size_t) -1..."

My reading of the document lead me to believe that a last argument of -1 *would* be a normal library call. And certainly should be with -fno-builtin-memcpy, right?

If that's not what's happening, should the document be clarified?
Comment 8 Jakub Jelinek 2020-06-06 17:18:38 UTC
(In reply to Jeff Davis from comment #7)
> "...built-in functions are optimized into the normal string functions like
> memcpy if the last argument is (size_t) -1..."
> 
> My reading of the document lead me to believe that a last argument of -1
> *would* be a normal library call. And certainly should be with
> -fno-builtin-memcpy, right?

No.  Because -fno-builtin-memcpy only disables the special behavior if one uses memcpy, when one uses __builtin_memcpy, it behaves always as builtin.  And you are using __builtin___memcpy_chk which is also a builtin and thus not affected by -fno-builtin*.
You can use -fno-builtin-__memcpy_chk but then you'll get __memcpy_chk calls if you call it that way.
As I wrote, if you for whatever reason want to use the library call, e.g. always, you can just use -mmemcpy-strategy=libcall:-1:1 or so, but then even very small ones will not be done inline, which is not really beneficial.
Comment 9 Jeff Davis 2020-06-07 15:31:54 UTC
I still feel like the documentation is misleading on this point.

Regardless, it doesn't seem like you think there is any bug here, so go ahead and close.
Comment 10 Jakub Jelinek 2020-06-07 15:37:34 UTC
.