Bug 56888 - memcpy implementation optimized as a call to memcpy
Summary: memcpy implementation optimized as a call to memcpy
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords:
: 60998 70798 82845 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-04-08 23:40 UTC by Max Reitz
Modified: 2017-11-05 20:51 UTC (History)
24 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-04-09 00:00:00


Attachments
Naive memcpy implementation (147 bytes, application/octet-stream)
2013-04-08 23:40 UTC, Max Reitz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Max Reitz 2013-04-08 23:40:53 UTC
Created attachment 29833 [details]
Naive memcpy implementation

Compiling the attached trivial memcpy implementation with -O3 -ffreestanding -fno-builtin -nodefaultlibs -nostdlib yields a memcpy which calls itself. Although the man page explicitly supports this behavior (“The compiler may generate calls to "memcmp", "memset", "memcpy" and "memmove".”), I find it hard to write a working compiler-optimized memcpy under these circumstances.
Using -O2 instead of -O3 “fixes” this.

Everything worked for me up until GCC 4.7.3, I'm using Arch Linux, my GCC has been configured with: /build/src/gcc-4.8.0/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl --disable-cloog-version-check --enable-lto --enable-gold --enable-ld=default --enable-plugin --with-plugin-ld=ld.gold --with-linker-hash-style=gnu --disable-install-libiberty --enable-multilib --disable-libssp --disable-werror --enable-checking=release.
Comment 1 Mikael Pettersson 2013-04-09 07:52:07 UTC
I can reproduce the problem on x86-64 Linux with 4.8-20130404.  This issue would be fatal for one of my projects which includes an embedded libc.
Comment 2 Mikael Pettersson 2013-04-09 09:59:20 UTC
Started with Richard Biener's http://gcc.gnu.org/r188261 aka PR53081 fix, which added or improved memcpy recognition.  I'm guess the new code fails to check for whatever option is supposed to disable this sort of transformation.
Comment 3 Jakub Jelinek 2013-04-09 10:01:31 UTC
Just add -fno-tree-loop-distribute-patterns to the already long list of options you need for compilation of certain routines in your C library.
Comment 4 Richard Biener 2013-04-09 10:01:48 UTC
I will have a look.
Comment 5 Max Reitz 2013-04-09 13:02:19 UTC
(In reply to comment #3)
> Just add -fno-tree-loop-distribute-patterns to the already long list of options
> you need for compilation of certain routines in your C library.

This works for me, however, I don't see this parameter documented in gcc's manpage (which might prove helpful).
Comment 6 rguenther@suse.de 2013-04-09 13:17:10 UTC
On Tue, 9 Apr 2013, xanclic at gmail dot com wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
> 
> --- Comment #5 from Max Reitz <xanclic at gmail dot com> 2013-04-09 13:02:19 UTC ---
> (In reply to comment #3)
> > Just add -fno-tree-loop-distribute-patterns to the already long list of options
> > you need for compilation of certain routines in your C library.
> 
> This works for me, however, I don't see this parameter documented in gcc's
> manpage (which might prove helpful).

It is documented in it's "positive" form, -ftree-loop-distribute-patterns
Comment 7 Max Reitz 2013-04-09 13:20:06 UTC
(In reply to comment #6)
> On Tue, 9 Apr 2013, xanclic at gmail dot com wrote:
> 
> > 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
> > 
> > --- Comment #5 from Max Reitz <xanclic at gmail dot com> 2013-04-09 13:02:19 UTC ---
> > (In reply to comment #3)
> > > Just add -fno-tree-loop-distribute-patterns to the already long list of options
> > > you need for compilation of certain routines in your C library.
> > 
> > This works for me, however, I don't see this parameter documented in gcc's
> > manpage (which might prove helpful).
> 
> It is documented in it's "positive" form, -ftree-loop-distribute-patterns

Oh, now that's embarrassing…

Sorry :-/

Well then, this seems to be exactly the thing I've been looking for. Thanks!
Comment 8 Richard Biener 2013-04-11 11:29:39 UTC
-fno-builtin-XXX does not prevent GCC from emitting calls to XXX.  It only
makes GCC not assume anything about existing calls to XXX.

For example to avoid transforming printf to puts in

extern int printf(const char *, ...);
int main()
{
  printf ("Hello World\n");
  return 0;
}

it does not work to specify -fno-builtin-puts, but instead you need to
provide -fno-builtin-printf.

Note that -fno-builtin only prevents the C family parsers from recognizing
XXX as builtin decls.  The fact that -fno-builtin was specified or not
cannot be queried in any way from the middle-end.

I consider the inability to specify this to the GCC middle-end as bug
but I am not going to work on it.  The requirement to be able to
generate calls to memset. memcpy and memmove is deep-rooted into
code-expansion as well for aggregate init and assignment.
Comment 9 Jeff Cook 2013-06-23 00:01:41 UTC
FYI this issue affects WINE. A workaround has been contributed as a modification to WINE's configuration scripts. See http://bugs.winehq.org/show_bug.cgi?id=33521 .
Comment 10 Brooks Moses 2013-07-17 01:55:48 UTC
FWIW, this issue also affected GLIBC.  Pointer to discussion, along with fixes, here:
http://sourceware.org/ml/libc-alpha/2013-07/msg00306.html

It seems to me -- based on my own experience, as well as Max's -- that the -ftree-distribute-patterns documentation could be notably improved.  In my case, I read it clearly and understood it to mean that it was only responsible for the loop-distribution portion of the rearrangement in the code examples, and that the replacement of a loop by a memcpy call was some other optimization pass.

Other than the documentation issues, this seems like a non-bug.
Comment 11 Paulo J. Matos 2013-07-17 07:50:50 UTC
(In reply to Brooks Moses from comment #10)
> Other than the documentation issues, this seems like a non-bug.

A non-bug? If you write a memcpy function by hand and call it memcpy, gcc replaces the function body by a call to memcpy which generates an infinite loop. How come it's a non-bug?
Comment 12 Brooks Moses 2013-07-17 17:08:43 UTC
(In reply to Paulo J. Matos from comment #11)
> A non-bug? If you write a memcpy function by hand and call it memcpy, gcc
> replaces the function body by a call to memcpy which generates an infinite
> loop. How come it's a non-bug?

Because if you do that you're invoking undefined behavior.  There's already a memcpy function in the standard library, so naming your own function memcpy violates the one-definition-per-function rule.  Even if it "worked", naming your own function memcpy would likely break other standard library functions that call the "real" memcpy.

Now, if this replacement still happens when you compile with -nostdlib, that would be a bug since it becomes legal code in that case.  But that's somewhat of a separate issue and should be filed separately if it happens.  (We should arguably also have a test for it, if we don't already.)
Comment 13 Max Reitz 2013-07-17 17:26:05 UTC
(In reply to Brooks Moses from comment #12)
> Now, if this replacement still happens when you compile with -nostdlib, that
> would be a bug since it becomes legal code in that case.  But that's
> somewhat of a separate issue and should be filed separately if it happens. 
> (We should arguably also have a test for it, if we don't already.)

Actually, that's why I filed this report in the first place. The test case you request is in fact given in my OP.
Comment 14 Andreas Schwab 2013-07-17 20:34:19 UTC
The relevant option is -ffreestanding, not -nostdlib.
Comment 15 Max Reitz 2013-07-17 20:57:33 UTC
(In reply to Andreas Schwab from comment #14)
> The relevant option is -ffreestanding, not -nostdlib.

If you're referring to me, I'll be glad to cite my OP for you :D

> Compiling the attached trivial memcpy implementation with -O3 -ffreestanding
> -fno-builtin -nodefaultlibs -nostdlib yields a memcpy which calls itself.
Comment 16 Andreas Schwab 2013-07-17 22:28:57 UTC
That's exactly what I wrote.
Comment 17 Max Reitz 2013-07-18 00:24:15 UTC
(In reply to Andreas Schwab from comment #16)
> That's exactly what I wrote.

Ah, okay, sorry I misunderstood.
Comment 18 Paulo J. Matos 2013-07-18 10:32:55 UTC
I notice(In reply to Brooks Moses from comment #12)
> 
> Now, if this replacement still happens when you compile with -nostdlib, that
> would be a bug since it becomes legal code in that case.  But that's
> somewhat of a separate issue and should be filed separately if it happens. 
> (We should arguably also have a test for it, if we don't already.)


I noticed this in the gcc testsuite with my port. File ./gcc.c-torture/execute/builtins/lib/memset.c contains an implementation of memset called memset and gcc goes into recursion when it finds this for the reasons mentioned above. This causes builtin/memset test to fail.
Comment 19 Rich Felker 2013-07-28 03:30:03 UTC
We are not presently experiencing this issue in musl libc, probably because the current C memcpy code is sufficiently overcomplicated to avoid getting detected by the optimizer as memcpy. However, I'm trying to switch to a new simpler implementation that's much faster when compiled with GCC 4.7.1 (on ARM), but hit this bug when testing on another system using GCC 4.6.1 (ARM). On the latter, even -fno-tree-loop-distribute-patterns does not make any difference. Unless there's a reliable workaround for this bug or at least a known blacklist of bad GCC versions where this bug can't be worked around, I'm afraid we're going to have to resort to generating the asm for each supported arch using a known-good GCC and including that asm in the distribution.

This is EXTREMELY frustrating.
Comment 20 Bernd Edlinger 2013-10-02 07:59:06 UTC
Just for the record:
This happens also for eCos on ARM
but only if it is compiled with -O3 and not with -O2.
We certainly need a way to tell GCC if this kind of
optimization is OK for us.
Comment 21 Richard Biener 2013-10-02 08:17:18 UTC
-fno-tree-loop-distribute-patterns is the reliable way to not transform loops
into library calls.

As of the trivial case of generating a recursion - yes, that's reasonably
easy to avoid in simple cases.  But if you consider

t1.c
----

mymemcpy_impl (...)
{
  for (...)
   ...
}

t2.c
----

memcpy ()
{
  mymemcpy_impl ()
}

then it's no longer possible to detect conservatively without severely
restricting the set of functions we can operate on.

Not sure if/how other compilers avoid the above situation (or if they
do this at all or rather use private entries into the library functions).
Comment 22 Bernd Edlinger 2013-10-02 08:48:56 UTC
(In reply to Richard Biener from comment #21)
> -fno-tree-loop-distribute-patterns is the reliable way to not transform loops
> into library calls.

Thanks!

Adding this fixed the generated code:

#pragma GCC optimize ("no-tree-loop-distribute-patterns")

BTW their memset.c looks like this:

externC void *
memset( void *s, int c, size_t n ) __attribute__ ((weak, alias("_memset")));

void *
_memset( void *s, int c, size_t n )
{
  while (...)
}
Comment 23 rguenther@suse.de 2013-10-02 08:59:30 UTC
On Wed, 2 Oct 2013, bernd.edlinger at hotmail dot de wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
> 
> --- Comment #22 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
> (In reply to Richard Biener from comment #21)
> > -fno-tree-loop-distribute-patterns is the reliable way to not transform loops
> > into library calls.
> 
> Thanks!
> 
> Adding this fixed the generated code:
> 
> #pragma GCC optimize ("no-tree-loop-distribute-patterns")
> 
> BTW their memset.c looks like this:
> 
> externC void *
> memset( void *s, int c, size_t n ) __attribute__ ((weak, alias("_memset")));
> 
> void *
> _memset( void *s, int c, size_t n )
> {
>   while (...)
> }

I suspect this is the most common form - glibc also uses aliases but
IIRC they are using global asms for them :/

The above would be still detectable with the new symbol table / alias
handling in GCC 4.9 (and maybe 4.8, I'm not sure).  So it may be
worth special-casing the "direct recursion" case as a QOI measure.
Comment 24 Janosch Rux 2014-02-17 03:48:36 UTC
When upgrading our build environment we ran into this. We worked around the way mentioned in the comments.

No Problems with: 4.6.3 
Broken with:  4.8.2
Comment 25 Richard Biener 2014-04-29 13:50:16 UTC
*** Bug 60998 has been marked as a duplicate of this bug. ***
Comment 26 Richard Biener 2014-04-29 13:57:07 UTC
(In reply to Janosch Rux from comment #24)
> When upgrading our build environment we ran into this. We worked around the
> way mentioned in the comments.
> 
> No Problems with: 4.6.3 
> Broken with:  4.8.2

-ftree-loop-distribute-patterns is on by default at -O3 since GCC 4.6, a
change from GCC 4.5 and before which needed explicit enabling of this.

More recent GCC may have become more clever in recognizing them though
(for example non-zero memset support is quite recent).
Comment 27 Richard Biener 2014-04-29 14:16:38 UTC
Ok, so looking at this again.

We don't have a cgraph node for builtin_decl_(implicit|explicit) (BUILT_IN_MEMSET).

But it seems that decl has DECL_ASSEMBLER_NAME_SET_P (not sure if set
"correctly" though).

So we can use symtab_node_for_asm (DECL_ASSEMBLER_NAME ()) and
eventually get to symtab_alias_target of that and check if it
is equal to the current function.

Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c        (revision 209892)
+++ gcc/tree-loop-distribution.c        (working copy)
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "gimple-pretty-print.h"
 #include "tree-vectorizer.h"
+#include "cgraph.h"
 
 
 /* A Reduced Dependence Graph (RDG) vertex representing a statement.  */
@@ -1084,6 +1085,15 @@ classify_partition (loop_p loop, struct
          || !dominated_by_p (CDI_DOMINATORS,
                              loop->latch, gimple_bb (stmt)))
        return;
+      tree fn = builtin_decl_implicit (BUILT_IN_MEMSET);
+      if (DECL_ASSEMBLER_NAME_SET_P (fn))
+       {
+         symtab_node *n1 = symtab_node_for_asm (DECL_ASSEMBLER_NAME (fn));
+         symtab_node *n2 = symtab_get_node (cfun->decl);
+         if (n1 == n2
+             || (n1->alias && symtab_alias_target (n1) == n2))
+           return;
+       }
       partition->kind = PKIND_MEMSET;
       partition->main_dr = single_store;
       partition->niter = nb_iter;


fixes the following testcase:

typedef __SIZE_TYPE__ size_t;
extern void *
memset (void *s, int c, size_t n) __attribute__ ((weak, alias("_memset")));

void *
_memset(void *s, int c, size_t n)
{
  char *q = (char *)s;
  while (n != 0)
    {
      *(q++) = c;
      n--;
    }
}

it won't fix glibc as that uses

asm(".alias ....");

for the aliases which we don't parse.

It of course also fixes the very direct recursion.  At least if
the assembler name of the builtin agrees with that of the function.

Honza, is there a more "fancy" way of doing this?
Comment 28 Rich Felker 2014-04-29 14:47:36 UTC
On Tue, Apr 29, 2014 at 02:16:38PM +0000, rguenth at gcc dot gnu.org wrote:
> Honza, is there a more "fancy" way of doing this?

The only correct way to fix this is to honor -ffreestanding and never
generate references to hosted-C functions (which include memset) when
-ffreestanding is used.
Comment 29 Richard Biener 2014-05-06 10:50:59 UTC
(In reply to Rich Felker from comment #28)
> On Tue, Apr 29, 2014 at 02:16:38PM +0000, rguenth at gcc dot gnu.org wrote:
> > Honza, is there a more "fancy" way of doing this?
> 
> The only correct way to fix this is to honor -ffreestanding and never
> generate references to hosted-C functions (which include memset) when
> -ffreestanding is used.

Done that for 4.8+ now (bah, forgot to reference the PR in the changelog
so the commits don't appear here).  But I still like to fix the obvious
wrong cases in some way.
Comment 30 Richard Biener 2014-05-06 10:52:01 UTC
Thus, from 4.8.3, 4.9.1 and 4.10.0 on -ffreestanding, -fno-hosted and -fno-builtin
will cause -ftree-loop-distribute-patterns to _not_ be enabled by default
with -O3+ (you can still enable it manually).
Comment 31 M Welinder 2014-06-06 10:40:22 UTC
Extra complication: the C library's memcpy may change errno to any non-zero
value if it so desires.  (C99 section 7.5 #5.)

That means that raw calls to memcpy (and friends) cannot be generated anywhere
where the compiler is unable to prove that the value of errno isn't used.
Extra code to store and restore errno must be emitted otherwise.
Comment 32 rguenther@suse.de 2014-06-06 11:54:18 UTC
On Fri, 6 Jun 2014, terra at gnome dot org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
> 
> M Welinder <terra at gnome dot org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |terra at gnome dot org
> 
> --- Comment #31 from M Welinder <terra at gnome dot org> ---
> Extra complication: the C library's memcpy may change errno to any non-zero
> value if it so desires.  (C99 section 7.5 #5.)

That's news to me.

> That means that raw calls to memcpy (and friends) cannot be generated anywhere
> where the compiler is unable to prove that the value of errno isn't used.

That's almost impossible.

> Extra code to store and restore errno must be emitted otherwise.

That is not possible.

Note that the compiler emits calls to memcpy for struct copies anyway,
so if there is a problem it is a long-standing one.
Comment 33 Jakub Jelinek 2014-06-06 12:19:01 UTC
Yeah, I'd say we could document that gcc doesn't support any implementations where memcpy/memmove/memset clobber errno.
Comment 34 Evan Langlois 2014-11-03 07:09:07 UTC
Grub-2.00 (grub-mkimage utility) will crash with -O3 because of this bug, using gcc 4.8.2.  GDB shows it going into an infinite loop calling memset() until it segfaults.  I added the -fno-tree-loop-distribute-patterns and it created code that doesn't barf on itself.

This is definately a bug and a pretty serious one.
Comment 35 Richard Biener 2016-02-19 14:17:48 UTC
Let's try "fixing" this finally for GCC 6.  Still waiting for Honza for comment #27 (lets put that in a symtab->equal_to (enum built_in_function) function).

Similar issue is present for malloc + memset -> calloc.
Comment 36 Richard Biener 2016-04-26 09:33:55 UTC
*** Bug 70798 has been marked as a duplicate of this bug. ***
Comment 37 Marc Glisse 2017-11-05 16:59:06 UTC
*** Bug 82845 has been marked as a duplicate of this bug. ***
Comment 38 Marc Glisse 2017-11-05 20:51:59 UTC
*** Bug 82845 has been marked as a duplicate of this bug. ***