Bug 49872 - Missed optimization: Could coalesce neighboring memsets
Summary: Missed optimization: Could coalesce neighboring memsets
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 86017 114797 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-07-27 16:51 UTC by Steinar H. Gunderson
Modified: 2024-04-21 19:26 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-11-09 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steinar H. Gunderson 2011-07-27 16:51:03 UTC
Given the following code:

#include <string.h>

struct S {
	int f[1024];
	int g[1024];
};

void func(struct S* s)
{
	memset(s->f, 0, sizeof(s->f));
	memset(s->g, 0, sizeof(s->g));
}

GCC currently generates two memsets. The code with -O2 is a bit hard to read, so I'm just pasting the -Os assembly for clarity:

00000000 <func>:
   0:	55                   	push   %ebp
   1:	31 c0                	xor    %eax,%eax
   3:	89 e5                	mov    %esp,%ebp
   5:	b9 00 04 00 00       	mov    $0x400,%ecx
   a:	57                   	push   %edi
   b:	8b 7d 08             	mov    0x8(%ebp),%edi
   e:	f3 ab                	rep stos %eax,%es:(%edi)
  10:	8b 55 08             	mov    0x8(%ebp),%edx
  13:	66 b9 00 04          	mov    $0x400,%cx
  17:	81 c2 00 10 00 00    	add    $0x1000,%edx
  1d:	89 d7                	mov    %edx,%edi
  1f:	f3 ab                	rep stos %eax,%es:(%edi)
  21:	5f                   	pop    %edi
  22:	5d                   	pop    %ebp
  23:	c3                   	ret    

Ideally GCC should also be able to coalesce this together with memsets not written as memset, e.g. s->g[0] = 0;.
Comment 1 Richard Biener 2011-07-28 10:00:34 UTC
Confirmed.

It should be hard to implement this, but did you see this in real-world code
(to asses the importance of this optimization)?

Other cases would include (partially) overlapping memsets (or related
functions such as memory copying routines and the respective string variants).

We already optimize some cases (that appeared in GCC IIRC) in tree-ssa-forwprop.c,
namely

/* *GSI_P is a GIMPLE_CALL to a builtin function.
   Optimize
   memcpy (p, "abcd", 4);
   memset (p + 4, ' ', 3);
   into
   memcpy (p, "abcd   ", 7);
   call if the latter can be stored by pieces during expansion.  */
Comment 2 Steinar H. Gunderson 2011-07-28 10:09:51 UTC
I'm not sure if I've seen exactly this construction in real-world code, but I've certainly seen examples of the hybrid I sketched out (looking at one was what motivated me to file the bug), ie. something like:

struct S {
    int f[1024];
    int g;
};

void func(struct S* s)
{
    memset(s->f, 0, sizeof(s->f));
    s->g = 0;
}

which I would argue should be rewritten to

void func(struct S* s)
{
    memset(s->f, 0, sizeof(s->f) + sizeof(s->g));
}

I'd argue that programmers should not be doing this kind of optimization themselves, since it's very prone to break when changing the structure, especially as alignment etc. comes into play.
Comment 3 Andrew Pinski 2021-08-21 23:57:14 UTC
*** Bug 86017 has been marked as a duplicate of this bug. ***
Comment 4 Andrew Pinski 2024-04-21 19:26:47 UTC
*** Bug 114797 has been marked as a duplicate of this bug. ***