Bug 83022 - malloc & memset -> calloc is not always an optimization
Summary: malloc & memset -> calloc is not always an optimization
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: ---
Assignee: Nathan Sidwell
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2017-11-16 18:40 UTC by Nathan Sidwell
Modified: 2021-08-20 18:21 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-11-17 00:00:00


Attachments
exemplar (118 bytes, text/x-csrc)
2017-11-16 18:40 UTC, Nathan Sidwell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nathan Sidwell 2017-11-16 18:40:43 UTC
Created attachment 42623 [details]
exemplar

We like to optimize malloc followed by memset into a calloc call.  Even when the memset is conditional.  That's well formed, but pessimizes, and noticeable when the size is large and we do unnecessary clearing.

The attached example, compiled on x86_64 with -O results in:
_Z1mmb:
	movl	$1, %esi
	jmp	calloc

But, it causes a noticeable performance regression, as 'c' is false sufficiently often and 's' is large sufficiently often.
Comment 1 Marc Glisse 2017-11-16 21:24:29 UTC
I am pretty sure this was discussed when the patch was reviewed. IIRC the original patch was specifically pattern-matching if(p!=0) (with p the result of malloc) as the only acceptable condition between malloc and memset for this transformation, but the move to the strlen pass made that inconvenient and it was decided that always doing the transformation was ok (I hope I am not rewriting history). If there is a way to check the probability of reaching the call to memset from the call to malloc (preferably conditional to the fact that malloc returned something != 0), that could be checked before generating calloc, but that may not be easy... (compare the local count of the 2 BBs? That's 10000 and 3300 here, but I see you have already filed PR 83023 to improve it)

Did you actually hit a measurable slowdown in a real application?
Comment 2 Nathan Sidwell 2017-11-16 22:10:06 UTC
Yes, this is a measurable degradation in going from gcc 4.9 -> 5.0 with myrocks DB.  Apparently 25% more queries/sec with gcc 4.9 (I think it's from a benchmark).  Profiling (by others) has fingered this optimization.

I am in the process of implementing a check to see if the only condition between the malloc and memset is 'ptr != 0'.

I noticed the crazy default probabilities during investigating this.  I think that's really a second-order problem.
Comment 3 Nathan Sidwell 2017-11-17 12:26:33 UTC
working on a patch.  This also occurred in mysql 5.6, and the 5.7 release refactored the code to avoid the problem.
Comment 4 Marc Glisse 2017-11-17 13:04:19 UTC
(In reply to Nathan Sidwell from comment #2)
> I noticed the crazy default probabilities during investigating this.  I
> think that's really a second-order problem.

It's just that comparing the frequencies (or however they are called now) could be an easier and less restrictive check than exact CFG matching.
Comment 5 Fangrui Song 2021-08-20 18:20:13 UTC
void *my_malloc(size_t size, int my_flags)
{
  void* point = malloc(size);
  if (my_flags & 32) memset(point, 0, size);
  return point;
}
=>
my_malloc(unsigned long, int):
        mov     esi, 1
        jmp     calloc

---

If GCC supports -fsanitize=memory, note that this transformation should be disabled as well to not lose error checking.
Comment 6 Fangrui Song 2021-08-20 18:21:32 UTC
The issue succeeded to waste some time of MySQL developers BTW: 
http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html