Created attachment 42623 [details] exemplar We like to optimize malloc followed by memset into a calloc call. Even when the memset is conditional. That's well formed, but pessimizes, and noticeable when the size is large and we do unnecessary clearing. The attached example, compiled on x86_64 with -O results in: _Z1mmb: movl $1, %esi jmp calloc But, it causes a noticeable performance regression, as 'c' is false sufficiently often and 's' is large sufficiently often.
I am pretty sure this was discussed when the patch was reviewed. IIRC the original patch was specifically pattern-matching if(p!=0) (with p the result of malloc) as the only acceptable condition between malloc and memset for this transformation, but the move to the strlen pass made that inconvenient and it was decided that always doing the transformation was ok (I hope I am not rewriting history). If there is a way to check the probability of reaching the call to memset from the call to malloc (preferably conditional to the fact that malloc returned something != 0), that could be checked before generating calloc, but that may not be easy... (compare the local count of the 2 BBs? That's 10000 and 3300 here, but I see you have already filed PR 83023 to improve it) Did you actually hit a measurable slowdown in a real application?
Yes, this is a measurable degradation in going from gcc 4.9 -> 5.0 with myrocks DB. Apparently 25% more queries/sec with gcc 4.9 (I think it's from a benchmark). Profiling (by others) has fingered this optimization. I am in the process of implementing a check to see if the only condition between the malloc and memset is 'ptr != 0'. I noticed the crazy default probabilities during investigating this. I think that's really a second-order problem.
working on a patch. This also occurred in mysql 5.6, and the 5.7 release refactored the code to avoid the problem.
(In reply to Nathan Sidwell from comment #2) > I noticed the crazy default probabilities during investigating this. I > think that's really a second-order problem. It's just that comparing the frequencies (or however they are called now) could be an easier and less restrictive check than exact CFG matching.
void *my_malloc(size_t size, int my_flags) { void* point = malloc(size); if (my_flags & 32) memset(point, 0, size); return point; } => my_malloc(unsigned long, int): mov esi, 1 jmp calloc --- If GCC supports -fsanitize=memory, note that this transformation should be disabled as well to not lose error checking.
The issue succeeded to waste some time of MySQL developers BTW: http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html