This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PR tree-optimization/83022] malloc/memset->calloc too aggressive

From: Jeff Law <law at redhat dot com>
To: Nathan Sidwell <nathan at acm dot org>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Cc: marc dot glisse at inria dot fr
Date: Fri, 17 Nov 2017 12:09:26 -0700
Subject: Re: [PR tree-optimization/83022] malloc/memset->calloc too aggressive
Authentication-results: sourceware.org; auth=none
References: <f4b5d106-8176-b7bd-709b-d435188783b0@acm.org> <c5c80857-2bf0-93df-4835-c3f75ea05786@redhat.com> <23d2990c-6b10-ec7b-e0b7-e638c0d0a62a@acm.org>

On 11/17/2017 11:57 AM, Nathan Sidwell wrote:
> On 11/17/2017 01:37 PM, Jeff Law wrote:
> 
>> ISTM the better way to drive this is to query the branch probabilities.
>> It'd probably be simpler too.  Is there some reason that's not a good
>> solution?
> 
> (a) I'd have to learn how to do that
Yea, but that's easy :-)  I think you just need to look at the edge
> 
> (b) in the case where the condition is just a null check,
> ma.cc.046t.profile_estimate  considers the memset reachable 53.47% of
> the time (see defect 83023)
Which seems like an obvious goof to me.

> 
> when the condition is 'ptr && some_bool' we think it reachable 33% of
> the time.
> 
> It's not clear to me what a sensible threshold might be. I suppose more
> realistic probabilities are 99.99% in the first case and 50% in the
> second case?
Yea, in isolation that's what I'd expect.  Jan or Martin L might be able
to provide guidance on how to fix this.  Odds are we just need a
predictor that says allocators almost always return non-null.


> 
> (c) the performance skew is presumably proportional to the size
> parameter. small size is probably swamped by the allocation effort
> itself.  A large size, the memset cost might start to dominate.
> Profiling shows that it is the kernel burning this in flushing the tlb
> during a syscall.
Agreed.  I'd come to the same conclusion as well -- it'd have to be a
combination of the branch probably and the size.  Given we don't have
any good data, I'd be willing to go with using the benchmark to guide
selection.


> My guess is that the useage pattern repeatedly allocates and frees a
> large chunk of uninitialized memory.  That ends up not being syscally at
> all.  With the change to use calloc, each of those allocations turns out
> to be a large TLB churn getting read-as-zero anonymous pages.  And
> possibly similar churn returning freed pages to the OS.
Right.  As the request gets large glibc will switch you to just getting
one or more mmap'd pages which are likely backed anonymously.  So it's
relatively cheap.  But once the memset touches all of them, well, you lose.

I guess ultimately I'm not rejecting the patch, I just think that
there's a better solution that (in theory) isn't a ton of work.  We have
to fix the predictor, query the probabilities and plug the result into
the heuristic.  If you could poke a little here it'd be appreciated.  If
it gets ugly, then I'm comfortable returning to your patch.

jeff

References:
- [PR tree-optimization/83022] malloc/memset->calloc too aggressive
  - From: Nathan Sidwell
- Re: [PR tree-optimization/83022] malloc/memset->calloc too aggressive
  - From: Jeff Law
- Re: [PR tree-optimization/83022] malloc/memset->calloc too aggressive
  - From: Nathan Sidwell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]