This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: using scratchpads to enhance RTL-level if-conversion: revised patch


On Tue, Oct 20, 2015 at 7:43 AM, Jeff Law <law@redhat.com> wrote:
> On 10/14/2015 01:15 PM, Bernd Schmidt wrote:
>>
>> On 10/14/2015 07:43 PM, Jeff Law wrote:
>>>
>>> Obviously some pessimization relative to current code is necessary to
>>> fix some of the problems WRT thread safety and avoiding things like
>>> introducing faults in code which did not previously fault.
>>
>>
>> Huh? This patch is purely an (attempt at) optimization, not something
>> that fixes any problems.
>
> Then I must be mentally merging two things Abe has been working on then.
> He's certainly had an if-converter patch that was designed to avoid
> introducing races in code that didn't previously have races.
>
> Looking back through the archives that appears to be the case. His patches
> to avoid racing are for the tree level if converter, not the RTL if
> converter.

Even for the tree level this wasn't the case, he just run into a bug
of the existing
converter that I've fixed meanwhile.

> Sigh, sorry for the confusion.  It's totally my fault.  Assuming Abe doesn't
> have a correctness case at all here, then I don't see any way for the code
> to go forward as-is since it's likely making things significantly worse.
>
>>
>> I can't test valgrind right now, it fails to run on my machine, but I
>> guess it could adapt to allow stores slightly below the stack (maybe
>> warning once)? It seems like a bit of an edge case to worry about, but
>> if supporting it is critical and it can't be changed to adapt to new
>> optimizations, then I think we're probably better off entirely without
>> this scratchpad transformation.
>>
>> Alternatively I can think of a few other possible approaches which
>> wouldn't require this kind of bloat:
>>   * add support for allocating space in the stack redzone. That could be
>>     interesting for the register allocator as well. Would help only
>>     x86_64, but that's a large fraction of gcc's userbase.
>>   * add support for opportunistically finding unused alignment padding
>>     in the existing stack frame. Less likely to work but would produce
>>     better results when it does.
>>   * on embedded targets we probably don't have to worry about valgrind,
>>     so do the optimal (sp - x) thing there
>>   * allocate a single global as the dummy target. Might be more
>>     expensive to load the address on some targets though.
>>   * at least find a way to express costs for this transformation.
>>     Difficult since you don't yet necessarily know if the function is
>>     going to have a stack frame. Hence, IMO this approach is flawed.
>>     (You'll still want cost estimates even when not allocating stuff in
>>     the normal stack frame, because generated code will still execute
>>     between two and four extra instructions).
>
> One could argue these should all be on the table.  However, I tend to really
> dislike using area beyond the current stack.  I realize it's throw-away
> data, but it just seems like a bad idea to me -- even on embedded targets
> that don't support valgrind.
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]