[PATCH] i386: Don't use frame pointer without stack access

Uros Bizjak ubizjak@gmail.com
Tue Aug 8 17:05:00 GMT 2017


On Tue, Aug 8, 2017 at 6:38 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
>>>>> this optimization removes more than 730
>>>>>
>>>>> pushq %rbp
>>>>> movq %rsp, %rbp
>>>>> popq %rbp
>>>>
>>>> If you don't want the frame pointer, why are you compiling with
>>>> -fno-omit-frame-pointer?  Are you going to add
>>>> -fforce-no-omit-frame-pointer or something similar so that people can
>>>> actually get what they are asking for?  This doesn't really make sense.
>>>> It is perfectly fine to omit frame pointer by default, when it isn't
>>>> required for something, but if the user asks for it, we shouldn't ignore his
>>>> request.
>>>>
>>>
>>>
>>> wanting a framepointer is very nice and desired...  ... but if the
>>> optimizer/ins scheduler moves instructions outside of the frame'd
>>> portion, (it does it for cases like below as well), the value is
>>> already negative for these functions that don't have stack use.
>>>
>>> <MPIDU_Sched_are_pending@@Base>:
>>> mov    all_schedules@@Base-0x38460,%rax
>>> push   %rbp
>>> mov    %rsp,%rbp
>>> pop    %rbp
>>> cmpq   $0x0,(%rax)
>>> setne  %al
>>> movzbl %al,%eax
>>> retq
>>
>> Yeah, and it could be even weirder for big single-block functions.
>> I think GCC has been doing this kind of scheduling of prologue and
>> epilogue instructions for a while, so there hasn*t really been a
>> guarantee which parts of the function will have a new FP and which
>> will still have the old one.
>>
>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1), shrink-wrapping
>> kicks in when the following is compiled with -O3 -fno-omit-frame-pointer:
>>
>>     void f (int *);
>>     void
>>     g (int *x)
>>     {
>>       for (int i = 0; i < 1000; ++i)
>>         x[i] += 1;
>>       if (x[0])
>>         {
>>           int temp;
>>           f (&temp);
>>         }
>>     }
>>
>> so only the block with the call to f sets up FP.  The relatively
>> long-running loop runs with the caller's FP.
>>
>> I hope we can go for a target-independent position that what HJ*s
>> patch does is OK...
>>
>
> In light of this,  I am resubmitting my patch.  I added 3 more testcases
> and also handle:
>
> typedef int v8si __attribute__ ((vector_size (32)));
>
> void
> foo (v8si *out_start, v8si *out_end, v8si *regions)
> {
>     v8si base = regions[3];
>     *out_start = base;
>     *out_end = base;
> }
>
> OK for trunk?

I think that the patch doesn't worsen the situation with FP debugging,
a couple of cases were presented where function operates on the caller
frame. Let's wait a bit for a counter-examples, where the patch hurts
debugging. IMO, the patch is the way to go, as shrink-wrapping is more
toxic than presented patch.

Uros.



More information about the Gcc-patches mailing list