This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)


On 28.01.2016 23:17, Richard Sandiford wrote:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>> On 26.01.2016 22:18, Richard Sandiford wrote:
>>> [cc-ing Eric as RTL maintainer]
>>>
>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>>>>> Has the patch been tested beyond just building GCC? I can do a
>>>>>> test run for you if you don't have things set up to do one yourself.
>>>>>
>>>>> I built a cross-gcc with all languages and a cross-glibc, but I have
>>>>> not set up an emulation environment, so if you could give it a test
>>>>> that would be highly welcome.
>>>>
>>>> mipsel-linux-gnu test results are the same before and after this patch.
>>>>
>>>> Please go ahead and commit.
>>>
>>> I still object to this.  And it feels like the patch was posted
>>> as though it was a new one in order to avoid answering the objections
>>> that were raised when it was last posted:
>>>
>>>     https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html
>>>
>>
>> Richard, I am really sorry when you feel now like I did not take your
>> objections seriously.  Let me first explain what happened from my point
>> of view:
>>
>> When I posted this response to your objections here:
>>
>>    https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html
>>
>> I was waiting for your response, but nothing happened, so I kind of
>> forgot about the issue.  In the meantime Ubuntu and Debian began to
>> roll out GCC-6 and they got stuck at the same issue, but instead of
>> waiting for pr69012 to be eventually resolved they created a duplicate
>> pr69129, and then last Thursday Nick applied my initial patch without
>> my intervention, probably because of the pressure they put on us.
>
> Ah, I'd missed that, sorry.  It's not obvious from the email thread
> that the patch was actually approved.  I just see a message from Matthias
> saying that it worked for him and then a message from Nick saying that
> he'd applied it.
>
> Ah well.  I guess at some point I have to get over the fact that I'm no
> longer the MIPS maintainer :-)
>
>> That changed significantly how I looked at the issue after that point,
>> as there was no actual regression anymore, the revised patch still made
>> sense, but for another reason.  When you look at the 20+ targets in the
>> gcc tree you'll see that almost all of them have a frame-layout
>> computation function and all except mips have a shortcut
>> "if (reload_completed) return;" in that function.  And OTOH mips has
>> one of the most complicated frame layout functions of all targets.
>>
>> For all of these reasons I posted a new patch which tries to resolve
>> differences between mips and other targets inital_elimination_offset
>> functions.
>
> OK.  But the point still stands that the patch is only useful because
> we're now calling mips_compute_frame_info in cases where we wouldn't
> previously, because of the rtx_addr_can_trap_p changes.
>

Yes of course, but it cannot hurt to have all targets behave
identical in such a central point.  Even if at some point also the
implementation in rtx_addr_can_trap_p will eventually be improved in
one way or the other.

>> I still think that it is not OK for the mips target to do the frame
>> layout already in mips_frame_pointer_required because the frame layout
>> will change several times, until reload is completed, and that function
>> is only called right in the beginning.
>
> I don't think it's any better or worse than doing the frame layout in
> INITIAL_ELIMINATION_OFFSET (which is common practice and pretty much
> required).  They're both part of the initial setup phase --
> targetm.frame_layout_required determines frame_pointer_needed, which is
> a vital input to the code that decides which eliminations to make.
>

Hmm... That can also be a difference between LRA and traditional reload.
Reload calls targetm.frame_pointer_required in update_eliminables, and
can apparently handle 0=>1 and 1=>0 transitions here.

While LRA calls frame_pointer_required only once in
ira_setup_eliminable_regset and can only handle 0=>1 transitions of
frame_pointer_needed in setup_can_eliminate, but not based on
targetm.frame_pointer_required but
targetm.can_eliminate(FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM).

At least it I read the source correctly, I could be wrong of course.

> And this is an example of us doing the kind of caching that I was
> suggesting.  Code that wants to know whether the frame pointer is needed
> for the current function should use frame_pointer_needed.  Only the code
> that sets up frame_pointer_needed should call frame_pointer_required
> directly.
>

But frame_pointer_needed adds at least some value to the raw
targetm.frame_pointer_required, while a cached result of
initial_elimination_offset would not, just conserve the current
interface exactly as it is.

>> And I think that it is not really well-designed to have a frame layout
>> function F(x,y)->z unless you assume F to be a trivial O(1) function
>> that has no significant computation overhead.  When you assume F to be
>> an expensive O(N) or higher complexity you would design the interface
>> like compute_frame_layout_now() and
>> get_cached_initial_elimination_offset(x,y)->z.
>>
>> Also note, that with the current design, of initial_elimination_offset
>> the frame layout is already computed several times, because reload has
>> to call the function with different parameters, and there is no way how
>> to know when the cached value can be used and when not.
>
> Agreed -- the current interface is pretty poor.  But that's even more
> reason not to add uses of it after LRA/reload.  Caching would be both
> simpler and more efficient.
>
>> I do also think that having to cache the initial elimination offset
>> values in the back-end that are already cached in the target does not
>> improve anything.  Then I would prefer to have a set of new target
>> callbacks that makes it easy to access the cached target values.
>
> I don't agree.  Asking each backend to cache a particular value in
> its frame_info structure and then providing a hook to query that
> cached value means adding code to every target.  And calling an
> indirect function is going to much less efficient than accessing
> a structure field.
>
> It seems better to have one piece of code (or maybe two, until
> reload goes away) that caches the information that the target-independent
> code needs.
>

It is for sure a matter of personal taste, nothing more.

And yes, it is do-able, and not even really complicated, but it would
not improve anything, just conserve the interface as it is (or as it was
before I spoiled it :).

I would rather like to improve something at this point.  For instance
I think it would be good to split the initial_elimination_offset
function in an optional compute_initial_frame_layout function, doing
only the O(N) computation and leave the actual O(1) querying of the
result in the initial_elimination_offset function.  If the target does
not implement the compute_initial_frame_layout the default just does
nothing, and the initial_elimination_offset function does not have
to change, but if the target implements the optional
compute_initial_frame_layout, it can rely on that function to be called
immediately before the initial_elimination_offset, which will then be
only an O(1) function.  That would at least improve a tiny bit over the
current state of the initial_elimination_offset function.


Tanks
Bernd.

> I think that applies to other frame-related information too.  At the
> moment there's no simple way for the postreload passes to tell which
> registers are available for use and which have to be left untouched,
> so we get complicated tests like:
>
>    for (i = nregs - 1; i >= 0; --i)
>      if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i)
> 	|| fixed_regs[new_reg + i]
> 	|| global_regs[new_reg + i]
> 	/* Can't use regs which aren't saved by the prologue.  */
> 	|| (! df_regs_ever_live_p (new_reg + i)
> 	    && ! call_used_regs[new_reg + i])
> #ifdef LEAF_REGISTERS
> 	/* We can't use a non-leaf register if we're in a
> 	   leaf function.  */
> 	|| (crtl->is_leaf
> 	    && !LEAF_REGISTERS[new_reg + i])
> #endif
> 	|| ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i))
>        return false;
>
> This is an example of relying (to some extent) on querying the target
> code every time we want to know something, but surely it would be
> better to have LRA/reload create a single "usable registers" HARD_REG_SET.
> (I have an old patch series for that but never found time to clean it up
> and submit it.)
>
>> If you want to propose a patch for that I will not object, but I doubt
>> it will be suitable for Stage 4.
>
> Fair enough.  Guess I just have to learn to live with it.
>
> Thanks,
> Richard
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]