This is the mail archive of the
mailing list for the GCC project.
[RFD] Simplifying subregs in LRA
- From: Matthew Fortune <Matthew dot Fortune at imgtec dot com>
- To: "Eric Botcazou (ebotcazou at adacore dot com)" <ebotcazou at adacore dot com>, "Alan Modra (amodra at gmail dot com)" <amodra at gmail dot com>, Vladimir Makarov <vmakarov at redhat dot com>, "wmi at google dot com" <wmi at google dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Wed, 1 Feb 2017 23:52:17 +0000
- Subject: [RFD] Simplifying subregs in LRA
- Authentication-results: sourceware.org; auth=none
I've copied you as you have each made some significant change to a function
in LRA which I guess makes you de-facto experts.
I've spent a while researching the history of simplify_operand_subreg and
in particular the behaviour for subregs of memory. For my sake if no-one
else here is a rundown of its evolution; corrections welcome.
The original code identified a few special cases where a subreg could
be trivially eliminated. Otherwise it introduced a reload for the inner
expression in expectation of the new subreg(reload-reg) to be handled
as part of operand reloading or get eliminated on the next iteration
A special case for an LRA introduced subreg was added (LRA_SUBREG_P) that
should always be considered valid. This I believe is to cope with cases
where a there are operands required to match but with different modes and,
presumably, one of the modes is not actually allowed. Not 100% sure what
this is though!
A special case for a paradoxical subreg (wider than a word) was added to
handle the case where the outer mode requires more registers than are
available in the allocno class. There is a seemingly pointless call to
PUT_MODE in this which from what I can see will set the same mode as
the reload register was just created with: "PUT_MODE (new_reg, mode);"
A special case to ensure a subreg(pseudo-reg) will get spilled to avoid
looping in LRA.
It now starts to get interesting.
Only simplify a subreg of memory if the new address is valid or that the
original address wasn't valid.
Introduces 'innermode' local which as far as I can tell is going to be
identical to the reg_mode argument to simplify_operand_subreg. References
to reg_mode are not fixed as part of this. This is a significant source
of confusion in the code.
Also simplifies subreg of CONSTANT_P expressions.
Also allow elimination of a subreg of memory when the address component
of the new MEM can be reloaded. Note that at this point we are specifically
allowing new MEMs to exist with invalid addresses that will be fixed up
later but this is OK as they will get reloaded.
Introduces MEM subreg simplification if the inner mode access would have
been slow anyway (a check that gets fixed in d7671d7). Also introduces a
double reload to satisfy some very specific rs6000 backend requirements.
This case requires an outer-mode reload but is implemented as a double
reload rather than just doing the outer mode reload and letting the next
iteration take care of the following step. It is not clear whether this
change actually fixes the same reported issues in two different ways due
to the change on the condition for simplifying the MEM subreg and the
double reload trick.
This change also affects almost all MEM subreg reloads as previously
any MEM subreg that was not simplified would have used the common innermode
reload logic later in the function; see code guarded by:
|| CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg))
but now many, if not all, use the double reload logic which is not always
necessary. I believe this change needs partly reverting and redoing as a
special case to insert an outermode reload for a MEM subreg if the goal
class for the operand has no register suitable for inner mode, then wait for
the following iteration to deal with the MEM subreg simplification. (Just a
theory for now.)
Fixes the new case of MEM subreg simplification from (r239342 git:2d2b78a)
This is another case of multiple changes where some were not critical and
overall there is a dangerous one here I believe. The primary aim of this
change is to reload the address before reloading the inner subreg. This
appears to be a dormant bug since day1 as the original logic would have
failed when reloading an inner mem if its address was not already valid.
The potentially fatal part of this change is the introduction of a
"return false" in the MEM subreg simplification which is placed immediately
after restoring the original subreg expression. I believe that if control
ever actually reaches this statement then LRA would infinite loop as the
MEM subreg would never be simplified. With the additional cases handled
by virtue of (401bd0c) then I believe there are very few reasons why control
would reach this point but presumably a MEM subreg on an operand with
a special memory constraint is one such case that would be fatal.
So what are the next steps!
1) [BUG] Add an exclusion for WORD_REGISTER_OPERATIONS because MIPS is currently
broken by the existing code. PR78660
2) [BUG] Remove the return false introduced in (r243782 git:856bd6f).
3) [CLEANUP] Remove reg_mode argument and replace all uses of reg_mode with
innermode. Rename 'reg' to 'inner' and 'operand' to 'outer' and 'mode' to
4) [OPTIMISATION] Change double-reload logic so that it just deals with the
special outermode reload without adjusting the subreg.
5) [??] Determine if big endian still needs a special case like in reload?
In an attempt to make a minimal change I propose the following as it allows
WORD_REGISTER_OPERATIONS targets to benefit from the invalid address reloading
fix. I think the check would be more appropriately placed on the outer-most
if (MEM_P (reg)) but this would affect the handling of many more subregs which
seems too dangerous at this point in release.
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 393cc69..771475a 100644
@@ -1512,10 +1512,11 @@ simplify_operand_subreg (int nop, machine_mode reg_mode)
equivalences in function lra_constraints) and because for spilled
pseudos we allocate stack memory enough for the biggest
corresponding paradoxical subreg. */
- if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
- && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
- || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
- && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))
+ if (!WORD_REGISTER_OPERATIONS
+ && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
+ && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
+ || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
+ && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))))
*curr_id->operand_loc[nop] = operand;
The change will affect at least arc,mips,rx,sh,sparc though I haven't checked
which of these default on for LRA just that they can turn on LRA.
I'll post this as a patch with appropriate updates to comments unless anyone
raises some issues.