This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [rfc] multi-word subreg lowering pass


On Mon, 29 May 2006, Daniel Jacobowitz wrote:
> If a value is worth loading into the coprocessor, then it should be
> treated as 64-bit.  If it's not, though, it would be wonderful to split
> it and handle the two halves separately.  Will splitting 64-bit values
> at expand time make it hard to use 64-bit registers for them later?

Interesting.  I'll admit that I've not given much thought to using
this strategy on targets that have hard registers that support "mode",
but on which some operations may be better implemented via decomposing
that mode.

It sounds as though the iWMMXt ARM is a more pronounced example of x86_64
where you have all of the original 32-bit operations, but additionally
64-bit registers and operations on them.  In my current experiments, I've
been completely disabling SUBREG/CONCAT lowering during RTL expansion on
such TARGET_64BIT targets.  I've been thinking more about processors that
currently "white lie to reload" about having multiword register classes,
and then always split them post-reload (with the associated penalties of
not being able allocate the pieces independently or optimize some of the
lower-level operations away).

I suspect the current scheme where iWMMXt-like systems present two
register classes ("hard" and "multi") to reload which then decides where
to put pseudos, finally splitting those deemed best to be done in 32-bit
post-reload is the best compromise we have for now.  i.e. a circumstance
where the little white lie is justified.  However, in such a scheme
the quality of register allocation is even more critical  :-(


But to more directly answer your question, there are a number of open
PRs, including PR middle-end/22141, which highlight GCC's inability to
promote multiple operations in one mode to a single operation in a
wider mode.  Given combine's limited lookahead and our inability to
reason about concatenated registers in the RTL optimizers, I think
keeping DImode operations in one piece is clearly to be prefered if
there's some potential backend benefit.  However, on targets such as
x86, AVR and regular ARM where you know in advance that all operations
must eventually be decomposed, its better to "bite the bullet" and do
that earlier during the transition from tree-ssa to RTL.


Prior to tree-ssa, backends had to emulate wide modes if they were to
have any chance of being "globally" optimized at a high-level by the
RTL optimizers.  Now that that role is filled by tree-ssa, RTL can more
accurately reflect the target hardware and the role of RTL optimizers
is increasingly to capture machine specific idioms.  But as you correctly
point out, we still make critical high-level instruction selection and
code generation decisions very late during register allocation.


I'll give it some thought.  If anyone has some good suggestions for
handling operation lowering on WMMX-like targets I'd be interested in
hearing their opinions.

Thanks for pointing out this issue.  I hadn't previously given it
much consideration, and unfortunately my proposed solution isn't much
help for tackling this problem on targets like those you describe.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]