This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [rfc] multi-word subreg lowering pass


Roger Sayle wrote:
>On Mon, 29 May 2006, Daniel Jacobowitz wrote:
>> If a value is worth loading into the coprocessor, then it should be
>> treated as 64-bit.  If it's not, though, it would be wonderful to split
>> it and handle the two halves separately.  Will splitting 64-bit values
>> at expand time make it hard to use 64-bit registers for them later?

>It sounds as though the iWMMXt ARM is a more pronounced example of x86_64
>where you have all of the original 32-bit operations, but additionally
>64-bit registers and operations on them.  In my current experiments, I've
>been completely disabling SUBREG/CONCAT lowering during RTL expansion on
>such TARGET_64BIT targets.  I've been thinking more about processors that
>currently "white lie to reload" about having multiword register classes,
>and then always split them post-reload (with the associated penalties of
>not being able allocate the pieces independently or optimize some of the
>lower-level operations away).

>I suspect the current scheme where iWMMXt-like systems present two
>register classes ("hard" and "multi") to reload which then decides where
>to put pseudos, finally splitting those deemed best to be done in 32-bit
>post-reload is the best compromise we have for now.  i.e. a circumstance
>where the little white lie is justified.  However, in such a scheme
>the quality of register allocation is even more critical  :-(
In this case one might also try to emit DImode RTL, and use late splitting 
(splitting after the initial RTL passes but before register allocation) and 
use simple heuristics for the decision whether or not to split. E.g. one 
might check if the last setter of the DImode register was a sign/zero extend 
operation or a previous DImode arithmetics result. This could probably be 
realized with not too much difficulty by using the use-def chain that is also 
used by combine.?
The first splitting pass would already be right at the correct location. The 
only thing that one possibly would want between splitting and allocation is 
possibly a CSE pass on the affected basic blocks.?

>But to more directly answer your question, there are a number of open
>PRs, including PR middle-end/22141, which highlight GCC's inability to
>promote multiple operations in one mode to a single operation in a
>wider mode.  Given combine's limited lookahead and our inability to
>reason about concatenated registers in the RTL optimizers, I think
>keeping DImode operations in one piece is clearly to be prefered if
>there's some potential backend benefit.  However, on targets such as
>x86, AVR and regular ARM where you know in advance that all operations
>must eventually be decomposed, its better to "bite the bullet" and do
>that earlier during the transition from tree-ssa to RTL.
Concerning AVR I disagree. At the beginning I had been strong advocate of 
lowering at expand. However, after closely analyzing gcc and quite a number 
of tests I come to the conclusion that one will definitely loose more than 
one gains. In comparison to lowering at expand we are better off with avr's 
present lowering at text output.
The question, *when* to do the lowering is, IMO, of quite some importance for 
the smaller targets. I understand that one might be reluctant to let some gcc 
targets make much of the work on the RTL level (so that one never could get 
rid of some of the uglier aspects of RTL). However, the only other solution 
that I see for obtaining compareable performance, e.g. for AVR or HC12,  
would be to insert lots of target dependencies at the Tree level.?

>Prior to tree-ssa, backends had to emulate wide modes if they were to
>have any chance of being "globally" optimized at a high-level by the
>RTL optimizers.  Now that that role is filled by tree-ssa, RTL can more
>accurately reflect the target hardware and the role of RTL optimizers
>is increasingly to capture machine specific idioms. 
... but still in order to capture these, you may require to have a fairly 
high-level representation ... This seems to be the case for these ARM targets 
and definitely is the case for the two ports that I know better (AVR and 
HC12).
>But as you correctly 
>point out, we still make critical high-level instruction selection and
>code generation decisions very late during register allocation.

>I'll give it some thought.  If anyone has some good suggestions for
>handling operation lowering on WMMX-like targets I'd be interested in
>hearing their opinions.

>Thanks for pointing out this issue.  I hadn't previously given it
>much consideration, and unfortunately my proposed solution isn't much
>help for tackling this problem on targets like those you describe.

>Roger


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]