This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

combiner weaknesses (Was: Re: An issue for the SC: horrible documentation quality of GCC)


Richard Earnshaw:
> Which, in RTL is
> 
>       (set b (plus (mult d 4) b))
>       (set c (plus (mult d 4) c))
>     
>     into
>   
>       (set t1 (mult d 4))
>       (set b (plus t1 b))
>       (set c (plus t1 c))
>     
>     is a de-optimization (takes 3 cycles instead of 2).

Richard Kenner:
> It's long been on my list to try to see if can find some way to turn the
> latter back into the former.  The problem is that combine won't do it 
> because it doesn't know it can kill T1.  But I've never come up with
> any way that does it.  I think this sort of thing is very common,
> especially with addressing modes on CISC machines.

The problem is that combine is too rigid in the number and interrelatedness
if instructions that it might consider.  It is advertised in the
documentation as a machine-independent peephole optimizer, and while
it is fairly machine-independent, it can be hardly compared to a peephole
optimizer in its scope.
It knows only the patterns:
- i1 feeds i2 which in turn feeds i3
- i1 feeds i2 and i3
- i1 and i2 feed i3
while Richard Earnshaw's example can be described as:
- i1 feeds i2 and i3
and there are also often cases that look like:
- i0 feeds i1 feeds i2 feeds i3 ...
and there are also cases where we'd really like to split a combination
into an equal or higher number of instructions, but with a lower total cost.

I suppose we'd first have to make the code
more general, i.e. instead of i1, i2 and i3, just have one instruction
as a marker where we are in the function, and match this and its predecessors
against a set of rules.  We'd want a higher maximum count of instructions
to be considered, patterns that need not culminate in a single instruction
which is the ultimate consumer of the inputs, and also more data dependencies
considered - i.e. if a register is mentioned in an address, and the
target has auto-increment, than there is the potential to change the value
of the register.
Considering the amount of work subst does, making the caller rule-driven
rather than a heap of ad-hoc code shouldn't add too much overhead.
On the other hand, an indiscriminate expansion of the search space into
more varied instruction combinations of more instructions might cause a
noticable slowdown - that could be an awfully large number of non-deterministic
finite state automatons running simultanously, or lots of looping code to cover
all the possibilities.  So it might be best to have just a handful of target
independent rules - equivalent to what we currently have, and maybe a bit
more - and allow the target to define extra rules, a bit like a peephole
in that it can match for the presence of specific operations, use of
particular registers or addressing modes, and data dependencies, but
unlike the current peepholes, wildcard patterns / operations should be
allowed, the sequence is only loosely defined by a dependency graph and
unrelated instruction in-between are allowed as long as they don't interfere
with the data operated on, and the actual substitution need not be specified.

-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]