[PATCH] Exploiting dual mode operation, implementation.

Leehod Baruch LEEHOD@il.ibm.com
Thu Aug 4 12:03:00 GMT 2005


> > I don't think that this is the point of the this optimization.
> > The sign-extensions are currently generated right after-the-def,
> > so the producer *is* currently merged with the sign-extension
> > by the combiner.
>
> No, I don't think so.  When I've examined this problem in the
> past, a typical failure case was:
>
>    (set (reg:si 100) (mem:si))
>    (set (reg:di 101) (sign_extend:di (reg:si 100)))
>    ...
>    (use (reg:si 100))
>    (use (reg:di 101))
>

I think that now I understand why we disagreed on "the main point of
the algorithm".
ppc doesn't have such patterns.
In ppc the above pattern would look like this:

    (set (reg:si 100) (mem:si))
    (set (reg:di 101) (sign_extend:di (reg:si 100)))
    ...
    (use (subreg:si (reg:di 101) 4))
    (use (reg:di 101))
The SImode register that holds the lowpart always dies. It is only
temporary. All the uses are of reg/subreg of the DImode register.
This is why the combiner always succeeds in combining the
definition with the extension. You can also see that the assembly code
always contains load+extension and not just load followed by extension
even without this optimization.
Since there are many webs in which all the uses are of subregs, the
extensions is redundant and could be eliminated. Combiner could not
eliminate these extensions since the uses are not in the same
bb as the def. This is the reason for performance improvement on ppc.

If the pattern you specified is typical (on which architecture?) than you
are right, the improvement of the algorithm will be contributed
mostly by the producer side.

> Perhaps the setting of PROMOTE_MODE on ppc belies that assumption.
> I wonder if the bulk of this problem can be solved simply by using
> the PROMOTE_MODE definition from ia64.

Do you still think that this is relevant. If so, it may be
interesting.


> Find all uses for a def, looking through sign-extensions.  Of those
> uses, find the largest mode.  Use that mode as the mode to which the
> def must be extended.  All other uses will use a subreg of that
> extended result.

I'm afraid that I'm not following you.
I would like to take the discussion a couple of steps back to the web
I've introduced before:


                         def1            def2
                          se1            se2
                            \           /
                             \         /
                              \       /
                               \     /
                                \   /
                                 \ /
                                 use

def1 + se1:
set ((reg:SI 10) (..def1rhs..))
set ((reg:DI 100) (sign_extend:DI (reg:SI 10)))

def2 + se2:
set ((reg:HI 20) (..def2rhs..))
set ((reg:DI 100) (sign_extend:DI (reg:HI 20)))

use:
(use (subreg:SI (reg:DI 100) 4))

As I see it, the redundancy/partial redundancy expression is the
(sign_extend:WIDEmode (reg:NARROWmode r)) expression.

I *can't* use the partial redundancy elimination on the
(sign_extend:DI (reg:HI 20)) expression on this web.

What I can do is is to use partial redundancy elimination on the
(sign_extend:DI (reg:SI 10)) expressions and treat def2+se2 as if
they already contain such an expressions.

This will leave me with the optimal solution for this case - se1 will
be eliminated and se2 will remain the same.

But this is not the general case.
A different case could look like this:

                         def1            def2
                          se1            se2
                            \           / |
                             \         /  |
                              \       /   |
                               \     /    |
                                \   /     |
                                 \ /      |
                                 use     use2

where def1, def2, se1, se2 and use are the same.
use2:
(use (subreg:HI (reg:DI 100) 4))

In this case the optimal solution would be:

                         def1            def2
                            \           / |
                             \         /  |
                              \      se2  |
                               \     /    |
                                \   /     |
                                 \ /      |
                                 use     use2
And the optimization won't fined it since we can't use
redundancy elimination for this expressions:
(sign_extend:DI (reg:HI 20)) in this web.

As I see it, this is what this optimization do.
1. Do you see something that I don't? If so, please, don't be laconic.
2. Do you think that this is not enough?
3. Isn't it (different types of extensions) a rare case? Don't
forget that we are trying to solve the common problem of 32 bit
computation on 64 architecture therefore most of the extensions
are from 32 bit to 64 bit.

Thanks,
Leehod.



More information about the Gcc-patches mailing list