This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Exploiting dual mode operation


> 1. Trying to solve the sign extension removal problem using the live 
> highpart 
> information has some limitations. For instance in the following case 
> (which
> appears during computation of array addresses):
> 
> i = sign extension i1;
> ....
> index = 64-bit shift of i  // the target and the source are 64 bits 
> 
> In some architectures we may get the same result without using an explicit 
> sign
> extension. As we understand it, your algorithm will found that the 
> highpart  of 
> "i" is live and the sign extension will not be discarded.

It depends. If you have a static right shift such that the highpart of the
value being shifted does not actually influence the result, the highpart
attribute can be 'ignore', and the sign extension can be eliminated.

But sign extension / shift combination can actually be handled generally
and much simpler in the combiner.  You only have to make sure that your
machine description contains the matching patterns, and it will just work -
no patches to the machine independent code required.

> Another example is:
>  
> int i, s;
> for (i = 0; i < N; i++)
> {
>    s1 = s + i;
>    s = sign extend s1;
> }
> return s;
> The sign extension is required for the return only, so the sign extension 
> can be 
> removed from the loop and placed before the return. The highpart of "s" is 
> live,
> but this information alone will not help to improve the code. 

Yes, this is not covered by the highpart liveness optimization.
The SHmedia intruction set has (among others) an addition instruction
that does a 32->64 bit sign extension of the result, so again this
can be handled by the combiner.
I have some across some code that uses short or unsigned short basic
induction variables, though.
I've written some patches for the loop optimizer to pre-condition the loop
so that it stops at the end or at the signed overflow, whichever is earlier,
and then use an outer loop to handle the sign extend.
If vector addition is available, that is also used to do a zero-extending
increment of a value that has been pre-conditioned to be zero-extended.

> 2. To exploit the dual mode operation, for instructions that uses the 
> result of 
> explicit sign extensions we need to found if it is possible to get the 
> same result 
> via an instruction that doesn't require explicit sign extensions. 
> Basically we 
> need to found if: 
> 
> s1 = sign extend s
> t1 = sign extend  t
> result = inst s1, t1 
> 
> can be replaced by an instruction inst1:
> 
> result = inst1 s, t.
> 
> But this seems similar with what combine does, so the information 
> from the description file should suffice.

Yes.  Just write a testcase, start gdb on cc1, set a breakpoint at
combine_instructions, start cc1 on your testcase, look for the patterns
you want combined, then set a breakpoint at try_combine with a condition
that the uids of i2 and i3 are your two patterns.
Then step through it to see if there is any snag that prevents the pattern
from being combined, or if not, look what pattern it generates.
Than add a matching pattern to your machine description.

> 3. One possible way for implementation is to use reaching definitions
> to propagate the sign extensions forward right before the uses. This will 
> create 
> opportunities for combine and gcse to do the rest of the work afterward.

Do you mean putting the sign extended values into new pseudo registers?
That seems to have about as much potential for harm as good, since it
can leave you with extra register-register copies, and you might
loose strength reduction unless you change the loop optimizer to grok
these new copies too.

> Another possible way is to extend gcse (but there are some issues that 
> we still need to clarify).

gcse works by computing the values in separate pseudos, thus creating new
register-register copies as discussed above.
> 
> Maybe there is a way to use your code (or part of it) ? 

Would you like a unidiff of all our patches against
gcc 3.4.0 20040414 (prerelease) ?
It's 736615 bytes raw, or 216022 bytes gzipped & uuencoded.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]