This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC] optabs and tree-codes for vector operations
- From: Dorit Naishlos <DORIT at il dot ibm dot com>
- To: Richard Henderson <rth at redhat dot com>
- Cc: Devang Patel <dpatel at apple dot com>, gcc at gcc dot gnu dot org, James E Wilson <wilson at specifixinc dot com>, Ayal Zaks <ZAKS at il dot ibm dot com>
- Date: Tue, 31 Aug 2004 18:36:47 +0300
- Subject: Re: [RFC] optabs and tree-codes for vector operations
> then everyone will assume that REALIGN_LOAD_EXPR really needs the
> initial value of addr, and not any random value containing the
> same low N bits. Which means that we *will* use two registers
> for this loop when only one is needed.
But will we be able to recognize later that the mask generation operation
eventually feeding the REALIGN_LOAD_EXPR is itself loop-invariant and can
be taken out of the loop (when there is an available register)? In the case
that the target has a builtin for creating a mask we have three options:
(1) Minimum register pressure, but leaving practically no chance for later
stages to figure out that the magic generation is loop invariant:
loop {
...
magic = CALL builtin (addr)
v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
addr += N;
...
}
(2) Minimizing life-range of variables, and exposing information that could
allow later stages to figure out that the magic generation is loop
invariant (not without a nontrivial analysis):
loop {
...
off = get_low_bits_of_address (addr)
magic = CALL builtin (off)
v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
addr += N;
...
}
(3) Generating loop invariant code out of the loop. Relying on register
allocation to take it into the loop (rematerialize) if necessary:
magic = CALL builtin (addr)
loop {
...
v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
addr += N;
...
}
I vote for (3), because (1) kills any chance to remove the loop invariant
code if register pressure turns out to be low, and (2) still requires non
trivial analysis at later stages that may not be simpler than
rematerialization. (This issue - loop-invariant-code-motion vs. excessive
register pressure - also came up in
http://gcc.gnu.org/ml/gcc/2004-08/msg00659.html. I think most people agreed
that reload/rematerialization should do the work?).
Maybe the following scheme is what we want:
if (target_has_builtin)
{
generate:
magic = CALL builtin (addr)
loop {
...
v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
addr += N;
...
}
}
else
{
generate:
loop {
...
v3 = REALIGN_LOAD_EXPR (v1, v2, addr);
addr += N;
...
}
}
> I will not allow a CALL_EXPR -- even to a builtin -- to be nested
> within REALIGN_LOAD_EXPR. None of the optimizers are set up to
> expect this sort of thing, and they shouldn't have to.
Sure. This is not what I meant. I didn't mean explicitely nesting the
CALL_EXPR in the GIMPLE representation of REALIGN_LOAD_EXPR, but that the
functionality of REALIGN_LOAD_EXPR could encapsulate also the mask
generation. Say a target has a builtin for generating a mask; we can
generate in GIMPLE:
1. mask = CALL builtin_create_mask (addr)
2. v3 = REALIGN_LOAD_EXPR (v1, v2, mask)
and then the RTL expander will expand it into (for altivec):
1. the RTL representation for 'vmsk = lvsr (-addr)'
2. the RTL representation for 'v3 = vperm (v1,v2,vmsk)'
Or, we can generate:
1. v3 = REALIGN_LOAD_EXPR (v1, v2, addr)
and then the RTL expander will expand it into the same sequence directly.
I think we want the first option.
thanks,
dorit
Richard Henderson
<rth@redhat.com> To: Dorit Naishlos/Haifa/IBM@IBMIL
cc: Devang Patel <dpatel@apple.com>, gcc@gcc.gnu.org, James E Wilson
30/08/2004 21:27 <wilson@specifixinc.com>, Ayal Zaks/Haifa/IBM@IBMIL
Subject: Re: [RFC] optabs and tree-codes for vector operations
On Mon, Aug 30, 2004 at 01:13:08PM +0300, Dorit Naishlos wrote:
> > It'll be much harder to reduce the register pressure by one after the
> > fact.
Meaning that if we write
magic = addr
loop {
...
v2 = ALIGNED_EXPR (addr);
v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
addr += N;
...
}
then everyone will assume that REALIGN_LOAD_EXPR really needs the
initial value of addr, and not any random value containing the
same low N bits. Which means that we *will* use two registers
for this loop when only one is needed.
> > > Can the builtin be encapsulated within REALIGN_LOAD_EXPR
> > > ? i.e - have a REALIGN_LOAD_EXPR(v1,v2,addr) that on some targets
will
> > > be expanded to {x=builtin(addr),smthing(v1,v2,x)} and on other
targets
> > > will be expanded directly to {smthing(v1,v2,addr)}?
> >
> > No. Gimple doesn't work that way.
>
> Can you please explain the above two comments?
I will not allow a CALL_EXPR -- even to a builtin -- to be nested
within REALIGN_LOAD_EXPR. None of the optimizers are set up to
expect this sort of thing, and they shouldn't have to.
r~