This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] optabs and tree-codes for vector operations


> then everyone will assume that REALIGN_LOAD_EXPR really needs the
> initial value of addr, and not any random value containing the
> same low N bits.  Which means that we *will* use two registers
> for this loop when only one is needed.

But will we be able to recognize later that the mask generation operation
eventually feeding the REALIGN_LOAD_EXPR is itself loop-invariant and can
be taken out of the loop (when there is an available register)? In the case
that the target has a builtin for creating a mask we have three options:

(1) Minimum register pressure, but leaving practically no chance for later
stages to figure out that the magic generation is loop invariant:

             loop {
               ...
               magic = CALL builtin (addr)
               v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
               addr += N;
               ...
             }

(2) Minimizing life-range of variables, and exposing information that could
allow later stages to figure out that the magic generation is loop
invariant (not without a nontrivial analysis):

             loop {
               ...
               off = get_low_bits_of_address (addr)
               magic = CALL builtin (off)
               v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
               addr += N;
               ...
             }

(3) Generating loop invariant code out of the loop. Relying on register
allocation to take it into the loop (rematerialize) if necessary:

             magic = CALL builtin (addr)
             loop {
               ...
               v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
               addr += N;
               ...
             }

I vote for (3), because (1) kills any chance to remove the loop invariant
code if register pressure turns out to be low, and (2) still requires non
trivial analysis at later stages that may not be simpler than
rematerialization. (This issue - loop-invariant-code-motion vs. excessive
register pressure - also came up in
http://gcc.gnu.org/ml/gcc/2004-08/msg00659.html. I think most people agreed
that reload/rematerialization should do the work?).

Maybe the following scheme is what we want:

if (target_has_builtin)
  {
    generate:
             magic = CALL builtin (addr)
             loop {
               ...
               v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
               addr += N;
               ...
             }
  }
else
  {
    generate:
             loop {
               ...
               v3 = REALIGN_LOAD_EXPR (v1, v2, addr);
               addr += N;
               ...
             }
  }


> I will not allow a CALL_EXPR -- even to a builtin -- to be nested
> within REALIGN_LOAD_EXPR.  None of the optimizers are set up to
> expect this sort of thing, and they shouldn't have to.

Sure. This is not what I meant. I didn't mean explicitely nesting the
CALL_EXPR in the GIMPLE representation of REALIGN_LOAD_EXPR, but that the
functionality of REALIGN_LOAD_EXPR could encapsulate also the mask
generation. Say a target has a builtin for generating a mask; we can
generate in GIMPLE:
      1. mask = CALL builtin_create_mask (addr)
      2. v3 = REALIGN_LOAD_EXPR (v1, v2, mask)
and then the RTL expander will expand it into (for altivec):
      1. the RTL representation for 'vmsk = lvsr (-addr)'
      2. the RTL representation for 'v3 = vperm (v1,v2,vmsk)'

Or, we can generate:
      1. v3 = REALIGN_LOAD_EXPR (v1, v2, addr)
and then the RTL expander will expand it into the same sequence directly.

I think we want the first option.


thanks,
dorit



                                                                                                                                     
                      Richard Henderson                                                                                              
                      <rth@redhat.com>         To:       Dorit Naishlos/Haifa/IBM@IBMIL                                              
                                               cc:       Devang Patel <dpatel@apple.com>, gcc@gcc.gnu.org, James E Wilson            
                      30/08/2004 21:27          <wilson@specifixinc.com>, Ayal Zaks/Haifa/IBM@IBMIL                                  
                                               Subject:  Re: [RFC] optabs and tree-codes for vector operations                       
                                                                                                                                     




On Mon, Aug 30, 2004 at 01:13:08PM +0300, Dorit Naishlos wrote:
> > It'll be much harder to reduce the register pressure by one after the
> > fact.

Meaning that if we write

             magic = addr
             loop {
               ...
               v2 = ALIGNED_EXPR (addr);
               v3 = REALIGN_LOAD_EXPR (v1, v2, magic);
               addr += N;
               ...
             }

then everyone will assume that REALIGN_LOAD_EXPR really needs the
initial value of addr, and not any random value containing the
same low N bits.  Which means that we *will* use two registers
for this loop when only one is needed.


> > > Can the builtin be encapsulated within REALIGN_LOAD_EXPR
> > > ? i.e - have a REALIGN_LOAD_EXPR(v1,v2,addr) that on some targets
will
> > > be  expanded to {x=builtin(addr),smthing(v1,v2,x)} and on other
targets
> > > will be expanded directly to {smthing(v1,v2,addr)}?
> >
> > No.  Gimple doesn't work that way.
>
> Can you please explain the above two comments?

I will not allow a CALL_EXPR -- even to a builtin -- to be nested
within REALIGN_LOAD_EXPR.  None of the optimizers are set up to
expect this sort of thing, and they shouldn't have to.


r~




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]