This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[tree-ssa][Bug optimization/12747] Scalar replacement ofaggregates (part 1)


This is a new pass that tries to replace structure references with
scalars so that they can be exposed to the scalar optimizers.  I believe
that Muchnik or Morgan explain it in more detail.  The pass should
address several PRs, including 12747, 12853, 12825, 6883, 6880 and
7061.  The patch still doesn't fix *all* of them because some other
cleanups are necessary that would've obfuscated this patch.

As an example, given:

bar ()
{
  struct Complex_i * factorpointer;
  struct Complex_i factor;

  factor.re = 123;;
  factor.im = 428;;
  Zi = factor;;
}

SRA converts the above into:

bar ()
{
  int SR.2;
  int SR.1;
  struct Complex_i factor;
  struct Complex_i * factorpointer;

  SR.1 = 123;
  SR.2 = 428;
  Zi.re = SR.1;
  Zi.im = SR.2;
}

So, now the scalar optimizers can propagate '123' and '428' into 'Zi'
and also completely remove references to 'factor'.

For PR 12747, we are still missing a few scalarizations because there is
a backend quirk related to removing the TREE_ADDRESSABLE bit from
structures which blocks SRA from scalarizing some structures.  In this
case, we have this input code:

void
copy (BitVector & DEST, BitVector & SRC, unsigned I)
{
    DEST[I] = SRC[I];
}

which after scalarization is optimized into

void copy(BitVector&, BitVector&, unsigned int) (DEST, SRC, I)
{
  <D1565>.m_bv = SRC;
  <D1565>.m_idx = I;

  <D1561>.m_bv = <D1565>.m_bv;
  <D1561>.m_idx = I;
  <D1572>.m_bv = DEST;
  <D1572>.m_idx = I;

  <D1560>.m_bv = <D1572>.m_bv;
  <D1560>.m_idx = I;
  T.3 = getBit (<D1561>.m_bv, I);
  <D1587> = (int)T.3;

  setBit (<D1560>.m_bv, <D1560>.m_idx, (int)(bool)<D1587>);
}

After I get TREE_ADDRESSABLE fixed for the remaining structures, we will
get:

void copy(BitVector&, BitVector&, unsigned int) (DEST, SRC, I)
{
  T.3 = getBit (SRC, I);
  <D1587> = (int)T.3;
  setBit (DEST, I, (int)(bool)<D1587>);
}

which is pretty good.

I've done some timings and SRA adds about 0.13 seconds to the
cc1/cc1plus components.  In SPEC it's either neutral or it improves
things a little:

			      No SRA   SRA
   164.gzip                     676    690	+2%
   175.vpr                      415    414	-0.24%
   181.mcf                      412    412	0%
   186.crafty                   614    615	0%
   197.parser                   553    560	+1.2%
   253.perlbmk                  787    811	+3%
   254.gap                      726    726	0%
   255.vortex                   856    855	-0.1%
   256.bzip2                    529    531	0%
   300.twolf                    532    531	0%	
   Est. SPECint_base2000        593
   Est. SPECint2000                    596

I expect SRA to reduce virtual operands somewhat, but it creates more
scalar assignments, so it should balance out.  Something tells me
register pressure may be a problem, but I haven't noticed anything too
nasty.

In any case, this is not the complete fix, there are structures that
could be scalarized that currently aren't.  I'd like folks interested in
this to check it out with their favourite programs and let me know how
things go.

Bootstrapped and tested on alpha, x86, amd64 and ia64.


Diego.

Attachment: 20031120-sra.diff.gz
Description: GNU Zip compressed data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]