This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Merging mem-ssa into mainline

I will be merging mem-ssa into mainline over the next few days. The experiments that I have been doing show marked improvements in compile times over the set of programs that I have in my sandbox. The compiler should now be between 5% and 25% faster.

Here's a summary of the speedups I obtained with respect to mainline
@119145.  I am only including timings over the affected modules.  Many
passes are now faster, but this is mostly due to speedups in the
operand scanner and SSA updating.  Also, phases that show less than
0.5% change are not included:

Phase                           Before  After   % change
tree PTA                        16.39   15.35     -6.3%
tree alias analysis             12.33   11.75     -4.7%
tree SSA rewrite                 6.24    5.13    -17.8%
tree SSA incremental            15.79   12.09    -23.4%
tree operand scan               85.90   52.28    -39.1%
TOTAL                          476.43  437.47     -8.2%

DLV --- Phase Before After % change tree PTA 4.22 3.69 -12.6% tree SSA incremental 4.09 2.89 -29.3% tree operand scan 20.85 14.47 -30.6% TOTAL 101.39 91.69 -9.6%

MICO ---- Phase Before After % change tree SSA rewrite 4.56 3.13 -31.4% tree SSA incremental 10.00 5.93 -40.7% tree operand scan 90.19 50.24 -44.3% TOTAL 444.08 391.64 -11.8%

SPEC2000 -------- Phase Before After % change tree alias analysis 6.01 5.31 -11.6% tree SSA rewrite 4.54 2.66 -41.4% tree SSA incremental 9.47 5.08 -46.4% tree operand scan 49.93 43.83 -12.2% TOTAL 287.97 271.64 -5.7%

TRAMP3D ------- Phase Before After % change tree SSA rewrite 3.80 2.71 -28.7% tree SSA incremental 3.29 1.92 -41.6% tree operand scan 22.72 16.87 -25.7% expand 5.88 5.06 -13.9% TOTAL 88.78 77.68 -12.5%

VARIOUS (PR12850) ------- Phase Before After % change tree alias analysis 4.61 3.66 -20.6% tree SSA rewrite 4.77 3.10 -35.0% tree SSA incremental 5.85 1.87 -68.0% tree operand scan 54.29 26.98 -50.3% TOTAL 159.39 118.16 -25.9%

I am still doing performance testing, but a cursory look at tramp3d did not show regressions in performance, and the changes should not really affect performance. I will be doing more testing in this area next week.

The branch has accumulated many changes over the last few months and
there are several other changes that are not even in the branch.  In
this message I will describe all the changes that are coming over.  I
will be send the patches separately, but since all of them rely on
infrastructure modifications to virtual operands and aliasing
representation, it's best if they are committed in a single operation.

Removal of V_MUST_DEF ---------------------

This was a design mistake I made last year.  At the time I thought it
was useful to distinguish when a memory variable is being totally
written to.  Well, it isn't.  Supporting V_MUST_DEFs requires
additional support in the operand scanner and just about everywhere,
so this simplifies quite a few things.

We now have two virtual operators: VDEF and VUSE.

DSE for aggregates ------------------

Aldy added support to DSE for eliminating multiple stores that amount
to the clobbering of a whole aggregate.  A full description of his
patch in

Multi-operand support for VDEF and VUSE ---------------------------------------

Andrew added support for multiple operands in VDEFs and VUSEs.  With
this, it is possible to assign a single name to the clobbering of
several SSA names at the same time.  This is the basic idea behind
Memory SSA (  I
will be updating this document soon.

Tagging a memory store with a single name is attractive, because it
allows the compiler to generate exactly one name per store.  For
instance, if pointer p_3 is pointing to { a, b }:

	# a_1 = VDEF <>
	a = 9;

	# b_9 = VDEF <>
	b = 2;

	# .MEM_10 = VDEF <a_1, b_9>
	*p_3 = ...

	# VUSE <.MEM_10>
	x_3 = b;

The store to *p_3, generates a single name .MEM_10 which is then used
as the current definition for 'a', 'b' and '*p_3' itself.  This
provides the absolute minimum number of virtual operators needed to
represent aliasing.

However, in the initial merge, we will not be generating multi-operand
VDEFs.  It turns out that for some classes of memory optimizations
(including PRE), this complicates things.  The memory optimizer folks
weren't too keen on this, so I will figure out an efficient
mechanism to only use this outside this class of optimizations.  The
other alternative would be to make the appropriate changes in these
optimizers to support the factored stores.  But I still do not have an
overly compelling argument to go in that direction, so for now the
first option seems easiest.

New push/pop interface for modifying statements -----------------------------------------------

Whenever a pass needs to modify a statement, it calls
push_stmt_changes which saves the current state of the statement
operands.  After the pass has modified the statement, it calls
pop_stmt_changes which marks which symbols need to be renamed and
various other cleanup tasks.

By centralizing this, we can do things like minimizing the amount of
symbols we put in the renaming queue and have better control of what
needs to be cleaned up like EH info.

Support for arbitrary memory partitions ---------------------------------------

One of the side-effects of the Memory SSA design is that memory
symbols and the operands used to represent them are not necessarily
tied up to the names used in the virtual SSA form.  For instance, in
the example above, the name .MEM_10 is associated with the symbols 'a'
and 'b'.

In the extreme form of Memory SSA, this association between symbols
and names can be purely dynamic.  At every store, we can generate a
new association between the symbols stored by the statement and the
unique name generated on the LHS.  However, this can introduce
overlapping live ranges for the factoring SSA names, which leads to
the problem with some memory optimizers.

To avoid this, we now have the concept of Memory Partition Tags (MPT),
which are artificial memory tags that are created after alias
analysis.  An MPT is simply a symbol that represents a fixed set of
aliased or call-clobbered symbols.  There does not need to be an alias
or type relationship between an MPT and the symbols in its set.

The partitioning can be completely arbitrary, and it can even change
as long as this rule is followed: Given two partitions MPT.i and
MPT.j, they must not contain symbols in common.  From this, it follows
that given a symbol V, V may only belong to exactly one partition.

When the operand scanner finds symbol V in a memory expression, it
asks whether V belongs to a partition, if it does, then an operand is
added for V's partition.

The partitioning scheme can be altered at will.  Initially I
implemented one based on the number of members in the alias sets we
compute.  Since the number of aliased symbols in an alias set is
directly related to the number of virtual operators generated by a
statement, the idea for the default partition is very simple:

1- Sort alias sets by decreasing set size

	2- Create a partition for every set and add enough symbols
	   into the partition to reduce the set below a threshold.

3- Rewrite the set to use the partitions.

So, now instead of dealing with statements producing hundreds of
virtual operators, we get up to 10-15.  It's not necessarily the
smartest way, but it does get rid of all the call-clobbering and all
the symbols that are always accessed via pointer dereferences.  One
tweak that can be added is to not allow partitioning for symbols that
are referenced directly in the code (in the hope that optimizers can
do more with those).

Cleanups and speedup in alias analysis --------------------------------------

We were using some bitmaps during alias analysis that were better
converted to pointer sets.  That gave a bit of an improvement.  We
were also doing slow checks for duplicates in places like

That's about all the important bits that will be coming over from the
branch.  The patches that I will be posting have all been bootstrapped
and tested on x86, x86-64, ppc64 and ia64.

I expect to be committing these changes sometime next week.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]