RF[CA]: Don't restrict stack slot sharing
Richard Guenther
richard.guenther@gmail.com
Wed Jun 6 15:16:00 GMT 2012
On Wed, Jun 6, 2012 at 2:43 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 29 May 2012, Richard Guenther wrote:
>
>> > [... patch about making conflict handling for stack slot sharing
>> > faster...]
>>
>> The volatile handling is very odd - the objects_must_conflict_p code
>> basically says that two volatile vars may always share stack slots?!
>> What's the reasoning for this, or handling volatiles special in any way?
>> I'd rather remove this fishy code (and if at all, ensure that volatile
>> automatics are never coalesced with any other stack slot).
>
> After some pondering and tries I propose something even more aggressive:
> disable the restriction on stack slot sharing alltogether. The current
> code tries to avoid sharing slots for decls that don't conflict in their
> alias sets. The thought behind that is that such accesses could be
> reordered by any transformation (on the basis of non-conflicting alias
> sets), which is a problem if those accesses actually go to the same
> memory.
>
> Now, since this code was invented some TLC went into the RTL
> disambiguators. In particular write_dependence_1 (and hence anti_ and
> output_dependece) don't use TBAA anymore (see below for a caveat).
> true_dependence_1 still does, and it's correct to do so.
>
> As TBAA isn't used in the dependence testers that could create problems
> (namely swapping a write-write or a read-write pair) we also don't need to
> restrict the stack sharing based on TBAA. We still will happily reorder a
> write-read pair (i.e. read-after-write), but that's okay because for this
> to have a bad effect the initial code must have been invalid already (in
> our mem model a read must be done with a matching type like the last write
> to that memory area).
>
> I verified this with some tweaks on the scheduler on x86(-64). I disabled
> the stack slot sharing restrictions, then I forced a first scheduling pass
> with -O2, and I implemented a shuffle mode that reorders everything it
> can:
>
> Index: common/config/i386/i386-common.c
> ===================================================================
> --- common/config/i386/i386-common.c (revision 187959)
> +++ common/config/i386/i386-common.c (working copy)
> @@ -618,7 +618,8 @@ static const struct default_options ix86
> { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> /* Turn off -fschedule-insns by default. It tends to make the
> problem with not enough registers even worse. */
> - { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> + /*{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },*/
> + { OPT_LEVELS_2_PLUS, OPT_fschedule_insns, NULL, 1 },
>
> #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
> SUBTARGET_OPTIMIZATION_OPTIONS,
> Index: haifa-sched.c
> ===================================================================
> --- haifa-sched.c (revision 187959)
> +++ haifa-sched.c (working copy)
> @@ -2436,6 +2436,9 @@ rank_for_schedule (const void *x, const
> /* Make sure that priority of TMP and TMP2 are initialized. */
> gcc_assert (INSN_PRIORITY_KNOWN (tmp) && INSN_PRIORITY_KNOWN (tmp2));
>
> + if (!reload_completed)
> + return INSN_LUID (tmp2) - INSN_LUID (tmp);
> +
> if (sched_pressure != SCHED_PRESSURE_NONE)
> {
> int diff;
>
> The only regression with this change on a regstrap on x86_64-linux with
> and without -m32 is gcc.target/i386/pr49095.c, which already fails just
> with -fschedule-insns and no patches (unexpected input to the peephole
> patterns disable the optimization that is checked in this testcase via asm
> string matching).
>
> The caveat: there's a pre-existing problem in the RTL disambiguators where
> nonoverlapping_component_refs_p still uses a sort of TBAA (component_ref
> paths), not based on alias sets. This problem wasn't worked around with
> the stack sharing restrictions anyway, so it's untouched by this proposal.
>
> What I propose is hence the below patch.
>
> (Btw, what about adding this shuffle schedule mode as general testing
> mean, perhaps under an uck-me flag?)
>
> Regstrapped this patch (all languages+Ada) on x86_64-linux, with and
> without the above scheduler hacks, no regressions (without the scheduler
> hacks).
The n_temp_slots_in_use part is ok.
The rest is also a good idea, and indeed the middle-end type-based
memory-model makes sharing slots always possible (modulo bugs,
like the nonoverlapping_component_refs_p code - which should simply
be removed).
Thus, ok for the rest, too, after waiting a while for others to chime in.
Thanks,
Richard.
>
> Ciao,
> Michael.
> --------------------------
> PR middle-end/38474
> * cfgexpand.c (add_alias_set_conflicts): Remove.
> (expand_used_vars): Don't call it.
> (fini_vars_expansion): Clear nonconflicting_type.
> * function.c (n_temp_slots_in_use): New static data.
> (make_slot_available, assign_stack_temp_for_type): Update it.
> (init_temp_slots): Zero it.
> (remove_unused_temp_slot_addresses): Use it for quicker removal.
> (remove_unused_temp_slot_addresses_1): Use htab_clear_slot.
>
> Index: cfgexpand.c
> ===================================================================
> --- cfgexpand.c.orig 2012-05-29 16:24:46.000000000 +0200
> +++ cfgexpand.c 2012-06-06 14:30:33.000000000 +0200
> @@ -353,47 +353,6 @@ aggregate_contains_union_type (tree type
> return false;
> }
>
> -/* A subroutine of expand_used_vars. If two variables X and Y have alias
> - sets that do not conflict, then do add a conflict for these variables
> - in the interference graph. We also need to make sure to add conflicts
> - for union containing structures. Else RTL alias analysis comes along
> - and due to type based aliasing rules decides that for two overlapping
> - union temporaries { short s; int i; } accesses to the same mem through
> - different types may not alias and happily reorders stores across
> - life-time boundaries of the temporaries (See PR25654). */
> -
> -static void
> -add_alias_set_conflicts (void)
> -{
> - size_t i, j, n = stack_vars_num;
> -
> - for (i = 0; i < n; ++i)
> - {
> - tree type_i = TREE_TYPE (stack_vars[i].decl);
> - bool aggr_i = AGGREGATE_TYPE_P (type_i);
> - bool contains_union;
> -
> - contains_union = aggregate_contains_union_type (type_i);
> - for (j = 0; j < i; ++j)
> - {
> - tree type_j = TREE_TYPE (stack_vars[j].decl);
> - bool aggr_j = AGGREGATE_TYPE_P (type_j);
> - if (aggr_i != aggr_j
> - /* Either the objects conflict by means of type based
> - aliasing rules, or we need to add a conflict. */
> - || !objects_must_conflict_p (type_i, type_j)
> - /* In case the types do not conflict ensure that access
> - to elements will conflict. In case of unions we have
> - to be careful as type based aliasing rules may say
> - access to the same memory does not conflict. So play
> - safe and add a conflict in this case when
> - -fstrict-aliasing is used. */
> - || (contains_union && flag_strict_aliasing))
> - add_stack_var_conflict (i, j);
> - }
> - }
> -}
> -
> /* Callback for walk_stmt_ops. If OP is a decl touched by add_stack_var
> enter its partition number into bitmap DATA. */
>
> @@ -1626,10 +1585,6 @@ expand_used_vars (void)
> if (stack_vars_num > 0)
> {
> add_scope_conflicts ();
> - /* Due to the way alias sets work, no variables with non-conflicting
> - alias sets may be assigned the same address. Add conflicts to
> - reflect this. */
> - add_alias_set_conflicts ();
>
> /* If stack protection is enabled, we don't share space between
> vulnerable data and non-vulnerable data. */
> Index: function.c
> ===================================================================
> --- function.c.orig 2012-05-29 16:42:31.000000000 +0200
> +++ function.c 2012-05-29 16:45:33.000000000 +0200
> @@ -572,6 +572,7 @@ struct GTY(()) temp_slot {
> /* A table of addresses that represent a stack slot. The table is a mapping
> from address RTXen to a temp slot. */
> static GTY((param_is(struct temp_slot_address_entry))) htab_t temp_slot_address_table;
> +static size_t n_temp_slots_in_use;
>
> /* Entry for the above hash table. */
> struct GTY(()) temp_slot_address_entry {
> @@ -648,6 +649,7 @@ make_slot_available (struct temp_slot *t
> insert_slot_to_list (temp, &avail_temp_slots);
> temp->in_use = 0;
> temp->level = -1;
> + n_temp_slots_in_use--;
> }
>
> /* Compute the hash value for an address -> temp slot mapping.
> @@ -700,7 +702,7 @@ remove_unused_temp_slot_addresses_1 (voi
> const struct temp_slot_address_entry *t;
> t = (const struct temp_slot_address_entry *) *slot;
> if (! t->temp_slot->in_use)
> - *slot = NULL;
> + htab_clear_slot (temp_slot_address_table, slot);
> return 1;
> }
>
> @@ -708,9 +710,13 @@ remove_unused_temp_slot_addresses_1 (voi
> static void
> remove_unused_temp_slot_addresses (void)
> {
> - htab_traverse (temp_slot_address_table,
> - remove_unused_temp_slot_addresses_1,
> - NULL);
> + /* Use quicker clearing if there aren't any active temp slots. */
> + if (n_temp_slots_in_use)
> + htab_traverse (temp_slot_address_table,
> + remove_unused_temp_slot_addresses_1,
> + NULL);
> + else
> + htab_empty (temp_slot_address_table);
> }
>
> /* Find the temp slot corresponding to the object at address X. */
> @@ -902,6 +908,7 @@ assign_stack_temp_for_type (enum machine
> p->in_use = 1;
> p->type = type;
> p->level = temp_slot_level;
> + n_temp_slots_in_use++;
>
> pp = temp_slots_at_level (p->level);
> insert_slot_to_list (p, pp);
> @@ -1213,6 +1220,7 @@ init_temp_slots (void)
> avail_temp_slots = 0;
> used_temp_slots = 0;
> temp_slot_level = 0;
> + n_temp_slots_in_use = 0;
>
> /* Set up the table to map addresses to temp slots. */
> if (! temp_slot_address_table)
More information about the Gcc-patches
mailing list