RF[CA]: Don't restrict stack slot sharing

Wed Jun 6 15:16:00 GMT 2012

On Wed, Jun 6, 2012 at 2:43 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 29 May 2012, Richard Guenther wrote:
>
>> > [... patch about making conflict handling for stack slot sharing
>> > faster...]
>>
>> The volatile handling is very odd - the objects_must_conflict_p code
>> basically says that two volatile vars may always share stack slots?!
>> What's the reasoning for this, or handling volatiles special in any way?
>> I'd rather remove this fishy code (and if at all, ensure that volatile
>> automatics are never coalesced with any other stack slot).
>
> After some pondering and tries I propose something even more aggressive:
> disable the restriction on stack slot sharing alltogether.  The current
> code tries to avoid sharing slots for decls that don't conflict in their
> alias sets.  The thought behind that is that such accesses could be
> reordered by any transformation (on the basis of non-conflicting alias
> sets), which is a problem if those accesses actually go to the same
> memory.
>
> Now, since this code was invented some TLC went into the RTL
> disambiguators.  In particular write_dependence_1 (and hence anti_ and
> output_dependece) don't use TBAA anymore (see below for a caveat).
> true_dependence_1 still does, and it's correct to do so.
>
> As TBAA isn't used in the dependence testers that could create problems
> (namely swapping a write-write or a read-write pair) we also don't need to
> restrict the stack sharing based on TBAA.  We still will happily reorder a
> write-read pair (i.e. read-after-write), but that's okay because for this
> to have a bad effect the initial code must have been invalid already (in
> our mem model a read must be done with a matching type like the last write
> to that memory area).
>
> I verified this with some tweaks on the scheduler on x86(-64).  I disabled
> the stack slot sharing restrictions, then I forced a first scheduling pass
> with -O2, and I implemented a shuffle mode that reorders everything it
> can:
>
> Index: common/config/i386/i386-common.c
> ===================================================================
> --- common/config/i386/i386-common.c    (revision 187959)
> +++ common/config/i386/i386-common.c    (working copy)
> @@ -618,7 +618,8 @@ static const struct default_options ix86
>     { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
>     /* Turn off -fschedule-insns by default.  It tends to make the
>        problem with not enough registers even worse.  */
> -    { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> +    /*{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },*/
> +    { OPT_LEVELS_2_PLUS, OPT_fschedule_insns, NULL, 1 },
>
>  #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
>     SUBTARGET_OPTIMIZATION_OPTIONS,
> Index: haifa-sched.c
> ===================================================================
> --- haifa-sched.c       (revision 187959)
> +++ haifa-sched.c       (working copy)
> @@ -2436,6 +2436,9 @@ rank_for_schedule (const void *x, const
>   /* Make sure that priority of TMP and TMP2 are initialized.  */
>   gcc_assert (INSN_PRIORITY_KNOWN (tmp) && INSN_PRIORITY_KNOWN (tmp2));
>
> +  if (!reload_completed)
> +    return INSN_LUID (tmp2) - INSN_LUID (tmp);
> +
>   if (sched_pressure != SCHED_PRESSURE_NONE)
>     {
>       int diff;
>
> The only regression with this change on a regstrap on x86_64-linux with
> and without -m32 is gcc.target/i386/pr49095.c, which already fails just
> with -fschedule-insns and no patches (unexpected input to the peephole
> patterns disable the optimization that is checked in this testcase via asm
> string matching).
>
> The caveat: there's a pre-existing problem in the RTL disambiguators where
> nonoverlapping_component_refs_p still uses a sort of TBAA (component_ref
> paths), not based on alias sets.  This problem wasn't worked around with
> the stack sharing restrictions anyway, so it's untouched by this proposal.
>
> What I propose is hence the below patch.
>
> (Btw, what about adding this shuffle schedule mode as general testing
> mean, perhaps under an uck-me flag?)
>
> Regstrapped this patch (all languages+Ada) on x86_64-linux, with and
> without the above scheduler hacks, no regressions (without the scheduler
> hacks).

The n_temp_slots_in_use part is ok.

The rest is also a good idea, and indeed the middle-end type-based
memory-model makes sharing slots always possible (modulo bugs,
like the nonoverlapping_component_refs_p code - which should simply
be removed).

Thus, ok for the rest, too, after waiting a while for others to chime in.

Thanks,
Richard.

>
> Ciao,
> Michael.
> --------------------------
>        PR middle-end/38474
>        * cfgexpand.c (add_alias_set_conflicts): Remove.
>        (expand_used_vars): Don't call it.
>        (fini_vars_expansion): Clear nonconflicting_type.
>        * function.c (n_temp_slots_in_use): New static data.
>        (make_slot_available, assign_stack_temp_for_type): Update it.
>        (init_temp_slots): Zero it.
>        (remove_unused_temp_slot_addresses): Use it for quicker removal.
>        (remove_unused_temp_slot_addresses_1): Use htab_clear_slot.
>
> Index: cfgexpand.c
> ===================================================================
> --- cfgexpand.c.orig    2012-05-29 16:24:46.000000000 +0200
> +++ cfgexpand.c 2012-06-06 14:30:33.000000000 +0200
> @@ -353,47 +353,6 @@ aggregate_contains_union_type (tree type
>   return false;
>  }
>
> -/* A subroutine of expand_used_vars.  If two variables X and Y have alias
> -   sets that do not conflict, then do add a conflict for these variables
> -   in the interference graph.  We also need to make sure to add conflicts
> -   for union containing structures.  Else RTL alias analysis comes along
> -   and due to type based aliasing rules decides that for two overlapping
> -   union temporaries { short s; int i; } accesses to the same mem through
> -   different types may not alias and happily reorders stores across
> -   life-time boundaries of the temporaries (See PR25654).  */
> -
> -static void
> -add_alias_set_conflicts (void)
> -{
> -  size_t i, j, n = stack_vars_num;
> -
> -  for (i = 0; i < n; ++i)
> -    {
> -      tree type_i = TREE_TYPE (stack_vars[i].decl);
> -      bool aggr_i = AGGREGATE_TYPE_P (type_i);
> -      bool contains_union;
> -
> -      contains_union = aggregate_contains_union_type (type_i);
> -      for (j = 0; j < i; ++j)
> -       {
> -         tree type_j = TREE_TYPE (stack_vars[j].decl);
> -         bool aggr_j = AGGREGATE_TYPE_P (type_j);
> -         if (aggr_i != aggr_j
> -             /* Either the objects conflict by means of type based
> -                aliasing rules, or we need to add a conflict.  */
> -             || !objects_must_conflict_p (type_i, type_j)
> -             /* In case the types do not conflict ensure that access
> -                to elements will conflict.  In case of unions we have
> -                to be careful as type based aliasing rules may say
> -                access to the same memory does not conflict.  So play
> -                safe and add a conflict in this case when
> -                 -fstrict-aliasing is used.  */
> -              || (contains_union && flag_strict_aliasing))
> -           add_stack_var_conflict (i, j);
> -       }
> -    }
> -}
> -
>  /* Callback for walk_stmt_ops.  If OP is a decl touched by add_stack_var
>    enter its partition number into bitmap DATA.  */
>
> @@ -1626,10 +1585,6 @@ expand_used_vars (void)
>   if (stack_vars_num > 0)
>     {
>       add_scope_conflicts ();
> -      /* Due to the way alias sets work, no variables with non-conflicting
> -        alias sets may be assigned the same address.  Add conflicts to
> -        reflect this.  */
> -      add_alias_set_conflicts ();
>
>       /* If stack protection is enabled, we don't share space between
>         vulnerable data and non-vulnerable data.  */
> Index: function.c
> ===================================================================
> --- function.c.orig     2012-05-29 16:42:31.000000000 +0200
> +++ function.c  2012-05-29 16:45:33.000000000 +0200
> @@ -572,6 +572,7 @@ struct GTY(()) temp_slot {
>  /* A table of addresses that represent a stack slot.  The table is a mapping
>    from address RTXen to a temp slot.  */
>  static GTY((param_is(struct temp_slot_address_entry))) htab_t temp_slot_address_table;
> +static size_t n_temp_slots_in_use;
>
>  /* Entry for the above hash table.  */
>  struct GTY(()) temp_slot_address_entry {
> @@ -648,6 +649,7 @@ make_slot_available (struct temp_slot *t
>   insert_slot_to_list (temp, &avail_temp_slots);
>   temp->in_use = 0;
>   temp->level = -1;
> +  n_temp_slots_in_use--;
>  }
>
>  /* Compute the hash value for an address -> temp slot mapping.
> @@ -700,7 +702,7 @@ remove_unused_temp_slot_addresses_1 (voi
>   const struct temp_slot_address_entry *t;
>   t = (const struct temp_slot_address_entry *) *slot;
>   if (! t->temp_slot->in_use)
> -    *slot = NULL;
> +    htab_clear_slot (temp_slot_address_table, slot);
>   return 1;
>  }
>
> @@ -708,9 +710,13 @@ remove_unused_temp_slot_addresses_1 (voi
>  static void
>  remove_unused_temp_slot_addresses (void)
>  {
> -  htab_traverse (temp_slot_address_table,
> -                remove_unused_temp_slot_addresses_1,
> -                NULL);
> +  /* Use quicker clearing if there aren't any active temp slots.  */
> +  if (n_temp_slots_in_use)
> +    htab_traverse (temp_slot_address_table,
> +                  remove_unused_temp_slot_addresses_1,
> +                  NULL);
> +  else
> +    htab_empty (temp_slot_address_table);
>  }
>
>  /* Find the temp slot corresponding to the object at address X.  */
> @@ -902,6 +908,7 @@ assign_stack_temp_for_type (enum machine
>   p->in_use = 1;
>   p->type = type;
>   p->level = temp_slot_level;
> +  n_temp_slots_in_use++;
>
>   pp = temp_slots_at_level (p->level);
>   insert_slot_to_list (p, pp);
> @@ -1213,6 +1220,7 @@ init_temp_slots (void)
>   avail_temp_slots = 0;
>   used_temp_slots = 0;
>   temp_slot_level = 0;
> +  n_temp_slots_in_use = 0;
>
>   /* Set up the table to map addresses to temp slots.  */
>   if (! temp_slot_address_table)