This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: Fix part of PR38474, stack var partitioning slowness


On Wed, Dec 2, 2009 at 8:00 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> the testcase in PR38474 has the property that it contains zillions of
> local struct variables of size == 32, and in addition that there are many
> temporary structs of the same type generated, in their individual blocks.
> Due to the size (>=32) they are added to the defer list even at -O0, hence
> run into the inherent quadraticness. ?We can't just turn on the threshold
> size because even at -O0 we don't want to make the stack explode, and in
> addition a similar testcase could then be constructed with just larger
> structs.
>
> In the testcase many of the stack slots can indeed be shared, another
> reason to not fiddle with the threshold.
>
> Profiling reveals that it's not the N*N/2 walk over stack vars in order to
> possibly union them that is slow, but instead union_stack_vars itself.
> Looking at it that's no wonder, because in order to transfer the conflicts
> from B to A it does:
>
> ?for x in all stack_vars
> ? ?if x conflicts with B
> ? ? ?add_conflict (A, x)
>
> So doing one union is O(N), not O(nr-of-conflicts). ?Ugh. ?The patch
> simply rewrites the whole conflict graph handling. ?Instead of a global
> lower triangular form each stack var now has a bitmap of all conflicting
> stack vars. ?As the original global array used a char per conflict, these
> bitmaps will use a quarter of the memory even in a completely filled
> conflict graph.
>
> With that we can simply walk over only the conflicts, and in addition even
> free them after transfering. ?We only ever ask about conflicts for
> representatives, hence we don't need to add the conflict numbers themself
> for transfer, but their representatives, making the bitmaps even smaller
> (that's also important for the testcase because there are many unions).
>
> As we never ask about a self-conflict we don't need to create them,
> despite the comment saying so (maybe it was the case once). ?And
> account_used_vars_for_block creates conflicts for no reason as we don't
> actually try to merge stack vars for estimating the stack size.
>
> With an -O0 compiler and the testcase from comment #23 (the reduced
> version of the original one), this is before the patch:
>
> ?expand : 24.62 (64%) usr ? 0.16 (48%) sys ?24.79 (64%) wall ?111892 kB ggc
> ?TOTAL ?: 38.56 ? ? ? ? ? ? 0.33 ? ? ? ? ? ?38.89 ? ? ? ? ? ? 206741 kB
>
> and this is after:
>
> ?expand : 14.70 (52%) usr ? 0.09 (45%) sys ?14.80 (51%) wall ?111892 kB ggc
> ?TOTAL ?: 28.48 ? ? ? ? ? ? 0.20 ? ? ? ? ? ?28.77 ? ? ? ? ? ? 206741 kB
>
> The gprof profile looks like so (also -O0, but adds -pg to cfgexpand.o
> compile, hence the time includes mcount overhead), before patch:
>
> ?17.26 ?6.39 ? ? 6.39 534371077 ? ? 0.00 ? ? 0.00 stack_var_conflict_p
> ?16.92 12.65 ? ? 6.26 ? ? ? ? ? ? ? ? ? ? ? ? ? ? htab_traverse_noresize
> ?11.46 16.89 ? ? 4.24 791452876 ? ? 0.00 ? ? 0.00 triangular_index
> ?5.97 19.10 ? ? 2.21 ? ? ? ?23 ? ?96.09 ? ?96.09 add_alias_set_conflicts
> ?3.92 20.55 ? ? 1.45 ? ? 24297 ? ? 0.06 ? ? 0.55 union_stack_vars
> ?3.49 21.84 ? ? 1.29 257079877 ? ? 0.00 ? ? 0.00 add_stack_var_conflict
> ...
> ? ? ? ? ? ? ? ?0.03 ? 13.36 ? ? ?23/23 ? ? ? ? ?expand_used_vars [3]
> [4] ? ? 36.2 ? ?0.03 ? 13.36 ? ? ?23 ? ? ? ? partition_stack_vars [4]
> ? ? ? ? ? ? ? ?1.45 ? 11.89 ? 24297/24297 ? ? ? union_stack_vars [5]
> ? ? ? ? ? ? ? ?0.01 ? ?0.00 ?756961/534371077 ? ? stack_var_conflict_p
> [6]
> -----------------------------------------------
> ? ? ? ? ? ? ? ?1.45 ? 11.89 ? 24297/24297 ? ? ? partition_stack_vars [4]
> [5] ? ? 36.1 ? ?1.45 ? 11.89 ? 24297 ? ? ? ? union_stack_vars [5]
> ? ? ? ? ? ? ? ?6.38 ? ?2.86 533614116/534371077 stack_var_conflict_p [6]
> ? ? ? ? ? ? ? ?1.29 ? ?1.37 256322391/257079877 add_stack_var_conflict [9]
>
> after patch:
> ?26.61 ?6.14 ? ? 6.14 ? ? ? ? ? ? ? ? ? ? ? ? ? ? htab_traverse_noresize
> ?9.80 ?8.40 ? ? 2.26 ? ? ? 23 ? ?98.26 ? ?98.26 ?add_alias_set_conflicts
> ...
> ? ? ? ? ? ? ? ?0.05 ? ?0.13 ? ? ?23/23 ? ? ? ? ?expand_used_vars [4]
> [19] ? ? 0.7 ? ?0.05 ? ?0.13 ? ? ?23 ? ? ? ? partition_stack_vars [19]
> ? ? ? ? ? ? ? ?0.07 ? ?0.06 ? 24297/24297 ? ? ? union_stack_vars [28]
> ? ? ? ? ? ? ? ?0.00 ? ?0.00 ?756961/756961 ? ? ?stack_var_conflict_p
> [446]
> -----------------------------------------------
> ? ? ? ? ? ? ? ?0.07 ? ?0.06 ? 24297/24297 ? ? ? partition_stack_vars [19]
> [28] ? ? 0.5 ? ?0.07 ? ?0.06 ? 24297 ? ? ? ? union_stack_vars [28]
> ? ? ? ? ? ? ? ?0.04 ? ?0.00 4392046/4392046 ? ? bmp_iter_set [83]
> ? ? ? ? ? ? ? ?0.02 ? ?0.00 4367757/5100421 ? ? add_stack_var_conflict
> [136]
> ? ? ? ? ? ? ? ?0.00 ? ?0.00 4367757/4367757 ? ? bmp_iter_next [444]
> ? ? ? ? ? ? ? ?0.00 ? ?0.00 ? 24289/24289 ? ? ? bmp_iter_set_init [488]
>
> Regstrapped on x86_64-linux, all langs+Ada, no regressions. ?Okay for
> trunk?

Ok.

Thanks,
Richard.

>
> Ciao,
> Michael.
> --
> ? ? ? ?PR middle-end/38474
> ? ? ? ?* cfgexpand.c (struct stack_var): Add conflicts member.
> ? ? ? ?(stack_vars_conflict, stack_vars_conflict_alloc,
> ? ? ? ?n_stack_vars_conflict): Remove.
> ? ? ? ?(add_stack_var): Initialize conflicts member.
> ? ? ? ?(triangular_index, resize_stack_vars_conflict): Remove.
> ? ? ? ?(add_stack_var_conflict, stack_var_conflict_p): Rewrite in
> ? ? ? ?terms of new member.
> ? ? ? ?(union_stack_vars): Only run over the conflicts.
> ? ? ? ?(partition_stack_vars): Remove special case.
> ? ? ? ?(expand_used_vars_for_block): Don't call resize_stack_vars_conflict,
> ? ? ? ?don't create self-conflicts.
> ? ? ? ?(account_used_vars_for_block): Don't create any conflicts.
> ? ? ? ?(fini_vars_expansion): Free bitmaps, don't free or clear removed
> ? ? ? ?globals.
>
> Index: cfgexpand.c
> ===================================================================
> --- cfgexpand.c (revision 154872)
> +++ cfgexpand.c (working copy)
> @@ -200,6 +200,9 @@ struct stack_var
>
> ? /* The next stack variable in the partition, or EOC. ?*/
> ? size_t next;
> +
> + ?/* The numbers of conflicting stack variables. ?*/
> + ?bitmap conflicts;
> ?};
>
> ?#define EOC ?((size_t)-1)
> @@ -213,12 +216,6 @@ static size_t stack_vars_num;
> ? ?is non-decreasing. ?*/
> ?static size_t *stack_vars_sorted;
>
> -/* We have an interference graph between such objects. ?This graph
> - ? is lower triangular. ?*/
> -static bool *stack_vars_conflict;
> -static size_t stack_vars_conflict_alloc;
> -static size_t n_stack_vars_conflict;
> -
> ?/* The phase of the stack frame. ?This is the known misalignment of
> ? ?virtual_stack_vars_rtx from PREFERRED_STACK_BOUNDARY. ?That is,
> ? ?(frame_offset+frame_phase) % PREFERRED_STACK_BOUNDARY == 0. ?*/
> @@ -320,58 +317,28 @@ add_stack_var (tree decl)
> ? stack_vars[stack_vars_num].representative = stack_vars_num;
> ? stack_vars[stack_vars_num].next = EOC;
>
> + ?/* All variables initially conflict with no other. ?*/
> + ?stack_vars[stack_vars_num].conflicts = NULL;
> +
> ? /* Ensure that this decl doesn't get put onto the list twice. ?*/
> ? set_rtl (decl, pc_rtx);
>
> ? stack_vars_num++;
> ?}
>
> -/* Compute the linear index of a lower-triangular coordinate (I, J). ?*/
> -
> -static size_t
> -triangular_index (size_t i, size_t j)
> -{
> - ?if (i < j)
> - ? ?{
> - ? ? ?size_t t;
> - ? ? ?t = i, i = j, j = t;
> - ? ?}
> -
> - ?if (i & 1)
> - ? ?return ((i + 1) / 2) * i + j;
> - ?else
> - ? ?return (i / 2) * (i + 1) + j;
> -}
> -
> -/* Ensure that STACK_VARS_CONFLICT is large enough for N objects. ?*/
> -
> -static void
> -resize_stack_vars_conflict (size_t n)
> -{
> - ?size_t size = triangular_index (n-1, n-1) + 1;
> -
> - ?if (size <= stack_vars_conflict_alloc)
> - ? ?{
> - ? ? ?if (n > n_stack_vars_conflict)
> - ? ? ? fatal_error ("program is too large to be compiled on this machine");
> - ? ? ?return;
> - ? ?}
> -
> - ?stack_vars_conflict = XRESIZEVEC (bool, stack_vars_conflict, size);
> - ?memset (stack_vars_conflict + stack_vars_conflict_alloc, 0,
> - ? ? ? ? (size - stack_vars_conflict_alloc) * sizeof (bool));
> - ?stack_vars_conflict_alloc = size;
> - ?n_stack_vars_conflict = n;
> -}
> -
> ?/* Make the decls associated with luid's X and Y conflict. ?*/
>
> ?static void
> ?add_stack_var_conflict (size_t x, size_t y)
> ?{
> - ?size_t index = triangular_index (x, y);
> - ?gcc_assert (index < stack_vars_conflict_alloc);
> - ?stack_vars_conflict[index] = true;
> + ?struct stack_var *a = &stack_vars[x];
> + ?struct stack_var *b = &stack_vars[y];
> + ?if (!a->conflicts)
> + ? ?a->conflicts = BITMAP_ALLOC (NULL);
> + ?if (!b->conflicts)
> + ? ?b->conflicts = BITMAP_ALLOC (NULL);
> + ?bitmap_set_bit (a->conflicts, y);
> + ?bitmap_set_bit (b->conflicts, x);
> ?}
>
> ?/* Check whether the decls associated with luid's X and Y conflict. ?*/
> @@ -379,9 +346,11 @@ add_stack_var_conflict (size_t x, size_t
> ?static bool
> ?stack_var_conflict_p (size_t x, size_t y)
> ?{
> - ?size_t index = triangular_index (x, y);
> - ?gcc_assert (index < stack_vars_conflict_alloc);
> - ?return stack_vars_conflict[index];
> + ?struct stack_var *a = &stack_vars[x];
> + ?struct stack_var *b = &stack_vars[y];
> + ?if (!a->conflicts || !b->conflicts)
> + ? ?return false;
> + ?return bitmap_bit_p (a->conflicts, y);
> ?}
>
> ?/* Returns true if TYPE is or contains a union type. ?*/
> @@ -626,6 +595,9 @@ static void
> ?union_stack_vars (size_t a, size_t b, HOST_WIDE_INT offset)
> ?{
> ? size_t i, last;
> + ?struct stack_var *vb = &stack_vars[b];
> + ?bitmap_iterator bi;
> + ?unsigned u;
>
> ? /* Update each element of partition B with the given offset,
> ? ? ?and merge them into partition A. ?*/
> @@ -642,9 +614,12 @@ union_stack_vars (size_t a, size_t b, HO
> ? ? stack_vars[a].alignb = stack_vars[b].alignb;
>
> ? /* Update the interference graph and merge the conflicts. ?*/
> - ?for (last = stack_vars_num, i = 0; i < last; ++i)
> - ? ?if (stack_var_conflict_p (b, i))
> - ? ? ?add_stack_var_conflict (a, i);
> + ?if (vb->conflicts)
> + ? ?{
> + ? ? ?EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> + ? ? ? add_stack_var_conflict (a, stack_vars[u].representative);
> + ? ? ?BITMAP_FREE (vb->conflicts);
> + ? ?}
> ?}
>
> ?/* A subroutine of expand_used_vars. ?Binpack the variables into
> @@ -679,15 +654,6 @@ partition_stack_vars (void)
>
> ? qsort (stack_vars_sorted, n, sizeof (size_t), stack_var_size_cmp);
>
> - ?/* Special case: detect when all variables conflict, and thus we can't
> - ? ? do anything during the partitioning loop. ?It isn't uncommon (with
> - ? ? C code at least) to declare all variables at the top of the function,
> - ? ? and if we're not inlining, then all variables will be in the same scope.
> - ? ? Take advantage of very fast libc routines for this scan. ?*/
> - ?gcc_assert (sizeof(bool) == sizeof(char));
> - ?if (memchr (stack_vars_conflict, false, stack_vars_conflict_alloc) == NULL)
> - ? ?return;
> -
> ? for (si = 0; si < n; ++si)
> ? ? {
> ? ? ? size_t i = stack_vars_sorted[si];
> @@ -1084,15 +1050,13 @@ expand_used_vars_for_block (tree block,
> ? /* Since we do not track exact variable lifetimes (which is not even
> ? ? ?possible for variables whose address escapes), we mirror the block
> ? ? ?tree in the interference graph. ?Here we cause all variables at this
> - ? ? level, and all sublevels, to conflict. ?Do make certain that a
> - ? ? variable conflicts with itself. ?*/
> + ? ? level, and all sublevels, to conflict. ?*/
> ? if (old_sv_num < this_sv_num)
> ? ? {
> ? ? ? new_sv_num = stack_vars_num;
> - ? ? ?resize_stack_vars_conflict (new_sv_num);
>
> ? ? ? for (i = old_sv_num; i < new_sv_num; ++i)
> - ? ? ? for (j = i < this_sv_num ? i+1 : this_sv_num; j-- > old_sv_num ;)
> + ? ? ? for (j = i < this_sv_num ? i : this_sv_num; j-- > old_sv_num ;)
> ? ? ? ? ?add_stack_var_conflict (i, j);
> ? ? }
> ?}
> @@ -1260,37 +1224,18 @@ create_stack_guard (void)
> ?static HOST_WIDE_INT
> ?account_used_vars_for_block (tree block, bool toplevel)
> ?{
> - ?size_t i, j, old_sv_num, this_sv_num, new_sv_num;
> ? tree t;
> ? HOST_WIDE_INT size = 0;
>
> - ?old_sv_num = toplevel ? 0 : stack_vars_num;
> -
> ? /* Expand all variables at this level. ?*/
> ? for (t = BLOCK_VARS (block); t ; t = TREE_CHAIN (t))
> ? ? if (TREE_USED (t))
> ? ? ? size += expand_one_var (t, toplevel, false);
>
> - ?this_sv_num = stack_vars_num;
> -
> ? /* Expand all variables at containing levels. ?*/
> ? for (t = BLOCK_SUBBLOCKS (block); t ; t = BLOCK_CHAIN (t))
> ? ? size += account_used_vars_for_block (t, false);
>
> - ?/* Since we do not track exact variable lifetimes (which is not even
> - ? ? possible for variables whose address escapes), we mirror the block
> - ? ? tree in the interference graph. ?Here we cause all variables at this
> - ? ? level, and all sublevels, to conflict. ?Do make certain that a
> - ? ? variable conflicts with itself. ?*/
> - ?if (old_sv_num < this_sv_num)
> - ? ?{
> - ? ? ?new_sv_num = stack_vars_num;
> - ? ? ?resize_stack_vars_conflict (new_sv_num);
> -
> - ? ? ?for (i = old_sv_num; i < new_sv_num; ++i)
> - ? ? ? for (j = i < this_sv_num ? i+1 : this_sv_num; j-- > old_sv_num ;)
> - ? ? ? ? add_stack_var_conflict (i, j);
> - ? ?}
> ? return size;
> ?}
>
> @@ -1315,13 +1260,13 @@ init_vars_expansion (void)
> ?static void
> ?fini_vars_expansion (void)
> ?{
> + ?size_t i, n = stack_vars_num;
> + ?for (i = 0; i < n; i++)
> + ? ?BITMAP_FREE (stack_vars[i].conflicts);
> ? XDELETEVEC (stack_vars);
> ? XDELETEVEC (stack_vars_sorted);
> - ?XDELETEVEC (stack_vars_conflict);
> ? stack_vars = NULL;
> ? stack_vars_alloc = stack_vars_num = 0;
> - ?stack_vars_conflict = NULL;
> - ?stack_vars_conflict_alloc = 0;
> ?}
>
> ?/* Make a fair guess for the size of the stack frame of the current
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]