This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Implement smart multiple switch expansion algorithms.
- From: Martin Liška <mliska at suse dot cz>
- To: David Malcolm <dmalcolm at redhat dot com>, Jeff Law <law at redhat dot com>, gcc-patches at gcc dot gnu dot org
- Cc: Jan Hubicka <hubicka at ucw dot cz>
- Date: Fri, 29 Dec 2017 15:38:48 +0100
- Subject: Re: [PATCH] Implement smart multiple switch expansion algorithms.
- Authentication-results: sourceware.org; auth=none
- References: <4a6e1c28-6a18-0084-e602-57a0c259d676@suse.cz> <c0fc51df-4794-be1e-e139-578464623e57@redhat.com> <82167884-d00e-a87c-9f22-94f5cd6e834d@suse.cz> <1507310683.29092.13.camel@redhat.com>
On 10/06/2017 07:24 PM, David Malcolm wrote:
> On Fri, 2017-10-06 at 14:25 +0200, Martin Liška wrote:
>> Hello.
>>
>> As presented at this year's Cauldron, I rewrote current switch
>> expansion to support
>> multiple algorithms (jump table and bit-test) that can be used just
>> for a fraction
>> of cases. Balanced decision tree is built on top of that. I decided
>> to do a bigger
>> refactoring and put all there 3 mentioned algorithm to its own class.
>>
>> There's a bigger change in jump_table_cluster::can_be_handled where
>> the constant 10 (3 respectively)
>> is compared to number of handled values, and not number of cases.
>> Later one is wrong in my opinion.
>>
>> There are some numbers for cc1plus:
>>
>> $ bloaty ./objdir2/gcc/cc1plus -- ./objdir/gcc/cc1plus
>> VM SIZE FILE SIZE
>> ++++++++++++++ GROWING ++++++++++++++
>> +19% +1.32Mi .rodata +1.32Mi +19%
>> [ = ] 0 .symtab +7.38Ki +0.5%
>> [ = ] 0 .strtab +5.18Ki +0.2%
>> [ = ] 0 .debug_info +743 +0.0%
>> +0.0% +712 .eh_frame +712 +0.0%
>> +0.1% +440 .eh_frame_hdr +440 +0.1%
>> [ = ] 0 .debug_aranges +80 +0.1%
>> +0.0% +67 .dynstr +67 +0.0%
>> [ = ] 0 .debug_str +6 +0.0%
>>
>> -------------- SHRINKING --------------
>> -1.2% -214Ki .text -214Ki -1.2%
>> [ = ] 0 .debug_loc -66.7Ki -0.1%
>> [ = ] 0 .debug_line -14.0Ki -0.2%
>> [ = ] 0 .debug_ranges -9.56Ki -0.1%
>> -6.8% -3 [Unmapped] -375 -14.4%
>> [ = ] 0 .debug_abbrev -46 -0.0%
>>
>> +3.8% +1.11Mi TOTAL +1.03Mi +0.5%
>>
>> So it growth and can be easily explained:
>>
>> insn-attrtab.o:
>> VM SIZE FILE SIZE
>> ++++++++++++++ GROWING ++++++++++++++
>> [ = ] 0 .rela.rodata +2.00Mi +215%
>> +214% +682Ki .rodata +682Ki +214%
>> [ = ] 0 .rela.debug_loc +29.3Ki +36%
>> +32% +1.91Ki .eh_frame +1.91Ki +32%
>> [ = ] 0 .debug_loc +37 +5.6%
>> [ = ] 0 .debug_str +2 +0.0%
>>
>> -------------- SHRINKING --------------
>> -50.1% -63.3Ki .text -63.3Ki -50.1%
>> [ = ] 0 .debug_line -1.71Ki -10.4%
>> [ = ] 0 .rela.debug_info -768 -0.3%
>> [ = ] 0 .rela.text -624 -0.8%
>> [ = ] 0 .rela.debug_ranges -384 -2.7%
>> [ = ] 0 .debug_info -87 -0.2%
>> [ = ] 0 [Unmapped] -2 -8.7%
>>
>> +137% +621Ki TOTAL +2.63Mi +139%
>>
>> It's caused by:
>> ...
>> ;; GIMPLE switch case clusters: JT(2710):-1-5090
>> ;; GIMPLE switch case clusters: JT(2710):-1-5090
>> ;; GIMPLE switch case clusters: JT(2967):-1-5090
>> ;; GIMPLE switch case clusters: JT(1033):-1-5017
>> ...
>>
>> so there are many switch statements with very reasonable density.
>>
>> and
>> insn-dfatab.o: +13% +99.4Ki TOTAL +455Ki +14%
>> insn-latencytab.o: +19% +136Ki TOTAL +609Ki +20%
>>
>> There shouldn't be any fallout other from what I mentioned in
>> previous email that is
>> a prerequisite for this patch.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression
>> tests.
>>
>> I'm still planning to come with some numbers and stats. Will do that
>> next week.
>>
>> Martin
>
>
>> gcc/ChangeLog:
>>
>> 2017-09-29 Martin Liska <mliska@suse.cz>
>>
>> * tree-switch-conversion.c (MAX_CASE_BIT_TESTS): Remove and move
>> to header file.
>> (hoist_edge_and_branch_if_true): Move to ...
>> (bit_test_cluster::hoist_edge_and_branch_if_true): ... this.
>
> I'm not a reviewer, but there's a lot of "churn" in the patch due to
> things moving around; there seems to be a mixture of things simply
> moving around, functions becoming methods of classes, and some
> things changing.
>
> Would it be helpful to split the patch into two: a patch that moves
> the functions to where they'll be needed, and then a patch to do the
> "actual" (or functional) changes?
Hello.
Sorry for late answer. I fully agree what you wrote David. It would make
the patch more easily readable and reviewable. Let me postpone it to next
stage1, patch separation into 2 pieces will need some effort.
>
> [...snip...]
>
>> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
>> new file mode 100644
>> index 00000000000..45ae11f408d
>> --- /dev/null
>> +++ b/gcc/tree-switch-conversion.h
>> @@ -0,0 +1,806 @@
>> +/* Tree switch conversion for GNU compiler.
>> + Copyright (C) 2017 Free Software Foundation, Inc.
>
> Should this be "gimple switch conversion" and
> gimple-ssa-switch-conversion.h?
> (but presumably this is to mirror tree-switch-conversion.c, and
> I don't want to advocate for unnecessary churn)
>
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3. If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +#ifndef TREE_SWITCH_CONVERSION_H
>> +#define TREE_SWITCH_CONVERSION_H
>> +
>> +namespace tree_switch_conversion {
>> +
>> +/* Type of cluster. */
>> +
>> +enum cluster_type
>> +{
>> + SIMPLE_CASE,
>> + JUMP_TABLE,
>> + BIT_TEST
>> +};
>> +
>> +#define PRINT_CASE(f,c) print_dec (c, f, TYPE_SIGN (TREE_TYPE (c)))
>> +
>> +/* Base class for switch clustering. */
>
> Maybe "Abstract base class for representing a cluster of cases", or
> somesuch?
I like it!
> Perhaps an ASCII art inheritance diagram; if I'm reading the code
> right, something like:
>
> /* Abstract base class for representing a cluster of cases.
>
> Here is the inheritance hierarachy, and the enum_cluster_type
> values for the concrete subclasses:
>
> cluster
> |-simple_cluster (SIMPLE_CASE)
> `-group_cluster
> |-jump_table_cluster (JUMP_TABLE)
> `-bit_test_cluster (BIT_TEST). */
>
> Am I right in thinking that the switch initialization conversion
> (struct switch_conversion) is currently distinct from the clustering
> in this patch? (I'm wondering if there's possible benefit from
> combining the two approaches, maybe if one cluster of cases
> predominantly assigns to some variables, and another cluster of
> cases predominantly assigns to some other variables).
Exactly, you understand the algorithm well. Yep switch conversion should
be also incorporated as one of strategies for a cluster. Let me work on
that in next stage1.
Thanks also for another nit comment, I will include them.
Martin
>
>> +
>> +struct cluster
>> +{
>> + /* Cosntructor. */
>
> ^^^ Typo
>
>> + cluster ()
>> + {}
>
> When is this default ctor used?
> Doesn't it leave the various fields uninitialized?
>
> I notice that of the descendant classes, at least jump_table_cluster
> doesn't call its base class ctor (if I'm reading the patch right), so I
> believe these fields are uninitialized for some of the subclasses.
>
>> +
>> + /* Constructor. */
>> + cluster (tree case_label_expr, basic_block case_bb, profile_probability prob,
>> + profile_probability subtree_prob);
>> +
>
> [...snip...]
>
>> +struct simple_cluster: public cluster
>> +{
>
> Could this struct have a descriptive comment? e.g.
>
> /* Subclass of cluster representing a simple contiguous range
> from [low..high]. */
>
> or somesuch (assuming I've read the intent correctly)
>
>> + /* Constructor. */
>> + simple_cluster (tree low, tree high, tree case_label_expr,
>> + basic_block case_bb, profile_probability prob);
>> +
>> + /* Destructor. */
>> + virtual ~simple_cluster ()
>> + {}
>> +
>> + virtual cluster_type
>> + get_type ()
>> + {
>> + return SIMPLE_CASE;
>> + }
>
> I believe that the "virtual" here is redundant, but that it ought to say
> get_type () OVERRIDE FINAL
> so tell human readers that it's a vfunc override, and so that C++11
> onwards can warn if it's not actually a vfunc override.
>
>> + virtual tree
>> + get_low ()
>> + {
>> + return m_low;
>> + }
>> +
>> + virtual tree
>> + get_high ()
>> + {
>> + return m_high;
>> + }
>> +
>> + virtual void
>> + debug ()
>> + {
>> + dump (stderr);
>> + }
>> +
>> + virtual void
>> + dump (FILE *f)
>> + {
>> + PRINT_CASE (f, get_low ());
>> + if (get_low () != get_high ())
>> + {
>> + fprintf (f, "-");
>> + PRINT_CASE (f, get_high ());
>> + }
>> + fprintf (f, " ");
>> + }
>
> Likewise for all of these, I believe.
>
>> + /* Low value of the case. */
>> + tree m_low;
>> +
>> + /* High value of the case. */
>> + tree m_high;
>> +};
>> +
>> +simple_cluster::simple_cluster (tree low, tree high, tree case_label_expr,
>> + basic_block case_bb, profile_probability prob):
>> + cluster (case_label_expr, case_bb, prob, prob),
>> + m_low (low), m_high (high)
>> +{
>> +}
>
>> +struct group_cluster: public cluster
>> +{
>
> Again, could this subclass have a descriptive comment, something like.
>
> /* Abstract subclass of cluster, handling a collection of simple_cluster
> instances. */
>
> or somesuch (I believe it's abstract because get_type isn't implemented).
>
>> + /* Destructor. */
>> + virtual ~group_cluster ();
>> +
>> + virtual tree
>> + get_low ()
>> + {
>> + return m_cases[0]->get_low ();
>> + }
>> +
>> + virtual tree
>> + get_high ()
>> + {
>> + return m_cases[m_cases.length () - 1]->get_high ();
>> + }
>> +
>> + virtual void
>> + debug ()
>> + {
>> + dump (stderr);
>> + }
>> +
>> + virtual void
>> + dump (FILE *f)
>> + {
>> + unsigned total_values = 0;
>> + for (unsigned i = 0; i < m_cases.length (); i++)
>> + total_values += m_cases[i]->get_range (m_cases[i]->get_low (),
>> + m_cases[i]->get_high ());
>> +
>> + fprintf (f, "%s(%d):", get_type () == JUMP_TABLE ? "JT" : "BT",
>> + total_values);
>> + PRINT_CASE (f, get_low ());
>> + fprintf (f, "-");
>> + PRINT_CASE (f, get_high ());
>> + fprintf (f, " ");
>> + }
>
> As before, lose the "virtual" and add OVERRIDE FINAL for all of these,
> I believe.
>
>> +
>> + /* List of simple clusters handled by the group. */
>> + vec<simple_cluster *> m_cases;
>> +};
>> +
>> +struct jump_table_cluster: public group_cluster
>
> Suggested descriptive comment:
>
> /* Concrete subclass of group_cluster representing a collection
> of cases to be implemented as a jump table.
> The "emit" vfunc gernerates a nested switch statement which
> is later lowered to a jump table. */
>
> or somesuch.
>
>> +{
>> + /* Constructor. */
>> + jump_table_cluster (vec<cluster *> &clusters, unsigned start, unsigned end);
>> +
>> + virtual cluster_type
>> + get_type ()
>> + {
>> + return JUMP_TABLE;
>> + }
>> +
>> + virtual void emit (tree index_expr, tree index_type,
>> + tree default_label_expr, basic_block default_bb);
>
> As before, lose the "virtual" and add OVERRIDE FINAL for both of these
> (I think it matters much for "emit" given the non-trivial signature and the
> empty default implementation).
>
>> + /* Find jump tables of given CLUSTERS, where all members of the vector
>> + are of type simple_cluster. New clusters are returned. */
>> + static vec<cluster *> find_jump_tables (vec<cluster *> &clusters);
>> +
>> + /* Return true when cluster starting at START and ending at END (inclusive)
>> + can build a jump-table. */
>> + static bool can_be_handled (const vec<cluster *> &clusters, unsigned start,
>> + unsigned end);
>> +
>> + /* Return true if cluster starting at START and ending at END (inclusive)
>> + is profitable transformation. */
>> + static bool is_beneficial (const vec<cluster *> &clusters, unsigned start,
>> + unsigned end);
>> +
>> + /* Return the smallest number of different values for which it is best
>> + to use a jump-table instead of a tree of conditional branches. */
>> + static inline unsigned int case_values_threshold (void);
>> +};
>> +
>> +/* A GIMPLE switch statement can be expanded to a short sequence of bit-wise
>> +comparisons. "switch(x)" is converted into "if ((1 << (x-MINVAL)) & CST)"
>> +where CST and MINVAL are integer constants. This is better than a series
>> +of compare-and-banch insns in some cases, e.g. we can implement:
>> +
>> + if ((x==4) || (x==6) || (x==9) || (x==11))
>> +
>> +as a single bit test:
>> +
>> + if ((1<<x) & ((1<<4)|(1<<6)|(1<<9)|(1<<11)))
>> +
>> +This transformation is only applied if the number of case targets is small,
>> +if CST constains at least 3 bits, and "1 << x" is cheap. The bit tests are
>> +performed in "word_mode".
>> +
>> +The following example shows the code the transformation generates:
>> +
>> + int bar(int x)
>> + {
>> + switch (x)
>> + {
>> + case '0': case '1': case '2': case '3': case '4':
>> + case '5': case '6': case '7': case '8': case '9':
>> + case 'A': case 'B': case 'C': case 'D': case 'E':
>> + case 'F':
>> + return 1;
>> + }
>> + return 0;
>> + }
>> +
>> +==>
>> +
>> + bar (int x)
>> + {
>> + tmp1 = x - 48;
>> + if (tmp1 > (70 - 48)) goto L2;
>> + tmp2 = 1 << tmp1;
>> + tmp3 = 0b11111100000001111111111;
>> + if ((tmp2 & tmp3) != 0) goto L1 ; else goto L2;
>> + L1:
>> + return 1;
>> + L2:
>> + return 0;
>> + }
>> +
>> +TODO: There are still some improvements to this transformation that could
>> +be implemented:
>> +
>> +* A narrower mode than word_mode could be used if that is cheaper, e.g.
>> + for x86_64 where a narrower-mode shift may result in smaller code.
>> +
>> +* The compounded constant could be shifted rather than the one. The
>> + test would be either on the sign bit or on the least significant bit,
>> + depending on the direction of the shift. On some machines, the test
>> + for the branch would be free if the bit to test is already set by the
>> + shift operation.
>> +
>> +This transformation was contributed by Roger Sayle, see this e-mail:
>> + http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01950.html
>> +*/
>> +
>> +struct bit_test_cluster: public group_cluster
>> +{
>> + /* Constructor. */
>> + bit_test_cluster (vec<cluster *> &clusters, unsigned start, unsigned end);
>> +
>> + virtual cluster_type
>> + get_type ()
>> + {
>> + return BIT_TEST;
>> + }
>> +
>> +/* Expand a switch statement by a short sequence of bit-wise
>> + comparisons. "switch(x)" is effectively converted into
>> + "if ((1 << (x-MINVAL)) & CST)" where CST and MINVAL are
>> + integer constants.
>> +
>> + INDEX_EXPR is the value being switched on.
>> +
>> + MINVAL is the lowest case value of in the case nodes,
>> + and RANGE is highest value minus MINVAL. MINVAL and RANGE
>> + are not guaranteed to be of the same type as INDEX_EXPR
>> + (the gimplifier doesn't change the type of case label values,
>> + and MINVAL and RANGE are derived from those values).
>> + MAXVAL is MINVAL + RANGE.
>> +
>> + There *MUST* be max_case_bit_tests or less unique case
>> + node targets. */
>> + virtual void emit (tree index_expr, tree index_type,
>> + tree default_label_expr, basic_block default_bb);
>
> As before, lose the "virtual" and add OVERRIDE FINAL for both of these.
>
>> + /* Find bit tests of given CLUSTERS, where all members of the vector
>> + are of type simple_cluster. New clusters are returned. */
>> + static vec<cluster *> find_bit_tests (vec<cluster *> &clusters);
>> +
>> + /* Return true when cluster starting at START and ending at END (inclusive)
>> + can build a bit test. */
>> + static bool can_be_handled (const vec<cluster *> &clusters, unsigned start,
>> + unsigned end);
>> +
>> + /* Return true if cluster starting at START and ending at END (inclusive)
>> + is profitable transformation. */
>> + static bool is_beneficial (const vec<cluster *> &clusters, unsigned start,
>> + unsigned end);
>> +
>> +/* Split the basic block at the statement pointed to by GSIP, and insert
>> + a branch to the target basic block of E_TRUE conditional on tree
>> + expression COND.
>> +
>> + It is assumed that there is already an edge from the to-be-split
>> + basic block to E_TRUE->dest block. This edge is removed, and the
>> + profile information on the edge is re-used for the new conditional
>> + jump.
>> +
>> + The CFG is updated. The dominator tree will not be valid after
>> + this transformation, but the immediate dominators are updated if
>> + UPDATE_DOMINATORS is true.
>> +
>> + Returns the newly created basic block. */
>> + static basic_block hoist_edge_and_branch_if_true (gimple_stmt_iterator *gsip,
>> + tree cond,
>> + basic_block case_bb);
>> +
>> + /* Maximum number of different basic blocks that can be handlel by
>
> Typo: handlel -> handled
>
>> + a bit test. */
>> + static const int m_max_case_bit_tests = 3;
>> +};
>> +
>
> [...snip...]
>
> Hope this is constructive
> Dave
>