This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Implement smart multiple switch expansion algorithms.


On 10/06/2017 07:24 PM, David Malcolm wrote:
> On Fri, 2017-10-06 at 14:25 +0200, Martin Liška wrote:
>> Hello.
>>
>> As presented at this year's Cauldron, I rewrote current switch
>> expansion to support
>> multiple algorithms (jump table and bit-test) that can be used just
>> for a fraction
>> of cases. Balanced decision tree is built on top of that. I decided
>> to do a bigger
>> refactoring and put all there 3 mentioned algorithm to its own class.
>>
>> There's a bigger change in jump_table_cluster::can_be_handled where
>> the constant 10 (3 respectively)
>> is compared to number of handled values, and not number of cases.
>> Later one is wrong in my opinion.
>>
>> There are some numbers for cc1plus:
>>
>> $ bloaty ./objdir2/gcc/cc1plus -- ./objdir/gcc/cc1plus
>>      VM SIZE                      FILE SIZE
>>  ++++++++++++++ GROWING        ++++++++++++++
>>    +19% +1.32Mi .rodata        +1.32Mi   +19%
>>   [ = ]       0 .symtab        +7.38Ki  +0.5%
>>   [ = ]       0 .strtab        +5.18Ki  +0.2%
>>   [ = ]       0 .debug_info       +743  +0.0%
>>   +0.0%    +712 .eh_frame         +712  +0.0%
>>   +0.1%    +440 .eh_frame_hdr     +440  +0.1%
>>   [ = ]       0 .debug_aranges     +80  +0.1%
>>   +0.0%     +67 .dynstr            +67  +0.0%
>>   [ = ]       0 .debug_str          +6  +0.0%
>>
>>  -------------- SHRINKING      --------------
>>   -1.2%  -214Ki .text           -214Ki  -1.2%
>>   [ = ]       0 .debug_loc     -66.7Ki  -0.1%
>>   [ = ]       0 .debug_line    -14.0Ki  -0.2%
>>   [ = ]       0 .debug_ranges  -9.56Ki  -0.1%
>>   -6.8%      -3 [Unmapped]        -375 -14.4%
>>   [ = ]       0 .debug_abbrev      -46  -0.0%
>>
>>   +3.8% +1.11Mi TOTAL          +1.03Mi  +0.5%
>>
>> So it growth and can be easily explained:
>>
>> insn-attrtab.o:
>>      VM SIZE                          FILE SIZE
>>  ++++++++++++++ GROWING            ++++++++++++++
>>   [ = ]       0 .rela.rodata       +2.00Mi  +215%
>>   +214%  +682Ki .rodata             +682Ki  +214%
>>   [ = ]       0 .rela.debug_loc    +29.3Ki   +36%
>>    +32% +1.91Ki .eh_frame          +1.91Ki   +32%
>>   [ = ]       0 .debug_loc             +37  +5.6%
>>   [ = ]       0 .debug_str              +2  +0.0%
>>
>>  -------------- SHRINKING          --------------
>>  -50.1% -63.3Ki .text              -63.3Ki -50.1%
>>   [ = ]       0 .debug_line        -1.71Ki -10.4%
>>   [ = ]       0 .rela.debug_info      -768  -0.3%
>>   [ = ]       0 .rela.text            -624  -0.8%
>>   [ = ]       0 .rela.debug_ranges    -384  -2.7%
>>   [ = ]       0 .debug_info            -87  -0.2%
>>   [ = ]       0 [Unmapped]              -2  -8.7%
>>
>>   +137%  +621Ki TOTAL              +2.63Mi  +139%
>>
>> It's caused by:
>> ...
>> ;; GIMPLE switch case clusters: JT(2710):-1-5090 
>> ;; GIMPLE switch case clusters: JT(2710):-1-5090 
>> ;; GIMPLE switch case clusters: JT(2967):-1-5090 
>> ;; GIMPLE switch case clusters: JT(1033):-1-5017 
>> ...
>>
>> so there are many switch statements with very reasonable density.
>>
>> and
>> insn-dfatab.o:    +13% +99.4Ki TOTAL               +455Ki   +14%
>> insn-latencytab.o:   +19%  +136Ki TOTAL               +609Ki   +20%
>>
>> There shouldn't be any fallout other from what I mentioned in
>> previous email that is
>> a prerequisite for this patch.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression
>> tests.
>>
>> I'm still planning to come with some numbers and stats. Will do that
>> next week.
>>
>> Martin
> 
> 
>> gcc/ChangeLog:
>>
>> 2017-09-29  Martin Liska  <mliska@suse.cz>
>>
>> 	* tree-switch-conversion.c (MAX_CASE_BIT_TESTS): Remove and move
>> 	to header file.
>> 	(hoist_edge_and_branch_if_true): Move to ...
>> 	(bit_test_cluster::hoist_edge_and_branch_if_true): ... this.
> 
> I'm not a reviewer, but there's a lot of "churn" in the patch due to
> things moving around; there seems to be a mixture of things simply
> moving around, functions becoming methods of classes, and some
> things changing.
> 
> Would it be helpful to split the patch into two: a patch that moves
> the functions to where they'll be needed, and then a patch to do the
> "actual" (or functional) changes?

Hello.

Sorry for late answer. I fully agree what you wrote David. It would make
the patch more easily readable and reviewable. Let me postpone it to next
stage1, patch separation into 2 pieces will need some effort.

> 
> [...snip...]
> 
>> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
>> new file mode 100644
>> index 00000000000..45ae11f408d
>> --- /dev/null
>> +++ b/gcc/tree-switch-conversion.h
>> @@ -0,0 +1,806 @@
>> +/* Tree switch conversion for GNU compiler.
>> +   Copyright (C) 2017 Free Software Foundation, Inc.
> 
> Should this be "gimple switch conversion" and
> gimple-ssa-switch-conversion.h?
> (but presumably this is to mirror tree-switch-conversion.c, and
> I don't want to advocate for unnecessary churn)
> 
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +#ifndef TREE_SWITCH_CONVERSION_H
>> +#define TREE_SWITCH_CONVERSION_H
>> +
>> +namespace tree_switch_conversion {
>> +
>> +/* Type of cluster.  */
>> +
>> +enum cluster_type
>> +{
>> +  SIMPLE_CASE,
>> +  JUMP_TABLE,
>> +  BIT_TEST
>> +};
>> +
>> +#define PRINT_CASE(f,c) print_dec (c, f, TYPE_SIGN (TREE_TYPE (c)))
>> +
>> +/* Base class for switch clustering.  */
> 
> Maybe "Abstract base class for representing a cluster of cases", or
> somesuch?

I like it!

> Perhaps an ASCII art inheritance diagram; if I'm reading the code
> right, something like:
> 
> /* Abstract base class for representing a cluster of cases.
> 
>    Here is the inheritance hierarachy, and the enum_cluster_type
>    values for the concrete subclasses:
> 
>    cluster
>    |-simple_cluster (SIMPLE_CASE)
>    `-group_cluster
>      |-jump_table_cluster (JUMP_TABLE)
>      `-bit_test_cluster   (BIT_TEST).  */
> 
> Am I right in thinking that the switch initialization conversion
> (struct switch_conversion) is currently distinct from the clustering
> in this patch?  (I'm wondering if there's possible benefit from
> combining the two approaches, maybe if one cluster of cases
> predominantly assigns to some variables, and another cluster of
> cases predominantly assigns to some other variables).

Exactly, you understand the algorithm well. Yep switch conversion should
be also incorporated as one of strategies for a cluster. Let me work on
that in next stage1.

Thanks also for another nit comment, I will include them.

Martin

> 
>> +
>> +struct cluster
>> +{
>> +  /* Cosntructor.  */
> 
>          ^^^ Typo
> 	 
>> +  cluster ()
>> +  {}
> 
> When is this default ctor used?
> Doesn't it leave the various fields uninitialized?
> 
> I notice that of the descendant classes, at least jump_table_cluster
> doesn't call its base class ctor (if I'm reading the patch right), so I
> believe these fields are uninitialized for some of the subclasses.
> 
>> +
>> +  /* Constructor.  */
>> +  cluster (tree case_label_expr, basic_block case_bb, profile_probability prob,
>> +	   profile_probability subtree_prob);
>> +
> 
> [...snip...]
> 
>> +struct simple_cluster: public cluster
>> +{
> 
> Could this struct have a descriptive comment? e.g.
> 
> /* Subclass of cluster representing a simple contiguous range
>    from [low..high].  */
> 
> or somesuch (assuming I've read the intent correctly)
> 
>> +  /* Constructor.  */
>> +  simple_cluster (tree low, tree high, tree case_label_expr,
>> +		  basic_block case_bb, profile_probability prob);
>> +
>> +  /* Destructor.  */
>> +  virtual ~simple_cluster ()
>> +  {}
>> +
>> +  virtual cluster_type
>> +  get_type ()
>> +  {
>> +    return SIMPLE_CASE;
>> +  }
> 
> I believe that the "virtual" here is redundant, but that it ought to say
>   get_type () OVERRIDE FINAL
> so tell human readers that it's a vfunc override, and so that C++11
> onwards can warn if it's not actually a vfunc override.
> 
>> +  virtual tree
>> +  get_low ()
>> +  {
>> +    return m_low;
>> +  }
>> +
>> +  virtual tree
>> +  get_high ()
>> +  {
>> +    return m_high;
>> +  }
>> +
>> +  virtual void
>> +  debug ()
>> +  {
>> +    dump (stderr);
>> +  }
>> +
>> +  virtual void
>> +  dump (FILE *f)
>> +  {
>> +    PRINT_CASE (f, get_low ());
>> +    if (get_low () != get_high ())
>> +      {
>> +	fprintf (f, "-");
>> +	PRINT_CASE (f, get_high ());
>> +      }
>> +    fprintf (f, " ");
>> +  }
> 
> Likewise for all of these, I believe.
> 
>> +  /* Low value of the case.  */
>> +  tree m_low;
>> +
>> +  /* High value of the case.  */
>> +  tree m_high;
>> +};
>> +
>> +simple_cluster::simple_cluster (tree low, tree high, tree case_label_expr,
>> +				basic_block case_bb, profile_probability prob):
>> +  cluster (case_label_expr, case_bb, prob, prob),
>> +  m_low (low), m_high (high)
>> +{
>> +}
> 
>> +struct group_cluster: public cluster
>> +{
> 
> Again, could this subclass have a descriptive comment, something like.
> 
> /* Abstract subclass of cluster, handling a collection of simple_cluster
>    instances.  */
> 
> or somesuch (I believe it's abstract because get_type isn't implemented).
> 
>> +  /* Destructor.  */
>> +  virtual ~group_cluster ();
>> +
>> +  virtual tree
>> +  get_low ()
>> +  {
>> +    return m_cases[0]->get_low ();
>> +  }
>> +
>> +  virtual tree
>> +  get_high ()
>> +  {
>> +    return m_cases[m_cases.length () - 1]->get_high ();
>> +  }
>> +
>> +  virtual void
>> +  debug ()
>> +  {
>> +    dump (stderr);
>> +  }
>> +
>> +  virtual void
>> +  dump (FILE *f)
>> +  {
>> +    unsigned total_values = 0;
>> +    for (unsigned i = 0; i < m_cases.length (); i++)
>> +      total_values += m_cases[i]->get_range (m_cases[i]->get_low (),
>> +					     m_cases[i]->get_high ());
>> +
>> +    fprintf (f, "%s(%d):", get_type () == JUMP_TABLE ? "JT" : "BT",
>> +	     total_values);
>> +    PRINT_CASE (f, get_low ());
>> +    fprintf (f, "-");
>> +    PRINT_CASE (f, get_high ());
>> +    fprintf (f, " ");
>> +  }
> 
> As before, lose the "virtual" and add OVERRIDE FINAL for all of these,
> I believe.
> 
>> +
>> +  /* List of simple clusters handled by the group.  */
>> +  vec<simple_cluster *> m_cases;
>> +};
>> +
>> +struct jump_table_cluster: public group_cluster
> 
> Suggested descriptive comment:
> 
> /* Concrete subclass of group_cluster representing a collection
>    of cases to be implemented as a jump table.
>    The "emit" vfunc gernerates a nested switch statement which
>    is later lowered to a jump table.  */
> 
> or somesuch.
> 
>> +{
>> +  /* Constructor.  */
>> +  jump_table_cluster (vec<cluster *> &clusters, unsigned start, unsigned end);
>> +
>> +  virtual cluster_type
>> +  get_type ()
>> +  {
>> +    return JUMP_TABLE;
>> +  }
>> +
>> +  virtual void emit (tree index_expr, tree index_type,
>> +		     tree default_label_expr, basic_block default_bb);
> 
> As before, lose the "virtual" and add OVERRIDE FINAL for both of these
> (I think it matters much for "emit" given the non-trivial signature and the
> empty default implementation).
> 
>> +  /* Find jump tables of given CLUSTERS, where all members of the vector
>> +     are of type simple_cluster.  New clusters are returned.  */
>> +  static vec<cluster *> find_jump_tables (vec<cluster *> &clusters);
>> +
>> +  /* Return true when cluster starting at START and ending at END (inclusive)
>> +     can build a jump-table.  */
>> +  static bool can_be_handled (const vec<cluster *> &clusters, unsigned start,
>> +			      unsigned end);
>> +
>> +  /* Return true if cluster starting at START and ending at END (inclusive)
>> +     is profitable transformation.  */
>> +  static bool is_beneficial (const vec<cluster *> &clusters, unsigned start,
>> +			     unsigned end);
>> +
>> +  /* Return the smallest number of different values for which it is best
>> +     to use a jump-table instead of a tree of conditional branches.  */
>> +  static inline unsigned int case_values_threshold (void);
>> +};
>> +
>> +/* A GIMPLE switch statement can be expanded to a short sequence of bit-wise
>> +comparisons.  "switch(x)" is converted into "if ((1 << (x-MINVAL)) & CST)"
>> +where CST and MINVAL are integer constants.  This is better than a series
>> +of compare-and-banch insns in some cases,  e.g. we can implement:
>> +
>> +	if ((x==4) || (x==6) || (x==9) || (x==11))
>> +
>> +as a single bit test:
>> +
>> +	if ((1<<x) & ((1<<4)|(1<<6)|(1<<9)|(1<<11)))
>> +
>> +This transformation is only applied if the number of case targets is small,
>> +if CST constains at least 3 bits, and "1 << x" is cheap.  The bit tests are
>> +performed in "word_mode".
>> +
>> +The following example shows the code the transformation generates:
>> +
>> +	int bar(int x)
>> +	{
>> +		switch (x)
>> +		{
>> +		case '0':  case '1':  case '2':  case '3':  case '4':
>> +		case '5':  case '6':  case '7':  case '8':  case '9':
>> +		case 'A':  case 'B':  case 'C':  case 'D':  case 'E':
>> +		case 'F':
>> +			return 1;
>> +		}
>> +		return 0;
>> +	}
>> +
>> +==>
>> +
>> +	bar (int x)
>> +	{
>> +		tmp1 = x - 48;
>> +		if (tmp1 > (70 - 48)) goto L2;
>> +		tmp2 = 1 << tmp1;
>> +		tmp3 = 0b11111100000001111111111;
>> +		if ((tmp2 & tmp3) != 0) goto L1 ; else goto L2;
>> +	L1:
>> +		return 1;
>> +	L2:
>> +		return 0;
>> +	}
>> +
>> +TODO: There are still some improvements to this transformation that could
>> +be implemented:
>> +
>> +* A narrower mode than word_mode could be used if that is cheaper, e.g.
>> +  for x86_64 where a narrower-mode shift may result in smaller code.
>> +
>> +* The compounded constant could be shifted rather than the one.  The
>> +  test would be either on the sign bit or on the least significant bit,
>> +  depending on the direction of the shift.  On some machines, the test
>> +  for the branch would be free if the bit to test is already set by the
>> +  shift operation.
>> +
>> +This transformation was contributed by Roger Sayle, see this e-mail:
>> +   http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01950.html
>> +*/
>> +
>> +struct bit_test_cluster: public group_cluster
>> +{
>> +  /* Constructor.  */
>> +  bit_test_cluster (vec<cluster *> &clusters, unsigned start, unsigned end);
>> +
>> +  virtual cluster_type
>> +  get_type ()
>> +  {
>> +    return BIT_TEST;
>> +  }
>> +
>> +/*  Expand a switch statement by a short sequence of bit-wise
>> +    comparisons.  "switch(x)" is effectively converted into
>> +    "if ((1 << (x-MINVAL)) & CST)" where CST and MINVAL are
>> +    integer constants.
>> +
>> +    INDEX_EXPR is the value being switched on.
>> +
>> +    MINVAL is the lowest case value of in the case nodes,
>> +    and RANGE is highest value minus MINVAL.  MINVAL and RANGE
>> +    are not guaranteed to be of the same type as INDEX_EXPR
>> +    (the gimplifier doesn't change the type of case label values,
>> +    and MINVAL and RANGE are derived from those values).
>> +    MAXVAL is MINVAL + RANGE.
>> +
>> +    There *MUST* be max_case_bit_tests or less unique case
>> +    node targets.  */
>> +  virtual void emit (tree index_expr, tree index_type,
>> +		     tree default_label_expr, basic_block default_bb);
> 
> As before, lose the "virtual" and add OVERRIDE FINAL for both of these.
> 
>> +  /* Find bit tests of given CLUSTERS, where all members of the vector
>> +     are of type simple_cluster.  New clusters are returned.  */
>> +  static vec<cluster *> find_bit_tests (vec<cluster *> &clusters);
>> +
>> +  /* Return true when cluster starting at START and ending at END (inclusive)
>> +     can build a bit test.  */
>> +  static bool can_be_handled (const vec<cluster *> &clusters, unsigned start,
>> +			      unsigned end);
>> +
>> +  /* Return true if cluster starting at START and ending at END (inclusive)
>> +     is profitable transformation.  */
>> +  static bool is_beneficial (const vec<cluster *> &clusters, unsigned start,
>> +			     unsigned end);
>> +
>> +/* Split the basic block at the statement pointed to by GSIP, and insert
>> +   a branch to the target basic block of E_TRUE conditional on tree
>> +   expression COND.
>> +
>> +   It is assumed that there is already an edge from the to-be-split
>> +   basic block to E_TRUE->dest block.  This edge is removed, and the
>> +   profile information on the edge is re-used for the new conditional
>> +   jump.
>> +
>> +   The CFG is updated.  The dominator tree will not be valid after
>> +   this transformation, but the immediate dominators are updated if
>> +   UPDATE_DOMINATORS is true.
>> +
>> +   Returns the newly created basic block.  */
>> +  static basic_block hoist_edge_and_branch_if_true (gimple_stmt_iterator *gsip,
>> +						    tree cond,
>> +						    basic_block case_bb);
>> +
>> +  /* Maximum number of different basic blocks that can be handlel by
> 
> Typo: handlel -> handled
> 
>> +     a bit test.  */
>> +  static const int m_max_case_bit_tests = 3;
>> +};
>> +
> 
> [...snip...]
> 
> Hope this is constructive
> Dave
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]