This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Implement switch statements with bit tests
- From: Roger Sayle <roger at www dot eyesopen dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Andi Kleen <ak at suse dot de>, Jan Hubicka <jh at suse dot cz>
- Date: Thu, 23 Jan 2003 19:49:30 -0700 (MST)
- Subject: [PATCH] Implement switch statements with bit tests
The following patch allows GCC to implement switch/case statements
using a short sequence of bit comparison operations. The original
idea was suggested to to me by Andi Kleen and Jan Hubicka of SuSe.
The theory is that for small dense switch statements with few
unique targets, it is possible to implement "switch(x)" with
comparisons of the form "if ((1<<x) & CST)", where CST is a
suitable integer constant.
To be slightly more precise the generated RTL looks like:
t1 = x - MINVAL
if (t1 > RANGE)
goto DEFAULT
t2 = 1 << t1
if (t2 & CST1)
goto CASE1
if (t2 & CST2)
goto CASE2
if (t2 & CST3)
goto CASE3
goto DEFAULT
Here MINVAL is the lowest case value in the switch, RANGE is
the value MAXVAL - MINVAL, i.e. the highest case value - MINVAL,
DEFAULT is the default case target. To work RANGE must be less
than the number of bits in SImode, i.e. 31 or less on typical
32-bit architectures.
The implementation below allows atmost three unique non-default case
targets, as longer sequences are potentially better implemented
by tablejumps. Similarly, we require several bits to be set
in order to benefit from this approach. In the patch below,
we'll select this strategy if there is a single target with atleast
three bits set, two targets with atleast five bits set or three
targets with atleast 6 bits set.
In an anaylsis of the 5414 switch statements encountered by GCC
during stages 2 and 3 of a bootstrap on i686-pc-cygwin, 236 or
about 4.35% meet the required criteria. For example, 3786 have
a range < 32, 812 of these branched to a only single target or
the default, but only 60 of these had three of more cases values
that branched to that single target.
As a performance optimization, the case with the most bits set
is tested first, then if there is more than one bit-test, by
decreasing bit/case counts.
As another optimization, we also try to avoid the subtraction
by MINVAL, if MAXVAL is less than the number of bits in SImode.
In the same switch analysis above, 130 of the 236 eligible
switch statements could avoid the subtraction.
As an example,
int foo(int x)
{
switch (x)
{
case 4:
case 6:
case 9:
case 11:
return 30;
}
return 31;
}
generates the following code with -O2 on i686-pc-cygwin
_foo: pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
cmpl $8, $ecx
ja L2
movl $1, $eax
movl $30, $edx
sall $cl, $eax
testl $274, $eax
jne L1
L2:
movl $31, %edx
L1:
popl %ebp
movl %edx, %eax
ret
And as another example,
int bar(int x)
{
switch (x)
{
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
case 'A': case 'B': case 'C': case 'D': case 'E':
case 'F':
return 1;
}
return 0;
}
which produces
bar: pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
subl $48, %ecx
cmpl $22, %ecx
ja L7
movl $1, %eax
movl $1, %edx
sall %cl, %eax
testl $8258559, %eax
jne L6
L7:
xorl %edx, %edx
L6:
popl %ebp
movl %edx, %eax
ret
The one caveat to the above approach is that this is only a win
if the target provides an efficient instruction to perform 1<<x.
To account for this, I introduce a new target macro that allows
backends to disable this implementation strategy. The default
value of CASE_USE_BIT_TESTS checks the optabs to see if the
target .md defines an ashlsi3 pattern. This is a heuristic, as
I suspect there may be targets that define ashlsi3 but actually
implement it via an inlined loop or a library call. On these
targets, the patch below will not affect correctness, but may
negatively impact performance unless the target's .h file is
tweaked to contain the line:
#define CASE_USE_BIT_TESTS 0
[Is there a better way to test for an efficient left shift insn?]
The following patch has been tested by a complete bootstrap, all
languages except Ada and treelang, on i686-pc-linux-gnu with no
new regressions. It has also been bootstrapped on i686-pc-cygwin.
Ok for mainline?
2003-01-23 Roger Sayle <roger@eyesopen.com>
* stmt.c (emit_case_bit_tests): New routine to implement suitable
switch statements using the equivalent of "if ((1<<x) & cst) ... ".
(case_bit_test_cmp): New comparison function for "qsort" to order
case_bit_tests by decreasing number of destination nodes.
(expand_end_case_type): Use emit_case_bit_tests to implement
suitable switch statments.
(CASE_USE_BIT_TESTS): New target macro to disable the above.
* Makefile.in (stmt.o): Add dependency on optab.h.
* doc/tm.texi (CASE_USE_BIT_TESTS): Document new target macro.
Index: stmt.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/stmt.c,v
retrieving revision 1.286
diff -c -3 -p -r1.286 stmt.c
*** stmt.c 21 Jan 2003 20:52:44 -0000 1.286
--- stmt.c 22 Jan 2003 06:09:14 -0000
*************** Software Foundation, 59 Temple Place - S
*** 56,61 ****
--- 56,62 ----
#include "ggc.h"
#include "langhooks.h"
#include "predict.h"
+ #include "optabs.h"
/* Assume that case vectors are not pc-relative. */
#ifndef CASE_VECTOR_PC_RELATIVE
*************** static void do_jump_if_equal PARAMS ((r
*** 420,425 ****
--- 421,429 ----
static int estimate_case_costs PARAMS ((case_node_ptr));
static bool same_case_target_p PARAMS ((rtx, rtx));
static void strip_default_case_nodes PARAMS ((case_node_ptr *, rtx));
+ static int case_bit_test_cmp PARAMS ((const void *, const void *));
+ static void emit_case_bit_tests PARAMS ((tree, tree, tree, tree,
+ case_node_ptr, rtx));
static void group_case_nodes PARAMS ((case_node_ptr));
static void balance_case_nodes PARAMS ((case_node_ptr *,
case_node_ptr));
*************** check_for_full_enumeration_handling (typ
*** 5180,5185 ****
--- 5184,5319 ----
}
+ /* Maximum number of case bit tests. */
+ #define MAX_CASE_BIT_TESTS 3
+
+ /* By default, enable case bit tests on targets with ashlsi3. */
+ #ifndef CASE_USE_BIT_TESTS
+ #define CASE_USE_BIT_TESTS (ashl_optab->handlers[SImode].insn_code \
+ != CODE_FOR_nothing)
+ #endif
+
+
+ /* A case_bit_test represents a set of case nodes that may be
+ selected from using a bit-wise comparison. HI and LO hold
+ the integer to be tested against, LABEL contains the label
+ to jump to upon success and BITS counts the number of case
+ nodes handled by this test, typically the number of bits
+ set in HI:LO. */
+
+ struct case_bit_test
+ {
+ HOST_WIDE_INT hi;
+ HOST_WIDE_INT lo;
+ rtx label;
+ int bits;
+ };
+
+ /* Comparison function for qsort to order bit tests by decreasing
+ number of case nodes, i.e. the node with the most cases gets
+ tested first. */
+
+ static int case_bit_test_cmp (p1, p2)
+ const void *p1;
+ const void *p2;
+ {
+ const struct case_bit_test *d1 = p1;
+ const struct case_bit_test *d2 = p2;
+
+ return d2->bits - d1->bits;
+ }
+
+ /* Expand a switch statement by a short sequence of bit-wise
+ comparisons. "switch(x)" is effectively converted into
+ "if ((1 << (x-MINVAL)) & CST)" where CST and MINVAL are
+ integer constants.
+
+ INDEX_EXPR is the value being switched on, which is of
+ type INDEX_TYPE. MINVAL is the lowest case value of in
+ the case nodes, of INDEX_TYPE type, and RANGE is highest
+ value minus MINVAL, also of type INDEX_TYPE. NODES is
+ the set of case nodes, and DEFAULT_LABEL is the label to
+ branch to should none of the cases match.
+
+ There *MUST* be MAX_CASE_BIT_TESTS or less unique case
+ node targets. */
+
+ static void
+ emit_case_bit_tests (index_type, index_expr, minval, range,
+ nodes, default_label)
+ tree index_type, index_expr, minval, range;
+ case_node_ptr nodes;
+ rtx default_label;
+ {
+ struct case_bit_test test[MAX_CASE_BIT_TESTS];
+ enum machine_mode mode;
+ rtx expr, index, label;
+ unsigned int i,j,lo,hi;
+ struct case_node *n;
+ unsigned int count;
+
+ count = 0;
+ for (n = nodes; n; n = n->right)
+ {
+ label = label_rtx (n->code_label);
+ for (i = 0; i < count; i++)
+ if (same_case_target_p (label, test[i].label))
+ break;
+
+ if (i == count)
+ {
+ if (count >= MAX_CASE_BIT_TESTS)
+ abort ();
+ test[i].hi = 0;
+ test[i].lo = 0;
+ test[i].label = label;
+ test[i].bits = 1;
+ count++;
+ }
+ else
+ test[i].bits++;
+
+ lo = tree_low_cst (fold (build (MINUS_EXPR, index_type,
+ n->low, minval)), 1);
+ hi = tree_low_cst (fold (build (MINUS_EXPR, index_type,
+ n->high, minval)), 1);
+ for (j = lo; j <= hi; j++)
+ if (j >= HOST_BITS_PER_WIDE_INT)
+ test[i].hi |= (HOST_WIDE_INT) 1 << (j - HOST_BITS_PER_INT);
+ else
+ test[i].lo |= (HOST_WIDE_INT) 1 << j;
+ }
+
+ qsort (test, count, sizeof(*test), case_bit_test_cmp);
+
+ index_expr = fold (build (MINUS_EXPR, index_type,
+ convert (index_type, index_expr),
+ convert (index_type, minval)));
+ index = expand_expr (index_expr, NULL_RTX, VOIDmode, 0);
+ emit_queue ();
+ index = protect_from_queue (index, 0);
+ do_pending_stack_adjust ();
+
+ mode = TYPE_MODE (index_type);
+ expr = expand_expr (range, NULL_RTX, VOIDmode, 0);
+ emit_cmp_and_jump_insns (index, expr, GTU, NULL_RTX, mode, 1,
+ default_label);
+
+ index = convert_to_mode (SImode, index, 0);
+ index = expand_binop (SImode, ashl_optab, const1_rtx,
+ index, NULL_RTX, 1, OPTAB_WIDEN);
+
+ for (i = 0; i < count; i++)
+ {
+ expr = immed_double_const (test[i].lo, test[i].hi, SImode);
+ expr = expand_binop (SImode, and_optab, index, expr,
+ NULL_RTX, 1, OPTAB_WIDEN);
+ emit_cmp_and_jump_insns (expr, const0_rtx, NE, NULL_RTX,
+ SImode, 1, test[i].label);
+ }
+
+ emit_jump (default_label);
+ }
/* Terminate a case (Pascal) or switch (C) statement
in which ORIG_INDEX is the expression to be tested.
*************** expand_end_case_type (orig_index, orig_t
*** 5193,5206 ****
{
tree minval = NULL_TREE, maxval = NULL_TREE, range = NULL_TREE;
rtx default_label = 0;
! struct case_node *n;
! unsigned int count;
rtx index;
rtx table_label;
int ncases;
rtx *labelvec;
int i;
! rtx before_case, end;
struct nesting *thiscase = case_stack;
tree index_expr, index_type;
bool exit_done = false;
--- 5327,5340 ----
{
tree minval = NULL_TREE, maxval = NULL_TREE, range = NULL_TREE;
rtx default_label = 0;
! struct case_node *n, *m;
! unsigned int count, uniq;
rtx index;
rtx table_label;
int ncases;
rtx *labelvec;
int i;
! rtx before_case, end, lab;
struct nesting *thiscase = case_stack;
tree index_expr, index_type;
bool exit_done = false;
*************** expand_end_case_type (orig_index, orig_t
*** 5275,5280 ****
--- 5409,5415 ----
/* Get upper and lower bounds of case values.
Also convert all the case values to the index expr's data type. */
+ uniq = 0;
count = 0;
for (n = thiscase->data.case_stmt.case_list; n; n = n->right)
{
*************** expand_end_case_type (orig_index, orig_t
*** 5304,5309 ****
--- 5439,5454 ----
/* A range counts double, since it requires two compares. */
if (! tree_int_cst_equal (n->low, n->high))
count++;
+
+ /* Count the number of unique case node targets. */
+ uniq++;
+ lab = label_rtx (n->code_label);
+ for (m = thiscase->data.case_stmt.case_list; m != n; m = m->right)
+ if (same_case_target_p (label_rtx (m->code_label), lab))
+ {
+ uniq--;
+ break;
+ }
}
/* Compute span of values. */
*************** expand_end_case_type (orig_index, orig_t
*** 5317,5322 ****
--- 5462,5491 ----
expand_expr (index_expr, const0_rtx, VOIDmode, 0);
emit_queue ();
emit_jump (default_label);
+ }
+
+ /* Try implementing this switch statement by a short sequence of
+ bit-wise comparisons. However, we let the binary-tree case
+ below handle constant index expressions. */
+ else if (CASE_USE_BIT_TESTS
+ && ! TREE_CONSTANT (index_expr)
+ && compare_tree_int (range, GET_MODE_BITSIZE (SImode)) < 0
+ && ((uniq == 1 && count >= 3)
+ || (uniq == 2 && count >= 5)
+ || (uniq == 3 && count >= 6)))
+ {
+ /* Optimize the case where all the case values fit in a
+ word without having to subtract MINVAL. In this case,
+ we can optimize away the subtraction. */
+ if (compare_tree_int (minval, 0) > 0
+ && compare_tree_int (maxval, GET_MODE_BITSIZE (SImode)) < 0)
+ {
+ minval = integer_zero_node;
+ range = maxval;
+ }
+ emit_case_bit_tests (index_type, index_expr, minval, range,
+ thiscase->data.case_stmt.case_list,
+ default_label);
}
/* If range of values is much bigger than number of values,
Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.974
diff -c -3 -p -r1.974 Makefile.in
*** Makefile.in 21 Jan 2003 13:45:07 -0000 1.974
--- Makefile.in 22 Jan 2003 06:09:15 -0000
*************** function.o : function.c $(CONFIG_H) $(SY
*** 1460,1466 ****
stmt.o : stmt.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) flags.h \
function.h insn-config.h hard-reg-set.h $(EXPR_H) libfuncs.h except.h \
$(LOOP_H) $(RECOG_H) toplev.h output.h varray.h $(GGC_H) $(TM_P_H) \
! langhooks.h $(PREDICT_H) gt-stmt.h
except.o : except.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
flags.h except.h function.h $(EXPR_H) libfuncs.h integrate.h langhooks.h \
insn-config.h hard-reg-set.h $(BASIC_BLOCK_H) output.h \
--- 1460,1466 ----
stmt.o : stmt.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) flags.h \
function.h insn-config.h hard-reg-set.h $(EXPR_H) libfuncs.h except.h \
$(LOOP_H) $(RECOG_H) toplev.h output.h varray.h $(GGC_H) $(TM_P_H) \
! langhooks.h $(PREDICT_H) gt-stmt.h $(OPTABS_H)
except.o : except.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
flags.h except.h function.h $(EXPR_H) libfuncs.h integrate.h langhooks.h \
insn-config.h hard-reg-set.h $(BASIC_BLOCK_H) output.h \
Index: doc/tm.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v
retrieving revision 1.191
diff -c -3 -p -r1.191 tm.texi
*** doc/tm.texi 19 Jan 2003 13:04:23 -0000 1.191
--- doc/tm.texi 22 Jan 2003 06:09:17 -0000
*************** is best to use a jump-table instead of a
*** 8618,8623 ****
--- 8618,8633 ----
The default is four for machines with a @code{casesi} instruction and
five otherwise. This is best for most machines.
+ @findex CASE_USE_BIT_TESTS
+ @item CASE_USE_BIT_TESTS
+ Define this macro to be a C expression to indicate whether C switch
+ statements may be implemented by a sequence of bit tests. This is
+ advantageous on processors that can efficiently implement left shift
+ of 1 by the number of bits held in a register, but inappropriate on
+ targets that would require a loop. By default, this macro returns
+ @code{true} if the target defines an @code{ashlsi3} pattern, and
+ @code{false} otherwise.
+
@findex WORD_REGISTER_OPERATIONS
@item WORD_REGISTER_OPERATIONS
Define this macro if operations between registers with integral mode
Roger
--
Roger Sayle, E-mail: roger@eyesopen.com
OpenEye Scientific Software, WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road, Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507. Fax: (+1) 505-473-0833