rfc: another switch optimization idea
Dinar Temirbulatov
dtemirbulatov@gmail.com
Mon Mar 25 21:24:00 GMT 2013
Hi,
We noticed some performance gains if we are not using jump over some
simple switch statements. Here is the idea: Check whether the switch
statement can be expanded with conditional instructions. In that case
jump tables should be avoided since some branch instructions can be
eliminated in further passes (replaced by conditional execution).
For example:
switch (i)
{
case 1: sum += 1;
case 2: sum += 3;
case 3: sum += 5;
case 4: sum += 10;
}
Using jump tables the following code will be generated (ARM assembly):
ldrcc pc, [pc, r0, lsl #2]
b .L5
.L0:
.word L1
.word L2
.word L3
.word L4
.L1:
add r3, #1
.L2:
add r3, #4
.L3:
add r3, #5
.L4:
add r3, #10
.L5
Although this code has a constant complexity it can be improved by the
conditional execution to avoid implicit branching:
cmp r0,1
addeq r3, #1
cmp r0,2
addeq r3, #4
cmp r0,3
addeq r3, #5
cmp r0,4
addeq r3, #10
Although the assembly below requires more assembly instructions to be
executed, it doesn't violate the CPU pipeline (since no branching is
performed).
The original version of patch for was developed by Alexey Kravets. I
measured some performance improvements/regressions using spec 2000 int
benchmark on Samsumg's exynos 5250. Here is the result:
before:
Base Base Base Peak
Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- --------
-------- --------
164.gzip 1400 287 487* 1400 288 485*
175.vpr 1400 376 373* 1400 374 374*
176.gcc 1100 121 912* 1100 118 933*
181.mcf 1800 242 743* 1800 251 718*
186.crafty 1000 159 628* 1000 165 608*
197.parser 1800 347 518* 1800 329 547*
252.eon 1300 960 135* 1300 960 135*
253.perlbmk 1800 214 842* 1800 212 848*
254.gap 1100 138 797* 1100 136 806*
255.vortex 1900 253 750* 1900 255 744*
256.bzip2 1500 237 632* 1500 230 653*
300.twolf X X
SPECint_base2000 561
SPECint2000 563
After:
164.gzip 1400 286 490 * 1400 288 486 *
175.vpr 1400 213 656 * 1400 215 650 *
176.gcc 1100 119 923 * 1100 118 933 *
181.mcf 1800 247 730 * 1800 251 717 *
186.crafty 1000 145 688 * 1000 150 664 *
197.parser 1800 296 608 * 1800 275 654 *
252.eon X X
253.perlbmk 1800 206 872 * 1800 211 853 *
254.gap 1100 133 825 * 1100 131 838 *
255.vortex 1900 241 789 * 1900 239 797 *
256.bzip2 1500 235 638 * 1500 226 663 *
300.twolf X X
The error in 252.eon was due to incorrect setup. Also "if (count >
3*PARAM_VALUE (PARAM_SWITCH_JUMP_TABLES_BB_OPS_LIMIT))" does not look
correct, and probably it is better to move this code in the earlier
stage just before the gimple expand and keep preference expand state
(jump-tables or not) for every switch statement to avoid dealing with
the RTL altogether.
thanks, Dinar.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: switch.patch
Type: application/octet-stream
Size: 8871 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc/attachments/20130325/ef6637c9/attachment.obj>
More information about the Gcc
mailing list