Why is GCC generating a jump table for a five-entry switch statement if retpolines are on? This has got to be a *huge* performance loss. The retpoline sequence is very, very slow, and branches aren't that slow. A five-entry switch is only three branches deep.
Author: willschm Date: Mon Sep 24 15:47:22 2018 New Revision: 264538 URL: https://gcc.gnu.org/viewcvs?rev=264538&root=gcc&view=rev Log: [testsuite] 2018-09-24 Will Schmidt <will_schmidt@vnet.ibm.com> PR testsuite/86952 * gcc.target/powerpc/p8-vec-xl-xst-v2.c: Add and update expected codegen Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c
H.J. I can write a patch for it. Do you expect more expensive costs when retpolines are enabled?
(In reply to Martin Liška from comment #2) > H.J. I can write a patch for it. Do you expect more expensive costs when > retpolines are enabled? retpoline is more expensive than 4 branches.
(In reply to H.J. Lu from comment #3) > (In reply to Martin Liška from comment #2) > > H.J. I can write a patch for it. Do you expect more expensive costs when > > retpolines are enabled? > > retpoline is more expensive than 4 branches. Can you please make a microbenchmark that will expose how exactly is that expensive? Based on that I can tune current costs.
(In reply to Martin Liška from comment #4) > (In reply to H.J. Lu from comment #3) > > (In reply to Martin Liška from comment #2) > > > H.J. I can write a patch for it. Do you expect more expensive costs when > > > retpolines are enabled? > > > > retpoline is more expensive than 4 branches. > > Can you please make a microbenchmark that will expose how exactly is that > expensive? Based on that I can tune current costs. Is there a testcase where GCC generates a jump table for a five-entry switch statement?
(In reply to H.J. Lu from comment #5) > (In reply to Martin Liška from comment #4) > > (In reply to H.J. Lu from comment #3) > > > (In reply to Martin Liška from comment #2) > > > > H.J. I can write a patch for it. Do you expect more expensive costs when > > > > retpolines are enabled? > > > > > > retpoline is more expensive than 4 branches. > > > > Can you please make a microbenchmark that will expose how exactly is that > > expensive? Based on that I can tune current costs. > > Is there a testcase where GCC generates a jump table for a five-entry > switch statement? $ cat jt.c int global; int foo3 (int x) { switch (x) { case 0: return 11; case 1: return 123; case 2: global += 1; return 3; case 3: return 44; case 4: return 444; default: return 0; } } $ gcc jt.c -O2 -S -o/dev/stdout .file "jt.c" .text .p2align 4,,15 .globl foo3 .type foo3, @function foo3: .LFB0: .cfi_startproc cmpl $4, %edi ja .L2 movl %edi, %edi jmp *.L4(,%rdi,8) .section .rodata .align 8 .align 4 .L4: .quad .L9 .quad .L7 .quad .L6 .quad .L5 .quad .L3 .text .p2align 4,,10 .p2align 3 ...
Please try retpoline-table branch at https://github.com/hjl-tools/microbenchmark I got [hjl@gnu-cfl-1 microbenchmark]$ make gcc -g -I. -O2 -mindirect-branch=thunk -c -o test.o test.c gcc -g -I. -O2 -mindirect-branch=thunk -fno-jump-tables -c -o switch-no-table.o switch-no-table.c gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c gcc -o test test.o switch-no-table.o switch.o ./test no jump table: 189484 jump table : 333016 [hjl@gnu-cfl-1 microbenchmark]$
Thanks for it, will work on that next week.
Ok, I've updated slightly the micro-benchmark and I see following difference: https://github.com/marxin/microbenchmark/tree/retpoline-table on my Haswell desktop: ./test no jump table: 4265908653 jump table : 5118680921 (119.99%) which is quite small I would say..
H.J. : Can you please run updated benchmark on a recent machine and provide slow down numbers for that?
(In reply to Martin Liška from comment #10) > H.J. : Can you please run updated benchmark on a recent machine and provide > slow down numbers for that? The numbers aren't stable: [hjl@gnu-cfl-1 microbenchmark]$ make ./test 30000 loops: global: 21, total: 625 no jump table: 178424 global: 21, total: 625 jump table : 266792 (149.53%) [hjl@gnu-cfl-1 microbenchmark]$ make ./test 30000 loops: global: 21, total: 625 no jump table: 185068 global: 21, total: 625 jump table : 266678 (144.10%) [hjl@gnu-cfl-1 microbenchmark]$ make ./test 30000 loops: global: 21, total: 625 no jump table: 292810 global: 21, total: 625 jump table : 214840 (73.37%) [hjl@gnu-cfl-1 microbenchmark]$ Close it for now.
I've been looking into this issue quite recently and improved the benchmark tool a bit along the way. There need to be multiple considerations wrt to traversing the switch cases, the case is here is doing round robin, but additional distributions / tests could be added. Pushed here just in case: https://github.com/borkmann/microbenchmark Numbers I'm getting are stable: * Xeon E3-1240, packet.net c1.small.x86 instance: # make prep [...] # make gcc -g -I. -O2 -c -o test.o test.c gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 -c -o switch-no-table.o switch-no-table.c gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o taskset 1 ./test no retpoline : 6098325270 no jump table: 6298192058 (no retpoline: 103.28%) jump table : 22081802856 (no retpoline: 362.10%, no jump table: 350.61%) # make taskset 1 ./test no retpoline : 6098439816 no jump table: 6298242270 (no retpoline: 103.28%) jump table : 22107872854 (no retpoline: 362.52%, no jump table: 351.02%) # make taskset 1 ./test no retpoline : 6098187038 no jump table: 6298308128 (no retpoline: 103.28%) jump table : 22071053524 (no retpoline: 361.93%, no jump table: 350.43%) * Xeon Gold 5120, packet.net m2.xlarge.x86 instance: # make prep [...] # make gcc -g -I. -O2 -c -o test.o test.c gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 -c -o switch-no-table.o switch-no-table.c gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o taskset 1 ./test no retpoline : 5450356814 no jump table: 5620673036 (no retpoline: 103.12%) jump table : 21448285314 (no retpoline: 393.52%, no jump table: 381.60%) # make taskset 1 ./test no retpoline : 5450356100 no jump table: 5620678302 (no retpoline: 103.12%) jump table : 21448119720 (no retpoline: 393.52%, no jump table: 381.59%) # make taskset 1 ./test no retpoline : 5450331258 no jump table: 5620839740 (no retpoline: 103.13%) jump table : 21446922902 (no retpoline: 393.50%, no jump table: 381.56%) I've also looked into clang for their -mretpoline flag, and they generally turn off jump table generation in this case. For gcc, the s390 folks implemented a target override for the default case-values-threshold to raise it to 20. For x86 something similar could be done. Anyway, H.J. Lu asked me to reopen this issue (but seems like I cannot make this change from my account).
Reopened with new info.
Reopened.
(In reply to Daniel Borkmann from comment #12) > I've been looking into this issue quite recently and improved the benchmark > tool a bit along the way. There need to be multiple considerations wrt to > traversing the switch cases, the case is here is doing round robin, but > additional distributions / tests could be added. Pushed here just in case: > https://github.com/borkmann/microbenchmark Thanks a lot for the benchmark. > > Numbers I'm getting are stable: > > * Xeon E3-1240, packet.net c1.small.x86 instance: > > # make prep > [...] > # make > gcc -g -I. -O2 -c -o test.o test.c > gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 > -c -o switch-no-table.o switch-no-table.c > gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c > gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c > gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o > taskset 1 ./test > no retpoline : 6098325270 > no jump table: 6298192058 (no retpoline: 103.28%) > jump table : 22081802856 (no retpoline: 362.10%, no jump table: > 350.61%) > # make > taskset 1 ./test > no retpoline : 6098439816 > no jump table: 6298242270 (no retpoline: 103.28%) > jump table : 22107872854 (no retpoline: 362.52%, no jump table: > 351.02%) > # make > taskset 1 ./test > no retpoline : 6098187038 > no jump table: 6298308128 (no retpoline: 103.28%) > jump table : 22071053524 (no retpoline: 361.93%, no jump table: > 350.43%) > > * Xeon Gold 5120, packet.net m2.xlarge.x86 instance: > > # make prep > [...] > # make > gcc -g -I. -O2 -c -o test.o test.c > gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 > -c -o switch-no-table.o switch-no-table.c > gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c > gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c > gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o > taskset 1 ./test > no retpoline : 5450356814 > no jump table: 5620673036 (no retpoline: 103.12%) > jump table : 21448285314 (no retpoline: 393.52%, no jump table: > 381.60%) > # make > taskset 1 ./test > no retpoline : 5450356100 > no jump table: 5620678302 (no retpoline: 103.12%) > jump table : 21448119720 (no retpoline: 393.52%, no jump table: > 381.59%) > # make > taskset 1 ./test > no retpoline : 5450331258 > no jump table: 5620839740 (no retpoline: 103.13%) > jump table : 21446922902 (no retpoline: 393.50%, no jump table: > 381.56%) I can confirm the numbers. I've got: model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz taskset 1 ./test no retpoline : 4311969467 no jump table: 5146081372 (no retpoline: 119.34%) jump table : 18845846887 (no retpoline: 437.06%, no jump table: 366.22%) > > I've also looked into clang for their -mretpoline flag, and they generally > turn off jump table generation in this case. For gcc, the s390 folks > implemented a target override for the default case-values-threshold to raise > it to 20. Note that GCC has similar parameter: --param case-values-threshold The smallest number of different values for which it is best to use a jump-table instead of a tree of conditional branches. If the value is 0, use the default for the machine. The default is 0. For 20 branches, I've got even worse numbers: https://github.com/marxin/microbenchmark-1/tree/retpoline-table taskset 1 ./test no retpoline : 5096377521 no jump table: 5169400990 (no retpoline: 101.43%) jump table : 28830137876 (no retpoline: 565.70%, no jump table: 557.71%) So are you suggesting to disable jump tables with retpolines at all? For x86 something similar could be done. Anyway, H.J. Lu asked me > to reopen this issue (but seems like I cannot make this change from my > account). Yep, I would need an account ending with @gcc.org to change a bug.
(In reply to Martin Liška from comment #15) > (In reply to Daniel Borkmann from comment #12) > > I've been looking into this issue quite recently and improved the benchmark > > tool a bit along the way. There need to be multiple considerations wrt to > > traversing the switch cases, the case is here is doing round robin, but > > additional distributions / tests could be added. Pushed here just in case: > > https://github.com/borkmann/microbenchmark > > Thanks a lot for the benchmark. > > > Numbers I'm getting are stable: > > > > * Xeon E3-1240, packet.net c1.small.x86 instance: > > > > # make prep > > [...] > > # make > > gcc -g -I. -O2 -c -o test.o test.c > > gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 > > -c -o switch-no-table.o switch-no-table.c > > gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c > > gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c > > gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o > > taskset 1 ./test > > no retpoline : 6098325270 > > no jump table: 6298192058 (no retpoline: 103.28%) > > jump table : 22081802856 (no retpoline: 362.10%, no jump table: > > 350.61%) > > # make > > taskset 1 ./test > > no retpoline : 6098439816 > > no jump table: 6298242270 (no retpoline: 103.28%) > > jump table : 22107872854 (no retpoline: 362.52%, no jump table: > > 351.02%) > > # make > > taskset 1 ./test > > no retpoline : 6098187038 > > no jump table: 6298308128 (no retpoline: 103.28%) > > jump table : 22071053524 (no retpoline: 361.93%, no jump table: > > 350.43%) > > > > * Xeon Gold 5120, packet.net m2.xlarge.x86 instance: > > > > # make prep > > [...] > > # make > > gcc -g -I. -O2 -c -o test.o test.c > > gcc -g -I. -O2 -mindirect-branch=thunk --param=case-values-threshold=20 > > -c -o switch-no-table.o switch-no-table.c > > gcc -g -I. -O2 -mindirect-branch=thunk -c -o switch.o switch.c > > gcc -g -I. -O2 -c -o switch-no-retpol.o switch-no-retpol.c > > gcc -o test test.o switch-no-table.o switch.o switch-no-retpol.o > > taskset 1 ./test > > no retpoline : 5450356814 > > no jump table: 5620673036 (no retpoline: 103.12%) > > jump table : 21448285314 (no retpoline: 393.52%, no jump table: > > 381.60%) > > # make > > taskset 1 ./test > > no retpoline : 5450356100 > > no jump table: 5620678302 (no retpoline: 103.12%) > > jump table : 21448119720 (no retpoline: 393.52%, no jump table: > > 381.59%) > > # make > > taskset 1 ./test > > no retpoline : 5450331258 > > no jump table: 5620839740 (no retpoline: 103.13%) > > jump table : 21446922902 (no retpoline: 393.50%, no jump table: > > 381.56%) > > I can confirm the numbers. I've got: > model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz > > taskset 1 ./test > no retpoline : 4311969467 > no jump table: 5146081372 (no retpoline: 119.34%) > jump table : 18845846887 (no retpoline: 437.06%, no jump table: > 366.22%) Ok, great, thanks for testing on your side as well! > > I've also looked into clang for their -mretpoline flag, and they generally > > turn off jump table generation in this case. For gcc, the s390 folks > > implemented a target override for the default case-values-threshold to raise > > it to 20. > > Note that GCC has similar parameter: > > --param case-values-threshold > The smallest number of different values for which it is best > to use a jump-table instead of a tree of conditional branches. If the value > is 0, use the default for the machine. The default is 0. Yeah, I know, I've used it above for the test case (see the gcc cmdline parts). > For 20 branches, I've got even worse numbers: > https://github.com/marxin/microbenchmark-1/tree/retpoline-table > > taskset 1 ./test > no retpoline : 5096377521 > no jump table: 5169400990 (no retpoline: 101.43%) > jump table : 28830137876 (no retpoline: 565.70%, no jump table: > 557.71%) > > So are you suggesting to disable jump tables with retpolines at all? I leave that up to you guys, but I would at min probably implement something like s390 folks did for gcc, commit db7a90aa0de5 ("S/390: Disable prediction of indirect branches"), see s390_case_values_threshold() which does: +unsigned int +s390_case_values_threshold (void) +{ + /* Disabling branch prediction for indirect jumps makes jump tables + much more expensive. */ + if (TARGET_INDIRECT_BRANCH_NOBP_JUMP) + return 20; + + return default_case_values_threshold (); +} > For x86 something similar could be done. Anyway, H.J. Lu asked me > > to reopen this issue (but seems like I cannot make this change from my > > account). > > Yep, I would need an account ending with @gcc.org to change a bug.
> I leave that up to you guys, but I would at min probably implement something > like s390 folks did for gcc, commit db7a90aa0de5 ("S/390: Disable prediction > of indirect branches"), see s390_case_values_threshold() which does: Sure, that's probably the right approach. I would appreciate help with the benchmark. Can you please come up with a --param case-values-threshold value that will show when a jump table (w/ retpolines) is equally fast as a decision tree (-fno-jump-tables)? > > +unsigned int > +s390_case_values_threshold (void) > +{ > + /* Disabling branch prediction for indirect jumps makes jump tables > + much more expensive. */ > + if (TARGET_INDIRECT_BRANCH_NOBP_JUMP) > + return 20; > + > + return default_case_values_threshold (); > +} > > > For x86 something similar could be done. Anyway, H.J. Lu asked me > > > to reopen this issue (but seems like I cannot make this change from my > > > account). > > > > Yep, I would need an account ending with @gcc.org to change a bug.
I'm working on a more complex test-case generator. I'll post results tomorrow.
Ok, I updated the benchmark and push it here: https://github.com/marxin/microbenchmark-1 And I see following on my Haswell machine: $ ./test.py normal retpoline retpo+no-JT retpo+JT=20 retpo+JT=40 cases: 8: 0.34 (100%) 1.80 (529%) 0.39 (114%) 0.39 (115%) 0.39 (115%) cases: 16: 0.33 (100%) 1.77 (541%) 0.51 (156%) 0.51 (157%) 0.51 (157%) cases: 32: 1.01 (100%) 1.82 (179%) 0.57 ( 56%) 1.82 (179%) 0.54 ( 54%) cases: 64: 0.78 (100%) 1.76 (225%) 0.58 ( 74%) 1.76 (225%) 1.75 (224%) cases: 128: 0.34 (100%) 1.94 (577%) 0.64 (191%) 1.93 (574%) 1.93 (573%) cases: 256: 0.34 (100%) 1.94 (579%) 0.76 (225%) 1.95 (581%) 1.94 (580%) cases: 1024: 1.21 (100%) 2.00 (166%) 0.97 ( 80%) 2.00 (165%) 2.00 (166%) cases: 2048: 1.48 (100%) 2.03 (137%) 2.06 (139%) 2.01 (136%) 2.00 (135%) cases: 4096: 1.67 (100%) 2.09 (125%) 3.78 (226%) 2.10 (126%) 2.20 (132%) From the number I see recommend to disable jump tables with -mindirect-branch=*. Thoughts?
(In reply to Martin Liška from comment #19) > Ok, I updated the benchmark and push it here: > https://github.com/marxin/microbenchmark-1 > > And I see following on my Haswell machine: Thanks for working on it! Bit strange why some of your numbers are quite fluctuating e.g. in your 'normal' column. What do you use to tune your setup for testing? I've been running the `make prep` part which I added back then, and the numbers I see are quite stable. I ran a quick test this morning with your repo, and here's what I got for the round-robin walk: * Xeon E3-1240 (3.7GHz): # ./test.py normal retpoline retpo+no-JT retpo+JT=20 retpo+JT=40 cases: 8: 0.49 (100%) 2.09 (426%) 0.53 (108%) 0.53 (108%) 0.53 (108%) cases: 16: 0.49 (100%) 2.09 (426%) 0.58 (119%) 0.58 (119%) 0.58 (119%) cases: 32: 0.49 (100%) 2.09 (426%) 0.61 (125%) 2.09 (426%) 0.61 (125%) cases: 64: 0.49 (100%) 2.26 (458%) 0.69 (140%) 2.27 (459%) 2.27 (459%) cases: 128: 0.50 (100%) 2.37 (476%) 0.76 (153%) 2.32 (466%) 2.41 (483%) cases: 256: 0.52 (100%) 2.33 (451%) 0.91 (175%) 2.33 (450%) 2.36 (456%) cases: 1024: 1.05 (100%) 2.54 (242%) 1.08 (103%) 2.59 (246%) 2.54 (242%) cases: 2048: 1.63 (100%) 2.56 (157%) 1.94 (119%) 2.61 (160%) 2.59 (159%) cases: 4096: 2.19 (100%) 3.12 (143%) 3.22 (147%) 3.09 (142%) 3.13 (143%) * Xeon Gold 5120 (2.6GHz): # ./test.py normal retpoline retpo+no-JT retpo+JT=20 retpo+JT=40 cases: 8: 0.70 (100%) 2.98 (425%) 0.75 (107%) 0.75 (107%) 0.75 (107%) cases: 16: 0.70 (100%) 2.98 (425%) 0.82 (117%) 0.82 (117%) 0.82 (117%) cases: 32: 0.70 (100%) 3.01 (430%) 0.87 (124%) 2.98 (426%) 0.87 (124%) cases: 64: 0.70 (100%) 3.52 (501%) 0.94 (134%) 3.52 (501%) 3.52 (501%) cases: 128: 0.71 (100%) 3.51 (495%) 1.07 (151%) 3.50 (495%) 3.50 (494%) cases: 256: 0.76 (100%) 3.14 (414%) 1.27 (167%) 3.14 (414%) 3.14 (414%) cases: 1024: 1.46 (100%) 3.36 (230%) 1.49 (102%) 3.36 (230%) 3.36 (230%) cases: 2048: 2.25 (100%) 3.19 (142%) 2.70 (120%) 3.19 (142%) 3.19 (142%) cases: 4096: 2.90 (100%) 3.74 (129%) 4.48 (155%) 3.73 (129%) 3.72 (129%) Probably makes sense to also add other walk tests aka input distributions for foo{,_no_table,_no_retpol}(<x>) for further comparison if plan would be to disable jump tables entirely.
(In reply to Daniel Borkmann from comment #20) > (In reply to Martin Liška from comment #19) > > Ok, I updated the benchmark and push it here: > > https://github.com/marxin/microbenchmark-1 > > > > And I see following on my Haswell machine: > > Thanks for working on it! Bit strange why some of your numbers are quite > fluctuating e.g. in your 'normal' column. What do you use to tune your setup > for testing? I've been running the `make prep` part which I added back then, > and the numbers I see are quite stable. I ran a quick test this morning with > your repo, and here's what I got for the round-robin walk: Yes, it's without taskset and tuned. I don't have any experience with tuned. > > * Xeon E3-1240 (3.7GHz): > > # ./test.py > normal retpoline retpo+no-JT retpo+JT=20 retpo+JT=40 > cases: 8: 0.49 (100%) 2.09 (426%) 0.53 (108%) 0.53 (108%) 0.53 (108%) > cases: 16: 0.49 (100%) 2.09 (426%) 0.58 (119%) 0.58 (119%) 0.58 (119%) > cases: 32: 0.49 (100%) 2.09 (426%) 0.61 (125%) 2.09 (426%) 0.61 (125%) > cases: 64: 0.49 (100%) 2.26 (458%) 0.69 (140%) 2.27 (459%) 2.27 (459%) > cases: 128: 0.50 (100%) 2.37 (476%) 0.76 (153%) 2.32 (466%) 2.41 (483%) > cases: 256: 0.52 (100%) 2.33 (451%) 0.91 (175%) 2.33 (450%) 2.36 (456%) > cases: 1024: 1.05 (100%) 2.54 (242%) 1.08 (103%) 2.59 (246%) 2.54 (242%) > cases: 2048: 1.63 (100%) 2.56 (157%) 1.94 (119%) 2.61 (160%) 2.59 (159%) > cases: 4096: 2.19 (100%) 3.12 (143%) 3.22 (147%) 3.09 (142%) 3.13 (143%) > > * Xeon Gold 5120 (2.6GHz): > > # ./test.py > normal retpoline retpo+no-JT retpo+JT=20 retpo+JT=40 > cases: 8: 0.70 (100%) 2.98 (425%) 0.75 (107%) 0.75 (107%) 0.75 (107%) > cases: 16: 0.70 (100%) 2.98 (425%) 0.82 (117%) 0.82 (117%) 0.82 (117%) > cases: 32: 0.70 (100%) 3.01 (430%) 0.87 (124%) 2.98 (426%) 0.87 (124%) > cases: 64: 0.70 (100%) 3.52 (501%) 0.94 (134%) 3.52 (501%) 3.52 (501%) > cases: 128: 0.71 (100%) 3.51 (495%) 1.07 (151%) 3.50 (495%) 3.50 (494%) > cases: 256: 0.76 (100%) 3.14 (414%) 1.27 (167%) 3.14 (414%) 3.14 (414%) > cases: 1024: 1.46 (100%) 3.36 (230%) 1.49 (102%) 3.36 (230%) 3.36 (230%) > cases: 2048: 2.25 (100%) 3.19 (142%) 2.70 (120%) 3.19 (142%) 3.19 (142%) > cases: 4096: 2.90 (100%) 3.74 (129%) 4.48 (155%) 3.73 (129%) 3.72 (129%) > > Probably makes sense to also add other walk tests aka input distributions > for foo{,_no_table,_no_retpol}(<x>) for further comparison if plan would be > to disable jump tables entirely. There are number for: + int x = i % 57; + foo ((3 * x * x + 17 * x) / 100); distribution: normal retpoline retpo+no-JT retpo+JT=20 retpo+JT=40 cases: 8: 1.55 (100%) 2.65 (171%) 0.59 ( 38%) 0.60 ( 39%) 0.60 ( 39%) cases: 16: 1.53 (100%) 2.66 (174%) 0.67 ( 44%) 0.66 ( 43%) 0.66 ( 43%) cases: 32: 1.76 (100%) 2.68 (152%) 0.70 ( 40%) 2.69 (153%) 0.70 ( 39%) cases: 64: 1.31 (100%) 2.71 (206%) 0.75 ( 57%) 2.69 (205%) 2.66 (202%) cases: 128: 0.53 (100%) 2.75 (515%) 0.78 (147%) 2.73 (513%) 2.75 (516%) cases: 256: 0.55 (100%) 2.76 (504%) 0.85 (154%) 2.76 (504%) 2.76 (503%) cases: 1024: 0.54 (100%) 2.73 (506%) 0.96 (178%) 2.76 (511%) 2.74 (507%) cases: 2048: 0.54 (100%) 2.74 (507%) 1.23 (228%) 2.73 (505%) 2.71 (501%) cases: 4096: 0.54 (100%) 2.73 (503%) 1.44 (266%) 2.73 (502%) 2.73 (503%) Conclusion is the same for me, I'm going to prepare a patch that will disable JTs for retpolines. Thank you for testing.
Author: marxin Date: Fri Mar 8 12:55:40 2019 New Revision: 269492 URL: https://gcc.gnu.org/viewcvs?rev=269492&root=gcc&view=rev Log: x86: Disable jump tables when retpolines are used (PR target/86952). 2019-03-08 Martin Liska <mliska@suse.cz> PR target/86952 * config/i386/i386.c (ix86_option_override_internal): Disable jump tables when retpolines are used. 2019-03-08 Martin Liska <mliska@suse.cz> PR target/86952 * gcc.target/i386/pr86952.c: New test. * gcc.target/i386/indirect-thunk-7.c: Use jump tables to match scanned pattern. * gcc.target/i386/indirect-thunk-inline-7.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr86952.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
Fixed on trunk so far.
Author: marxin Date: Mon Mar 11 09:38:06 2019 New Revision: 269572 URL: https://gcc.gnu.org/viewcvs?rev=269572&root=gcc&view=rev Log: Backport r269492 2019-03-11 Martin Liska <mliska@suse.cz> Backport from mainline 2019-03-08 Martin Liska <mliska@suse.cz> PR target/86952 * config/i386/i386.c (ix86_option_override_internal): Disable jump tables when retpolines are used. 2019-03-11 Martin Liska <mliska@suse.cz> Backport from mainline 2019-03-08 Martin Liska <mliska@suse.cz> PR target/86952 * gcc.target/i386/indirect-thunk-7.c: Use jump tables to match scanned pattern. * gcc.target/i386/indirect-thunk-inline-7.c: Likewise. Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/i386/i386.c branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
Fixed now.
Author: marxin Date: Thu Apr 11 08:59:48 2019 New Revision: 270277 URL: https://gcc.gnu.org/viewcvs?rev=270277&root=gcc&view=rev Log: Backport r269492 2019-04-11 Martin Liska <mliska@suse.cz> Backport from mainline 2019-03-08 Martin Liska <mliska@suse.cz> PR target/86952 * config/i386/i386.c (ix86_option_override_internal): Disable jump tables when retpolines are used. 2019-04-11 Martin Liska <mliska@suse.cz> Backport from mainline 2019-03-08 Martin Liska <mliska@suse.cz> PR target/86952 * gcc.target/i386/pr86952.c: New test. * gcc.target/i386/indirect-thunk-7.c: Use jump tables to match scanned pattern. * gcc.target/i386/indirect-thunk-inline-7.c: Likewise. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr86952.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/i386/i386.c branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c