This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark
- From: "trippels at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 03 May 2015 13:00:44 +0000
- Subject: [Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark
- Auto-submitted: auto-generated
- References: <bug-53533-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533
Markus Trippelsdorf <trippels at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2012-05-31 00:00:00 |2015-5-3
CC| |trippels at gcc dot gnu.org
--- Comment #26 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
For gcc-5 and gcc-6 there is an additional 50% slowdown:
% g++ -O3 loop_unroll.ii -o loop_unroll
% time ./loop_unroll 10000
./loop_unroll 10000
test description absolute operations ratio with
number time per second test0
0 "int32_t for loop unroll 1" 0.14 sec 552.30 M 1.00
1 "int32_t for loop unroll 2" 0.11 sec 699.49 M 0.79
2 "int32_t for loop unroll 3" 0.14 sec 566.56 M 0.97
3 "int32_t for loop unroll 4" 0.15 sec 532.87 M 1.04
4 "int32_t for loop unroll 5" 0.10 sec 784.70 M 0.70
5 "int32_t for loop unroll 6" 0.09 sec 887.12 M 0.62
6 "int32_t for loop unroll 7" 0.09 sec 913.50 M 0.60
7 "int32_t for loop unroll 8" 0.08 sec 986.45 M 0.56
8 "int32_t for loop unroll 9" 0.23 sec 346.06 M 1.60
9 "int32_t for loop unroll 10" 0.08 sec 1040.06 M 0.53
10 "int32_t for loop unroll 11" 0.23 sec 348.02 M 1.59
11 "int32_t for loop unroll 12" 0.23 sec 353.38 M 1.56
12 "int32_t for loop unroll 13" 0.24 sec 338.32 M 1.63
13 "int32_t for loop unroll 14" 0.24 sec 332.32 M 1.66
14 "int32_t for loop unroll 15" 0.25 sec 321.15 M 1.72
15 "int32_t for loop unroll 16" 0.25 sec 318.23 M 1.74
16 "int32_t for loop unroll 17" 0.24 sec 329.43 M 1.68
17 "int32_t for loop unroll 18" 0.25 sec 321.34 M 1.72
18 "int32_t for loop unroll 19" 0.25 sec 314.53 M 1.76
19 "int32_t for loop unroll 20" 0.25 sec 325.33 M 1.70
20 "int32_t for loop unroll 21" 0.25 sec 323.67 M 1.71
21 "int32_t for loop unroll 22" 0.25 sec 316.85 M 1.74
22 "int32_t for loop unroll 23" 0.25 sec 323.51 M 1.71
23 "int32_t for loop unroll 24" 0.06 sec 1257.94 M 0.44
24 "int32_t for loop unroll 25" 0.24 sec 327.77 M 1.69
25 "int32_t for loop unroll 26" 0.06 sec 1310.44 M 0.42
26 "int32_t for loop unroll 27" 0.07 sec 1072.85 M 0.51
27 "int32_t for loop unroll 28" 0.28 sec 283.44 M 1.95
28 "int32_t for loop unroll 29" 0.30 sec 267.96 M 2.06
29 "int32_t for loop unroll 30" 0.31 sec 258.88 M 2.13
30 "int32_t for loop unroll 31" 0.06 sec 1337.64 M 0.41
31 "int32_t for loop unroll 32" 0.06 sec 1315.10 M 0.42
Total absolute time for int32_t for loop unrolling: 5.85 sec
...
./loop_unroll 10000 41.43s user 0.00s system 100% cpu 41.426 total
==============================================================================
% /usr/x86_64-pc-linux-gnu/gcc-bin/4.9.2/g++ -O3 loop_unroll.ii -o loop_unroll
% time ./loop_unroll 10000
./loop_unroll 10000
test description absolute operations ratio with
number time per second test0
0 "int32_t for loop unroll 1" 0.14 sec 582.13 M 1.00
1 "int32_t for loop unroll 2" 0.13 sec 625.41 M 0.93
2 "int32_t for loop unroll 3" 0.13 sec 635.76 M 0.92
3 "int32_t for loop unroll 4" 0.13 sec 625.41 M 0.93
4 "int32_t for loop unroll 5" 0.12 sec 640.96 M 0.91
5 "int32_t for loop unroll 6" 0.09 sec 888.11 M 0.66
6 "int32_t for loop unroll 7" 0.09 sec 900.10 M 0.65
7 "int32_t for loop unroll 8" 0.10 sec 832.20 M 0.70
8 "int32_t for loop unroll 9" 0.10 sec 834.22 M 0.70
9 "int32_t for loop unroll 10" 0.09 sec 902.04 M 0.65
10 "int32_t for loop unroll 11" 0.10 sec 805.15 M 0.72
11 "int32_t for loop unroll 12" 0.10 sec 823.27 M 0.71
12 "int32_t for loop unroll 13" 0.09 sec 860.51 M 0.68
13 "int32_t for loop unroll 14" 0.11 sec 753.59 M 0.77
14 "int32_t for loop unroll 15" 0.10 sec 781.96 M 0.74
15 "int32_t for loop unroll 16" 0.09 sec 858.76 M 0.68
16 "int32_t for loop unroll 17" 0.09 sec 846.91 M 0.69
17 "int32_t for loop unroll 18" 0.10 sec 783.19 M 0.74
18 "int32_t for loop unroll 19" 0.10 sec 794.81 M 0.73
19 "int32_t for loop unroll 20" 0.10 sec 806.70 M 0.72
20 "int32_t for loop unroll 21" 0.10 sec 823.82 M 0.71
21 "int32_t for loop unroll 22" 0.09 sec 851.74 M 0.68
22 "int32_t for loop unroll 23" 0.10 sec 792.87 M 0.73
23 "int32_t for loop unroll 24" 0.10 sec 809.32 M 0.72
24 "int32_t for loop unroll 25" 0.10 sec 832.18 M 0.70
25 "int32_t for loop unroll 26" 0.10 sec 781.11 M 0.75
26 "int32_t for loop unroll 27" 0.10 sec 792.40 M 0.73
27 "int32_t for loop unroll 28" 0.10 sec 817.22 M 0.71
28 "int32_t for loop unroll 29" 0.10 sec 826.40 M 0.70
29 "int32_t for loop unroll 30" 0.10 sec 803.83 M 0.72
30 "int32_t for loop unroll 31" 0.10 sec 803.48 M 0.72
31 "int32_t for loop unroll 32" 0.10 sec 796.88 M 0.73
Total absolute time for int32_t for loop unrolling: 3.28 sec
...
./loop_unroll 10000 22.75s user 0.00s system 100% cpu 22.746 total
clang:
./loop_unroll 10000 12.93s user 0.00s system 100% cpu 12.933 total
icpc (5* faster than gcc-5):
./loop_unroll 10000 8.38s user 0.00s system 99% cpu 8.382 total