This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533

Markus Trippelsdorf <trippels at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2012-05-31 00:00:00         |2015-5-3
                 CC|                            |trippels at gcc dot gnu.org

--- Comment #26 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
For gcc-5 and gcc-6 there is an additional 50% slowdown:

 % g++ -O3 loop_unroll.ii -o loop_unroll
 % time ./loop_unroll 10000
./loop_unroll 10000 

test                description   absolute   operations   ratio with
number                            time       per second   test0

 0  "int32_t for loop unroll 1"   0.14 sec   552.30 M     1.00
 1  "int32_t for loop unroll 2"   0.11 sec   699.49 M     0.79
 2  "int32_t for loop unroll 3"   0.14 sec   566.56 M     0.97
 3  "int32_t for loop unroll 4"   0.15 sec   532.87 M     1.04
 4  "int32_t for loop unroll 5"   0.10 sec   784.70 M     0.70
 5  "int32_t for loop unroll 6"   0.09 sec   887.12 M     0.62
 6  "int32_t for loop unroll 7"   0.09 sec   913.50 M     0.60
 7  "int32_t for loop unroll 8"   0.08 sec   986.45 M     0.56
 8  "int32_t for loop unroll 9"   0.23 sec   346.06 M     1.60
 9 "int32_t for loop unroll 10"   0.08 sec   1040.06 M     0.53
10 "int32_t for loop unroll 11"   0.23 sec   348.02 M     1.59
11 "int32_t for loop unroll 12"   0.23 sec   353.38 M     1.56
12 "int32_t for loop unroll 13"   0.24 sec   338.32 M     1.63
13 "int32_t for loop unroll 14"   0.24 sec   332.32 M     1.66
14 "int32_t for loop unroll 15"   0.25 sec   321.15 M     1.72
15 "int32_t for loop unroll 16"   0.25 sec   318.23 M     1.74
16 "int32_t for loop unroll 17"   0.24 sec   329.43 M     1.68
17 "int32_t for loop unroll 18"   0.25 sec   321.34 M     1.72
18 "int32_t for loop unroll 19"   0.25 sec   314.53 M     1.76
19 "int32_t for loop unroll 20"   0.25 sec   325.33 M     1.70
20 "int32_t for loop unroll 21"   0.25 sec   323.67 M     1.71
21 "int32_t for loop unroll 22"   0.25 sec   316.85 M     1.74
22 "int32_t for loop unroll 23"   0.25 sec   323.51 M     1.71
23 "int32_t for loop unroll 24"   0.06 sec   1257.94 M     0.44
24 "int32_t for loop unroll 25"   0.24 sec   327.77 M     1.69
25 "int32_t for loop unroll 26"   0.06 sec   1310.44 M     0.42
26 "int32_t for loop unroll 27"   0.07 sec   1072.85 M     0.51
27 "int32_t for loop unroll 28"   0.28 sec   283.44 M     1.95
28 "int32_t for loop unroll 29"   0.30 sec   267.96 M     2.06
29 "int32_t for loop unroll 30"   0.31 sec   258.88 M     2.13
30 "int32_t for loop unroll 31"   0.06 sec   1337.64 M     0.41
31 "int32_t for loop unroll 32"   0.06 sec   1315.10 M     0.42

Total absolute time for int32_t for loop unrolling: 5.85 sec
...
./loop_unroll 10000  41.43s user 0.00s system 100% cpu 41.426 total

==============================================================================

 % /usr/x86_64-pc-linux-gnu/gcc-bin/4.9.2/g++ -O3 loop_unroll.ii -o loop_unroll
 % time ./loop_unroll 10000
./loop_unroll 10000 

test                description   absolute   operations   ratio with
number                            time       per second   test0

 0  "int32_t for loop unroll 1"   0.14 sec   582.13 M     1.00
 1  "int32_t for loop unroll 2"   0.13 sec   625.41 M     0.93
 2  "int32_t for loop unroll 3"   0.13 sec   635.76 M     0.92
 3  "int32_t for loop unroll 4"   0.13 sec   625.41 M     0.93
 4  "int32_t for loop unroll 5"   0.12 sec   640.96 M     0.91
 5  "int32_t for loop unroll 6"   0.09 sec   888.11 M     0.66
 6  "int32_t for loop unroll 7"   0.09 sec   900.10 M     0.65
 7  "int32_t for loop unroll 8"   0.10 sec   832.20 M     0.70
 8  "int32_t for loop unroll 9"   0.10 sec   834.22 M     0.70
 9 "int32_t for loop unroll 10"   0.09 sec   902.04 M     0.65
10 "int32_t for loop unroll 11"   0.10 sec   805.15 M     0.72
11 "int32_t for loop unroll 12"   0.10 sec   823.27 M     0.71
12 "int32_t for loop unroll 13"   0.09 sec   860.51 M     0.68
13 "int32_t for loop unroll 14"   0.11 sec   753.59 M     0.77
14 "int32_t for loop unroll 15"   0.10 sec   781.96 M     0.74
15 "int32_t for loop unroll 16"   0.09 sec   858.76 M     0.68
16 "int32_t for loop unroll 17"   0.09 sec   846.91 M     0.69
17 "int32_t for loop unroll 18"   0.10 sec   783.19 M     0.74
18 "int32_t for loop unroll 19"   0.10 sec   794.81 M     0.73
19 "int32_t for loop unroll 20"   0.10 sec   806.70 M     0.72
20 "int32_t for loop unroll 21"   0.10 sec   823.82 M     0.71
21 "int32_t for loop unroll 22"   0.09 sec   851.74 M     0.68
22 "int32_t for loop unroll 23"   0.10 sec   792.87 M     0.73
23 "int32_t for loop unroll 24"   0.10 sec   809.32 M     0.72
24 "int32_t for loop unroll 25"   0.10 sec   832.18 M     0.70
25 "int32_t for loop unroll 26"   0.10 sec   781.11 M     0.75
26 "int32_t for loop unroll 27"   0.10 sec   792.40 M     0.73
27 "int32_t for loop unroll 28"   0.10 sec   817.22 M     0.71
28 "int32_t for loop unroll 29"   0.10 sec   826.40 M     0.70
29 "int32_t for loop unroll 30"   0.10 sec   803.83 M     0.72
30 "int32_t for loop unroll 31"   0.10 sec   803.48 M     0.72
31 "int32_t for loop unroll 32"   0.10 sec   796.88 M     0.73

Total absolute time for int32_t for loop unrolling: 3.28 sec
...
./loop_unroll 10000  22.75s user 0.00s system 100% cpu 22.746 total

clang:
./loop_unroll 10000  12.93s user 0.00s system 100% cpu 12.933 total

icpc (5* faster than gcc-5):
./loop_unroll 10000  8.38s user 0.00s system 99% cpu 8.382 total


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]