This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
GCC FP code regressions on i386?
- From: Alexander Kabaev <ak03 at gte dot com>
- To: gcc at gcc dot gnu dot org
- Date: Tue, 10 Feb 2004 16:11:07 -0500
- Subject: GCC FP code regressions on i386?
- Organization: Verizon Data Services
Hi,
one of our users has reported that a simple flops.c benchmark compiled
by GCC 3.3.3 (gcc (GCC) 3.3.3 [FreeBSD] 20031106) runs significantly
slower than the same code compiled with GCC 2.95 on the same
machine/kernel. I repeated tests myself and confirmed the numbers for
the GCC 3.3.3 snaphot which is currently in FreeBSD source tree. I then
repeated tests with recent GCC 3.3.3 snapshots, stock from CVS and using
ports. Recent snapshots are behaving much better in this respect, but
there is at least one part of the benchmark where they lose to 2.95 by
almost 60%.
-O2 optimization flags were used in all tests. Fairly old dual-cpu
p3-500 machine was used, but results should be reproduceable on faster
machines too.
flops.c, compiled by GCC 2.95.4:
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -7.6739e-13 0.0940 148.9374
2 -5.7021e-13 0.0527 132.8005
3 -2.4314e-14 0.0828 205.3539
4 6.8501e-14 0.0858 174.8089
5 -1.6320e-14 0.1945 149.1066
6 1.3961e-13 0.1476 196.4863
7 -3.6152e-11 0.2080 57.6823
8 9.0483e-15 0.1495 200.6989
Iterations = 256000000
NullTime (usec) = 0.0045
MFLOPS(1) = 150.1427
MFLOPS(2) = 105.7891
MFLOPS(3) = 151.7372
MFLOPS(4) = 195.4205
flops.c, compiled by gcc (GCC) 3.3.3 [FreeBSD] 20031106. Results are
nothing but horrible here:
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 2.8422e-14 0.2022 69.2361
2 2.5047e-13 0.1812 38.6303
3 -7.6605e-15 0.2503 67.9062
4 2.2771e-13 0.1026 146.1286
5 3.8858e-14 0.2818 102.9046
6 7.5495e-15 0.2236 129.6791
7 -1.1369e-13 0.2753 43.5859
8 1.2612e-13 0.2238 134.0304
Iterations = 128000000
NullTime (usec) = 0.0045
MFLOPS(1) = 44.9683
MFLOPS(2) = 70.3080
MFLOPS(3) = 93.6022
MFLOPS(4) = 113.6856
flops.c, compiled by today's snapshot from CVS:
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -7.6739e-13 0.0862 162.3351
2 -5.7021e-13 0.0781 89.5757
3 -2.4314e-14 0.0875 194.2045
4 6.8501e-14 0.0847 177.0627
5 -1.6320e-14 0.1931 150.2127
6 1.3961e-13 0.1347 215.3673
7 -3.6152e-11 0.2142 56.0168
8 9.0483e-15 0.1301 230.5877
Iterations = 256000000
NullTime (usec) = 0.0045
MFLOPS(1) = 108.7258
MFLOPS(2) = 105.3293
MFLOPS(3) = 156.8997
MFLOPS(4) = 208.2340
flops.c, compiled from FreeBSD ports, snapshot date 2004/02/02:
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -7.6739e-13 0.0861 162.5698
2 -5.7021e-13 0.0771 90.7767
3 -2.4314e-14 0.0878 193.5303
4 6.8501e-14 0.0849 176.7623
5 -1.6320e-14 0.1931 150.2190
6 1.3961e-13 0.1354 214.1273
7 -3.6152e-11 0.2080 57.7011
8 9.0483e-15 0.1314 228.3718
Iterations = 256000000
NullTime (usec) = 0.0045
MFLOPS(1) = 109.8430
MFLOPS(2) = 107.1044
MFLOPS(3) = 157.5592
MFLOPS(4) = 207.0537
Note that last two runs are pretty equivalent and are on par or doing
better than 2.95 in all modules, except module #2.
And the last one, GCC 3.4 snapshot as of week ago:
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -7.6739e-13 0.0850 164.6364
2 -5.7021e-13 0.0805 86.9318
3 -2.4314e-14 0.0866 196.3759
4 6.8501e-14 0.0856 175.1892
5 -1.6320e-14 0.1767 164.1238
6 1.3961e-13 0.1260 230.0883
7 -3.6152e-11 0.2039 58.8544
8 9.0483e-15 0.1351 222.0808
Iterations = 256000000
NullTime (usec) = 0.0045
MFLOPS(1) = 106.2996
MFLOPS(2) = 110.5026
MFLOPS(3) = 162.4135
MFLOPS(4) = 210.0089
Module #2 performance is still very low compared to gcc 2.95.
--
Alexander Kabaev