This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GCC FP code regressions on i386?


Hi,

one of our users has reported that a simple flops.c benchmark compiled
by GCC 3.3.3 (gcc (GCC) 3.3.3 [FreeBSD] 20031106) runs significantly
slower than the same code compiled with GCC 2.95 on the same
machine/kernel. I repeated tests myself and confirmed the numbers for
the GCC 3.3.3 snaphot which is currently in FreeBSD source tree. I then
repeated tests with recent GCC 3.3.3 snapshots, stock from CVS and using
ports. Recent snapshots are behaving much better in this respect, but
there is at least one part of the benchmark where they lose to 2.95 by
almost 60%.

-O2 optimization flags were used in all tests. Fairly old dual-cpu
p3-500 machine was used, but results should  be reproduceable on faster
machines too.

flops.c, compiled by GCC 2.95.4:

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -7.6739e-13      0.0940    148.9374
     2     -5.7021e-13      0.0527    132.8005
     3     -2.4314e-14      0.0828    205.3539
     4      6.8501e-14      0.0858    174.8089
     5     -1.6320e-14      0.1945    149.1066
     6      1.3961e-13      0.1476    196.4863
     7     -3.6152e-11      0.2080     57.6823
     8      9.0483e-15      0.1495    200.6989

   Iterations      =  256000000
   NullTime (usec) =     0.0045
   MFLOPS(1)       =   150.1427
   MFLOPS(2)       =   105.7891
   MFLOPS(3)       =   151.7372
   MFLOPS(4)       =   195.4205


flops.c, compiled by gcc (GCC) 3.3.3 [FreeBSD] 20031106. Results are
nothing but horrible here:

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      2.8422e-14      0.2022     69.2361
     2      2.5047e-13      0.1812     38.6303
     3     -7.6605e-15      0.2503     67.9062
     4      2.2771e-13      0.1026    146.1286
     5      3.8858e-14      0.2818    102.9046
     6      7.5495e-15      0.2236    129.6791
     7     -1.1369e-13      0.2753     43.5859
     8      1.2612e-13      0.2238    134.0304

   Iterations      =  128000000
   NullTime (usec) =     0.0045
   MFLOPS(1)       =    44.9683
   MFLOPS(2)       =    70.3080
   MFLOPS(3)       =    93.6022
   MFLOPS(4)       =   113.6856

flops.c, compiled by today's snapshot from CVS:

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -7.6739e-13      0.0862    162.3351
     2     -5.7021e-13      0.0781     89.5757
     3     -2.4314e-14      0.0875    194.2045
     4      6.8501e-14      0.0847    177.0627
     5     -1.6320e-14      0.1931    150.2127
     6      1.3961e-13      0.1347    215.3673
     7     -3.6152e-11      0.2142     56.0168
     8      9.0483e-15      0.1301    230.5877

   Iterations      =  256000000
   NullTime (usec) =     0.0045
   MFLOPS(1)       =   108.7258
   MFLOPS(2)       =   105.3293
   MFLOPS(3)       =   156.8997
   MFLOPS(4)       =   208.2340

flops.c, compiled from FreeBSD ports, snapshot date 2004/02/02:

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -7.6739e-13      0.0861    162.5698
     2     -5.7021e-13      0.0771     90.7767
     3     -2.4314e-14      0.0878    193.5303
     4      6.8501e-14      0.0849    176.7623
     5     -1.6320e-14      0.1931    150.2190
     6      1.3961e-13      0.1354    214.1273
     7     -3.6152e-11      0.2080     57.7011
     8      9.0483e-15      0.1314    228.3718

   Iterations      =  256000000
   NullTime (usec) =     0.0045
   MFLOPS(1)       =   109.8430
   MFLOPS(2)       =   107.1044
   MFLOPS(3)       =   157.5592
   MFLOPS(4)       =   207.0537

Note that last two runs are pretty equivalent and are on par or doing
better than 2.95 in all modules, except module #2. 

And the last one, GCC 3.4 snapshot as of week ago:

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -7.6739e-13      0.0850    164.6364
     2     -5.7021e-13      0.0805     86.9318
     3     -2.4314e-14      0.0866    196.3759
     4      6.8501e-14      0.0856    175.1892
     5     -1.6320e-14      0.1767    164.1238
     6      1.3961e-13      0.1260    230.0883
     7     -3.6152e-11      0.2039     58.8544
     8      9.0483e-15      0.1351    222.0808

   Iterations      =  256000000
   NullTime (usec) =     0.0045
   MFLOPS(1)       =   106.2996
   MFLOPS(2)       =   110.5026
   MFLOPS(3)       =   162.4135
   MFLOPS(4)       =   210.0089

Module #2 performance is still very low compared to gcc 2.95.


-- 
Alexander Kabaev


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]