This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: 19980707 built on win95/i686-pc-cygwin32
- To: law at cygnus dot com
- Subject: Re: 19980707 built on win95/i686-pc-cygwin32
- From: N8TM at aol dot com
- Date: Sun, 12 Jul 1998 03:25:02 EDT
- Cc: amylaar at cygnus dot co dot uk, egcs at cygnus dot com
In a message dated 7/11/98 11:04:08 PM Pacific Daylight Time,
law@hurl.cygnus.com writes:
> Any chance you could analyze this code in more detail? I'm quite
> interested in cases where gcse makes code slower.
Thanks for the suggestion. The differences in performance turn out to be
confined to small parts of my benchmark codes. In the Livermore Kernels
double precision, -fno-gcse makes a significant difference in just 2 of the 24
kernel tests. By significant I mean a difference greater than the
"experimental timing error" assessed by the benchmark code. I ran these tests
on Pentium II 233 Mhz, with -funroll-loops -malign-double -march=pentiumpro
-O2, with binutils-2.9.1 installed with the p2align hooks. I quote numbers
from win95/cygwin32 first, then mention the comparison with Linux. win95
timings were done with sys_clock() rewritten with QueryPerformance WinAPI
calls; Linux with cpu_time() from libU77.
Kernel 9 performance at vector length 101 drops from 58 to 52 Mflops with
gcse. At vector length 15, it is 57 Mflops either way. This is definitely
abnormal, for performance to drop with increasing vector length, and, to me,
this would indicate an increase in cache miss rate.
Kernel 16 drops from 39 to 34 Mflops with gcse, at all vector lengths
(15,40,75). I don't see anything to tell whether this is a code size or a
jump target alignment effect. It could easily be the latter.
Kernels 9, 11, and 16 are the only ones where Linux (i686-pc-linux-gnulibc1)
performance is significantly better than win95/cygwin32. Linux does not
exhibit any reduced performance with increased vector length. Again, I think
this supports the supposition of an adverse cache effect under win95.
Maybe tomorrow I will get a chance to look at the .s code for these cases. I
would look for a correlation with code size or order of data access. I don't
know that I'm likely to see anything, or to figure out how to see the results
obtained by p2align.
I do see consistent increases in run time for number-crunching codes going
from Linux to win95. On real codes, some of that evidently is in the slowness
of disk file access under win95.
Let's see if I can paste in source code for Kernels 9 and 16:
C***********************************************************************
C*** KERNEL 9 INTEGRATE PREDICTORS
C***********************************************************************
C
C
do k= 1,n
px(1,k)= dm28*px(13,k)+dm27*px(12,k)+dm26*px(11,k)+dm25*px(1
&0,k)+dm24*px(9,k)+dm23*px(8,k)+dm22*px(7,k)+c0*(px(5,k)+px(6,k))+p
&x(3,k)
enddo
C***********************************************************************
C*** KERNEL 16 MONTE CARLO SEARCH LOOP
C***********************************************************************
C
do m= 1,zone(1)
j2= (n+n)*(m-1)+1
do k= 1,n
k2= k2+1
j4= j2+k+k
j5= zone(j4)
if(j5 >= n)then
if(j5 == n)then
exit
endif
k3= k3+1
if(d(j5) < d(j5-1)*(t-d(j5-2))**2+(s-d(j5-3))**2+
&(r-d(j5-4))**2)then
goto200
endif
if(d(j5) == d(j5-1)*(t-d(j5-2))**2+(s-d(j5-3))**2+
&(r-d(j5-4))**2)then
exit
endif
else
if(j5-n+lb < 0)then
if(plan(j5) < t)then
goto200
endif
if(plan(j5) == t)then
exit
endif
else
if(j5-n+ii < 0)then
if(plan(j5) < s)then
goto200
endif
if(plan(j5) == s)then
exit
endif
else
if(plan(j5) < r)then
goto200
endif
if(plan(j5) == r)then
exit
endif
endif
endif
endif
if(zone(j4-1) <= 0)then
goto200
endif
enddo
exit
200 if(zone(j4-1) == 0)then
exit
endif
enddo