[Patch, libgfortran] PR21468 Vectorizing matmul, other perf improvements

Mon Nov 14 21:19:00 GMT 2005

Janne Blomqvist wrote:
> On Sun, Nov 13, 2005 at 08:48:55PM +0100, Thomas Koenig wrote:
> 

> I updated the benchmark I posted yesterday slightly and did some
> measurements on an 1.8 GHz A64 (i686-pc-linux-gnu, sorry no 64-bit
> results). It seems that while -funroll-loops improves performance
> compared to the baseline, it doesn't really improve when combined with
> vectorizing. Perhaps it's different on x86-64, where there is twice
> the number of SSE2 registers. If that's the case, I propose that we
> enable -funroll-loops, as it increases performance for "bare" x86
> (which doesn't vectorize as sse2 isn't used by default). As for
> -frename-registers mentioned by Tim Prince, it is enabled by default
> if -funroll-loops is used.
ifort unrolls vectorized code as agressively for 32-bit platforms as for 
64-bit.  Extra unrolling is likely to be beneficial only for loops of 
length greater than 100 or so, or maybe slightly shorter, with correct 
alignment when loop length is exactly a multiple of 8, when there are 
cache misses to resolve.  It doesn't depend on the larger number of 
programmable registers in 64-bit mode, as it can benefit from hardware 
renaming.  It may not be as useful for current mobile processors, or 
maybe not for slower AMD processors, possibly on account of less effect 
of cache misses.