[Patch, libgfortran] PR21468 Vectorizing matmul, other perf improvements
Tim Prince
tprince@myrealbox.com
Mon Nov 14 21:19:00 GMT 2005
Janne Blomqvist wrote:
> On Sun, Nov 13, 2005 at 08:48:55PM +0100, Thomas Koenig wrote:
>
> I updated the benchmark I posted yesterday slightly and did some
> measurements on an 1.8 GHz A64 (i686-pc-linux-gnu, sorry no 64-bit
> results). It seems that while -funroll-loops improves performance
> compared to the baseline, it doesn't really improve when combined with
> vectorizing. Perhaps it's different on x86-64, where there is twice
> the number of SSE2 registers. If that's the case, I propose that we
> enable -funroll-loops, as it increases performance for "bare" x86
> (which doesn't vectorize as sse2 isn't used by default). As for
> -frename-registers mentioned by Tim Prince, it is enabled by default
> if -funroll-loops is used.
ifort unrolls vectorized code as agressively for 32-bit platforms as for
64-bit. Extra unrolling is likely to be beneficial only for loops of
length greater than 100 or so, or maybe slightly shorter, with correct
alignment when loop length is exactly a multiple of 8, when there are
cache misses to resolve. It doesn't depend on the larger number of
programmable registers in 64-bit mode, as it can benefit from hardware
renaming. It may not be as useful for current mobile processors, or
maybe not for slower AMD processors, possibly on account of less effect
of cache misses.
More information about the Gcc-patches
mailing list