This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: 100% speed improvement 3.4 -> 3.5 for ia64!! (?)
On Wed, 14 Apr 2004, Kaveh R. Ghazi wrote:
> > Anyone has an idea what improved so much for ia64 compared to 3.4?
> > Only flags I'm using are -O2 -funroll-loops -ffast-math, numerics are
> > mostly FP multiplication/addition. Do I need to be concerned and
> > investigate results more closely?
>
> I'm curious, how does 3.4/3.5 compare when you remove -ffast-math?
>
> A lot of -ffast-math changes went in and I don't recall how many made
> it into 3.4. So I'd like to know what the difference is (if any)
> between those versions when using slow (?) math.
Part of the "problem" of 3.4 seems to be uncovered by doing a profiled
run, where it shows:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
9.95 3.74 3.74 __divdf3
4.28 5.34 1.61 10 0.16 0.16 void LoopApplyEvaluator::evaluate<ApplyMul
while 3.5 has:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
4.37 0.89 0.89 10 0.09 0.09 MultiArgKernel<MultiArg3<F
i.e. __divdf3 seems to be inlined in 3.5 but not in 3.4. Does this also
inhibit constant folding or builtin optimizations of divisions? Or does
__divdf3 enter the stages only just before assembling? Inlining these
does not cause any increase in stripped binary size, 3.4 is 4433152 bytes
and 3.5 4341032 bytes.
Anyway, disabling -ffast-math the results are 8.0s/it for 3.4 and 4.2s/it
for 3.5. So maybe we could backport this inlining of ia64 FP division to
3.4.1.
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/