This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: powerpc & unaligned block moves with fp registers



----- Original Message -----
From: <dewar@gnat.com>
To: <degger@fhm.edu>; <kenner@vlsi1.ultra.nyu.edu>
Cc: <gcc@gcc.gnu.org>
Sent: Saturday, November 10, 2001 6:44 AM
Subject: Re: powerpc & unaligned block moves with fp registers


> <<Slow in the case of misaligned accesses depends on the system; if the
> hardware handles it by splitting up the accesses then it's likely to be
> in the range of a few dotzend up to a few hundred cycles. If the acesses
> emerge into the OS because the hardware cannot handle it then
> the overhead is more likely to be in the range from a few hundred up to
> several thousand cycles. It's really hard to give accurate numbers here
> since it depends very much on the CPU and in the latter case also on the
> >>
>
> This is too pessimistic. For example, on Power, the penalty for a
misligned
> access is far less than this.
>
> Yes, it very much depends on the architecture, but your generalization is
> not accurate (and far too pessimistic) for many cases. I don't have the
> figures for latest chips in the Pentium and Athlon series, but I would
> be very surprised if the penalty is as much as a few dozen cycles (on
> earlier chips it was about one clock).
I don't have the figures either, but typical memory-intensive benchmarks
using 64-bit data on P4 take 60% longer with the standard alignment
specified in coff-i386.c, as compared to when the
DEFAULT_SECTION_ALIGNMENT_POWER is increased.  The penalty occurs only on
the memory access which straddles cache boundaries (cache line split), so it
is huge when it occurs.  On P-III and early Athlons, the penalty is about
30% on the same tests, with smaller cache lines; on Athlon 1800+, about 50%.
This looks like more than a few dozen cycles, unless you take an average, in
which case it is only a "handful," to be even less precise.  On Itanium, a
mis-aligned access, if processed by a trap handler, takes 1000's of cycles.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]