This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: powerpc & unaligned block moves with fp registers

To: <dewar at gnat dot com>, <degger at fhm dot edu>, <kenner at vlsi1 dot ultra dot nyu dot edu>
Subject: Re: powerpc & unaligned block moves with fp registers
From: "Tim Prince" <tprince at computer dot org>
Date: Sat, 10 Nov 2001 07:28:34 -0800
Cc: <gcc at gcc dot gnu dot org>
References: <20011110144448.6B166F28C7@nile.gnat.com>


----- Original Message -----
From: <dewar@gnat.com>
To: <degger@fhm.edu>; <kenner@vlsi1.ultra.nyu.edu>
Cc: <gcc@gcc.gnu.org>
Sent: Saturday, November 10, 2001 6:44 AM
Subject: Re: powerpc & unaligned block moves with fp registers


> <<Slow in the case of misaligned accesses depends on the system; if the
> hardware handles it by splitting up the accesses then it's likely to be
> in the range of a few dotzend up to a few hundred cycles. If the acesses
> emerge into the OS because the hardware cannot handle it then
> the overhead is more likely to be in the range from a few hundred up to
> several thousand cycles. It's really hard to give accurate numbers here
> since it depends very much on the CPU and in the latter case also on the
> >>
>
> This is too pessimistic. For example, on Power, the penalty for a
misligned
> access is far less than this.
>
> Yes, it very much depends on the architecture, but your generalization is
> not accurate (and far too pessimistic) for many cases. I don't have the
> figures for latest chips in the Pentium and Athlon series, but I would
> be very surprised if the penalty is as much as a few dozen cycles (on
> earlier chips it was about one clock).
I don't have the figures either, but typical memory-intensive benchmarks
using 64-bit data on P4 take 60% longer with the standard alignment
specified in coff-i386.c, as compared to when the
DEFAULT_SECTION_ALIGNMENT_POWER is increased.  The penalty occurs only on
the memory access which straddles cache boundaries (cache line split), so it
is huge when it occurs.  On P-III and early Athlons, the penalty is about
30% on the same tests, with smaller cache lines; on Athlon 1800+, about 50%.
This looks like more than a few dozen cycles, unless you take an average, in
which case it is only a "handful," to be even less precise.  On Itanium, a
mis-aligned access, if processed by a trap handler, takes 1000's of cycles.

References:
- Re: powerpc & unaligned block moves with fp registers
  - From: dewar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]