This is the mail archive of the
mailing list for the GCC project.
Re: powerpc & unaligned block moves with fp registers
- To: dewar at gnat dot com
- Subject: Re: powerpc & unaligned block moves with fp registers
- From: Florian Weimer <fw at deneb dot enyo dot de>
- Date: Sat, 10 Nov 2001 17:21:28 +0100
- Cc: degger at fhm dot edu, kenner at vlsi1 dot ultra dot nyu dot edu, gcc at gcc dot gnu dot org
- References: <20011110144448.6B166F28C7@nile.gnat.com>
> Yes, it very much depends on the architecture, but your generalization is
> not accurate (and far too pessimistic) for many cases. I don't have the
> figures for latest chips in the Pentium and Athlon series, but I would
> be very surprised if the penalty is as much as a few dozen cycles (on
> earlier chips it was about one clock).
According to some (older) Intel documentation, a misaligned access
costs three cycles on the Pentium, and six to twelve cycles if it
crosses a cache line boundary on the Pentium Pro/II. On the Pentium
IV, misalined access "can incur stalls that are on the order of the
depth of the pipeline".
SSE/SSE2 instructions might even fault if the 128bit stores are not