This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Modifying ARM code generator for elimination of 8bit writes - need help
- From: Wolfgang MÃes <wolfgang at iksw-muees dot de>
- To: Richard Earnshaw <rearnsha at arm dot com>
- Cc: Rask Ingemann Lambertsen <rask at sygehus dot dk>, Paul Brook <paul at codesourcery dot com>, gcc at gcc dot gnu dot org
- Date: Mon, 5 Jun 2006 13:29:43 +0200
- Subject: Re: Modifying ARM code generator for elimination of 8bit writes - need help
- References: <200605282223.33002.wolfgang@iksw-muees.de> <20060604210104.GB10795@sygehus.dk> <1149502000.3857.5.camel@pc960.cambridge.arm.com>
Richard,
On Monday 05 June 2006 12:06, Richard Earnshaw wrote:
> I'm confident right now that these will be too invasive to include in
> mainline.
As said before, this is OK for me.
> The changes that tend to get incorporated into the compiler are to
> work around bugs in the CPU, not bugs in some H/W developer's use of
> the CPU. The former affect all users of the processor, the latter
> only that one case.
>
> If we started putting in hacks for the latter the compiler back-ends
> would become unmaintainable in almost no time at all.
Agreed.
> PS. Using swp is a bad idea IMO, this instruction is *very* slow on
> some CPU implementations because of the way it interacts with caches.
Yes, swp forces a cache load. But in this particular case, forcing a
cache load is the ONLY way to circumvent the hardware problem.
If there is a block write, cache loads are forced only each 32 byte
accesses.
Other possible solutions:
a) code a 16bit read-modify-write. This will also cause a cache load,
and will need much more code, because it will have to look at the
LSB of the address to know where to insert the byte into the word.
b) use the protection unit and make a data abort for a write to that
memory region. This has the advantage of affecting ONLY the critical
memory region (not all the other ones), but the disadvantages are big:
all memory writes are affected, and a data abort handler is very slow.
This solution was implemented before, it was 100 times slower than
native access. Unusable.
regards
Wolfgang
--
We're back to the times when men were men
and wrote their own device drivers.
(Linus Torvalds)