This is the mail archive of the
mailing list for the GCC project.
Re: Comment on closed PR target/9757: Gcc should use swp instruction in ARM targets
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Arpad Beszedes <beszedes at cc dot u-szeged dot hu>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>, Richard Earnshaw <rearnsha at arm dot com>
- Date: Thu, 27 Feb 2003 15:11:55 +0000
- Subject: Re: Comment on closed PR target/9757: Gcc should use swp instruction in ARM targets
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> Dear maintainers,
> I would like to argue on the closed PR target/9757: Gcc should use swp
> instruction in ARM targets
> It was closed because of the following:
> 1) It's very slow on some processors, since it forces an external
> bus access even if the data is already in the cache.
> My argue: if we optimize for size, speed is a trade-off. We could use a
> dedicated switch such as -fuse-swp or as part of the -Os option.
It still doesn't help if it's not safe. See below.
> 2) It's behaviour is not defined if access is made to a MMU managed
> page that is non-cacheable/bufferable.
> However, -mcpu and -mtune can be used to specify the ARM processor. If
> that processor hasn't got a MMU, there is no such problem and this could
> also be used to enable or disable the generation of swp.
Knowing the CPU type doesn't mean that you know enough about the memory
system to safely use the instruction at any arbitrary address.
Anyway, that's not what -mcpu and -mtune mean. -mcpu is purely a synonym
for -march=<arch-of-cpu> -mtune=<cpu>
The architecture is purely a list of those instructions which may legally
be used; since we don't know enough about the memory system from the
architecture we can't include swp in the list. The tuning affects which
instructions we select from within the list for best performance (and
should probably be ignored when optimizing for space). Further, the
archictecture information is considered by the compiler to be a set of
strict super-sets (so as you select higher architecture variants the
number of instructions available increases). If you select a suitably
low-numbered architecture then your code will run on any processor
supporting that architecture or later. As noted, this would not be
possible if SWP/SWPB were to be used.
The best way of supporting this, if it is really wanted, is to create two
new compiler builtins, __builtin_arm_swp() and builtin_arm_swpb() which
expand to swap and swpb instructions. Then a user can use these when
SWP's semantics are really what is wanted.