This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH, ARM] Improve GCC pipeline description for Cortex-M4 FPU
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Terry Guo <Terry dot Guo at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>
- Date: Tue, 16 Apr 2013 10:55:24 +0100
- Subject: Re: [PATCH, ARM] Improve GCC pipeline description for Cortex-M4 FPU
- References: <000101ce3a87$5f69ac80$1e3d0580$ at arm dot com>
On 16/04/13 10:47, Terry Guo wrote:
This patch intends to improve cortex-m4 FPU pipeline description based on
1) The integer instructions can be pipelined with fused/chained mac
2) The two-cycle 32-bit floating point load instructions should be put
together to save one cycle. The three-cycle 64-bit fp load instructions
haven't such feature.
3) The 32-bit floating point store instructions need 1 cycle, not 2 cycles.
I use some f32 functions from CMSIS DSPLib to benchmark this patch. All of
them show performance improvement i.e. less cycles are needed to perform
Is it OK for trunk?
2013-04-16 Terry Guo <firstname.lastname@example.org>
* config/arm/cortex-m4-fpu.md (cortex_m4_v): Delete cpu unit.
Replace with ...
(cortex_m4_v_a, cortex_m4_v_b): ... new cpu units.
(cortex_m4_v, cortex_m4_exa_va, cortex_m4_exb_vb): New reservations.
(cortex_m4_fmacs): Use new reservations.
(cortex_m4_f_load, cortex_m4_f_store): Likewise.