This is the mail archive of the
mailing list for the GCC project.
RE: Register Pressure guided Unroll and Jam in GCC !!
- From: Bingfeng Mei <bmei at broadcom dot com>
- To: Vladimir Makarov <vmakarov at redhat dot com>, Ajit Kumar Agarwal <ajit dot kumar dot agarwal at xilinx dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Cc: Michael Eager <eager at eagercon dot com>, Vinod Kathail <vinodk at xilinx dot com>, Shail Aditya Gupta <shailadi at xilinx dot com>, Vidhumouli Hunsigida <vidhum at xilinx dot com>, Nagaraju Mekala <nmekala at xilinx dot com>
- Date: Tue, 17 Jun 2014 09:07:35 +0000
- Subject: RE: Register Pressure guided Unroll and Jam in GCC !!
- Authentication-results: sourceware.org; auth=none
- References: <ba4e6d53-1ceb-481b-b420-bb8847ca0a1b at BL2FFO11FD007 dot protection dot gbl> <539F3955 dot 4000703 at redhat dot com>
That is true. Early estimation of register pressure should be improved. Right now I am looking at an example IVOPTS produces too many induction variables and causes a lot of register spilling. Though ivopts pass called estimate_reg_pressure_cost function, results are not even close to real situation.
From: email@example.com [mailto:firstname.lastname@example.org] On Behalf Of Vladimir Makarov
Sent: 16 June 2014 19:37
To: Ajit Kumar Agarwal; email@example.com
Cc: Michael Eager; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: Register Pressure guided Unroll and Jam in GCC !!
On 2014-06-16, 10:14 AM, Ajit Kumar Agarwal wrote:
> Hello All:
> I have worked on the Open64 compiler where the Register Pressure Guided Unroll and Jam gave a good amount of performance improvement for the C and C++ Spec Benchmark and also Fortran benchmarks.
> The Unroll and Jam increases the register pressure in the Unrolled Loop leading to increase in the Spill and Fetch degrading the performance of the Unrolled Loop. The Performance of Cache locality achieved through Unroll and Jam is degraded with the presence of Spilling instruction due to increases in register pressure Its better to do the decision of Unrolled Factor of the Loop based on the Performance model of the register pressure.
> Most of the Loop Optimization Like Unroll and Jam is implemented in the High Level IR. The register pressure based Unroll and Jam requires the calculation of register pressure in the High Level IR which will be similar to register pressure we calculate on Register Allocation. This makes the implementation complex.
> To overcome this, the Open64 compiler does the decision of Unrolling to both High Level IR and also at the Code Generation Level. Some of the decisions way at the end of the Code Generation . The advantage of using this approach like Open64 helps in using the register pressure information calculated by the Register Allocator. This helps the implementation much simpler and less complex.
> Can we have this approach in GCC of the Decisions of Unroll and Jam in the High Level IR and also to defer some of the decision at the Code Generation Level like Open64?
> Please let me know what do you think.
Most loop optimizations are a good target for register pressure
sensitive algorithms as loops are usually program hot spots and any
pressure decrease there would be harmful as any RA can not undo such
So I guess your proposal could work. Right now we have only
pressure-sensitive modulo scheduling (SMS) and loop-invariant motion (as
I remember switching from loop-invariant motion based on some very
inaccurate register-pressure evaluation to one based on RA pressure
evaluation gave a nice improvement about 1% for SPECFP2000 on some