[PATCH, rs6000] Improve power8 fusion peepholes
Fri Sep 19 15:42:00 GMT 2014
On Wed, Sep 17, 2014 at 12:47 PM, Michael Meissner
> This patch is an intermediate step of what I want to do to improve power8
> In the current trunk, the fusion support for gpr loads is done by a peephole2
> to find the addis followed by the load instruction where the only consumer of
> the addis instruction is the load, and it rewrites the addis to use the
> register that will be loaded, and emits the two separate instructions. There
> is then a normal peephole that recognizes the addis/load combination, and makes
> sure they are emited together, along with a comment, to make tracking of the
> fusion attempts easier. The problem is things like the second scheduler pass
> will move things around, and often times move the addis away from the load.
> This means the normal peephole pass won't see the two instructions.
> This patch creates a new insn that combines the two parts, so that the
> scheduler2 pass won't split up the two insns. In doing static analysis, a lot
> more fused pairs are generated. For instance, 400.perlbench generates more
> than 11,300 more load fusion with these patches, 403.gcc generates 23,000 more
> load fusions, and 416.gamess generates 39,000 more load fusions.
> However, when spec 2006 is run on a power8, you don't actually see much of a
> performance difference with these patches. In digging into it, the main place
> where fusion occurs is in referencing static/global variables. The spec 2006
> suite does not tend to have that much static/global data, so the linker
> optimizes most of the addis instructions to be nops, and the load index
> register is adjusted to use r2. These optimizations should help much larger
> code bases that do have a lot more static/global data.
> I've done bootstraps on both a big endian power7 and a little endian power8
> with no regressions. Are these patches ok to install in the trunk, and the
> 4.8/4.9 branches?
> 2014-09-16 Michael Meissner <email@example.com>
> * config/rs6000/predicates.md (fusion_gpr_mem_load): Move testing
> for base_reg_operand to be common between LO_SUM and PLUS.
> (fusion_gpr_mem_combo): New predicate to match a fused address
> that combines the addis and memory offset address.
> * config/rs6000/rs6000-protos.h (fusion_gpr_load_p): Change
> calling signature.
> (emit_fusion_gpr_load): Likewise.
> * config/rs6000/rs6000.c (fusion_gpr_load_p): Change calling
> signature to pass each argument separately, rather than
> using an operands array. Rewrite the insns found by peephole2 to
> be a single insn, rather than hoping the insns will still be
> together when the peephole pass is done. Drop being called via a
> normal peephole.
> (emit_fusion_gpr_load): Change calling signature to be called from
> the fusion_gpr_load_<mode> insns with a combined memory address
> instead of the peephole pass passing the addis and offset
> * config/rs6000/rs6000.md (UNSPEC_FUSION_GPR): New unspec for GPR
> (power8 fusion peephole): Drop support for doing power8 via a
> normal peephole that was created by the peephole2 pass.
> (power8 fusion peephole2): Create a new insn with the fused
> address, so that the fused operation is kept together after
> register allocation is done.
> (fusion_gpr_load_<mode>): Likewise.
More information about the Gcc-patches