[PATCH 0/3] Power10 PCREL_OPT support

Segher Boessenkool segher@kernel.crashing.org
Thu Aug 20 23:33:29 GMT 2020


Hi!

On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote:
> Currently on power10, the compiler compiles this as:
> 
> 	ret_var:
> 	        pld 9,ext_variable@got@pcrel
> 		lwa 3,0(9)
> 	        blr
> 
> 	store_var:
> 		pld 9,ext_variable@got@pcrel
> 		stw 3,0(9)
> 		blr
> 
> That is, it loads up the address of 'ext_variable' from the GOT table into
> register r9, and then uses r9 as a base register to reference the actual
> variable.
> 
> The linker does optimize the case where you are compiling the main program, and
> the variable is also defined in the main program to be:
> 
> 	ret_var:
> 		pla	9,ext_variable,1
> 		lwa	3,0(9)
> 		blr
> 
> 	store_var:
> 		pla	9,ext_variable,1
> 		stw	3,0(9)
> 		blr

Those "pla" insns are invalid; please correct them?  (You mixed "pla"
and "paddi" syntax I think.)

> These patches generate:
> 
> 	ret_var:
> 	        pld	9,ext_variable@got@pcrel
> 	.Lpcrel1:
> 		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
> 	        lwa	3,0(9)
> 		blr
> 
> 	store_var:
> 	        pld	9,ext_variable@got@pcrel
> 	.Lpcrel2:
> 		.reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
> 	        stw	3,0(9)
> 		blr
> 
> Note, the label for locating the PLD occurs after the PLD and not before it.
> This is so that if the assembler adds a NOP in front of the PLD to align it,
> the relocations will still work.
> 
> If the linker can, it will convert the code into:
> 
> 	ret_var:
> 		plwa	3,ext_variable,1
> 		nop
> 		blr
> 
> 	store_var:
> 		pstw	3,ext_variable,1
> 		nop
> 		blr

Those "plwa" and "pstw" are invalid syntax as well (should have "(0)"
after the symbol name).

> These patches allow the load of the address to not be physically adjacent to
> the actual load or store, which should allow for better code.

Why is that?  That is not what it does anyway?  /confused

> In order to do this, the pass that converts the load address and load/store
> must occur late in the compilation cycle.

That does not follow afaics.

> In particular, the second scheduler
> pass will duplicate and optimize some of the references and it will produce an
> invalid program.  In the past, Segher has said that we should be able to move
> it earlier.

I said that you shouldn't require this to be the very last pass.  There
is no reason for that, and that will not scale (what if a second pass
shows up that also requires this!)

It also makes it impossible to do normal late optimisations on code
produced here (optimisations like peephole, cprop_hardreg, dce).

I also said that you should use the DF framework, not parse all RTL by
hand and getting it all wrong, as *everyone* does: this stuff is hard.


Segher


More information about the Gcc-patches mailing list