[PATCH 0/3] Power10 PCREL_OPT support
Segher Boessenkool
segher@kernel.crashing.org
Thu Aug 20 23:33:29 GMT 2020
Hi!
On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote:
> Currently on power10, the compiler compiles this as:
>
> ret_var:
> pld 9,ext_variable@got@pcrel
> lwa 3,0(9)
> blr
>
> store_var:
> pld 9,ext_variable@got@pcrel
> stw 3,0(9)
> blr
>
> That is, it loads up the address of 'ext_variable' from the GOT table into
> register r9, and then uses r9 as a base register to reference the actual
> variable.
>
> The linker does optimize the case where you are compiling the main program, and
> the variable is also defined in the main program to be:
>
> ret_var:
> pla 9,ext_variable,1
> lwa 3,0(9)
> blr
>
> store_var:
> pla 9,ext_variable,1
> stw 3,0(9)
> blr
Those "pla" insns are invalid; please correct them? (You mixed "pla"
and "paddi" syntax I think.)
> These patches generate:
>
> ret_var:
> pld 9,ext_variable@got@pcrel
> .Lpcrel1:
> .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
> lwa 3,0(9)
> blr
>
> store_var:
> pld 9,ext_variable@got@pcrel
> .Lpcrel2:
> .reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
> stw 3,0(9)
> blr
>
> Note, the label for locating the PLD occurs after the PLD and not before it.
> This is so that if the assembler adds a NOP in front of the PLD to align it,
> the relocations will still work.
>
> If the linker can, it will convert the code into:
>
> ret_var:
> plwa 3,ext_variable,1
> nop
> blr
>
> store_var:
> pstw 3,ext_variable,1
> nop
> blr
Those "plwa" and "pstw" are invalid syntax as well (should have "(0)"
after the symbol name).
> These patches allow the load of the address to not be physically adjacent to
> the actual load or store, which should allow for better code.
Why is that? That is not what it does anyway? /confused
> In order to do this, the pass that converts the load address and load/store
> must occur late in the compilation cycle.
That does not follow afaics.
> In particular, the second scheduler
> pass will duplicate and optimize some of the references and it will produce an
> invalid program. In the past, Segher has said that we should be able to move
> it earlier.
I said that you shouldn't require this to be the very last pass. There
is no reason for that, and that will not scale (what if a second pass
shows up that also requires this!)
It also makes it impossible to do normal late optimisations on code
produced here (optimisations like peephole, cprop_hardreg, dce).
I also said that you should use the DF framework, not parse all RTL by
hand and getting it all wrong, as *everyone* does: this stuff is hard.
Segher
More information about the Gcc-patches
mailing list