[PATCH 0/3] Power10 PCREL_OPT support

Bill Schmidt wschmidt@linux.ibm.com
Sun Aug 23 00:05:51 GMT 2020


On 8/20/20 6:33 PM, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote:
>
>> In order to do this, the pass that converts the load address and load/store
>> must occur late in the compilation cycle.
> That does not follow afaics.
>
Let me see if I can help explain this.

I think the issue is that this optimization creates a dependency that 
isn't directly represented in RTL.  We either have to figure out how to 
represent it, or we have to do this very late to avoid problems.

Suppose we are at a point where hard registers have been assigned, and 
the RTL looks like:

     addi  r5,r3,4
     sldi  r6,r5,2
     pld  r10,symbol@got@pcrel
     lwz  r5,0(r10)

Everything is fine for the optimization to take place, since the two 
instructions are adjacent and therefore we can't have any problems with 
r10 being redefined in between, or r5 being used. So we stick on the 
relocation telling the linker to change this if resolved during static 
link time to:

     addi  r5,r3,4
     sldi  r6,r5,2
     plwz  r5,symbol@pcrel
     nop

Now, suppose after we insert the relocation we get a reordering of 
instructions such as

     addi  r5,r3,4
     pld  r10,symbol@got@pcrel
     sldi  r6,r5,2
     lwz  r5,0(r10)

When the linker performs the replacement, we will now end up with

     addi  r5,r3,4
     plwz  r5,symbol@pcrel
     sldi  r6,r5,2
     nop

which has altered the semantics of the program.

What is necessary in order to allow this optimization to occur earlier 
is to make this hidden dependency explicit.  When the relocation is 
inserted, we have to change the "pld" instruction to have a specific 
clobber of (in this case) r5, which represents what will happen if the 
linker makes the substitution.

I agree that it's too fragile to force this to be the last pass, so I 
think if Mike can look into introducing a clobber of the hard register 
when performing the optimization, that would at least allow us to move 
this anywhere after reload.

I don't immediately see a solution that works prior to register 
allocation because we basically are representing two potential starting 
points of a live range, only one of which will survive in the final 
code.  That is too ugly a problem to hand to the register allocator.

Thanks,
Bill



More information about the Gcc-patches mailing list