This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH/RFA] SH TLS support


Hi,

Joern Rennecke <joern.rennecke@superh.com> wrote:
> I'd like to see us move away from putting random tiny blobs of data into
> the text, rather than adding more.  Putting a few data bytes here and there
> pollutes both data and instruction cache.
> I think you should be able to represent the constants as V2SI data, which
> can stay there during reload, and then should be put into a constant pool
> by machine_dependent_reorg.  Or if the two values may be placed independently,
> you can just have them as two SImode values.

I could add an explanation for that implementation which looks oddly
inefficient. Sorry for this.
I once implemented a similer TLS code generation as you say. The problem
is that we need linker optimization like:

	mov.l   .Ln, r4         ->      mov.l   .Ln, r0
	mova    .Lf, r0         ->      stc     gbr, r4
	mov.l   .Lf, r1         ->      mov.l   @(r0,r12), r0
	add     r0, r1          ->      add     r4, r0
	jsr     @r1             ->      nop
	add     r12, r4         ->      nop
        ...
.Ln:    .long   x@TLSGD         ->      .long   x@GOTTPOFF
.Lf:    .long   __tls_get_addr@PLT

(an example GD -> IE transition) and we have to find instructions from
the constant pools containing TLS relocation or special function symbol
__tls_get_addr.
That implementation used a heuristic algorithm in assembler and many new
reloc types to mark instructions found as the TLS code. But it was very
fragile from compiler optimizations and many marker relocation types were
ugly and seemed to waste ELF relocation numbers. So I changed to use a
fixed instruction sequences for global and local dynamic cases according
to Uli's suggestion. It makes TLS stuff in assembler and linker very
simple and stable and removes the marker relocations.

> Moreover, do you actually need the value in r0 for anything but calculating
> the jsr destination?  SH2 and later have the bsr instruction, so if we can
> just add the appropriate negative offset to the value at 1f, we get a much
> shorter sequence:
> 
> 	mov.l L2,r1
> 	mov.l L1,r4
> 	bsr @r1
> 	add r12,r4
> L0:
> ....
> (in constant pool)
> L1:	.long	%a1@TLSGD	
> L2:	.long	__tls_get_addr@PLT+(L2-L0)
> 
> This should actually be a define_insn_and_split, with the splitting
> happening after reload (but before sched2 & machine_dependent_reorg),
> to each of the load separately.  The bsr, the add and the label shown
> as L0 here best stay together as one nominal instruction, so that we
> don't get any problems with instruction movement & duplication.
> The label can be implicit in an unspec value we use for the
> __tls_get_addr@PLT+(L2-L0) expression, and machine_dependent_reorg
> will have to make it explicit when it creates the constant pool entry,
> or - simpler, but more fragile, but that should really not be disturbed
> after machine_dependent_reorg - represent that label by a reference to the
> bsr/add/label instruction.

I don't use the current infrastructure for PIC for global and local
dynamic cases by the above reason.
There might be optimizations for the proposed implementation. Changing
jsr to bsrf would be a one of them, though it requires changes of linker
optimizations also. And some linker optimizations for TLS code will be
not easy if the original code sequence is too smart and short :-)
I think that the proposed implementation is not so bad as a starting
point, anyway.

Regards,
	kaz


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]