[Bug target/113686] New: [RISC-V] TLS (Local Exec) relaxation on structures (LE)
hpa at zytor dot com
gcc-bugzilla@gcc.gnu.org
Wed Jan 31 18:47:56 GMT 2024
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113686
Bug ID: 113686
Summary: [RISC-V] TLS (Local Exec) relaxation on structures
(LE)
Product: gcc
Version: 13.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hpa at zytor dot com
Target Milestone: ---
When the Local Exec TLS model is in use, gcc generates inefficient code for
accessing the member of a structure:
struct foobar {
int alpha;
int beta;
};
_Thread_local struct foobar foo;
void func(int bar)
{
foo.beta = bar;
}
# Version 1
lui a1,%tprel_hi(foo)
add a1,a1,tp,%tprel_add(foo)
addi a1,a1,%tprel_lo(foo)
sw a0,4(a1)
However, in this case it could be generated as:
# Version 2
lui a1,%tprel_hi(sym+4)
addi a1,a1,tp,%tprel_add(sym+4)
sw a0,%tprel_lo(sym+4)(a1)
... which, if %tprel_hi(sym+4) == 0, as it often is for small embedded
software, the linker can relax to a simple (tp) reference:
# Version 2a (post-relaxation with small .tbss)
sw a0,%tprel_lo(sym+4)(tp)
The linker will *not* relax version 1 all the way; leaving an unnecessary mv:
# Version 1a (post-relaxation with small .tbss)
mv a1,tp
sw a0,%tprel_lo(sym+4)(tp)
It is of course trickier for the case of multiple subsequent references to the
structure if the structure is not aligned, as gcc can't know a priori where the
4K breaks are[*]. The version 1 code is more efficient in that case (3
instructions + 1 instruction/field as opposed to 3 instructions/field.)
However, if the structure *is* aligned, gcc will still not optimize 1 into 2.
There are at least a few options I see:
1. gcc option: gcc can generate version 2 code for a single field reference, or
if the alignment is such that all fields are guaranteed to fall inside the same
4K window.
2. gcc and optional ABI option: introduce a "TLS TE-tiny" model for deep
embedded use, where the combined size of the TSS area is limited to 4K
equivalent to the way direct gp references [or zero, if the global pointer is
0] work. Thus, direct (tp) references can be used.
NOTE: With the current binutils, this will error unless .option norelax is in
effect. It might be desirable to instead have a new relocation type, which
would require binutils support. Alternatively, ld should recognize that the TLS
offset is within +/- 2K and suppress the warning in that case (since at that
point the address is available the the linker.)
The linker could be further optimized by allowing the TLS to offset; presumably
equivalently to the __global_pointer$ symbol.
3. binutils option: teach ld to relax these kinds of chained pointer
references.
[*] Rant: in my opinion, the lui/auipc instructions are fundamentally
misdesigned by not having an overlap bit to guarantee a sizable window.
More information about the Gcc-bugs
mailing list