This report is about GCC's handling of low-level pointer tricks such as the XOR linked list [1]. Uses of this sort of trick are present in legacy C code and in newly written code that push the boundaries of what is possible. If users of GCC are going to apply GCC to this sort of code, they need to know what is allowed. This relates to the discussion about pointer provenance, for which Kayvan Memarian and Peter Sewell wrote a discussion draft in 2016 [3]. Consider the three functions below: char f_ptradd(ptrdiff_t o) { g = 1; *(t + o) = 2; return g; } char f_intadd(ptrdiff_t o) { g = 1; *(char*)((uintptr_t)t + o) = 2; return g; } char f_intxor(ptrdiff_t o) { g = 1; *(char*)((uintptr_t)t ^ o) = 2; return g; } GCC 8.1 produces the following assembly for these functions [2]: f_ptradd: movb $1, g(%rip) movl $1, %eax movb $2, t(%rdi) ret f_intadd: movb $1, g(%rip) movl $1, %eax movb $2, t(%rdi) ret f_intxor: xorq $t, %rdi movb $1, g(%rip) movb $2, (%rdi) movzbl g(%rip), %eax ret The third function does exactly what a XOR linked list implementation does. Sufficiently smart inlining and constant propagation would turn a generic implementation into f_intxor. GCC 8.1 and earlier versions compile the first two functions f_ptradd and f_intadd as if they could never be invoked as in the following main: int main(void) { f_ptradd((uintptr_t) &g - (uintptr_t) t); f_intadd((uintptr_t) &g - (uintptr_t) t); f_intxor((uintptr_t) &g ^ (uintptr_t) t); } This is fair in the case of f_ptradd, which invokes undefined behavior at “t + o”. However, I would have expected f_intadd to be compiled conservatively, because it is possible to pass a value o that, added to the integer representation of (char*)t, produces the integer representation of g. Of course, it is for the GCC developers to decide exactly what is allowed and what isn't after a pointer is converted to an integer. But without an explicit explanation in GCC's documentation of what makes f_intadd and f_intxor different, we have to assume that the latter is no more supported than the former, and that programs that use XOR linked lists or similar data structures cannot be compiled by GCC. [1] https://en.wikipedia.org/wiki/XOR_linked_list [2] https://godbolt.org/g/DYCpjS [3] http://www.cl.cam.ac.uk/~pes20/cerberus/n2090.html
Please add full testcase source, the snippet is missing (at least) declarations of 'g' and 't'. The Godbolt link does not work correctly for me right now, and in general such links are not reliable long-term.
Created attachment 44223 [details] Complete source code for functions in the description
Tree optimizations already manage to avoid "optimizing" f_intadd, but unfortunately on RTL types and casts are not visible in IR and various passes make no distinction between (char*)((uintptr_t)t + o) and (t + o). Perhaps GCC should consider lowering pointer-to-integer casts to a non-transparent assignment, making the result alias all for the purposes of RTL alias analysis, akin to char __attribute__ ((noinline)) f_intadd1(ptrdiff_t o) { g = 1; uintptr_t t1 = (uintptr_t)t; asm("" : "+g"(t1)); *(char*)(t1 + o) = 2; return g; }
I am strongly in favor of Alexander Monakov's proposal.
We have a (very old) duplicate bug for the RTL issue. *** This bug has been marked as a duplicate of bug 49330 ***