Bug 86026 - Document and/or change allowed operations on integer representation of pointers
Summary: Document and/or change allowed operations on integer representation of pointers
Status: RESOLVED DUPLICATE of bug 49330
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 8.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: documentation
Depends on:
Blocks:
 
Reported: 2018-06-01 12:34 UTC by Pascal Cuoq
Modified: 2024-01-25 23:15 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Complete source code for functions in the description (237 bytes, text/plain)
2018-06-01 14:13 UTC, Pascal Cuoq
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pascal Cuoq 2018-06-01 12:34:10 UTC
This report is about GCC's handling of low-level pointer tricks such as the XOR linked list [1]. Uses of this sort of trick are present in legacy C code and in newly written code that push the boundaries of what is possible. If users of GCC are going to apply GCC to this sort of code, they need to know what is allowed. This relates to the discussion about pointer provenance, for which Kayvan Memarian and Peter Sewell wrote a discussion draft in 2016 [3].

Consider the three functions below:

char f_ptradd(ptrdiff_t o) {
  g = 1;
  *(t + o) = 2;
  return g;
}

char f_intadd(ptrdiff_t o) {
  g = 1;
  *(char*)((uintptr_t)t + o) = 2;
  return g;
}

char f_intxor(ptrdiff_t o) {
  g = 1;
  *(char*)((uintptr_t)t ^ o) = 2;
  return g;
}

GCC 8.1 produces the following assembly for these functions [2]:

f_ptradd:
  movb $1, g(%rip)
  movl $1, %eax
  movb $2, t(%rdi)
  ret
f_intadd:
  movb $1, g(%rip)
  movl $1, %eax
  movb $2, t(%rdi)
  ret
f_intxor:
  xorq $t, %rdi
  movb $1, g(%rip)
  movb $2, (%rdi)
  movzbl g(%rip), %eax
  ret

The third function does exactly what a XOR linked list implementation does. Sufficiently smart inlining and constant propagation would turn a generic implementation into f_intxor.

GCC 8.1 and earlier versions compile the first two functions f_ptradd and f_intadd as if they could never be invoked as in the following main:

int main(void) {
  f_ptradd((uintptr_t) &g - (uintptr_t) t);
  f_intadd((uintptr_t) &g - (uintptr_t) t);
  f_intxor((uintptr_t) &g ^ (uintptr_t) t);
}

This is fair in the case of f_ptradd, which invokes undefined behavior at “t + o”. However, I would have expected f_intadd to be compiled conservatively, because it is possible to pass a value o that, added to the integer representation of (char*)t, produces the integer representation of g.


Of course, it is for the GCC developers to decide exactly what is allowed and what isn't after a pointer is converted to an integer. But without an explicit explanation in GCC's documentation of what makes f_intadd and f_intxor different, we have to assume that the latter is no more supported than the former, and that programs that use XOR linked lists or similar data structures cannot be compiled by GCC.


[1] https://en.wikipedia.org/wiki/XOR_linked_list
[2] https://godbolt.org/g/DYCpjS
[3] http://www.cl.cam.ac.uk/~pes20/cerberus/n2090.html
Comment 1 Alexander Monakov 2018-06-01 13:06:27 UTC
Please add full testcase source, the snippet is missing (at least) declarations of 'g' and 't'. The Godbolt link does not work correctly for me right now, and in general such links are not reliable long-term.
Comment 2 Pascal Cuoq 2018-06-01 14:13:43 UTC
Created attachment 44223 [details]
Complete source code for functions in the description
Comment 3 Alexander Monakov 2018-06-01 15:43:37 UTC
Tree optimizations already manage to avoid "optimizing" f_intadd, but unfortunately on RTL types and casts are not visible in IR and various passes make no distinction between (char*)((uintptr_t)t + o) and (t + o).

Perhaps GCC should consider lowering pointer-to-integer casts to a non-transparent assignment, making the result alias all for the purposes of RTL alias analysis, akin to

char __attribute__ ((noinline)) f_intadd1(ptrdiff_t o) {
  g = 1;
  uintptr_t t1 = (uintptr_t)t;
  asm("" : "+g"(t1));
  *(char*)(t1 + o) = 2;
  return g;
}
Comment 4 Rich Felker 2018-06-02 02:43:31 UTC
I am strongly in favor of Alexander Monakov's proposal.
Comment 5 Richard Biener 2018-06-04 06:59:59 UTC
We have a (very old) duplicate bug for the RTL issue.

*** This bug has been marked as a duplicate of bug 49330 ***