Reproducible with gcc commit 1b5510a59163. I'm writing this up as a result of the following linux kernel discussion: https://lore.kernel.org/bpf/c9923c1d-971d-4022-8dc8-1364e929d34c@gmail.com/ https://lore.kernel.org/bpf/20240320015515.11883-1-iii@linux.ibm.com/ In the following code: extern const char bpf_plt[]; extern const char bpf_plt_ret[]; extern const char bpf_plt_target[]; static void bpf_jit_plt(void *plt, void *ret, void *target) { memcpy(plt, bpf_plt, BPF_PLT_SIZE); *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret; *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret; } GCC 11's sched1 pass reorders memcpy() and assignments. In GCC 12 this behavior is gone after commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc Author: Aldy Hernandez <aldyh@redhat.com> Date: Tue Jun 15 12:32:51 2021 +0200 Backwards jump threader rewrite with ranger. but this seems to be accidental. Internally, output_dependence() for the respective mems returns false, because it believes that they are based on different SYMBOL_REFs. This may be because on the C level we are not allowed to subtract pointers to different objects. However, a possible solution to this should be casting pointers to longs, since C pointer subtraction rules would no longer apply, but in practice this does nothing. In the attached minimized preprocessed source with long casts we get: stg %r3,232(%r2,%r15) ltgr %r11,%r11 locgrne %r3,%r11 stg %r3,232(%r1,%r15) la %r2,0(%r1,%r9) la %r3,232(%r1,%r15) mvc 232(16,%r15),0(%r5) mvc 248(16,%r15),16(%r5) lghi %r4,8 brasl %r14,s390_kernel_write@PLT so the assignments are placed before the memcpy().
Created attachment 57744 [details] preprocessed source
Created attachment 57745 [details] cc1 invocation
Sounds similar to PR113255. Yes, the code is undefined and you should cast the pointers to uintptr_t but for the RTL problem that won't help. Note the increment of 'plt' is similarly invalid so that has to be uintptr_t as well. It might be that different (no) REG_POINTER marking will then avoid the latent PR113255 issue from triggering. It might be that the s390 cpymem expander does some bogus REG_POINTER marking as well (or that it lacks marking, causing the heuristics to go wrong). You can check whether backporting the PR113255 fix avoids the issue, but I do not consider backporting the revs suitable.
Thanks, cherry-picking https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a98d5130a6dcff2ed4db371e500550134777b8cf helped both with the minimized testcase and the actual kernel bug. What you describe there - reassociation causing a wrong base term to be selected - matches what I've seen during debugging as well. So let's close this as a duplicate.
Duplicate then. *** This bug has been marked as a duplicate of bug 49330 ***