Here's the original SO thread with more info and/or meandering pondering: http://stackoverflow.com/questions/38235112/why-is-a-volatile-local-variable-optimised-differently-from-a-volatile-argument g++ seems to break in a simple situation involving a function argument passed by value and declared volatile, wherein it acts differently than if such variable is declared in-body. In the former case, it elides volatile reads. #include <cstddef> void f(void *const p, std::size_t n) { unsigned char *y = static_cast<unsigned char *>(p); volatile unsigned char const x = 42; while (n--) { *y++ = x; } } void g(void *const p, std::size_t n, volatile unsigned char const x) { unsigned char *y = static_cast<unsigned char *>(p); while (n--) { *y++ = x; } } void h(void *const p, std::size_t n, volatile unsigned char const &x) { unsigned char *y = static_cast<unsigned char *>(p); while (n--) { *y++ = x; } } int main(int, char **) { int y[1000]; f(&y, sizeof y); volatile unsigned char const x{99}; g(&y, sizeof y, x); h(&y, sizeof y, x); } => ASM main: .LFB3: .cfi_startproc # f() movb $42, -1(%rsp) movl $4000, %eax .p2align 4,,10 .p2align 3 .L21: subq $1, %rax movzbl -1(%rsp), %edx jne .L21 # x = 99 movb $99, -2(%rsp) movzbl -2(%rsp), %eax # g() movl $4000, %eax .p2align 4,,10 .p2align 3 .L22: subq $1, %rax jne .L22 # h() movl $4000, %eax .p2align 4,,10 .p2align 3 .L23: subq $1, %rax movzbl -2(%rsp), %edx jne .L23 Is g() non-conforming here because it elides reads to a volatile variable? That might represent a hardware register whose polling has side-effects, etc. And either way, how is it that the loop body can be totally elided, rightly or not - but the loop itself still executes? (It's like I'm back programming waits on a CPC 464! ;-) thanks
As y is not used stores to it are eliminated in main(). I can confirm that RTL optimizers somehow remove the volatile load from x from the inline copy of g() which is because RTL expansion expands x as register copy for some reason. ;; x.11_3 ={v} x; (insn 19 18 0 (set (reg:QI 87 [ x.11_3 ]) (mem/v/c:QI (plus:DI (reg/f:DI 82 virtual-stack-vars) (const_int -2 [0xfffffffffffffffe])) [0 x+0 S1 A16])) t.C:36 -1 (nil)) ;; x ={v} x.11_3; (insn 20 19 0 (set (reg/v:QI 94 [ x ]) (reg:QI 87 [ x.11_3 ])) -1 (nil)) ;; Generating RTL for gimple basic block 5 ;; x.7_10 ={v} x; (insn 22 21 0 (set (reg:QI 90 [ x.7_10 ]) (reg/v:QI 94 [ x ])) t.C:18 -1 (nil)) ;; ivtmp_36 = ivtmp_22 + 18446744073709551615; (insn 23 22 0 (parallel [ (set (reg:DI 92 [ ivtmp_22 ]) (plus:DI (reg:DI 92 [ ivtmp_22 ]) (const_int -1 [0xffffffffffffffff]))) (clobber (reg:CC 17 flags)) ]) -1 (nil))
Thanks Richard! About this - > RTL expansion expands x as register copy for some reason - is this person's explanation about this originating in the ABI accurate? http://stackoverflow.com/a/38248847/2757035 If so - again not that I expect anyone really to use this pattern! - but from a Standard perspective, I'm interested whether it forbids such register allocation, and whether any workaround is feasible. In practical terms, a very academic exercise :-) but valuable for its implications wrt the Standard and ABI
(In reply to DB from comment #2) > Thanks Richard! About this - > > > RTL expansion expands x as register copy for some reason > > - is this person's explanation about this originating in the ABI accurate? > http://stackoverflow.com/a/38248847/2757035 > > If so - again not that I expect anyone really to use this pattern! - but > from a Standard perspective, I'm interested whether it forbids such register > allocation, and whether any workaround is feasible. In practical terms, a > very academic exercise :-) but valuable for its implications wrt the > Standard and ABI Well, if you look at the out-of-line copies of the function then he is correct. But the inline copy in main() does not have this constraint and is still mishandled. Note I didn't yet investigate closer what is going on.
(In reply to Richard Biener from comment #3) > Well, if you look at the out-of-line copies of the function then he is > correct. > But the inline copy in main() does not have this constraint and is still > mishandled. Note I didn't yet investigate closer what is going on. For the out-of-line copies, surely they are not allowed to leave a declared as volatile argument in a register and thereby break volatility? That seems contrary to the requirements of the storage class. I'd expect special handling to allocate a value on the stack prior to calling and refer to that in the function instead.
http://stackoverflow.com/questions/38235112