Although the code generation for increment (++, --) through a pointer to volatile has improved greatly over the past 15 years, there is a case in which the address calculation is needlessly done separately instead of by the x86 increment instruction itself. Here is some example code: struct task { int other; int rcu_count; }; struct task *current; void rcu_read_lock() { (*(volatile int*)¤t->rcu_count)++; } As can be seen in godbolt.org (https://godbolt.org/z/fGze8E), the address calculation is split by GCC. The shorter code sequence generated by clang/LLVM is preferable. Fixing this would allow the Linux kernel to use safer code sequences for certain fastpaths, in this example, rcu_read_lock() and rcu_read_unlock() for kernels built with CONFIG_PREEMPT=y.
This manifests on GCC trunk (see the godbolt.org URL), but was first noted in gcc version 7.5.0. This is specific to x86, but might apply to any other architecture that provides increment-memory instructions. This behavior does not seem to be affected by GCC options. This can be reproduced by placing the sample code in a file "rrl.c" and running: cc -o rrl rrl.c This completes successfully with no error or warnings. Running "cc -o rrl rrl.c --save-temps" generates the following file: # 1 "rrl.c" # 1 "<built-in>" # 1 "<command-line>" # 31 "<command-line>" # 1 "/usr/include/stdc-predef.h" 1 3 4 # 32 "<command-line>" 2 # 1 "rrl.c" struct task { int other; int rcu_count; }; struct task *current; void rcu_read_lock() { (*(volatile int*)¤t->rcu_count)++; }
This has been reported many times before. volatile does not mean it will be done atomically or even without load/stores.
Dup of bug 3506. *** This bug has been marked as a duplicate of bug 3506 ***
Bug 3506 has since been fixed, at least for the example shown in this bug report, as you can see if you look at the godbolt, which shows that both compilers generate a single addl instruction, which is exactly what the submitter of 3506 requested. This bug is different, instead asking that the calculation of the address of the volatile object not be split into multiple instructions.
I don't think bug 3506 has been fixed (its status seems wrong to me). But don't worry, there are several other duplicates that still have status NEW (bug 50677 for instance). This is a sensible enhancement request, I think some gcc backends already do a similar optimization, it simply isn't a priority, because volatile almost means "don't optimize this". At least the difference between the gcc and clang codes matches those other PRs. Not sure why you are talking of address computations.
(In reply to Marc Glisse from comment #5) > I don't think bug 3506 has been fixed (its status seems wrong to me). But > don't worry, there are several other duplicates that still have status NEW > (bug 50677 for instance). > This is a sensible enhancement request, I think some gcc backends already do > a similar optimization, it simply isn't a priority, because volatile almost > means "don't optimize this". > At least the difference between the gcc and clang codes matches those other > PRs. Not sure why you are talking of address computations. Probably because I was confused by the addressing mode and by wishful thinking, and yes, you are quite correct. Anyway, if you look at https://godbolt.org/z/fGze8E, you can see that Clang/LLVM is using a to-memory addl rather than loading, adding, and storing.