Source code: -------------- volatile int i; int j; int testme(void) { return i <= 1; } int testme2(void) { return j <= 1; } -------------- Compiler command line: "cc -S -O torvalds.c" -------------- Expected results: volatile accesses not moved past sequence points, optimization otherwise unaffected. -------------- Observed results: redundant move to register generated, unecessarily increasing register pressure, increasing the size of the binary, and potentially consuming more power and decreasing performance. -------------- Build date and platform: August 17 2007 Ubuntu Feisty Fawn. -------------- gcc -v output: Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4) -------------- Code generated to compare volatile variable to constant: movl i, %eax cmpl $1, %eax Code generated to compare non-volatile variable to constant: cmpl $1, j The latter code sequence should be generated for the volatile case as well as the non-volatile case. Similar inefficiencies are produced in response to other uses of volatile variables.
volatile != atomic. *** This bug has been marked as a duplicate of 3506 ***
Hmmm... I wasn't asking for volatile to be atomic, just for it to avoid generating unnecessary code.
(In reply to comment #1) > volatile != atomic. And that is relevant why? Paul is perfectly aware of this, btw. There might be other reasons why GCC doesn't want to do this optimisation, but this isn't one of them. Please reopen, bugzilla won't allow me to do that myself. Segher
It is still the same issue. *** This bug has been marked as a duplicate of 3506 *** *** This bug has been marked as a duplicate of 3506 ***
> It is still the same issue. > > *** This bug has been marked as a duplicate of 3506 *** It isn't the same issue. The submitter of #3506 claimed the code that GCC currently generates is incorrect, which obviously is not the case. _This_ PR on the other hand merely asks for GCC to be enhanced to generate _better_ code for this; a wholly different thing. So do not close as a duplicate again, thank you. Segher
(In reply to comment #4) > It is still the same issue. Perhaps I am missing something, but I don't know of any hardware that would react differently to this two-instruction sequence: movl i, %eax cmpl $1, %eax than it would to the following single instruction: cmpl $1, j Either way, there is a single memory reference, and for all hardware I know of, it looks the same to both memory and external devices.
One should note this is actually hard to do without changing the code for 3506 also.
PS you should have reported this first to debian since you are using their modified version of GCC.
s/debian/Ubuntu/
Actually as I understand it, the expanded version is slightly faster under newer x86's anyways as they don't have an extra decode stage.
(In reply to comment #10) > Actually as I understand it, the expanded version is slightly faster under > newer x86's anyways as they don't have an extra decode stage. The main concern on the recent LKML thread appeared to be code size rather than speed. That said, if you are correct, it certainly wouldn't be the first time that multiple instructions turned out to be cheaper than a single instruction.
(In reply to comment #9) > s/debian/Ubuntu/ Please accept my apologies for skipping that step -- I wasn't aware of this. Should I replicate this bug at Ubuntu, or is this strictly advice for future bug submissions?
(In reply to comment #11) > The main concern on the recent LKML thread appeared to be code size rather than > speed. One should note this only helps CISC based processors, it will not help stuff like PowerPC anyways. It is better to remove volatile in 95% of the places where the kernel uses it anyways than fix this bug. (In reply to comment #12) > Please accept my apologies for skipping that step -- I wasn't aware of this. > Should I replicate this bug at Ubuntu, or is this strictly advice for future > bug submissions? It would be better next time unless you can test it on a FSF GCC source release/SVN. Thanks, Andrew Pinski
(In reply to comment #7) > One should note this is actually hard to do without changing the code for 3506 > also. And of course if the volatile variable in the 3506 example code was an MMIO register, there would not be any atomicity, at least not given the hardware I have come across. And I am not aware of any devices where it would be useful to blindly increment an MMIO register. So I believe that this is a non-issue. Or am I missing something?
(In reply to comment #13) > (In reply to comment #11) > > The main concern on the recent LKML thread appeared to be code size rather than > > speed. > One should note this only helps CISC based processors, it will not help stuff > like PowerPC anyways. It is better to remove volatile in 95% of the places > where the kernel uses it anyways than fix this bug. I agree that this change won't help PowerPC. As you say, it is primarily helpful to CISC processors (x86, x86-64, mainframe, m68000, ...). Although there do appear to be places in the kernel where volatile is overused and abused, it would still be good to fix this bug. > (In reply to comment #12) > > Please accept my apologies for skipping that step -- I wasn't aware of this. > > Should I replicate this bug at Ubuntu, or is this strictly advice for future > > bug submissions? > > It would be better next time unless you can test it on a FSF GCC source > release/SVN. Thank you for the guidance!
This is still a dup of bug 3506, we don't optimize volatile at all. *** This bug has been marked as a duplicate of 3506 ***