33102 – volatile excessively suppresses optimizations in range checks

Bug 33102 - volatile excessively suppresses optimizations in range checks

Summary: volatile excessively suppresses optimizations in range checks

Status:	RESOLVED DUPLICATE of bug 3506

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.1.2

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-08-17 23:52 UTC by Paul E. McKenney
Modified:	2024-02-18 18:29 UTC (History)
CC List:	6 users (show)

See Also:	113980
Host:	i486-linux-gnu
Target:	i486-linux-gnu
Build:	i486-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Paul E. McKenney 2007-08-17 23:52:56 UTC

Source code:
--------------
volatile int i;
int j;

int testme(void)
{
        return i <= 1;
}

int testme2(void)
{
        return j <= 1;
}
--------------
Compiler command line: "cc -S -O torvalds.c"
--------------
Expected results: volatile accesses not moved past sequence points, optimization otherwise unaffected.
--------------
Observed results: redundant move to register generated, unecessarily increasing register pressure, increasing the size of the binary, and potentially consuming more power and decreasing performance.
--------------
Build date and platform: August 17 2007 Ubuntu Feisty Fawn.
--------------
gcc -v output:
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
--------------
Code generated to compare volatile variable to constant:
        movl    i, %eax
        cmpl    $1, %eax
Code generated to compare non-volatile variable to constant:
        cmpl    $1, j
The latter code sequence should be generated for the volatile case as well as the non-volatile case.  Similar inefficiencies are produced in response to other uses of volatile variables.

Comment 1 Andrew Pinski 2007-08-18 00:05:45 UTC

volatile != atomic.

*** This bug has been marked as a duplicate of 3506 ***

Comment 2 Paul E. McKenney 2007-08-18 00:11:25 UTC

Hmmm...  I wasn't asking for volatile to be atomic, just for it to avoid generating unnecessary code.

Comment 3 Segher Boessenkool 2007-08-18 00:12:36 UTC

(In reply to comment #1)
> volatile != atomic.

And that is relevant why?  Paul is perfectly aware of this, btw.

There might be other reasons why GCC doesn't want to do this
optimisation, but this isn't one of them.  Please reopen, bugzilla
won't allow me to do that myself.


Segher

Comment 4 Andrew Pinski 2007-08-18 00:12:50 UTC

It is still the same issue.

*** This bug has been marked as a duplicate of 3506 ***

*** This bug has been marked as a duplicate of 3506 ***

Comment 5 Segher Boessenkool 2007-08-18 00:31:49 UTC

> It is still the same issue.
> 
> *** This bug has been marked as a duplicate of 3506 ***

It isn't the same issue.  The submitter of #3506 claimed the code
that GCC currently generates is incorrect, which obviously is not
the case.  _This_ PR on the other hand merely asks for GCC to
be enhanced to generate _better_ code for this; a wholly different
thing.  So do not close as a duplicate again, thank you.


Segher

Comment 6 Paul E. McKenney 2007-08-18 01:04:25 UTC

(In reply to comment #4)
> It is still the same issue.

Perhaps I am missing something, but I don't know of any hardware that would react differently to this two-instruction sequence:

        movl    i, %eax
        cmpl    $1, %eax

than it would to the following single instruction:

        cmpl    $1, j

Either way, there is a single memory reference, and for all hardware I know of, it looks the same to both memory and external devices.

Comment 7 Andrew Pinski 2007-08-18 01:10:08 UTC

One should note this is actually hard to do without changing the code for 3506 also.

Comment 8 Andrew Pinski 2007-08-18 01:11:17 UTC

PS you should have reported this first to debian since you are using their modified version of GCC.

Comment 9 Andrew Pinski 2007-08-18 01:12:15 UTC

s/debian/Ubuntu/

Comment 10 Andrew Pinski 2007-08-18 01:13:21 UTC

Actually as I understand it, the expanded version is slightly faster under newer x86's anyways as they don't have an extra decode stage.

Comment 11 Paul E. McKenney 2007-08-18 01:21:25 UTC

(In reply to comment #10)
> Actually as I understand it, the expanded version is slightly faster under
> newer x86's anyways as they don't have an extra decode stage.

The main concern on the recent LKML thread appeared to be code size rather than speed.

That said, if you are correct, it certainly wouldn't be the first time that multiple instructions turned out to be cheaper than a single instruction.

Comment 12 Paul E. McKenney 2007-08-18 01:23:13 UTC

(In reply to comment #9)
> s/debian/Ubuntu/

Please accept my apologies for skipping that step -- I wasn't aware of this.  Should I replicate this bug at Ubuntu, or is this strictly advice for future bug submissions?

Comment 13 Andrew Pinski 2007-08-18 01:25:59 UTC

(In reply to comment #11)
> The main concern on the recent LKML thread appeared to be code size rather than
> speed.
One should note this only helps CISC based processors, it will not help stuff like PowerPC anyways.  It is better to remove volatile in 95% of the places where the kernel uses it anyways than fix this bug.

(In reply to comment #12)
> Please accept my apologies for skipping that step -- I wasn't aware of this. 
> Should I replicate this bug at Ubuntu, or is this strictly advice for future
> bug submissions?

It would be better next time unless you can test it on a FSF GCC source release/SVN.

Thanks,
Andrew Pinski

Comment 14 Paul E. McKenney 2007-08-18 22:08:52 UTC

(In reply to comment #7)
> One should note this is actually hard to do without changing the code for 3506
> also.

And of course if the volatile variable in the 3506 example code was an MMIO register, there would not be any atomicity, at least not given the hardware I have come across.  And I am not aware of any devices where it would be useful to blindly increment an MMIO register.

So I believe that this is a non-issue.  Or am I missing something?

Comment 15 Paul E. McKenney 2007-08-18 22:12:25 UTC

(In reply to comment #13)
> (In reply to comment #11)
> > The main concern on the recent LKML thread appeared to be code size rather than
> > speed.
> One should note this only helps CISC based processors, it will not help stuff
> like PowerPC anyways.  It is better to remove volatile in 95% of the places
> where the kernel uses it anyways than fix this bug.

I agree that this change won't help PowerPC.  As you say, it is primarily helpful to CISC processors (x86, x86-64, mainframe, m68000, ...).  Although there do appear to be places in the kernel where volatile is overused and abused, it would still be good to fix this bug.

> (In reply to comment #12)
> > Please accept my apologies for skipping that step -- I wasn't aware of this. 
> > Should I replicate this bug at Ubuntu, or is this strictly advice for future
> > bug submissions?
> 
> It would be better next time unless you can test it on a FSF GCC source
> release/SVN.

Thank you for the guidance!

Comment 16 Andrew Pinski 2008-12-28 02:48:11 UTC

This is still a dup of bug 3506, we don't optimize volatile at all.

*** This bug has been marked as a duplicate of 3506 ***