Bug 54888 - GCC with -Os is faster than -O3 on some AVR code
Summary: GCC with -Os is faster than -O3 on some AVR code
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.3
: P4 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-10 14:50 UTC by mojo
Modified: 2013-02-02 14:21 UTC (History)
0 users

See Also:
Host:
Target: avr
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-10-21 00:00:00


Attachments
Compiler output (16.79 KB, text/plain)
2012-10-10 14:50 UTC, mojo
Details
Compiler output with -O3 (16.79 KB, text/plain)
2012-10-10 14:51 UTC, mojo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mojo 2012-10-10 14:50:26 UTC
Created attachment 28411 [details]
Compiler output

I am using AVR-GCC to write some very low power RTC related code. The interrupt "ISR(RTC_OVF_vect)" executes faster with -Os optimization than it does with -O1, -O2 or -O3. I have measured execution time on an oscilloscope to confirm.

V4.3.3 is the one that comes with Atmel Studio / WinAVR. Command line:

avr-gcc -funsigned-char -funsigned-bitfields -DF_CPU=8000000UL  -O3 -fpack-struct -fshort-enums -g2 -Wall -c -std=gnu99 -MD -MP -MF "rtc.d" -MT"rtc.d" -MT"rtc.o"  -mmcu=atxmega128d3   -o"rtc.o" ".././rtc.c"

I don't get any warnings etc. when compiling. Build machine is Windows 7 x64. Target is an XMEGA128D3, same issue confirmed with the 128A3U (unsurprisingly).

The problem appears to be with GCC, rather than avr-libc, but please correct me if I am wrong.
Comment 1 mojo 2012-10-10 14:51:26 UTC
Created attachment 28412 [details]
Compiler output with -O3
Comment 2 Georg-Johann Lay 2012-10-21 20:40:45 UTC
atxmega128d3 is not supported by avr-gcc, neither in 4.3 nor in 4.4 nor 4.5 nor 4.6.  Please report to the respective bug tracker, obviously some private toolchain port.

Compiling with a current compiler that supports ATxmega128D3 (4.7 or newer) stops compilation with errors.

And I actually don't understand teh issue: Optimizing for size does not require to produce slow code.  The code may run fast.

If your program relies on slow executable code, the program is incorrect.
Comment 3 mojo 2012-10-22 12:40:57 UTC
(In reply to comment #2)

> And I actually don't understand teh issue: Optimizing for size does not require
> to produce slow code.  The code may run fast.

-O3 is supposed to produce the fastest possible code, but it doesn't. -Os is faster. At the very least the two should be equal.

In other words -O3 is broken.
Comment 4 Richard Biener 2012-10-22 12:56:23 UTC
(In reply to comment #3)
> (In reply to comment #2)
> 
> > And I actually don't understand teh issue: Optimizing for size does not require
> > to produce slow code.  The code may run fast.
> 
> -O3 is supposed to produce the fastest possible code, but it doesn't. -Os is
> faster. At the very least the two should be equal.

Supposed to?  Where in the documentation is that specified?  I remember
a sentence that -O3 enables optimization that might not always be
profitable (but that sentence seems to be gone from latest docs).

> In other words -O3 is broken.

It's behavior is certainly undesirable, but broken?  For certain targets
-Os might be a win because that's what it is tuned for or icache behavior
is simply more important than anything else.
Comment 5 Georg-Johann Lay 2012-10-22 20:10:44 UTC
As a start, you could try to enable us to reproduce your problem.

First of all, it is clear that we don't have your hardware (oscilloscope) to measure things and even if, it is very unlikely someone will start research to find out exactly were you lost the ticks.

Second, notice that it is ulikely anybody is inclined to pick up buch of code you dumped above. It's 3800 lines and around 30 functions.  And it fails to compile.  Maybe you can be more descriptive and point out what /exactly/ goes wrong and work out a small example and limit to a critical spot or function and throw away unneeded stuff.

Third, please notice that 4.3 is no more supported since several years now.  Please supply code that compiles with a supported version of the compiler which implies at least 4.7 (because you use -mmcu=atxmega128d3).

Fourth, you use inline assembler that is not correct because of missing memory barrier and might show malfunction in corner cases.

Thus, you may want to fix at least 3. and 4. and rerun your benchmarks to see if the problem still exists.  Very likely, that is not the case.
Comment 6 Georg-Johann Lay 2013-02-02 14:21:27 UTC
Closed as invalid. No answer and no valid test case for over 3 months now.