Bug 84172 - option "-O3" create slower code
Summary: option "-O3" create slower code
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.3.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-01 21:11 UTC by Andreas Otto
Modified: 2018-02-05 11:03 UTC (History)
0 users

See Also:
Host:
Target: x86_64-suse-linux
Build:
Known to work: 7.2.0
Known to fail: 5.5.0, 6.4.0
Last reconfirmed:


Attachments
output of command you requested… (1.43 KB, text/plain)
2018-02-01 21:34 UTC, Andreas Otto
Details
cat /proc/cpuinfo (798 bytes, text/plain)
2018-02-01 21:37 UTC, Andreas Otto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Otto 2018-02-01 21:11:29 UTC
here my test…

#:~/test> make test
gcc-5 -march=native -mtune=native -g -static -O0 -o test.0 main.c 
gcc-5 -march=native -mtune=native -g -static -O1 -o test.1 main.c 
gcc-5 -march=native -mtune=native -g -static -O2 -o test.2 main.c 
gcc-5 -march=native -mtune=native -g -static -O3 -o test.3 main.c 
for t in test.0 test.1 test.2 test.3; do ./$t; done
./test.0             → T1 = 673.300964 ms → HI = 0x1, LO = 0
./test.0             → T2 = 506.130981 ms → HI = 0x1, LO = 0
./test.1             → T1 = 136.671005 ms → HI = 0x1, LO = 0
./test.1             → T2 = 139.194000 ms → HI = 0x1, LO = 0
./test.2             → T1 = 139.225998 ms → HI = 0x1, LO = 0
./test.2             → T2 = 139.294998 ms → HI = 0x1, LO = 0
./test.3             → T1 = 217.908997 ms → HI = 0x1, LO = 0
./test.3             → T2 = 231.663010 ms → HI = 0x1, LO = 0

#:~/test> gcc-5 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc-5
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/5/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,fortran,ada,go --enable-checking=release --with-gxx-include-dir=/usr/include/c++/5 --enable-ssp --disable-libssp --disable-libvtv --enable-libmpx --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --with-default-libstdcxx-abi=gcc4-compatible --enable-version-specific-runtime-libs --enable-linker-build-id --enable-linux-futex --program-suffix=-5 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
gcc version 5.3.1 20160301 [gcc-5-branch revision 233849] (SUSE Linux) 
dev1usr@linux02:~/test> Selected  "main.c:40 [main]"

#my code

==========================================================================
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>

#define SIZE    100000000

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#   define  HI  1
#   define  LO  0
#else
#   define  HI  0
#   define  LO  1
#endif

int main(int argc, char *argv[])
{
    int                 i;
    clock_t             t1, t2;

    // T1
    if (1) {
        uint64_t            a = 0xffffffffffffffff;
        uint64_t            b = 0xfffffffffffffffe;
        unsigned __int128   r;
        t1 = clock();

        for (i=0; i<SIZE; i++) {
            r = ((unsigned __int128) a * (unsigned __int128) b);
            if (i%2==0) {
                a += (uint64_t) (r>>64);
                b -= (uint64_t) (r>>0);
            } else {
                a -= (uint64_t) (r>>64);
                b += (uint64_t) (r>>0);
            }
        }

        t2 = clock();
        float diff = ((float)(t2 - t1) / (float)CLOCKS_PER_SEC ) * 1000;
        printf("%-20s → T1 = %f ms → HI = %#x, LO = %#x\n", argv[0], diff, a, b);
    }

    // T2
    if (1) {
        typedef union valU {
            unsigned __int128   ui128    ;
            uint64_t            ui64[2]  ;
        } valU_t;

        uint64_t            a = 0xffffffffffffffff;
        uint64_t            b = 0xfffffffffffffffe;
        valU_t              r;

        t1 = clock();

        for (i=0; i<SIZE; i++) {
            r.ui128 = ((unsigned __int128) a * (unsigned __int128) b);
            if (i%2==0) {
                a += r.ui64[HI];
                b -= r.ui64[LO];
            } else {
                a -= r.ui64[HI];
                b += r.ui64[LO];
            }
        }

        t2 = clock();
        float diff = ((float)(t2 - t1) / (float)CLOCKS_PER_SEC ) * 1000;
        printf("%-20s → T2 = %f ms → HI = %#x, LO = %#x\n", argv[0], diff, a, b);
    }

    exit(0);
=======================================================================
Comment 1 Andreas Otto 2018-02-01 21:12:28 UTC
forget last "}"
Comment 2 Andrew Pinski 2018-02-01 21:30:51 UTC
Can you provide the output of the following command:
gcc-5 -march=native -mtune=native -g -static -O3 -o test.3 main.c -v

We need to know what -march=native expands to.
Comment 3 Andreas Otto 2018-02-01 21:32:01 UTC
send me the command that I should run…
Comment 4 Andrew Pinski 2018-02-01 21:34:06 UTC
(In reply to Andreas Otto from comment #3)
> send me the command that I should run…

I did:
(In reply to Andrew Pinski from comment #2)
> gcc-5 -march=native -mtune=native -g -static -O3 -o test.3 main.c -v
Comment 5 Andreas Otto 2018-02-01 21:34:48 UTC
Created attachment 43320 [details]
output of command you requested…

gcc-5 -march=native -mtune=native -g -static -O3 -o test.3 main.c -v
Comment 6 Andrew Pinski 2018-02-01 21:36:37 UTC
Can you also send the output of:
cat /proc/cpuinfo
?
Comment 7 Andreas Otto 2018-02-01 21:37:50 UTC
Created attachment 43321 [details]
cat /proc/cpuinfo
Comment 8 Marc Glisse 2018-02-01 22:21:30 UTC
As far as I can tell, this is already fixed in gcc-7.
Comment 9 Andreas Otto 2018-02-02 07:34:02 UTC
after morning "boot" it seems OK… BUT the bug come back

→ star without "-g" option

#:~/test> make test
for t in test.0 test.1 test.2 test.3; do ./$t; done
./test.0             → T1 = 663.640015 ms → HI = 0x1, LO = 0
./test.0             → T2 = 490.407990 ms → HI = 0x1, LO = 0
./test.1             → T1 = 137.326996 ms → HI = 0x1, LO = 0
./test.1             → T2 = 133.870010 ms → HI = 0x1, LO = 0
./test.2             → T1 = 138.035995 ms → HI = 0x1, LO = 0
./test.2             → T2 = 135.183014 ms → HI = 0x1, LO = 0
./test.3             → T1 = 137.481003 ms → HI = 0x1, LO = 0
./test.3             → T2 = 134.893997 ms → HI = 0x1, LO = 0
#:~/test> make test
for t in test.0 test.1 test.2 test.3; do ./$t; done
./test.0             → T1 = 656.669983 ms → HI = 0x1, LO = 0
./test.0             → T2 = 490.566986 ms → HI = 0x1, LO = 0
./test.1             → T1 = 134.537994 ms → HI = 0x1, LO = 0
./test.1             → T2 = 132.356003 ms → HI = 0x1, LO = 0
./test.2             → T1 = 144.015991 ms → HI = 0x1, LO = 0
./test.2             → T2 = 134.715012 ms → HI = 0x1, LO = 0
./test.3             → T1 = 137.255997 ms → HI = 0x1, LO = 0
./test.3             → T2 = 134.914001 ms → HI = 0x1, LO = 0

add "-g" option again

#:~/test> make test
gcc-5 -march=native -mtune=native -g -static -O0 -o test.0 main.c
gcc-5 -march=native -mtune=native -g -static -O1 -o test.1 main.c
gcc-5 -march=native -mtune=native -g -static -O2 -o test.2 main.c
gcc-5 -march=native -mtune=native -g -static -O3 -o test.3 main.c
for t in test.0 test.1 test.2 test.3; do ./$t; done
./test.0             → T1 = 655.403992 ms → HI = 0x1, LO = 0
./test.0             → T2 = 490.989014 ms → HI = 0x1, LO = 0
./test.1             → T1 = 132.431000 ms → HI = 0x1, LO = 0
./test.1             → T2 = 133.049988 ms → HI = 0x1, LO = 0
./test.2             → T1 = 141.020004 ms → HI = 0x1, LO = 0
./test.2             → T2 = 135.460999 ms → HI = 0x1, LO = 0
./test.3             → T1 = 211.210999 ms → HI = 0x1, LO = 0
./test.3             → T2 = 225.455002 ms → HI = 0x1, LO = 0
#:~/test> make test
for t in test.0 test.1 test.2 test.3; do ./$t; done
./test.0             → T1 = 662.700989 ms → HI = 0x1, LO = 0
./test.0             → T2 = 490.704010 ms → HI = 0x1, LO = 0
./test.1             → T1 = 146.843994 ms → HI = 0x1, LO = 0
./test.1             → T2 = 133.729996 ms → HI = 0x1, LO = 0
./test.2             → T1 = 140.351990 ms → HI = 0x1, LO = 0
./test.2             → T2 = 134.825012 ms → HI = 0x1, LO = 0
./test.3             → T1 = 213.688995 ms → HI = 0x1, LO = 0
./test.3             → T2 = 225.066010 ms → HI = 0x1, LO = 0

disable "-g" option

#:~/test> make test
gcc-5 -march=native -mtune=native -static -O0 -o test.0 main.c 
gcc-5 -march=native -mtune=native -static -O1 -o test.1 main.c 
gcc-5 -march=native -mtune=native -static -O2 -o test.2 main.c 
gcc-5 -march=native -mtune=native -static -O3 -o test.3 main.c 
for t in test.0 test.1 test.2 test.3; do ./$t; done
./test.0             → T1 = 652.817017 ms → HI = 0x1, LO = 0
./test.0             → T2 = 490.574005 ms → HI = 0x1, LO = 0
./test.1             → T1 = 139.962006 ms → HI = 0x1, LO = 0
./test.1             → T2 = 134.207001 ms → HI = 0x1, LO = 0
./test.2             → T1 = 134.936005 ms → HI = 0x1, LO = 0
./test.2             → T2 = 135.485001 ms → HI = 0x1, LO = 0
./test.3             → T1 = 217.895004 ms → HI = 0x1, LO = 0
./test.3             → T2 = 224.744003 ms → HI = 0x1, LO = 0
#:~/test> make test
for t in test.0 test.1 test.2 test.3; do ./$t; done
./test.0             → T1 = 660.490967 ms → HI = 0x1, LO = 0
./test.0             → T2 = 490.671997 ms → HI = 0x1, LO = 0
./test.1             → T1 = 141.137009 ms → HI = 0x1, LO = 0
./test.1             → T2 = 133.236008 ms → HI = 0x1, LO = 0
./test.2             → T1 = 136.444000 ms → HI = 0x1, LO = 0
./test.2             → T2 = 135.473999 ms → HI = 0x1, LO = 0
./test.3             → T1 = 256.563019 ms → HI = 0x1, LO = 0
./test.3             → T2 = 225.742996 ms → HI = 0x1, LO = 0

do som additional testing

#:~/test> rm test.
test.0  test.1  test.2  test.3  
#:~/test> rm test.*
#:~/test> make test
gcc-5 -march=native -mtune=native -static -O0 -o test.0 main.c 
gcc-5 -march=native -mtune=native -static -O1 -o test.1 main.c 
gcc-5 -march=native -mtune=native -static -O2 -o test.2 main.c 
gcc-5 -march=native -mtune=native -static -O3 -o test.3 main.c 
for t in test.0 test.1 test.2 test.3; do ./$t; done
./test.0             → T1 = 652.343994 ms → HI = 0x1, LO = 0
./test.0             → T2 = 490.720978 ms → HI = 0x1, LO = 0
./test.1             → T1 = 132.760010 ms → HI = 0x1, LO = 0
./test.1             → T2 = 132.718994 ms → HI = 0x1, LO = 0
./test.2             → T1 = 135.055008 ms → HI = 0x1, LO = 0
./test.2             → T2 = 135.212997 ms → HI = 0x1, LO = 0
./test.3             → T1 = 221.292007 ms → HI = 0x1, LO = 0
./test.3             → T2 = 225.391006 ms → HI = 0x1, LO = 0
Comment 10 Richard Biener 2018-02-05 11:03:15 UTC
Indeed fixed in GCC 7.