Bug 96415 - GCC produces incorrect code for loops with -O3 for skylake-avx512 and icelake-server
Summary: GCC produces incorrect code for loops with -O3 for skylake-avx512 and icelake...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2020-08-02 00:30 UTC by Vsevolod Livinskii
Modified: 2021-11-01 23:07 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vsevolod Livinskii 2020-08-02 00:30:36 UTC
Error:
>$ g++ -O0 driver.cpp func.cpp && ./a.out
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 
>$ g++ -O3 driver.cpp func.cpp -march=skylake-avx512 && sde -skx -- ./a.out
1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 

Reproducer:
//driver.cpp 
#include <stdio.h>

unsigned short var_0 = 14;
unsigned int arr_10 [16];
void test(unsigned short var_0);

int main() {
    test(var_0);
    for (int i = 0; i < 16; ++i) 
        printf("%u ", arr_10[i]);
    printf("\n");
}

//func.cpp
extern int arr_10[16];
void test(unsigned short a) {
    for (unsigned e = 0; e < 16; e += 4)
        for (char f = 0; f < 6; f += 4)
            for (unsigned g = 0; g < a + 1; g++)
                arr_10[g] = 1;
}

GCC version:
11.0.0 (3a4a92598014d33ef2c8b8ec38d8ad917812921a)
I also applied the fix for bug #95396
Comment 1 Jakub Jelinek 2020-08-03 11:55:00 UTC
Can't reproduce:
for i in .c -2.c; do /opt/notnfs/gcc-bisect/obj/gcc/cc1plus.r11-2464 -quiet -O3 -march=skylake-avx512 pr96415$i; done; g++ -o pr96415{,.s,-2.s}; ./pr96415
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 
Though, without -mprefer-vector-width=512 it doesn't even generate any post-AVX2 insns, and with that can't reproduce either.
Maybe sde is buggy?
Comment 2 Vsevolod Livinskii 2020-08-03 20:02:48 UTC
You are right, it is a sde bug. It works on real hardware and a new version of sde. Sorry to bother you and thanks for looking into this!