91837 – Wrong code with -ftree-loop-vectorize and -march=skylake-avx512 on some Intel machines

Bug 91837 - Wrong code with -ftree-loop-vectorize and -march=skylake-avx512 on some Intel machines

Summary: Wrong code with -ftree-loop-vectorize and -march=skylake-avx512 on some Intel...

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	9.2.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2019-09-20 12:19 UTC by Daniel Cooke
Modified:	2019-09-23 10:51 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	x86_64--, i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Daniel Cooke 2019-09-20 12:19:57 UTC

See my question on StackOverflow: https://stackoverflow.com/questions/58026153/why-does-march-native-corrupt-my-program?noredirect=1#comment102457501_58026153

Comment 1 Richard Biener 2019-09-20 12:30:27 UTC

For reference:

#include <iostream>
#include <vector>
#include <cstddef>
#include <algorithm>

struct Model
{
    int open, extend;
};

struct Cell
{
    int a, b;
};

typedef std::vector<std::vector<Cell>> DPMatrix;

void print(const DPMatrix& matrix)
{
    for (std::size_t i = 0; i < matrix.size(); ++i) {
        for (std::size_t j = 0; j < matrix[i].size(); ++j) {
            std::cout << '{' << matrix[i][j].a << ' ' << matrix[i][j].b << "} ";
        }
        std::cout << std::endl;
    }
}

DPMatrix init_dp_matrix(const std::size_t num_cols, const std::size_t num_rows, const Model& model)
{
    DPMatrix result(num_cols, DPMatrix::value_type(num_rows, Cell()));
    const int inf = model.open * std::max(num_cols, num_rows);
    for (int i = 1; i < num_cols; ++i) {
        result[i][0].b = model.open + (i - 1) * model.extend;
    }
    for (int j = 1; j < num_rows; ++j) {
        result[0][j].a = model.open + (j - 1) * model.extend;
    }
    return result;
}

int main()
{
    const Model model = {-8, -1};
    const DPMatrix matrix = init_dp_matrix(10, 2, model);
    print(matrix);
}

Comment 2 Richard Biener 2019-09-20 12:40:56 UTC

gcc-9-branch revision 275330 works fine for me, likewise the GCC 9.2 release.
Note I have to use -mprefer-vector-width=512 to get AVX512 instructions used.

> g++-9 t.C -O3 -march=native -mprefer-vector-width=512 -fopt-info-vec
t.C:35:23: optimized: loop vectorized using 64 byte vectors
t.C:32:23: optimized: loop vectorized using 64 byte vectors
/usr/include/c++/9/bits/stl_algobase.h:759:13: optimized: loop vectorized using 64 byte vectors

vs.

> g++-9 t.C -O3 -march=native -fopt-info-vec
t.C:35:23: optimized: loop vectorized using 32 byte vectors
t.C:32:23: optimized: loop vectorized using 32 byte vectors
/usr/include/c++/9/bits/stl_algobase.h:759:13: optimized: loop vectorized using 32 byte vectors

so, not confirmed.

Comment 3 Daniel Cooke 2019-09-20 20:14:53 UTC

I tried replicating the issue on a CentOS machine (AWS EC2) with the exact same CPU and got the correct output. However, I just fired up a fresh AWS EC2 instance running Ubuntu and I can replicate the bug again. Is it possible that the bug is CPU *and* OS specific?

Comment 4 H.J. Lu 2019-09-20 21:41:10 UTC

(In reply to Daniel Cooke from comment #3)
> I tried replicating the issue on a CentOS machine (AWS EC2) with the exact
> same CPU and got the correct output. However, I just fired up a fresh AWS
> EC2 instance running Ubuntu and I can replicate the bug again. Is it
> possible that the bug is CPU *and* OS specific?

You may run into

https://sourceware.org/bugzilla/show_bug.cgi?id=23465

Comment 5 Daniel Cooke 2019-09-21 11:28:30 UTC

(In reply to H.J. Lu from comment #4)
> (In reply to Daniel Cooke from comment #3)
> > I tried replicating the issue on a CentOS machine (AWS EC2) with the exact
> > same CPU and got the correct output. However, I just fired up a fresh AWS
> > EC2 instance running Ubuntu and I can replicate the bug again. Is it
> > possible that the bug is CPU *and* OS specific?
> 
> You may run into
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=23465

It does seem like this could be the issue. I upgraded my binutils to 2.32 (from 2.30) and the problem disappears. I'll do some more testing and report back.

Comment 6 Daniel Cooke 2019-09-23 10:49:39 UTC

Can confirm I'm no longer having any issues after upgrading binutils to 2.32.

Comment 7 Richard Biener 2019-09-23 10:51:46 UTC

Not a gcc bug.