Reproducer: //func.cpp extern long long var_3; extern short var_8; extern int var_17; extern short arr_165[]; long c(long e, long f) { return f ? e : f; } void test() { for (int b = 0; b < 19; b = var_17) for (int d = int(~c(-2147483647 - 1, var_3)) - 2147483647; d < 22; d++) arr_165[d] = var_8; } //driver.cpp #include <stdio.h> long long int var_3 = -166416893043554447LL; short var_8 = (short)27092; unsigned int var_17 = 75036300U; short arr_165[23]; void test(); int main() { for (size_t i_3 = 0; i_3 < 23; ++i_3) arr_165[i_3] = (short)-8885; test(); printf("%d\n", arr_165[0]); } >$ g++ -O3 func.cpp driver.cpp && ./a.out Segmentation fault (core dumped) >$ g++ -O2 func.cpp driver.cpp && ./a.out 27092 gcc version 11.0.0 20201126 (beb9afcaf1466996a301c778596c5df209e7913c)
Started with r8-2380-g2d7744d4ef93bfff.
The following fails with -O2 -ftree-vectorize long long int var_3 = -166416893043554447LL; short var_8 = (short)27092; unsigned int var_17 = 75036300U; short arr_165[23]; static long c(long e, long f) { return f ? e : f; } void __attribute((noipa)) test() { for (int b = 0; b < 19; b = var_17) for (int d = (int)(~c(-2147483647 - 1, var_3)) - 2147483647; d < 22; d++) arr_165[d] = var_8; } int main() { for (unsigned i_3 = 0; i_3 < 23; ++i_3) arr_165[i_3] = (short)-8885; test(); if (arr_165[0] != 27092) __builtin_abort (); return 0; }
So before vectorization we have var_3.0_1 = var_3; iftmp.3_10 = var_3.0_1 != 0 ? -2147483648 : 0; _2 = (int) iftmp.3_10; d_11 = -2147483648 - _2; which should compute to 0. Now somewhere (split_constant_offset I'd bet) things go wrong and we produce in dataref analysis base_address: &arr_165 offset from base address: (ssizetype) ((sizetype) -_2 * 2) constant offset from base address: -4294967296 step: 2 base alignment: 32 base misalignment: 0 offset alignment: 2 step alignment: 2 base_object: arr_165 Access function 0: {d_11, +, 1}_2 from { ((sizetype) d_21) * 2, +, 2 }_2 as offset IV which split_constant_offset makes into { (sizetype) -_2 * 2, +, 2 }_2 + -4294967296. The logic is that we can dive into the (sizetype) extension from int since the operation on int cannot overflow (undefined behavior). We then strip the -2147483648 constant, thinking we can associate this as (sizetype)(0 - _2) + -2147483648 but of course 0 - _2 now overflows and we happily fold it to -_2. So here: static bool split_constant_offset_1 (tree type, tree op0, enum tree_code code, tree op1, tree *var, tree *off, hash_map<tree, std::pair<tree, tree> > &cache, unsigned *limit) { ... case PLUS_EXPR: case MINUS_EXPR: if (TREE_CODE (op1) == INTEGER_CST) { split_constant_offset (op0, &var0, &off0, cache, limit); *var = var0; *off = size_binop (ocode, off0, fold_convert (ssizetype, op1)); return true; } split_constant_offset (op0, &var0, &off0, cache, limit); split_constant_offset (op1, &var1, &off1, cache, limit); *var = fold_build2 (code, type, var0, var1); we have to make sure that the expression we fold cannot overflow or use unsigned arithmetic. In fact, using unsigned arithmetic looks like the only reasonable thing since splitting out any constant is not trivially valid. We're doing (var0 + off0) +- (var1 + off1) -> (var0 +- var1) + (off0 + off1) where we know the left side doesn't overflow. This makes associating across the multiplication invalid where we do ((sizetype) (A +- B)) * 2 -> ((sizetype) A) * 2 +- B * 2 and thus from 0 * 2 we go to -2147483648 * 2 + -4294967296. Using unsigned arithmetic to make the inner op no longer undefined doesn't help here. So it looks like we cannot use TYPE_OVERFLOW_UNDEFINED to validate looking across widening conversions. Which points to the same area as PR97960.
The following helps: diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 3bf460cccfd..a7606b3db99 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -754,9 +759,10 @@ split_constant_offset_1 (tree type, tree op0, enum tree_code code, tree op1, && TYPE_PRECISION (type) >= TYPE_PRECISION (itype) && (POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type))) { - if (INTEGRAL_TYPE_P (itype) && TYPE_OVERFLOW_WRAPS (itype) + if (INTEGRAL_TYPE_P (itype) && (TYPE_PRECISION (type) > TYPE_PRECISION (itype) - || TYPE_UNSIGNED (itype) != TYPE_UNSIGNED (type))) + || (TYPE_UNSIGNED (itype) != TYPE_UNSIGNED (type) + && TYPE_OVERFLOW_WRAPS (itype)))) { /* Split the unconverted operand and try to prove that wrapping isn't a problem. */ but I fear this will have quite some negative impact on dependence analysis. I'm still not sure I have the issue nailed down properly :/
The proposed fix for PR98137 might also expose more cases like this (which maybe is a good thing for coverage).
This is now also fixed, since g:4cf70c20cb10acd6fb10166 presumably. Assigning to Richard in case he's planning backports.
As discussed on irc, the fix was quite invasive, so it seems a bit dangerous to backport further than GCC 10. Will backport to GCC 10 in the GCC 11.2 timeframe, once we've had more chance to see if there's any fallout.
*** Bug 97960 has been marked as a duplicate of this bug. ***
*** Bug 95396 has been marked as a duplicate of this bug. ***
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Fixed in GCC 11.