Hi, I was suspecting that vect_prune_runtime_alias_check_list is broken, now I can create a test case for it: #include <stdio.h> int arr[2048]; int foo (int *a, int *b) { int i; int *a1 = a; int *a0 = a1 - 512; for (i = 0; i < 500; i++) { *b = *a0 + *a1; b++; a0--; a1--; } return 0; } int main (void) { int *a = &arr[1027]; int *b = &arr[1024]; int i; for (i = 0; i < 2048; i++) arr[i] = i; foo (a, b); for (i = 0; i < 2048; i++) fprintf (stdout, "%d: %d\n", i, arr[i]); return 0; } Compiled on x86_64 with below two commands: Binary 1) $ ../gcc -O2 -fno-inline ../x.c -o x.exe $ ./x.exe >/tmp/x Binary 2) ../gcc -Ofast -fno-inline ../x.c -o y.exe $ ./y.exe >/tmp/y We have: $ diff /tmp/x /tmp/y 1027,1028c1027,1028 < 1026: 2053 < 1027: 2054 --- > 1026: 1538 > 1027: 1536 I think the root cause is in below code: /* Generally the new segment length is the maximum of the left segment size and the right segment size plus the distance. ??? We can also build tree MAX_EXPR here but it's not clear this is profitable. */ else if (tree_fits_uhwi_p (dr_a1->seg_len) && tree_fits_uhwi_p (dr_a2->seg_len)) { unsigned HOST_WIDE_INT seg_len_a1 = tree_to_uhwi (dr_a1->seg_len); unsigned HOST_WIDE_INT seg_len_a2 = tree_to_uhwi (dr_a2->seg_len); dr_a1->seg_len = size_int (MAX (seg_len_a1, diff + seg_len_a2)); do_remove = true; } Here dr_a1 is *a0, dr_a2 is *a1 and dr_b1/dr_b2 is *b, and their ranges is like --------------------------------------------------------------> a0_en a0 a1_end b a1 b_end It should merge range_a0/range_a1 and result in range of a1, but because: 1) seg_len_a1 wraps to a large unsigned integer; 2) seg_len_a2 wraps to a large unsigned integer; 3) diff + seg_len_a2 wraps again to a small unsigned integer; The program falsely merges ranges to range_a0. As a result, loop is vectorized because range_a0 and range_b are considered not alias to each other. This is wrong because *a1 and *b overlaps at the first several iterations and have circle in dependence.
GCC uses vect_factor as minimal segment length for dr_b when merging alias pairs, I think it could be relaxed to vect_factor * abs (DR_STEP (dr_b)). Below test shows this change can merge additional alias checks: #include <stdio.h> int arr[2048]; int foo (int *a, int *b, int len) { int i; int *a1 = a; int *a0 = a1 - 4; for (i = 0; i < len; i++) { *b = *a0 + *a1; b--; a0++; a1++; } return 0; } int main (void) { int *a = &arr[1027]; int *b = &arr[1024]; int i; for (i = 0; i < 2048; i++) arr[i] = i; foo (a, b, 500); for (i = 0; i < 2048; i++) fprintf (stdout, "%d: %d\n", i, arr[i]); return 0; } $ ./gcc -Ofast -fno-inline ./z.c -o z.exe -fdump-tree-vect-details Of course, alias check should fail and vectorized loop in foo should never be executed. Thanks,
On Fri, 19 May 2017, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80815 > > --- Comment #1 from amker at gcc dot gnu.org --- > GCC uses vect_factor as minimal segment length for dr_b when merging alias > pairs, I think it could be relaxed to vect_factor * abs (DR_STEP (dr_b)). > Below test shows this change can merge additional alias checks: But it would make cases aliasing that are not (that's always a trade off between more precise result and less runtime checking).
(In reply to rguenther@suse.de from comment #2) > On Fri, 19 May 2017, amker at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80815 > > > > --- Comment #1 from amker at gcc dot gnu.org --- > > GCC uses vect_factor as minimal segment length for dr_b when merging alias > > pairs, I think it could be relaxed to vect_factor * abs (DR_STEP (dr_b)). > > Below test shows this change can merge additional alias checks: > > But it would make cases aliasing that are not (that's always a trade off > between more precise result and less runtime checking). Hmm, it only does compilation time (and still conservative) check if segment dr_b resides in the gap between dr_a1 and dr_a2. the overall effect would be equal to checking <dr_a1, dr_b>, <gap_segment, dr_b> and <dr_a2, dr_b>?
Also the three cases: /* If the left segment does not extend beyond the start of the right segment the new segment length is that of the right plus the segment distance. */ if (tree_fits_uhwi_p (dr_a1->seg_len) && compare_tree_int (dr_a1->seg_len, diff) <= 0) { dr_a1->seg_len = size_binop (PLUS_EXPR, dr_a2->seg_len, size_int (diff)); do_remove = true; } /* Generally the new segment length is the maximum of the left segment size and the right segment size plus the distance. ??? We can also build tree MAX_EXPR here but it's not clear this is profitable. */ else if (tree_fits_uhwi_p (dr_a1->seg_len) && tree_fits_uhwi_p (dr_a2->seg_len)) { unsigned HOST_WIDE_INT seg_len_a1 = tree_to_uhwi (dr_a1->seg_len); unsigned HOST_WIDE_INT seg_len_a2 = tree_to_uhwi (dr_a2->seg_len); dr_a1->seg_len = size_int (MAX (seg_len_a1, diff + seg_len_a2)); do_remove = true; } /* Now we check if the following condition is satisfied: DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B where DIFF = DR_A2_INIT - DR_A1_INIT. However, SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we have to make a best estimation. We can get the minimum value of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B, then either of the following two conditions can guarantee the one above: 1: DIFF <= MIN_SEG_LEN_B 2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B */ else { unsigned HOST_WIDE_INT min_seg_len_b = (tree_fits_uhwi_p (dr_b1->seg_len) ? tree_to_uhwi (dr_b1->seg_len) : factor); if (diff <= min_seg_len_b || (tree_fits_uhwi_p (dr_a1->seg_len) && diff - tree_to_uhwi (dr_a1->seg_len) < min_seg_len_b)) { dr_a1->seg_len = size_binop (PLUS_EXPR, dr_a2->seg_len, size_int (diff)); do_remove = true; } } Can be merged together. the merged dr_a1->seg_len is dr_a2->seg_len + diff, with an exception checking MAX (seg_len_a1, diff + seg_len_a2). Of course, negative step still needs to be handled.
Author: amker Date: Fri May 26 14:18:26 2017 New Revision: 248512 URL: https://gcc.gnu.org/viewcvs?rev=248512&root=gcc&view=rev Log: PR tree-optimization/80815 * tree-data-ref.c (prune_runtime_alias_test_list): Simplify condition for merging runtime alias checks. Handle negative DR_STEPs. gcc/testsuite * gcc.dg/vect/pr80815-1.c: New test. * gcc.dg/vect/pr80815-2.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/pr80815-1.c trunk/gcc/testsuite/gcc.dg/vect/pr80815-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-data-ref.c
Created attachment 41456 [details] sparc-sun-solaris2.12 pr80815-3.c.156t.vect
(In reply to Rainer Orth from comment #6) > Created attachment 41456 [details] > sparc-sun-solaris2.12 pr80815-3.c.156t.vect Thanks for reporting, I will investigate it and disable on targets if it's not a general case. Thanks.
vect_perm added for the the test case. It should be bypassed now on sparc-sun-solaris2.12?
Fixed.