Summary: | [10 Regression] x86-64 gcc 10.1 using -O3 leads to wrong calculation since r10-1882-g831e688af50c5f77 | ||
---|---|---|---|
Product: | gcc | Reporter: | Ingo Weyrich <heckflosse67> |
Component: | tree-optimization | Assignee: | Richard Biener <rguenth> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | marxin, rguenth |
Priority: | P2 | Keywords: | wrong-code |
Version: | 10.1.0 | ||
Target Milestone: | 10.2 | ||
Host: | Target: | ||
Build: | Known to work: | 10.1.1, 11.0 | |
Known to fail: | 10.1.0 | Last reconfirmed: | 2020-07-09 00:00:00 |
Description
Ingo Weyrich
2020-07-09 11:38:23 UTC
Confirmed, started with r10-1882-g831e688af50c5f77. Confirmed. The i == 1 lane is different. We're using standard interleaving vectorization here, the innermost two loops are unrolled and rgb_cam is elided. Note eventually we optimize the whole loop at compile-time to <bb 2> [local count: 89478486]: MEM <vector(2) double> [(double *)&xyz_cam] = { 2.97789709999999985257090884260833263397216796875e+0, 3.94211709999999992959374139900319278240203857421875e+0 }; MEM <vector(2) double> [(double *)&xyz_cam + 16B] = { 4.9063371000000000066165739553980529308319091796875e+0, 3.291832700000000055950977184693329036235809326171875e+0 }; MEM <vector(2) double> [(double *)&xyz_cam + 32B] = { 4.06932820000000017301999832852743566036224365234375e+0, 4.8468236999999998459998096222989261150360107421875e+0 }; MEM <vector(2) double> [(double *)&xyz_cam + 48B] = { 5.40156330000000028945805752300657331943511962890625e+0, 6.2267732999999996224005371914245188236236572265625e+0 }; xyz_cam[2][2] = 7.051983299999999843521436559967696666717529296875e+0; Value numbering stmt = vect__68.15_89 = MEM <vector(2) double> [(double *)vectp_xyz_sRGB.12_85]; RHS MEM <vector(2) double> [(double *)vectp_xyz_sRGB.12_85] simplified to { 1.4308039999999999647428694515838287770748138427734375e-1, 0.0 } Setting value number of vect__68.15_89 to { 1.4308039999999999647428694515838287770748138427734375e-1, 0.0 } (changed) I guess we might be confused by the read crossing CTORs. Interestingly it works for int[4] but fails for int[2]. typedef int T; static const T a[2][3] = { { 1, 2, 3 }, { 4, 5, 6 } }; typedef T v2 __attribute__((vector_size(2*sizeof(T)))); int main() { const T *p = &a[0][2]; v2 x = *(const v2 *)p; T z = x[1]; if (z != 4) __builtin_abort (); return 0; } The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:9ddea9306251b7d4e4fd1d67a5941ef7448b2e66 commit r11-1972-g9ddea9306251b7d4e4fd1d67a5941ef7448b2e66 Author: Richard Biener <rguenther@suse.de> Date: Thu Jul 9 16:06:04 2020 +0200 fixup BIT_FIELD_REF detection in SLP discovery This fixes a thinko where we end up combining a BIT_FIELD_REF and a memory access, fixed by checking all stmts are a load or none. 2020-07-09 Richard Biener <rguenther@suse.de> PR tree-optimization/96133 * tree-vect-slp.c (vect_build_slp_tree_1): Compare load_p status between stmts. (In reply to CVS Commits from comment #5) > The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: > > https://gcc.gnu.org/g:9ddea9306251b7d4e4fd1d67a5941ef7448b2e66 > > commit r11-1972-g9ddea9306251b7d4e4fd1d67a5941ef7448b2e66 > Author: Richard Biener <rguenther@suse.de> > Date: Thu Jul 9 16:06:04 2020 +0200 > > fixup BIT_FIELD_REF detection in SLP discovery > > This fixes a thinko where we end up combining a BIT_FIELD_REF > and a memory access, fixed by checking all stmts are a load or > none. > > 2020-07-09 Richard Biener <rguenther@suse.de> > > PR tree-optimization/96133 > * tree-vect-slp.c (vect_build_slp_tree_1): Compare load_p > status between stmts. Whoops - this was for PR96134. Richard, does you last comment mean, you already fixed it?. How can I test? Maybe using godbolt? Ingo (In reply to Ingo Weyrich from comment #7) > Richard, does you last comment mean, you already fixed it?. How can I test? > Maybe using godbolt? > > Ingo Sorry, no - a test is still in development, the last comment meant I mistyped the bug number in an unrelated commit ;) The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:6e41c27bf549d957eb399d39d7d0c213f8733351 commit r11-1981-g6e41c27bf549d957eb399d39d7d0c213f8733351 Author: Richard Biener <rguenther@suse.de> Date: Thu Jul 9 16:03:45 2020 +0200 fix constant folding from array CTORs This fixes the case where we try to fold a read from an array initalizer and happen to cross the boundary of multiple CTORs which isn't really supported. For the interesting cases like the testcase we actually handle the folding by encoding the whole initializer. 2020-07-10 Richard Biener <rguenther@suse.de> PR tree-optimization/96133 * gimple-fold.c (fold_array_ctor_reference): Do not recurse to folding a CTOR that does not fully cover the asked for object. * gcc.dg/torture/pr96133.c: New testcase. The releases/gcc-10 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:8614106f121db118f9db260b9949883485d0bbf6 commit r10-8455-g8614106f121db118f9db260b9949883485d0bbf6 Author: Richard Biener <rguenther@suse.de> Date: Thu Jul 9 16:03:45 2020 +0200 fix constant folding from array CTORs This fixes the case where we try to fold a read from an array initalizer and happen to cross the boundary of multiple CTORs which isn't really supported. For the interesting cases like the testcase we actually handle the folding by encoding the whole initializer. 2020-07-10 Richard Biener <rguenther@suse.de> PR tree-optimization/96133 * gimple-fold.c (fold_array_ctor_reference): Do not recurse to folding a CTOR that does not fully cover the asked for object. * gcc.dg/torture/pr96133.c: New testcase. (cherry picked from commit 6e41c27bf549d957eb399d39d7d0c213f8733351) Fixed. Thanks for fixing and kudos for your very fast response time! |