Created attachment 51875 [details] dct_test.cc Originally noticed a problem as failed tests on libjxl-0.5. I extracted ~10KB self-contained single-file example. It still could be reduced, but it's quite tangled. Could you see what is obviously wrong with it? Attached the reproducer as dct_test.cc: $ g++-12.0.0 -std=c++11 -O2 -fno-tree-vectorize dct_test.cc -o dct_test $ g++-12.0.0 -std=c++11 -O2 -fno-tree-vectorize -fno-ipa-modref dct_test.cc -o dct_test1 # good: $ ./dct_test1 OK: Good accuracy: exp=0.000001 act=0.000000 OK: Good accuracy: exp=0.000001 act=0.000000 # bad: $ ./dct_test OK: Good accuracy: exp=0.000001 act=0.000000 ERROR: Too low accuracy: exp=0.000001 act=1.000000 $ g++-12.0.0 -v Using built-in specs. COLLECT_GCC=/nix/store/2lxwqh3k88x4jwyfwlsfnwrp78yq2ah2-gcc-12.0.0/bin/g++ COLLECT_LTO_WRAPPER=/nix/store/2lxwqh3k88x4jwyfwlsfnwrp78yq2ah2-gcc-12.0.0/libexec/gcc/x86_64-unknown-linux-gnu/12.0.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: Thread model: posix Supported LTO compression algorithms: zlib gcc version 12.0.0 20211121 (experimental) (GCC)
It fails with ./xgcc -B ./ -O2 d.ii -fdbg-cnt=ipa_mod_ref_pta:189 -fdump-tree-all-details -fdump-ipa-all-details and works ./xgcc -B ./ -O2 d.ii -fdbg-cnt=ipa_mod_ref_pta:188 -fdump-tree-all-details -fdump-ipa-all-details The difference in optimized dump is: int main () { struct DCTFrom D.11418; @@ -2805,12 +2810,7 @@ float x[4]; struct DCTTo D.11356; struct DCTFrom D.11355; - float _3; - float _4; - double _6; struct FILE * stderr.3_8; - double _9; - struct FILE * stderr.4_11; float _12; float _13; double _15; @@ -2996,30 +2996,10 @@ {anonymous}::IDCT1DWrapper.constprop<4, 1, {anonymous}::DCTFrom, {anonymous}::DCTTo> (&D.11400, &D.11356); D.11400 ={v} {CLOBBER}; D.11356 ={v} {CLOBBER}; - _3 = out[2]; - _4 = _3 - 1.0e+0; - actual_accuracy_5 = ABS_EXPR <_4>; - if (actual_accuracy_5 > 9.999999974752427078783512115478515625e-7) - goto <bb 11>; [0.04%] - else - goto <bb 12>; [99.96%] - - <bb 11> [local count: 429325]: - _6 = (double) actual_accuracy_5; stderr.3_8 = stderr; - fprintf (stderr.3_8, "ERROR: Too low accuracy: exp=%f act=%f\n", 9.999999974752427078783512115478515625e-7, _6); + fprintf (stderr.3_8, "ERROR: Too low accuracy: exp=%f act=%f\n", 9.999999974752427078783512115478515625e-7, 1.0e+0); exit (1); - <bb 12> [local count: 1072883004]: - _9 = (double) actual_accuracy_5; - stderr.4_11 = stderr; - fprintf (stderr.4_11, "OK: Good accuracy: exp=%f act=%f\n", 9.999999974752427078783512115478515625e-7, _9); - x ={v} {CLOBBER}; - out ={v} {CLOBBER}; - coeffs ={v} {CLOBBER}; - scratch_space ={v} {CLOBBER}; - return 0; - } And I suppose we are not expected to optimize out the "Good accuracy" message :) So it looks out is modified by {anonymous}::IDCT1DWrapper.constprop<4, 1, {anonymous}::DCTFrom, {anonymous}::DCTTo> (&D.11400, &D.11356); but for some reason ipa propagation gets no_indirect_clobber for param1. This seems wrong since to->data is written to, so it is an indirect clobber. I may be able to look more into this only tomorrow - it is bit late.
Confirmed, I have not reduced it but here is what is happening. outD.25694 = {}; ... MEM[(struct DCTToD.21174 *)&D.25700 clique 3 base 1].data_D.21196 = &outD.25694; ... _ZN12_GLOBAL__N_121GenericTransposeBlockILm1ELm4ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0D.25466 (&D.25767, &D.25766); ... _ZN12_GLOBAL__N_113IDCT1DWrapperILm4ELm1ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0D.25467 (&D.25768, &D.25700); ... _3 = outD.25694[2]; FRE thinks _ZN12_GLOBAL__N_113IDCT1DWrapperILm4ELm1ENS_7DCTFromENS_5DCTToEEEvRKT1_RKT2_.constprop.0 does not touch out even though D.25700 is passed to it ....
Caused by stupid thinko (also present in gcc11). I compute right min_flags but then use wrong value (without dereference applied). I am testing the following. diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c index c2edc0d28a6..9e537b04196 100644 --- a/gcc/ipa-modref.c +++ b/gcc/ipa-modref.c @@ -4201,7 +4201,7 @@ update_escape_summary_1 (cgraph_edge *e, if (ee->direct && !em->direct) min_flags = deref_flags (min_flags, ignore_stores); struct escape_entry entry = {em->parm_index, ee->arg, - ee->min_flags, + min_flags, ee->direct & em->direct}; sum->esc.safe_push (entry); }
Fixed on trunk by g:a70faf6e4df7481c2c9a08a06657c20beb3043de (sorry for cut&pasting wrong PR number). GCC 11 is also affected, so I will backport it there are tonight testing.
> GCC 11 is also affected, so I will backport it there are tonight testing. Please fix the PR number in the backports.
(In reply to Jan Hubicka from comment #4) > Fixed on trunk by g:a70faf6e4df7481c2c9a08a06657c20beb3043de (sorry for > cut&pasting wrong PR number). Tested on full libjxl-0.5 testsuite. All works now. Thank you!
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:f0c601ed0080ab9ebe2bc0c23d41471531e2af02 commit r11-9791-gf0c601ed0080ab9ebe2bc0c23d41471531e2af02 Author: Jan Hubicka <jh@suse.cz> Date: Fri Nov 26 13:36:35 2021 +0100 Fix handling of in_flags in update_escape_summary_1 update_escape_summary_1 has thinko where it compues proper min_flags but then stores original value (ignoring the fact whether there was a dereference in the escape point). PR ipa/103432 * ipa-modref.c (update_escape_summary_1): Fix handling of min_flags. (cherry picked from commit a70faf6e4df7481c2c9a08a06657c20beb3043de)
Should be fixed now.