Bug 107344

Summary: GCC/nvptx SESE region optimization
Product: gcc Reporter: Thomas Schwinge <tschwinge>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: vries
Priority: P3 Keywords: openacc
Version: 13.0   
Target Milestone: ---   
Host: Target: nvptx
Build: Known to work:
Known to fail: Last reconfirmed: 2022-10-21 00:00:00

Description Thomas Schwinge 2022-10-21 09:24:39 UTC
GCC/nvptx has a "SESE region optimization", <https://inbox.sourceware.org/gcc-patches/564CC75D.3020309@acm.org>.

A "regression" has been introduced by recent commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 "[PR107195] Set range to zero when nonzero mask is 0":

    UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
    [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"

Same for C++.

During investigation of that issue (which I suppose is just one random example), I found that earlier code transformations/optimizations may inhibit this "Neuter whole SESE regions" optimization.

It's unclear to me if this is an actual "problem", which optimization is "more important".
Comment 1 GCC Commits 2022-10-21 09:29:11 UTC
The master branch has been updated by Thomas Schwinge <tschwinge@gcc.gnu.org>:

https://gcc.gnu.org/g:a9de836c2b22f878cff592b96e11c1b95d4d36ee

commit r13-3434-ga9de836c2b22f878cff592b96e11c1b95d4d36ee
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Sun Oct 16 00:07:20 2022 +0200

    Restore 'libgomp.oacc-c-c++-common/nvptx-sese-1.c' SESE regions checking [PR107195, PR107344]
    
    That is, adjust for optimization introduced with recent
    commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
    "[PR107195] Set range to zero when nonzero mask is 0", where GCC now
    understands that after 'r *= 2;', 'r & 1' will never hold here, and thus
    transforms/optimizes/"disturbs" the original code such that GCC/nvptx's later
    "Neuter whole SESE regions" optimization no longer is applicable to it:
    
        UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
        PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
        PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
        [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
    
    Same for C++.
    
    It's unclear to me if this is an actual "problem", which optimization is "more
    important", so I've filed PR107344 "GCC/nvptx SESE region optimization" to
    capture this question, and here restore what we intend to be testing (to my
    understanding) in 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'.
    
            PR tree-optimization/107195
            PR target/107344
            libgomp/
            * testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c: Restore SESE
            regions checking.
Comment 2 Thomas Schwinge 2022-10-21 09:34:05 UTC
Regression 'libgomp.oacc-c-c++-common/nvptx-sese-1.c' addressed, but this remains to be looked into:

(In reply to Thomas Schwinge from comment #0)
> GCC/nvptx has a "SESE region optimization", <https://inbox.sourceware.org/gcc-patches/564CC75D.3020309@acm.org>.
> 
> [...]
> 
> During investigation of that issue (which I suppose is just one random example), I found that earlier code transformations/optimizations may inhibit this "Neuter whole SESE regions" optimization.
> 
> It's unclear to me if this is an actual "problem", which optimization is "more important".
Comment 3 Richard Biener 2022-10-21 11:17:14 UTC
I guess the reorg pass has to be simply enhanced to deal with "distorted" regions, or alternatively those regions need to be protected somehow.

I'm just guessing that jump threading messes things up here?