Bug 87871 - [9/10 Regression] testcases fail after r265398 on arm
Summary: [9/10 Regression] testcases fail after r265398 on arm
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 9.0
: P2 normal
Target Milestone: 9.3
Assignee: Not yet assigned to anyone
URL: https://gcc.gnu.org/ml/gcc-patches/20...
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-11-02 20:50 UTC by Christophe Lyon
Modified: 2019-08-12 08:57 UTC (History)
11 users (show)

See Also:
Host:
Target: arm
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-12-14 00:00:00


Attachments
Proposed patch (610 bytes, patch)
2019-04-17 23:54 UTC, Peter Bergner
Details | Diff
Updated patch (613 bytes, patch)
2019-04-18 01:28 UTC, Peter Bergner
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Lyon 2018-11-02 20:50:37 UTC
The following tests fail on arm after r265398 (combine: Do not combine moves from hard registers).

    gcc.c-torture/execute/920428-2.c   -O2  execution test
    gcc.c-torture/execute/920428-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.c-torture/execute/920428-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.c-torture/execute/920428-2.c   -O3 -g  execution test
    gcc.c-torture/execute/920501-7.c   -O2  execution test
    gcc.c-torture/execute/920501-7.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.c-torture/execute/920501-7.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.c-torture/execute/920501-7.c   -O3 -g  execution test
    gcc.c-torture/execute/built-in-setjmp.c   -O2  execution test
    gcc.c-torture/execute/built-in-setjmp.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.c-torture/execute/built-in-setjmp.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.c-torture/execute/built-in-setjmp.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.c-torture/execute/built-in-setjmp.c   -O3 -g  execution test
gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/memmove-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/memmove-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/memmove-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/memmove-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/memset-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/memset-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/memset-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/memset-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/memset-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/stpcpy-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/stpcpy-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/stpcpy-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/stpcpy-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/stpcpy-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/stpncpy-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/stpncpy-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/stpncpy-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/stpncpy-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/strcat-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/strcat-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/strcat-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/strcat-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/strcat-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/strcpy-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/strcpy-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/strcpy-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/strcpy-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/strcpy-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/strncat-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/strncpy-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/strncpy-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/strncpy-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O3 -g 
    gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2 
    gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
    gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
    gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
    gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O3 -g 

    gcc.c-torture/execute/comp-goto-2.c   -O2  execution test
    gcc.c-torture/execute/comp-goto-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.c-torture/execute/comp-goto-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.c-torture/execute/comp-goto-2.c   -O3 -g  execution test
    gcc.c-torture/execute/nestfunc-5.c   -O2  execution test
    gcc.c-torture/execute/nestfunc-5.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.c-torture/execute/nestfunc-5.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.c-torture/execute/nestfunc-5.c   -O3 -g  execution test
    gcc.c-torture/execute/pr24135.c   -O2  execution test
    gcc.c-torture/execute/pr24135.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.c-torture/execute/pr24135.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.c-torture/execute/pr24135.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.c-torture/execute/pr24135.c   -O3 -g  execution test
    gcc.c-torture/execute/pr51447.c   -O2  execution test
    gcc.c-torture/execute/pr51447.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.c-torture/execute/pr51447.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.c-torture/execute/pr51447.c   -O3 -g  execution test
    gcc.c-torture/execute/pr60003.c   -O2  execution test
    gcc.c-torture/execute/pr60003.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.c-torture/execute/pr60003.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.c-torture/execute/pr60003.c   -O3 -g  execution test

    gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split live-range of register"
    gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"
    gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump ira "Split live-range of register"
    gcc.dg/non-local-goto-1.c execution test
    gcc.dg/non-local-goto-2.c execution test



    gcc.dg/torture/stackalign/comp-goto-1.c   -O2  execution test
    gcc.dg/torture/stackalign/comp-goto-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/comp-goto-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/comp-goto-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.dg/torture/stackalign/comp-goto-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic execution test
    gcc.dg/torture/stackalign/comp-goto-1.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/comp-goto-1.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/comp-goto-1.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/nested-5.c   -O2  execution test
    gcc.dg/torture/stackalign/nested-5.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/nested-5.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/nested-5.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.dg/torture/stackalign/nested-5.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic execution test
    gcc.dg/torture/stackalign/nested-5.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/nested-5.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/nested-5.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O2  execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/non-local-goto-1.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O2  execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/non-local-goto-2.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O2  execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/non-local-goto-3.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-4.c   -O2  execution test
    gcc.dg/torture/stackalign/non-local-goto-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/non-local-goto-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.dg/torture/stackalign/non-local-goto-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-4.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-4.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/non-local-goto-4.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-5.c   -O2  execution test
    gcc.dg/torture/stackalign/non-local-goto-5.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/non-local-goto-5.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-5.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.dg/torture/stackalign/non-local-goto-5.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-5.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/non-local-goto-5.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/non-local-goto-5.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O2  execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/setjmp-1.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/setjmp-3.c   -O2  execution test
    gcc.dg/torture/stackalign/setjmp-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/setjmp-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/setjmp-3.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/setjmp-3.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.dg/torture/stackalign/setjmp-3.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test
    gcc.dg/torture/stackalign/setjmp-3.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/setjmp-3.c   -O3 -g -fpic execution test
    gcc.dg/torture/stackalign/setjmp-4.c   -O2  execution test
    gcc.dg/torture/stackalign/setjmp-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
    gcc.dg/torture/stackalign/setjmp-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test
    gcc.dg/torture/stackalign/setjmp-4.c   -O2 -fpic execution test
    gcc.dg/torture/stackalign/setjmp-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    gcc.dg/torture/stackalign/setjmp-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test
    gcc.dg/torture/stackalign/setjmp-4.c   -O3 -g  execution test
    gcc.dg/torture/stackalign/setjmp-4.c   -O3 -g -fpic execution test


    gcc.target/arm/addr-modes-float.c scan-assembler vst3.8\t{d[02468], d[02468], d[02468]}, \\[r[0-9]+\\]!
    gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times strh\\tr[0-9]+ 2
    gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times vst1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
Comment 1 Segher Boessenkool 2018-11-05 21:18:54 UTC
Author: segher
Date: Mon Nov  5 21:18:22 2018
New Revision: 265821

URL: https://gcc.gnu.org/viewcvs?rev=265821&root=gcc&view=rev
Log:
combine: Don't make an intermediate reg for assigning to sfp (PR87871)

The code with an intermediate register is perfectly fine, but LRA
apparently cannot handle the resulting code, or perhaps something else
is wrong.  In either case, making an extra temporary will not likely
help here, so let's just skip it.


	PR rtl-optimization/87871
	* combine.c (make_more_copies): Skip if dest is frame_pointer_rtx.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
Comment 2 Martin Liška 2018-11-20 08:50:07 UTC
Segher: Can the bug be marked as resolved?
Comment 3 Segher Boessenkool 2018-11-20 12:46:22 UTC
I don't know, this is up to the arm people.  I don't know if all problems
reported here are fixed now.
Comment 4 Christophe Lyon 2018-11-20 15:11:01 UTC
As of r266293, the following regressions reported here are still failing:
FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"
FAIL: gcc.target/arm/addr-modes-float.c scan-assembler vst3.8\t{d[02468], d[02468], d[02468]}, \\[r[0-9]+\\]!
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times strh\\tr[0-9]+ 2
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times vst1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
Comment 5 Segher Boessenkool 2018-11-20 15:30:56 UTC
The first one just needs an xfail.  I don't know if it should be *-*-* there
or only arm*-*-* should be added.

The other two need some debugging by someone who knows the target and/or
these tests.
Comment 6 Ramana Radhakrishnan 2018-12-14 09:43:37 UTC
(In reply to Segher Boessenkool from comment #5)
> The first one just needs an xfail.  I don't know if it should be *-*-* there
> or only arm*-*-* should be added.
> 
> The other two need some debugging by someone who knows the target and/or
> these tests.

for the addr-modes-float.c case there are additional vmov's being generated and thus is certainly a regression. 

--- 8.s	2018-12-14 09:41:04.367843079 +0000
+++ addr-modes-float.s	2018-12-14 09:40:39.907980812 +0000
@@ -139,10 +139,13 @@
 	@ args = 0, pretend = 0, frame = 0
 	@ frame_needed = 0, uses_anonymous_args = 0
 	@ link register save eliminated.
+	vmov	q8, q0  @ ti
 	mov	r3, r0
+	vmov	q9, q1  @ ti
 	add	r0, r0, #48
-	vst3.8	{d0, d2, d4}, [r3]!
-	vst3.8	{d1, d3, d5}, [r3]
+	vmov	q10, q2  @ ti
+	vst3.8	{d16, d18, d20}, [r3]!
+	vst3.8	{d17, d19, d21}, [r3]
Comment 7 Ramana Radhakrishnan 2018-12-14 09:44:04 UTC
Confirmed.
Comment 8 Wilco 2019-02-06 17:14:48 UTC
(In reply to Segher Boessenkool from comment #5)
> The first one just needs an xfail.  I don't know if it should be *-*-* there
> or only arm*-*-* should be added.
> 
> The other two need some debugging by someone who knows the target and/or
> these tests.

The previous code for Arm was:

	cbz	r0, .L5
	push	{r4, lr}
	mov	r4, r0
	bl	foo
	movw	r2, #:lower16:.LANCHOR0
	movt	r2, #:upper16:.LANCHOR0
	add	r4, r4, r0
	str	r4, [r2]
	pop	{r4, pc}
.L5:
	movs	r0, #1
	bx	lr

Now it fails to shrinkwrap:

	push	{r4, lr}
	mov	r4, r0
	cmp	r4, #0
	moveq	r0, #1
	beq	.L3
	bl	foo
	ldr	r2, .L7
	add	r3, r4, r0
	str	r3, [r2]
.L3:
	pop	{r4, lr}
	bx	lr

It seems shrinkwrapping is more random, sometimes it's done as expected, sometimes it is not. It was more consistent on older GCC's.
Comment 9 Richard Earnshaw 2019-04-05 10:32:33 UTC
(In reply to Wilco from comment #8)
> (In reply to Segher Boessenkool from comment #5)
> > The first one just needs an xfail.  I don't know if it should be *-*-* there
> > or only arm*-*-* should be added.
> > 
> > The other two need some debugging by someone who knows the target and/or
> > these tests.
> 
> The previous code for Arm was:
> 
> 	cbz	r0, .L5
> 	push	{r4, lr}
> 	mov	r4, r0
> 	bl	foo
> 	movw	r2, #:lower16:.LANCHOR0
> 	movt	r2, #:upper16:.LANCHOR0
> 	add	r4, r4, r0
> 	str	r4, [r2]
> 	pop	{r4, pc}
> .L5:
> 	movs	r0, #1
> 	bx	lr
> 
> Now it fails to shrinkwrap:
> 
> 	push	{r4, lr}
> 	mov	r4, r0
> 	cmp	r4, #0
> 	moveq	r0, #1
> 	beq	.L3
> 	bl	foo
> 	ldr	r2, .L7
> 	add	r3, r4, r0
> 	str	r3, [r2]
> .L3:
> 	pop	{r4, lr}
> 	bx	lr
> 
> It seems shrinkwrapping is more random, sometimes it's done as expected,
> sometimes it is not. It was more consistent on older GCC's.

This looks like another fallout of not allowing combine to merge with hard regs.  Previously the CBZ could be moved outside of the prologue because it operated directly on the incoming hard reg.  Now it only sees the value after the copy into the pseudo, which is a call-saved reg because it's live over the call.
Comment 10 Richard Earnshaw 2019-04-05 10:34:48 UTC
I wonder if this could be picked up in the post-reload CSE pass?  (ie rewriting the CBZ to use the incoming hard reg?)
Comment 11 Segher Boessenkool 2019-04-06 04:58:00 UTC
(In reply to Wilco from comment #8)
> 	push	{r4, lr}
> 	mov	r4, r0
> 	cmp	r4, #0

Why does it copy r0 to r4 and then compare r4?  On more modern machines it
is faster to compare r0 itself, and it would allow shrink-wrapping to work
fine here (well, need to move the assignment to r4 down to the block where
it is used, but something will certainly do that, and it is one of the
shrink-wrapping improvements I want to do for GCC 10).

> It seems shrinkwrapping is more random, sometimes it's done as expected,
> sometimes it is not. It was more consistent on older GCC's.

Shrink-wrapping is very predictable.  But no block where a non-volatile
register is used or set will get shrink-wrapped.  This limitation has
existed since forever.
Comment 12 Segher Boessenkool 2019-04-06 05:31:11 UTC
(In reply to Segher Boessenkool from comment #11)
> (In reply to Wilco from comment #8)
> > 	mov	r4, r0
> > 	cmp	r4, #0
> 
> Why does it copy r0 to r4 and then compare r4?  On more modern machines it
> is faster to compare r0 itself, and it would allow shrink-wrapping to work
> fine here

We get this in combine:

Trying 2 -> 7:
    2: r112:SI=r116:SI
      REG_DEAD r116:SI
    7: cc:CC=cmp(r112:SI,0)
Successfully matched this instruction:
(parallel [
        (set (reg:CC 100 cc)
            (compare:CC (reg:SI 116)
                (const_int 0 [0])))
        (set (reg/v:SI 112 [ a ])
            (reg:SI 116))
    ])

(that's *movsi_compare0).


This is preceded by

(insn 50 3 7 2 (set (reg:SI 116)
        (reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 179 {*arm_movsi_insn}
     (nil))


And it stays that way until IRA, which does

Disposition:
    0:r111 l0     0    3:r112 l0     4    1:r113 l0     2    2:r114 l0     3
    5:r116 l0     4    4:r117 l0     0

If r116 had been allocated hard reg 0 all would be fine (and we know r116
dies in insn 7 already, there is a REG_DEAD note on it).
Comment 13 Richard Biener 2019-04-11 09:38:41 UTC
Can we xfail/defer the bug?
Comment 14 Peter Bergner 2019-04-11 20:33:46 UTC
(In reply to Segher Boessenkool from comment #12)
> Disposition:
>     0:r111 l0     0    3:r112 l0     4    1:r113 l0     2    2:r114 l0     3
>     5:r116 l0     4    4:r117 l0     0
> 
> If r116 had been allocated hard reg 0 all would be fine (and we know r116
> dies in insn 7 already, there is a REG_DEAD note on it).

What was the order of assignment?  If r116 conflicts with r111 or r117 and they were assigned first, then that's just bad luck.
Comment 15 Segher Boessenkool 2019-04-12 01:09:36 UTC
      Forming thread by copy 0:a0r111-a4r117 (freq=500):
        Result (freq=3500): a0r111(2500) a4r117(1000)
      Forming thread by copy 2:a3r112-a5r116 (freq=125):
        Result (freq=4500): a3r112(1500) a5r116(3000)
      Forming thread by copy 1:a2r114-a3r112 (freq=62):
        Result (freq=5500): a2r114(1000) a3r112(1500) a5r116(3000)
      Pushing a1(r113,l0)(cost 0)
      Pushing a4(r117,l0)(cost 0)
      Pushing a0(r111,l0)(cost 0)
      Pushing a2(r114,l0)(cost 0)
      Pushing a3(r112,l0)(cost 0)
      Pushing a5(r116,l0)(cost 0)
      Popping a5(r116,l0)  -- assign reg 3
      Popping a3(r112,l0)  -- assign reg 4
      Popping a2(r114,l0)  -- assign reg 3
      Popping a0(r111,l0)  -- assign reg 0
      Popping a4(r117,l0)  -- assign reg 0
      Popping a1(r113,l0)  -- assign reg 2
Assigning 4 to a5r116
Disposition:
    0:r111 l0     0    3:r112 l0     4    1:r113 l0     2    2:r114 l0     3
    5:r116 l0     4    4:r117 l0     0


r116 does not conflict with *any* other pseudo.  It is alive in the first
two insns of the function, which are

(insn 50 3 7 2 (set (reg:SI 116)
        (reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 181 {*arm_movsi_insn}
     (nil))
(insn 7 50 8 2 (parallel [
            (set (reg:CC 100 cc)
                (compare:CC (reg:SI 116)
                    (const_int 0 [0])))
            (set (reg/v:SI 112 [ a ])
                (reg:SI 116))
        ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
     (expr_list:REG_DEAD (reg:SI 116)
        (nil)))

r0 _is_ used by a successor (as the argument for the call to foo), but we
could use r0 for r116 anyway, since what we assign to it is r0 :-)
Comment 16 Segher Boessenkool 2019-04-12 01:10:44 UTC
(Which would make insn 50 go away, if you prefer to look at it that way).
Comment 17 Jakub Jelinek 2019-04-12 14:39:31 UTC
(In reply to Segher Boessenkool from comment #15)
>       Forming thread by copy 0:a0r111-a4r117 (freq=500):
>         Result (freq=3500): a0r111(2500) a4r117(1000)
>       Forming thread by copy 2:a3r112-a5r116 (freq=125):
>         Result (freq=4500): a3r112(1500) a5r116(3000)
>       Forming thread by copy 1:a2r114-a3r112 (freq=62):
>         Result (freq=5500): a2r114(1000) a3r112(1500) a5r116(3000)
>       Pushing a1(r113,l0)(cost 0)
>       Pushing a4(r117,l0)(cost 0)
>       Pushing a0(r111,l0)(cost 0)
>       Pushing a2(r114,l0)(cost 0)
>       Pushing a3(r112,l0)(cost 0)
>       Pushing a5(r116,l0)(cost 0)
>       Popping a5(r116,l0)  -- assign reg 3
>       Popping a3(r112,l0)  -- assign reg 4
>       Popping a2(r114,l0)  -- assign reg 3
>       Popping a0(r111,l0)  -- assign reg 0
>       Popping a4(r117,l0)  -- assign reg 0
>       Popping a1(r113,l0)  -- assign reg 2
> Assigning 4 to a5r116
> Disposition:
>     0:r111 l0     0    3:r112 l0     4    1:r113 l0     2    2:r114 l0     3
>     5:r116 l0     4    4:r117 l0     0
> 
> 
> r116 does not conflict with *any* other pseudo.  It is alive in the first
> two insns of the function, which are
> 
> (insn 50 3 7 2 (set (reg:SI 116)
>         (reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 181
> {*arm_movsi_insn}
>      (nil))
> (insn 7 50 8 2 (parallel [
>             (set (reg:CC 100 cc)
>                 (compare:CC (reg:SI 116)
>                     (const_int 0 [0])))
>             (set (reg/v:SI 112 [ a ])
>                 (reg:SI 116))
>         ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
>      (expr_list:REG_DEAD (reg:SI 116)
>         (nil)))
> 
> r0 _is_ used by a successor (as the argument for the call to foo), but we
> could use r0 for r116 anyway, since what we assign to it is r0 :-)

CCing Vlad on this.  I don't see that *movsi_compare0 would in any way prefer the =r,0 alternative over =r,r and using the =r,r alternative would allow to remove one instruction.
Comment 18 Peter Bergner 2019-04-12 15:26:28 UTC
(In reply to Segher Boessenkool from comment #15)
>       Popping a5(r116,l0)  -- assign reg 3
>       Popping a3(r112,l0)  -- assign reg 4
>       Popping a2(r114,l0)  -- assign reg 3
>       Popping a0(r111,l0)  -- assign reg 0
>       Popping a4(r117,l0)  -- assign reg 0
>       Popping a1(r113,l0)  -- assign reg 2
> Assigning 4 to a5r116
> Disposition:
>     0:r111 l0     0    3:r112 l0     4    1:r113 l0     2    2:r114 l0     3
>     5:r116 l0     4    4:r117 l0     0
> 
> 
> r116 does not conflict with *any* other pseudo.  It is alive in the first
> two insns of the function, which are

So we initially assign r3 to r116 presumably because it has the same cost as the other gprs and it occurs first in REG_ALLOC_ORDER.  Then improve_allocation() decides that r4 is a better hard reg and switches the assignment to that.  I'm not sure why it wouldn't choose r0 there instead.
Comment 19 Wilco 2019-04-12 15:34:29 UTC
(In reply to Peter Bergner from comment #18)
> (In reply to Segher Boessenkool from comment #15)
> >       Popping a5(r116,l0)  -- assign reg 3
> >       Popping a3(r112,l0)  -- assign reg 4
> >       Popping a2(r114,l0)  -- assign reg 3
> >       Popping a0(r111,l0)  -- assign reg 0
> >       Popping a4(r117,l0)  -- assign reg 0
> >       Popping a1(r113,l0)  -- assign reg 2
> > Assigning 4 to a5r116
> > Disposition:
> >     0:r111 l0     0    3:r112 l0     4    1:r113 l0     2    2:r114 l0     3
> >     5:r116 l0     4    4:r117 l0     0
> > 
> > 
> > r116 does not conflict with *any* other pseudo.  It is alive in the first
> > two insns of the function, which are
> 
> So we initially assign r3 to r116 presumably because it has the same cost as
> the other gprs and it occurs first in REG_ALLOC_ORDER.  Then
> improve_allocation() decides that r4 is a better hard reg and switches the
> assignment to that.  I'm not sure why it wouldn't choose r0 there instead.

I would expect that r116 has a strong preference for r0 given the r116 = mov r0 and thus allocating r116 to r0 should have the lowest cost by a large margin.
Comment 20 Vladimir Makarov 2019-04-12 20:06:58 UTC
(In reply to Wilco from comment #19)
> (In reply to Peter Bergner from comment #18)
> > (In reply to Segher Boessenkool from comment #15)
> > >       Popping a5(r116,l0)  -- assign reg 3
> > >       Popping a3(r112,l0)  -- assign reg 4
> > >       Popping a2(r114,l0)  -- assign reg 3
> > >       Popping a0(r111,l0)  -- assign reg 0
> > >       Popping a4(r117,l0)  -- assign reg 0
> > >       Popping a1(r113,l0)  -- assign reg 2
> > > Assigning 4 to a5r116
> > > Disposition:
> > >     0:r111 l0     0    3:r112 l0     4    1:r113 l0     2    2:r114 l0     3
> > >     5:r116 l0     4    4:r117 l0     0
> > > 
> > > 
> > > r116 does not conflict with *any* other pseudo.  It is alive in the first
> > > two insns of the function, which are
> > 
> > So we initially assign r3 to r116 presumably because it has the same cost as
> > the other gprs and it occurs first in REG_ALLOC_ORDER.  Then
> > improve_allocation() decides that r4 is a better hard reg and switches the
> > assignment to that.  I'm not sure why it wouldn't choose r0 there instead.
> 
> I would expect that r116 has a strong preference for r0 given the r116 = mov
> r0 and thus allocating r116 to r0 should have the lowest cost by a large
> margin.

p116 conflicts with hr0.  Therefore it can not get hr0.  p112 is connected with p116.  p112 got hr4 and p116 got 3.  Assigning 4 to 116 is profitable.  Therefore assignment of p116 is changed to 4.

The question is why p116 conflicts with hr0.  Before RA we have

(insn 50 3 7 2 (set (reg:SI 116)
        (reg:SI 0 r0 [ a ])) "/home/cygnus/vmakarov/build1/trunk/gcc/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c":11:1 181 {*arm_mo\
vsi_insn}
     (nil))

---> No reg-dead r0!

because later we have

call_insn 11 9 51 3 (parallel [
            (set (reg:SI 0 r0)
                (call (mem:SI (symbol_ref:SI ("foo") [flags 0x41]  <function_decl 0x7f7cc85ac000 foo>) [0 foo S4 A32])
                    (const_int 0 [0])))
            (use (const_int 0 [0]))
            (clobber (reg:SI 14 lr))
        ]) "/home/cygnus/vmakarov/build1/trunk/gcc/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c":16:11 219 {*call_value_symbol}
     (expr_list:REG_CALL_DECL (symbol_ref:SI ("foo") [flags 0x41]  <function_decl 0x7f7cc85ac000 foo>)
        (nil))
    (expr_list (clobber (reg:SI 12 ip))
        (expr_list:SI (use (reg:SI 0 r0))
            (nil))))

---> use r0!
Comment 21 Wilco 2019-04-12 21:01:53 UTC
(In reply to Vladimir Makarov from comment #20)
> (In reply to Wilco from comment #19)
> > (In reply to Peter Bergner from comment #18)
> > > (In reply to Segher Boessenkool from comment #15)
> > > >       Popping a5(r116,l0)  -- assign reg 3
> > > >       Popping a3(r112,l0)  -- assign reg 4
> > > >       Popping a2(r114,l0)  -- assign reg 3
> > > >       Popping a0(r111,l0)  -- assign reg 0
> > > >       Popping a4(r117,l0)  -- assign reg 0
> > > >       Popping a1(r113,l0)  -- assign reg 2
> > > > Assigning 4 to a5r116
> > > > Disposition:
> > > >     0:r111 l0     0    3:r112 l0     4    1:r113 l0     2    2:r114 l0     3
> > > >     5:r116 l0     4    4:r117 l0     0
> > > > 
> > > > 
> > > > r116 does not conflict with *any* other pseudo.  It is alive in the first
> > > > two insns of the function, which are
> > > 
> > > So we initially assign r3 to r116 presumably because it has the same cost as
> > > the other gprs and it occurs first in REG_ALLOC_ORDER.  Then
> > > improve_allocation() decides that r4 is a better hard reg and switches the
> > > assignment to that.  I'm not sure why it wouldn't choose r0 there instead.
> > 
> > I would expect that r116 has a strong preference for r0 given the r116 = mov
> > r0 and thus allocating r116 to r0 should have the lowest cost by a large
> > margin.
> 
> p116 conflicts with hr0.  Therefore it can not get hr0.  p112 is connected
> with p116.  p112 got hr4 and p116 got 3.  Assigning 4 to 116 is profitable. 
> Therefore assignment of p116 is changed to 4.
> 
> The question is why p116 conflicts with hr0.  Before RA we have

That's a bug since register copies should not create a conflict. It's one of the most basic optimization of register allocator.

And there is also the question why we do move r0 into a virtual register but not assign the virtual register to an argument register.
Comment 22 Peter Bergner 2019-04-12 23:02:12 UTC
(In reply to Wilco from comment #21)
> (In reply to Vladimir Makarov from comment #20)
>> The question is why p116 conflicts with hr0.  Before RA we have
> 
> That's a bug since register copies should not create a conflict. It's one of
> the most basic optimization of register allocator.
> 
> And there is also the question why we do move r0 into a virtual register but
> not assign the virtual register to an argument register.

We don't since my patch adding that support in current trunk.  That said, if
non_conflicting_reg_copy_p() returns NULL_RTX for that r116=r0 copy insn, then they will conflict.  So what does non_conflicting_reg_copy_p() return?  ...and if it says they conflict, why?  The insn has side effects or SImode is a register pair on arm or ???
Comment 23 Segher Boessenkool 2019-04-13 00:24:27 UTC
It says (I added some debug)

   Insn 50(l0): point = 27
ignoring for conflicts:
(reg:SI 0 r0 [ a ])

but non_conflicting_reg_copy_p isn't called at all where it is improving
the allocation
Comment 24 Peter Bergner 2019-04-13 01:16:41 UTC
So improve_allocation() initially looks at using r0, but disregards it because check_hard_reg_p() returns false for r0, and that is because we fail this test:

  /* Checking only profitable hard regs.  */
  if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno))
    return false;

I don't know why r0 isn't in profitable_regs for pseudo 116.
Comment 25 Vladimir Makarov 2019-04-14 21:43:42 UTC
(In reply to Peter Bergner from comment #24)
> So improve_allocation() initially looks at using r0, but disregards it
> because check_hard_reg_p() returns false for r0, and that is because we fail
> this test:
> 
>   /* Checking only profitable hard regs.  */
>   if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno))
>     return false;
> 
> I don't know why r0 isn't in profitable_regs for pseudo 116.
 
Profitable regs there contain also conflict regs.  R0 is conflicting with p106. If R0 usage (in call insn) were in the same BB, your new conflict calculation found that there is no actual conflict.  But IRA uses df-infrastructure which tells IRA that R0 lives at the BB end where p106 occurs.

So the right solution of the PR would be fixing df-infrastructure live analysis or may be somehow to ignore usage of r0 in call insn. That is how see the situation.
Comment 26 Peter Bergner 2019-04-16 15:23:15 UTC
(In reply to Vladimir Makarov from comment #25)
> (In reply to Peter Bergner from comment #24)
>> I don't know why r0 isn't in profitable_regs for pseudo 116.
>  
> Profitable regs there contain also conflict regs.  R0 is conflicting with
> p106. If R0 usage (in call insn) were in the same BB, your new conflict
> calculation found that there is no actual conflict.  But IRA uses
> df-infrastructure which tells IRA that R0 lives at the BB end where p106
> occurs.

I'm sorry, but I don't see where p116 conflicts with r0.  Can you show me where/how?  Looking at my IRA dump, I see:


+++Allocating 40 bytes for conflict table (uncompressed size 48)
;; a0(r111,l0) conflicts: a2(r114,l0) a1(r113,l0) a3(r112,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a1(r113,l0) conflicts: a0(r111,l0) a2(r114,l0) a3(r112,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a2(r114,l0) conflicts: a0(r111,l0) a1(r113,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a3(r112,l0) conflicts: a0(r111,l0) a1(r113,l0) a4(r117,l0)
;;     total conflict hard regs: 0 12 14
;;     conflict hard regs: 0 12 14

;; a4(r117,l0) conflicts: a3(r112,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a5(r116,l0) conflicts:  cp0:a0(r111)<->a4(r117)@330:move
  cp1:a2(r114)<->a3(r112)@41:shuffle
  cp2:a3(r112)<->a5(r116)@125:shuffle
  pref0:a0(r111)<-hr0@2000
  pref1:a4(r117)<-hr0@660
  pref2:a5(r116)<-hr0@1000
  regions=1, blocks=6, points=10
    allocnos=6 (big 0), copies=3, conflicts=0, ranges=6

Note: I'm assuming we're missing a \n after p116's empty conflicts above?

So I don't see p116 conflict with r0, but I do see we register a shuffle between p112 and p116 and p112 does (correctly) conflict with r0.  Is it really the shuffle between p112 and p116 that is preventing us from putting r0 into p116's profitable regs in the hope the p112 and p116 may get assigned the same reg allowing the removal of the copy?  If so, that shuffle, since it's attached to the setting of the CC reg cannot actually be removed even if p112 and p116 are assigned the same register.  Should we just ignore those types of shuffles/copies that have other side effects?
Comment 27 Segher Boessenkool 2019-04-16 16:51:41 UTC
(In reply to Peter Bergner from comment #26)
> ;; a4(r117,l0) conflicts: a3(r112,l0)
> ;;     total conflict hard regs:
> ;;     conflict hard regs:
> 
> ;; a5(r116,l0) conflicts:  cp0:a0(r111)<->a4(r117)@330:move
>   cp1:a2(r114)<->a3(r112)@41:shuffle
>   cp2:a3(r112)<->a5(r116)@125:shuffle
>   pref0:a0(r111)<-hr0@2000
>   pref1:a4(r117)<-hr0@660
>   pref2:a5(r116)<-hr0@1000
>   regions=1, blocks=6, points=10
>     allocnos=6 (big 0), copies=3, conflicts=0, ranges=6
> 
> Note: I'm assuming we're missing a \n after p116's empty conflicts above?

The code is

  fputs (" conflicts:", file);
  n = ALLOCNO_NUM_OBJECTS (a);
  for (i = 0; i < n; i++)
    {
      ira_object_t obj = ALLOCNO_OBJECT (a, i);
      ira_object_t conflict_obj;
      ira_object_conflict_iterator oci;

      if (OBJECT_CONFLICT_ARRAY (obj) == NULL)
        continue;
      [...]
    }

and the

;;     total conflict hard regs:

etc. prints are in that [...].
Comment 28 Peter Bergner 2019-04-16 17:08:46 UTC
Vlad, in looking at add_insn_allocno_copies(), it looks like it relies on seeing REG_DEAD notes on whether to record a copy/shuffle that should be handled.  Shouldn't we instead be looking at whether the source and destination regs conflict or not?  Ie, there might not be a REG_DEAD note, but that doesn't mean the two regs/pseudos conflict.  And conversely, if there is a REG_DEAD note on the copy/shuffle, the two regs/pseudos still could conflict.
Comment 29 Peter Bergner 2019-04-16 17:10:51 UTC
(In reply to Segher Boessenkool from comment #27)
> > Note: I'm assuming we're missing a \n after p116's empty conflicts above?
> 
> The code is

Right.  I already whipped up a patch that gives me:

;; a5(r116,l0) conflicts:
;;     total conflict hard regs:
;;     conflict hard regs:


  cp0:a0(r111)<->a4(r117)@330:move
  ...
Comment 30 Jakub Jelinek 2019-04-17 13:49:48 UTC
Is the *movsi_compare0 pattern actually ever a benefit before RA?  At least in this case it clearly results in a worse generated code rather than better, and I bet in other cases too, it just ties the hands of the RA too much.
I wonder if it better shouldn't be a pattern that is only matched when reload_completed and recognized say by a peephole2 or something similar.
Comment 31 Segher Boessenkool 2019-04-17 15:44:08 UTC
It's how you do a parallel of a mov and a flags set, which of course you
can have before RA, and you want created by combine, typically.  Or do I
misunderstand the question?

(I though Arm have a "movs" op for this, btw?)
Comment 32 Peter Bergner 2019-04-17 23:25:09 UTC
(In reply to Peter Bergner from comment #26)
> (In reply to Vladimir Makarov from comment #25)
> > (In reply to Peter Bergner from comment #24)
> >> I don't know why r0 isn't in profitable_regs for pseudo 116.
> >  
> > Profitable regs there contain also conflict regs.  R0 is conflicting with
> > p106. If R0 usage (in call insn) were in the same BB, your new conflict
> > calculation found that there is no actual conflict.  But IRA uses
> > df-infrastructure which tells IRA that R0 lives at the BB end where p106
> > occurs.
> 
> I'm sorry, but I don't see where p116 conflicts with r0.  Can you show me
> where/how?  Looking at my IRA dump, I see:

Ok, so there is a bug in print_allocno_conflicts() that causes us to skip printing the hard reg conflicts if the allocno doesn't have any conflicts with other allocnos.  I submitted a patch to fix that.  With the fix, I know see the following conflict info for p116:

;; a5(r116,l0) conflicts:
;;     total conflict hard regs: 0
;;     conflict hard regs:

So this explains why p116 isn't assigned r0.  That doesn't explain why p116 conflicts with r0 though, because looking at the rtl brlow, it shouldn't:

<r0 is live here>
(insn 50 3 7 2 (set (reg:SI 116)
        (reg:SI 0 r0 [ aD.4197 ])) "bug.i":7:1 181 {*arm_movsi_insn}
     (nil))
(insn 7 50 8 2 (parallel [
            (set (reg:CC 100 cc)
                (compare:CC (reg:SI 116)
                    (const_int 0 [0])))
            (set (reg/v:SI 112 [ aD.4197 ])
                (reg:SI 116))
        ]) "bug.i":10:6 188 {*movsi_compare0}
     (expr_list:REG_DEAD (reg:SI 116)
        (nil)))
<r0 is live here>

So yes, r0 is live at the definition of p116, we know they have the same value.  My ira-conflicts.c changes adding non_conflicting_reg_copy_p() should have handled that, but it isn't.  Now non_conflicting_reg_copy_p() does correctly notice that insn 50 is a simple copy that we can ignore for conflict purposes, but somehow, a conflict is still being added.

I tracked the problem down to ira-conflicts.c:make_object_dead() not handling ignore_reg_for_conflicts correctly.  The bug is that we correctly remove the ignored reg (r0) from OBJECT_CONFLICT_HARD_REGS, but we miss removing it from OBJECT_TOTAL_CONFLICT_HARD_REGS too.  I'm working on a patch.
Comment 33 Peter Bergner 2019-04-17 23:54:54 UTC
Created attachment 46189 [details]
Proposed patch

Here is a patch that fixes make_object_dead() that was causing r0 to be incorrectly added to p116's total_conflict_regs which made it impossible to assign r0 to p116.  With this patch, we now assign r0 to p116 like we want:

;; a5(r116,l0) conflicts:
;;     total conflict hard regs:
;;     conflict hard regs:

...

      Popping a5(r116,l0)  -- assign reg 0
      Popping a3(r112,l0)  -- assign reg 4
      Popping a2(r114,l0)  -- assign reg 4
      Popping a0(r111,l0)  -- assign reg 0
      Popping a4(r117,l0)  -- assign reg 0
      Popping a1(r113,l0)  -- assign reg 3
Disposition:
    0:r111 l0     0    3:r112 l0     4    1:r113 l0     3    2:r114 l0     4
    5:r116 l0     0    4:r117 l0     0


Can someone on the ARM side please bootstrap and regtest the patch to see if it fixes the testsuite fallout?  I'll bootstrap and regtest it on power.
Comment 34 Peter Bergner 2019-04-18 01:28:31 UTC
Created attachment 46190 [details]
Updated patch

Updated patch that is functionally the same, but I like this one better.
Comment 35 Segher Boessenkool 2019-04-18 12:17:03 UTC
Peter's patch solves this particular problem, but not the PR unfortunately.

I finally understand Jakub's comment 30.  This patch solves the PR (also
without Peter's patch):

===
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0aecd03..67dddb2 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6340,7 +6340,7 @@ (define_insn "*movsi_compare0"
                    (const_int 0)))
    (set (match_operand:SI 0 "s_register_operand" "=r,r")
        (match_dup 1))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && reload_completed"
   "@
    cmp%?\\t%0, #0
    subs%?\\t%0, %1, #0"
===
Comment 36 Richard Earnshaw 2019-04-18 12:25:43 UTC
(In reply to Segher Boessenkool from comment #35)
> Peter's patch solves this particular problem, but not the PR unfortunately.
> 
> I finally understand Jakub's comment 30.  This patch solves the PR (also
> without Peter's patch):
> 
> ===
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 0aecd03..67dddb2 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -6340,7 +6340,7 @@ (define_insn "*movsi_compare0"
>                     (const_int 0)))
>     (set (match_operand:SI 0 "s_register_operand" "=r,r")
>         (match_dup 1))]
> -  "TARGET_32BIT"
> +  "TARGET_32BIT && reload_completed"
>    "@
>     cmp%?\\t%0, #0
>     subs%?\\t%0, %1, #0"
> ===

And what about all the cases where the move and compare are not adjacent in the instruction stream so don't get matched by peepholing?
Comment 37 Segher Boessenkool 2019-04-18 12:35:42 UTC
Yes, it is a balancing act.  Which option works better?
Comment 38 Wilco 2019-04-18 13:08:11 UTC
(In reply to Segher Boessenkool from comment #37)
> Yes, it is a balancing act.  Which option works better?

Well the question really is what is bad about movsi_compare0 that could be easily fixed?

The move is for free so there is no need for the "r,0" variant in principle, so if that helps reducing constraints on register allocation then we could remove or reorder that alternative.
Comment 39 Segher Boessenkool 2019-04-18 13:45:54 UTC
On a linux kernel defconfig build it increases code size by 0.567%.
That seems a bit much :-(

The peephole only recognises

  mov rA,rB
  cmp rB,#0

and not

  mov rA,rB
  cmp rA,#0

or

  cmp rB,#0
  mov rA,rB

and we see a lot of the latter, after my patch anyway.
Comment 40 Jakub Jelinek 2019-04-18 13:51:00 UTC
(In reply to Segher Boessenkool from comment #39)
> On a linux kernel defconfig build it increases code size by 0.567%.
> That seems a bit much :-(
> 
> The peephole only recognises
> 
>   mov rA,rB
>   cmp rB,#0
> 
> and not
> 
>   mov rA,rB
>   cmp rA,#0

Well, changing the peephole2 so that it handles both of the above at the same time shall be quite easy.
> 
> or
> 
>   cmp rB,#0
>   mov rA,rB

And adding a peephole for this case too.

The question is what the code size differences would be with those changes (i.e. how often does it help not to have *movsi_compare0 make RA decisions worse vs. how often we actually have those two instructions separated by other insns).
Comment 41 Segher Boessenkool 2019-04-18 13:54:34 UTC
(In reply to Wilco from comment #38)
> Well the question really is what is bad about movsi_compare0 that could be
> easily fixed?

"Easily fixed"...  There is no such thing here.

Because it is a parallel everything has to work on the compare and the move
together.  Various things do not handle that, things that only handle simple
moves for example.  Like prepare_shrink_wrap in this testcase.  And for many
other things you have to split the parallel before you can do the transform
you want.
Comment 42 Segher Boessenkool 2019-04-18 13:56:21 UTC
(In reply to Jakub Jelinek from comment #40)
> The question is what the code size differences would be with those changes
> (i.e. how often does it help not to have *movsi_compare0 make RA decisions
> worse vs. how often we actually have those two instructions separated by
> other insns).

Yeah.  If someone writes patches adding the peepholes, I can test it, but I'm
no hero at writing peepholes, esp. for an arch I do not fully understand :-/
Comment 43 Peter Bergner 2019-04-18 14:06:52 UTC
(In reply to Jakub Jelinek from comment #40)
> The question is what the code size differences would be with those changes
> (i.e. how often does it help not to have *movsi_compare0 make RA decisions
> worse vs. how often we actually have those two instructions separated by
> other insns).

How does *movsi_compare0 make RA decisions worse other than the issue of p116 not being assigned r0 above, which my patch attached above fixes?
Comment 44 Jakub Jelinek 2019-04-18 14:24:38 UTC
Well, it requires that the RA looks specially for this kind of pattern and if it ends up being a noop move, nothing simplifies the pattern again back to normal comparison, and as Segher noted, it can negatively affect other optimization passes.

Completely untested patch peephole2 patch:
--- gcc/config/arm/arm.md.jj	2019-03-19 11:04:49.283170205 +0100
+++ gcc/config/arm/arm.md	2019-04-18 16:21:18.974543408 +0200
@@ -10928,12 +10928,22 @@
   [(set (match_operand:SI 0 "arm_general_register_operand" "")
 	(match_operand:SI 1 "arm_general_register_operand" ""))
    (set (reg:CC CC_REGNUM)
-	(compare:CC (match_dup 1) (const_int 0)))]
+	(compare:CC (match_operand:SI 2 "arm_general_register_operand" "")
+		    (const_int 0)))]
+  "TARGET_ARM
+   && (rtx_equal_p (operands[2], operands[0])
+       || rtx_equal_p (operands[2], operands[1]))"
+  [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (const_int 0)))
+	      (set (match_dup 0) (match_dup 1))])])
+
+(define_peephole2
+  [(set (reg:CC CC_REGNUM)
+	(compare:CC (match_operand:SI 1 "arm_general_register_operand" "")
+		    (const_int 0)))]
+   (set (match_operand:SI 0 "arm_general_register_operand" "") (match_dup 1))]
   "TARGET_ARM"
   [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (const_int 0)))
-	      (set (match_dup 0) (match_dup 1))])]
-  ""
-)
+	      (set (match_dup 0) (match_dup 1))])])
 
 (define_split
   [(set (match_operand:SI 0 "s_register_operand" "")
Comment 45 Peter Bergner 2019-04-18 15:26:27 UTC
I submitted a patch to fix the IRA conflict issue.
Comment 46 Segher Boessenkool 2019-04-18 15:33:56 UTC
With all three patches together (Peter's, mine, Jakub's), I get a code size
increase of only 0.047%, much more acceptable.  Now looking what that diff
really *is* :-)
Comment 47 Wilco 2019-04-18 15:53:43 UTC
(In reply to Segher Boessenkool from comment #46)
> With all three patches together (Peter's, mine, Jakub's), I get a code size
> increase of only 0.047%, much more acceptable.  Now looking what that diff
> really *is* :-)

I think with Jakub's change you don't need to disable the movsi_compare0 pattern in combine. If regalloc works as expected, it will get split into a compare so shrinkwrap can handle it.
Comment 48 Segher Boessenkool 2019-04-18 16:09:32 UTC
With just Peter's and Jakub's patch, it *improves* code size by 0.090%.
That does not fix this PR though :-/
Comment 49 Segher Boessenkool 2019-04-18 16:10:55 UTC
(In reply to Wilco from comment #47)
> (In reply to Segher Boessenkool from comment #46)
> > With all three patches together (Peter's, mine, Jakub's), I get a code size
> > increase of only 0.047%, much more acceptable.  Now looking what that diff
> > really *is* :-)
> 
> I think with Jakub's change you don't need to disable the movsi_compare0
> pattern in combine. If regalloc works as expected, it will get split into a
> compare so shrinkwrap can handle it.

prepare_shrink_wrap can not handle that.  prepare_shrink_wrap needs to be
improved for other reasons, of course.
Comment 50 Segher Boessenkool 2019-04-18 16:15:24 UTC
The insn is

(insn 7 3 8 2 (parallel [
            (set (reg:CC 100 cc)
                (compare:CC (reg:SI 0 r0 [116])
                    (const_int 0 [0])))
            (set (reg/v:SI 4 r4 [orig:112 a ] [112])
                (reg:SI 0 r0 [116]))
        ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
     (nil))

and that isn't split, and then prepare_shrink_wrap gives up on it.
Comment 51 Richard Earnshaw 2019-04-18 16:19:58 UTC
(In reply to Segher Boessenkool from comment #50)
> The insn is
> 
> (insn 7 3 8 2 (parallel [
>             (set (reg:CC 100 cc)
>                 (compare:CC (reg:SI 0 r0 [116])
>                     (const_int 0 [0])))
>             (set (reg/v:SI 4 r4 [orig:112 a ] [112])
>                 (reg:SI 0 r0 [116]))
>         ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
>      (nil))
> 
> and that isn't split, and then prepare_shrink_wrap gives up on it.

In the more general case splitting this would produce worse code, not better, since then we'd end up with two instructions rather than one.
Comment 52 Wilco 2019-04-18 17:04:22 UTC
(In reply to Segher Boessenkool from comment #48)
> With just Peter's and Jakub's patch, it *improves* code size by 0.090%.
> That does not fix this PR though :-/

But it does fix most of the codesize regression. The shrinkwrapping testcase seems a preexisting problem that was exposed by the combine changes, so it doesn't need to hold up the release. The regalloc change might fix addr-modes-float.c too.
Comment 53 Segher Boessenkool 2019-04-18 19:13:53 UTC
(In reply to Richard Earnshaw from comment #51)
> In the more general case splitting this would produce worse code, not
> better, since then we'd end up with two instructions rather than one.

Sure, it _often_ is good to have it merged.  Quite clearly more often than
not it's good, so if you have to pick only one way, this is the way to go.

Hopefully we can do better though.  But not for stage 4, sure.
Comment 54 Segher Boessenkool 2019-04-18 19:16:29 UTC
(In reply to Wilco from comment #52)
> (In reply to Segher Boessenkool from comment #48)
> > With just Peter's and Jakub's patch, it *improves* code size by 0.090%.
> > That does not fix this PR though :-/
> 
> But it does fix most of the codesize regression.

Yes, and it often creates *better* code, as far as I can see.

> The shrinkwrapping testcase
> seems a preexisting problem that was exposed by the combine changes,

It is.

> so it
> doesn't need to hold up the release. The regalloc change might fix
> addr-modes-float.c too.

I'd like to see the RA fix in GCC 9.
Comment 55 Peter Bergner 2019-04-18 22:14:48 UTC
Author: bergner
Date: Thu Apr 18 22:14:17 2019
New Revision: 270448

URL: https://gcc.gnu.org/viewcvs?rev=270448&root=gcc&view=rev
Log:
	PR rtl-optimization/87871
	* ira-lives.c (make_object_dead): Don't add conflicts to
	TOTAL_CONFLICT_HARD_REGS for register ignore_reg_for_conflicts.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-lives.c
Comment 56 Peter Bergner 2019-04-18 22:16:36 UTC
I committed the RA fix.  Unassigning myself now.
Comment 57 Jeffrey A. Law 2019-04-23 16:32:12 UTC
So what's actually left to do with this BZ?  ie, what tests are still regressing?
Comment 58 Jakub Jelinek 2019-04-23 16:38:29 UTC
If we don't want to go with #c35 at least for GCC 9, would the #c44 patch be still useful without it (does it ever trigger say on the kernel where it didn't trigger before)?
Comment 59 Segher Boessenkool 2019-04-23 18:43:49 UTC
(In reply to Jakub Jelinek from comment #58)
> If we don't want to go with #c35 at least for GCC 9, would the #c44 patch be
> still useful without it (does it ever trigger say on the kernel where it
> didn't trigger before)?

The patch in comment 44 is obviously good, it improves the size by 0.090%
as noted (this is a kernel build, multi_v5_defconfig iirc).

I'd say it is perfectly safe for GCC 9, but I'm not an Arm maintainer :-)
Comment 60 Jakub Jelinek 2019-04-25 08:35:06 UTC
This PR had various fixes applied already and the remaining issues don't warrant a release blocker, so downgrading this to P2.
Comment 61 Jakub Jelinek 2019-05-03 09:18:15 UTC
GCC 9.1 has been released.
Comment 62 Jakub Jelinek 2019-08-12 08:57:50 UTC
GCC 9.2 has been released.