Created attachment 40215 [details] test-case to reproduce We noticed a huge performance regression on x86 for one important benchmark (the reproduced for which is attached). It is caused by additional if-conversion which can be seen in ce2 dump: IF-THEN-ELSE-JOIN block found, pass 1, test 12, then 13, else 14, join 15 scanning new insn with uid = 163. scanning new insn with uid = 164. scanning new insn with uid = 165. scanning new insn with uid = 166. scanning new insn with uid = 167. scanning new insn with uid = 168. scanning new insn with uid = 169. if-conversion succeeded through noce_try_cmove_arith deleting insn with uid = 85. deleting block 14 Removing jump 78. deleting insn with uid = 78. deleting insn with uid = 80. deleting block 13 Merging block 15 into block 12... changing bb of uid 87 changing bb of uid 88 from 15 to 12 changing bb of uid 89 from 15 to 12 Merged blocks 12 and 15. Conversion succeeded on pass 1. On AVX2 machine we see: time ./test1.1124.exe // build by compiler before r242832. real 0m0.577s user 0m0.575s sys 0m0.002s time ./test1.1125.exe // build by compiler after r242832. real 0m0.888s user 0m0.886s sys 0m0.001s It is sufficient to compile it with -Ofast option to reproduce on x86.
Patch and discussion here. https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00212.html
Author: bernds Date: Mon Jan 23 16:17:33 2017 New Revision: 244816 URL: https://gcc.gnu.org/viewcvs?rev=244816&root=gcc&view=rev Log: PR rtl-optimization/78634 * config/i386/i386.c (ix86_max_noce_ifcvt_seq_cost): New function. (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Define. * ifcvt.c (noce_try_cmove): Add missing cost check. testsuite/ PR rtl-optimization/78634 * gcc.target/i386/funcspec-11.c: Also pass -mtune=i686. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/ifcvt.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/funcspec-11.c
Fixed.
This commit has broken a test case on s390x: FAIL: gcc.target/s390/loc-1.c scan-assembler \tlocgrne\t%r2,%r4 The load-on-condition instruction is no longer used because the branch cost is very low on s390x (1). Using -mbranch-cost=2 fixes the test failure.
I don't know the machine, but with a branch cost of 1 this seems like it might be expected. Do you think this is a testcase problem or something else?
It fails with -march=zEC12 but not with -march=z900. It seems to be a tuning issue of the branch cost in the backend; a colleague is working on that and will mave a patch at some time in the future. So, I think you can ignore this, it's something to be dealt with in the backend.