cat minmax-10.c --- #define max(a,b) (((a) > (b))) ? (a) : (b) #define min(a,b) (((a) < (b))) ? (a) : (b) int smax1(int x) { return max(x,1); } --- gcc -O2 -march=skylake-avx512 -S --- smax1(int): vmovdqa xmm1, XMMWORD PTR .LC0[rip] vmovd xmm0, edi vpmaxsd xmm0, xmm0, xmm1 vmovd eax, xmm0 ret --- gcc -O2 -march=x86-64 -S --- smax1(int): test edi, edi mov eax, 1 cmovg eax, edi ret --- It seems cost model need to be adjusted for skylake_cost.
Collected chain #1... insns: 6 defs to convert: r84, r85 Computing gain for chain #1... Instruction gain 8 for 6: {r84:SI=smax(r85:SI,0x1);clobber flags:CC;} REG_DEAD r85:SI REG_UNUSED flags:CC Instruction conversion gain: 8 Registers conversion cost: 4 Total gain: 4 Converting chain #1... else if (GET_CODE (src) == SMAX || GET_CODE (src) == SMIN || GET_CODE (src) == UMAX || GET_CODE (src) == UMIN) { /* We do not have any conditional move cost, estimate it as a reg-reg move. Comparisons are costed as adds. */ igain += m * (COSTS_N_INSNS (2) + ix86_cost->add); /* Integer SSE ops are all costed the same. */ igain -= ix86_cost->sse_op; } is a bit of hand-waving, esp. since costs in general have not been very consistent between the GPR and vector parts of the cores (that also hurts vector cost estimates btw which also compare those apples vs. oranges)
.
(In reply to Hongtao.liu from comment #2) > . Difference comes from /* Cost the integer to sse and sse to integer moves. */ cost += n_sse_to_integer * ix86_cost->sse_to_integer; /* ??? integer_to_sse but we only have that in the RA cost table. Assume sse_to_integer/integer_to_sse are the same which they are at the moment. */ cost += n_integer_to_sse * ix86_cost->sse_to_integer; Maybe need to increase ix86->sse_to_integer, i'm test the following. --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1769,7 +1769,7 @@ struct processor_costs skylake_cost = { {6, 6, 6, 10, 20}, /* cost of unaligned loads. */ {8, 8, 8, 8, 16}, /* cost of unaligned stores. */ 2, 2, 4, /* cost of moving XMM,YMM,ZMM register */ - 2, /* cost of moving SSE register to integer. */ + 6, /* cost of moving SSE register to integer. */ 20, 8, /* Gather load static, per_elt. */ 22, 10, /* Gather store static, per_elt. */ 64, /* size of l1 cache. */
Closed by mistake.
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>: https://gcc.gnu.org/g:83858ba1db31cd83667592a41d71315090459da4 commit r11-3294-g83858ba1db31cd83667592a41d71315090459da4 Author: liuhongt <hongtao.liu@intel.com> Date: Wed Sep 16 10:53:52 2020 +0800 Increase rtx cost of sse_to_integer in skylake_cost. As https://gcc.gnu.org/pipermail/gcc-patches/2019-August/528839.html indicates, movement between SSE and gpr should be much expensive than movement inside gpr(which is 2 as default). gcc/ChangeLog PR target/96861 * config/i386/x86-tune-costs.h (skylake_cost): increase rtx cost of sse_to_integer from 2 to 6. gcc/testsuite * gcc.target/i386/pr95021-3.c: Add -mtune=generic.
Fixed in GCC11.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:c66c004ad610f4f02d5f065fe29c4c10f05ae13f commit r11-3297-gc66c004ad610f4f02d5f065fe29c4c10f05ae13f Author: H.J. Lu <hjl.tools@gmail.com> Date: Sat Sep 19 09:57:16 2020 -0700 x86: Add a testcase for PR target/96861 Add a testcase to verify that -march=skylake-avx512 -mtune=skylake-avx512 generates desired code sequence. PR target/96861 * gcc.target/i386/pr96861.c: New test.