Created attachment 53580 [details] Assembly from gcc -O3 -S bug.c The following is an implementation of a ray/axis-aligned box intersection test: struct ray { float origin[3]; float dir_inv[3]; }; struct box { float min[3]; float max[3]; }; static inline float min(float x, float y) { return x < y ? x : y; } static inline float max(float x, float y) { return x < y ? x : y; } _Bool intersection(const struct ray *ray, const struct box *box) { float tmin = 0.0, tmax = 1.0 / 0.0; for (int i = 0; i < 3; ++i) { float t1 = (box->min[i] - ray->origin[i]) * ray->dir_inv[i]; float t2 = (box->max[i] - ray->origin[i]) * ray->dir_inv[i]; tmin = min(max(t1, tmin), max(t2, tmin)); tmax = max(min(t1, tmax), min(t2, tmax)); } return tmin < tmax; } However, gcc -O3 doesn't use minss/maxss for every min()/max(). Instead, some of them are lowered to conditional jumps which regresses performance significantly since the branches are unpredictable. Simpler variants like tmin = max(tmin, min(t1, t2)); tmax = min(tmax, max(t1, t2)); get the desired codegen, but that behaves differently if t1 or t2 is NaN. "Bisecting" with godbolt.org, it seems this is an old regression: 4.8.5 was good, but 4.9.0 was bad.
Note, your 'max' function is the same as 'min' (the issue remains with that corrected).
(In reply to Alexander Monakov from comment #1) > Note, your 'max' function is the same as 'min' (the issue remains with that > corrected). Whoops, thanks. Also I just noticed that GCC 12.2 does better (but not perfect) with #define min(x, y) ((x) < (y) ? (x) : (y)) #define max(x, y) ((x) > (y) ? (x) : (y)) instead of the inline functions. Doesn't seem to help GCC trunk though.
The main issue is that combine is confused with the more complicated setup and the x86 ieee min/max patterns and on the GIMPLE level we are not forming IEEE compatible .FMIN and .FMAX plus the x86 backend doesn't announce any.
With the proposed patches for PR88540 and PR105715 I get with -O3 -msse4.1 intersection: .LFB2: .cfi_startproc movss .LC0(%rip), %xmm5 pxor %xmm2, %xmm2 movss (%rdi), %xmm4 movss 12(%rsi), %xmm1 movss 12(%rdi), %xmm0 divss %xmm2, %xmm5 movss (%rsi), %xmm3 subss %xmm4, %xmm1 subss %xmm4, %xmm3 pxor %xmm4, %xmm4 mulss %xmm0, %xmm1 mulss %xmm0, %xmm3 movaps %xmm1, %xmm0 cmpnltss %xmm2, %xmm0 blendvps %xmm0, %xmm1, %xmm4 movaps %xmm3, %xmm0 cmpnltss %xmm2, %xmm0 pxor %xmm2, %xmm2 blendvps %xmm0, %xmm3, %xmm2 movss 16(%rsi), %xmm0 minss %xmm5, %xmm3 minss %xmm5, %xmm1 movss 4(%rdi), %xmm5 minss %xmm4, %xmm2 movss 16(%rdi), %xmm4 subss %xmm5, %xmm0 maxss %xmm3, %xmm1 movss 4(%rsi), %xmm3 subss %xmm5, %xmm3 mulss %xmm4, %xmm0 movss 8(%rdi), %xmm5 mulss %xmm4, %xmm3 movaps %xmm2, %xmm4 maxss %xmm0, %xmm4 minss %xmm1, %xmm0 maxss %xmm3, %xmm2 minss %xmm1, %xmm3 movss 8(%rsi), %xmm1 subss %xmm5, %xmm1 maxss %xmm3, %xmm0 movss 20(%rsi), %xmm3 minss %xmm4, %xmm2 movss 20(%rdi), %xmm4 subss %xmm5, %xmm3 mulss %xmm4, %xmm1 movaps %xmm2, %xmm5 mulss %xmm4, %xmm3 movaps %xmm2, %xmm4 maxss %xmm1, %xmm4 minss %xmm0, %xmm1 movaps %xmm3, %xmm2 maxss %xmm3, %xmm5 minss %xmm0, %xmm2 minss %xmm5, %xmm4 maxss %xmm1, %xmm2 comiss %xmm4, %xmm2 seta %al ret there's the existing issue that RTL conditional move expansion doesn't preserve the equality of constants for _33 = t2_34 < 0.0; _12 = _33 ? 0.0 : t2_34; but it emits two loads from the constant pool for 0.0 here which in the x86 backend fail to be recognized as min/max.
After the latest fixes we still fail to recognize min/max early for float < 0.0 ? float : 0.0 because prepare_cmp_insn doesn't push the FP 0.0 constant to a reg since RTX cost for this seems to be zero. We then call insn_operand_matches which ultimatively fails in ix86_fp_comparison_operator as ix86_fp_comparison_strategy is IX86_FPCMP_COMI here and ix86_trivial_fp_comparison_operator for (lt (reg/v:SF 110 [ t2 ]) (const_double:SF 0.0 [0x0.0p+0])) returns false. If I fix things so we try (gt (const_double:SF 0.0 [0x0.0p+0]) (reg:SF ..)) then maybe_legitimize_operands "breaks" things here since it forces the cond operand to a register but not the comparison operand so ix86_expand_fp_movcc again FAILs. I'm not sure why the x86 backend allows any CONST_DOUBLE as part of comparisons (during expansion only?). This and maybe special-handling of rtx_cost with this special constant and LT/GT code makes the first compares not recognized as MIN/MAX. The rest is fixed now. Patch for trying (gt ..): diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 32ff379ffc3..3ff8ba88bbb 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -4607,6 +4607,14 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, break; } + if (FLOAT_MODE_P (mode)) + { + prepare_cmp_insn (y, x, swap_condition (comparison), + size, unsignedp, methods, ptest, pmode); + if (*ptest) + return; + } + if (methods != OPTAB_LIB_WIDEN) goto fail;