GCC can't simplify static inline function for xor/xnor, but can, if used macro instead. Testcase: #include <inttypes.h> #define TYPE uint8_t #define M_XOR(a,b) ((!!(a))^(!!(b))) #define M_NXOR(a,b) (!((!!(a))^(!!(b)))) __attribute__((__always_inline__, const)) static inline TYPE m_xor (const TYPE a, const TYPE b) { return M_XOR(a,b); } __attribute__((__always_inline__, const)) static inline TYPE m_xnor (const TYPE a, const TYPE b) { return M_NXOR(a,b); } // bad assembly output int test1b(const TYPE a, const TYPE b) { return m_xor(a,b) == !m_xnor(a,b); } int test2b(const TYPE a, const TYPE b) { return !m_xor(a,b) == m_xnor(a,b); } int test3b(const TYPE a, const TYPE b) { return M_XOR(a,b) == !m_xnor(a,b); } // good assembly output int test1g(const TYPE a, const TYPE b) { return m_xor(a,b) == M_XOR(a,b); } int test2g(const TYPE a, const TYPE b) { return M_XOR(a,b) == !M_NXOR(a,b); } int test3g(const TYPE a, const TYPE b) { return M_XOR(a,b) != !M_NXOR(a,b);; }

Thanks. Let me copy what I had in the other PR: we have an old optimization in fold_unary (like other "do ... if ... simplifies" it is not straightforward to move it to match.pd) and thus it only applies when everything is part of the same expression in the source code. /* Convert ~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify. */ If you use even less macros and more lines, things optimize a bit better: m_xnor: TYPE tmp=!!a^!!b; return !tmp; but we still have some discrepancy where we optimize (a^b)==0 but not ~(a^b) (for _Bool type).

By me everything works but test2b. When I in match.pd remove /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) when profitable. For bitwise binary operations apply operand conversions to the binary operation result instead of to the operands. This allows to combine successive conversions and bitwise binary operations. We combine the above two cases by using a conditional convert. */ (for bitop (bit_and bit_ior bit_xor) (simplify (bitop (convert @0) (convert? @1)) (if (((TREE_CODE (@1) == INTEGER_CST && INTEGRAL_TYPE_P (TREE_TYPE (@0)) && int_fits_type_p (@1, TREE_TYPE (@0))) || types_match (@0, @1)) /* ??? This transform conflicts with fold-const.c doing Convert (T)(x & c) into (T)x & (T)c, if c is an integer constants (if x has signed type, the sign bit cannot be set in c). This folds extension into the BIT_AND_EXPR. Restrict it to GIMPLE to avoid endless recursions. */ && (bitop != BIT_AND_EXPR || GIMPLE) && (/* That's a good idea if the conversion widens the operand, thus after hoisting the conversion the operation will be narrower. */ TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type) /* It's also a good idea if the conversion is to a non-integer mode. */ || GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT /* Or if the precision of TO is not the same as the precision of its mode. */ || !type_has_mode_precision_p (type))) (convert (bitop @0 (convert @1)))))) then test2b optimizes correct. The question is why? In match.pd what does convert?

I think the biggest issue is: _8 == _9 For truth_values is not just converted to _8 ^ _9

On the trunk the only one missing is test2b: _8 = a_4(D) != 0; _9 = b_3(D) != 0; _10 = _8 ^ _9; _1 = ~_10; _7 = _8 == _9; _2 = _1 == _7; _5 = (int) _2; If _7 is converted to _8 ^ _9 we would see _7 and _10 are the same _10 = _8 ^ _9; _1 = ~_10; _7 = _10; _2 = _1 ^ _7; And then you have: (~_10 ^ _10) Which is always 1 And done. So: (simplify (eq truth_value@0 truth_value@1) (bit_xor @0 @1)) (simplify (ne truth_value@0 truth_value@1) (bit_xor (bit_xor @0 @1)) { build_one_cst (type); })

(In reply to Andrew Pinski from comment #4) > (simplify > (eq truth_value@0 truth_value@1) > (bit_xor @0 @1)) > (simplify > (ne truth_value@0 truth_value@1) > (bit_xor (bit_xor @0 @1)) { build_one_cst (type); }) Also I had this wrong. ~(a ^ b) is the same as a == b. This was fixed by r13-1779-g375668e0508fbe173a (aka PR 106379).