This is an odd one, and it seems different from the other bswap bugs that I could find in bugzilla. This is on x64. Compiler Explorer link is here: https://godbolt.org/z/arTf5T Full source code: ============================================================== constexpr long long bswap64(long long in) // unsigned long long behaves the same { union { long long v; char c[8]; } u{in}; union { char c[8]; long long v; } v{ u.c[7], u.c[6], u.c[5], u.c[4], u.c[3], u.c[2], u.c[1], u.c[0]}; return v.v; } long long f(long long i) { return bswap64(i); } constexpr long long bswapD(double x) { return bswap64(*(long long*)&x); } long long g(double x) { return bswapD(x); } =============================================================== There are three observations / bugs: 1) bswapD is never recognized as byte-swapping 2) bswap64 is optimized to bswap at -O2 but not at -O3 3) 131t.bswap never shows bswap, apparently the pass doesn't detect this way of writing bswap, leaving it to the RTL optimizers. Hence I classified this as tree-optimization bug. Verified at -O2 with 9.3, 10.1 and trunk on the compiler explorer. I'm flagging this as a regression because at -O2 gcc 8.3 detects bswap in both cases, but I'm guessing that this is by some accident. In 7.5 neither function is compiled as bswap.
The bswap pass doesn't handle these patterns at all (it doesn't look at stores). What does handle this case is store-merging which - on trunk - figures bswap in f() but not in g() likely because of the BIT_FIELD_REF -> cast folding which makes the stores appear inhomogenous: _3 = VIEW_CONVERT_EXPR<long long int>(x_2(D)); _4 = BIT_FIELD_REF <x_2(D), 8, 56>; v.c[0] = _4; ... _11 = (char) _3; v.c[7] = _11; store-merging already handles the cast vs. BIT_FIELD_REF case for f() but it appearantly doesn't consider to look through a VIEW_CONVERT. With -O3 we vectorize this in an inconvenient way and fully elide the store so store-merging isn't the correct pass to handle this: _3 = BIT_FIELD_REF <i_2(D), 8, 56>; _4 = BIT_FIELD_REF <i_2(D), 8, 48>; _5 = BIT_FIELD_REF <i_2(D), 8, 40>; _6 = BIT_FIELD_REF <i_2(D), 8, 32>; _7 = BIT_FIELD_REF <i_2(D), 8, 24>; _8 = BIT_FIELD_REF <i_2(D), 8, 16>; _9 = BIT_FIELD_REF <i_2(D), 8, 8>; _10 = (char) i_2(D); _21 = {_3, _4, _5, _6, _7, _8, _9, _10}; _18 = VIEW_CONVERT_EXPR<long long int>(_21); v ={v} {CLOBBER}; return _18; the vectorizer is also confused about BIT_FIELD_REF vs. cast here (I repeatedly thought of removing that simplification ... but the user could have written it as well :/). And it would look for a vector function argument but that's something that could be fixed. The above is all for GCC 10. GCC 8 possibly was lucky and did not have that BIT_FIELD_REF -> cast simplification.
Created attachment 50361 [details] WIP On current trunk at -O3 f() again works via store-merging / vectorizing: - _21 = {_3, _4, _5, _6, _7, _8, _9, _10}; + bswapsrc_22 = (long unsigned int) i_2(D); + bswapdst_19 = __builtin_bswap64 (bswapsrc_22); + _21 = VIEW_CONVERT_EXPR<vector(8) char>(bswapdst_19); but g() does not, because init_symbolic_number doesn't like non-integral types. Fixing that generates _Z1gd: .LFB2: .cfi_startproc movq %xmm0, %rax bswap %rax ret but with -m32 it has the issue that we bswap only the lower part since vectorizing produced two vector CTORs. So we'd need to use a BIT_FIELD_REF to extract the integer representation. WIP patch attached.
See also PR96573
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Note on the trunk for f and g at -O3 -msse4 (and -O3 on aarch64), GCC produces: _21 = VIEW_CONVERT_EXPR<vector(8) char>(i_2(D)); _22 = VEC_PERM_EXPR <_21, _21, { 7, 6, 5, 4, 3, 2, 1, 0 }>; _18 = VIEW_CONVERT_EXPR<long long int>(_22); But that VEC_PERM_EXPR is a bswap :). So to fix this at -O3 -msse4, maybe we could just do: (simplify (view_convert (vec_perm @0 @0 vector_cst_byteswap_p @1)) (if (INTERGAL_TYPE_P (type))) (convert (bswap (view_convert @1)))) Note I don't think we want to do the byteswap in the integer registers if we are going back to the floating point registers.
(In reply to Andrew Pinski from comment #5) > Note on the trunk for f and g at -O3 -msse4 (and -O3 on aarch64), GCC > produces: > _21 = VIEW_CONVERT_EXPR<vector(8) char>(i_2(D)); > _22 = VEC_PERM_EXPR <_21, _21, { 7, 6, 5, 4, 3, 2, 1, 0 }>; > _18 = VIEW_CONVERT_EXPR<long long int>(_22); > > But that VEC_PERM_EXPR is a bswap :). > > So to fix this at -O3 -msse4, maybe we could just do: > (simplify > (view_convert (vec_perm @0 @0 vector_cst_byteswap_p @1)) > (if (INTERGAL_TYPE_P (type))) > (convert (bswap (view_convert @1)))) > > Note I don't think we want to do the byteswap in the integer registers if we > are going back to the floating point registers. Or we could just support that in vect-lowering.
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.
GCC 11 branch is being closed.