I was expecting that fixed-size loop will be unrolled and the uint32_t load pattern is recognized, but it does not happen. Clang has no problems with this https://godbolt.org/z/8ES09V #include <cstdint> #include <cstring> // recognized std::uint32_t good(const unsigned char *p) { std::uint32_t result = 0; result |= (static_cast<std::uint32_t>(p[0]) << 0); result |= (static_cast<std::uint32_t>(p[1]) << 8); result |= (static_cast<std::uint32_t>(p[2]) << 16); result |= (static_cast<std::uint32_t>(p[3]) << 24); return result; } // not recognized if done in a loop std::uint32_t loop(const unsigned char *p) { std::uint32_t result = 0; for (int i = 0; i < 4; ++i) result |= (static_cast<std::uint32_t>(p[i]) << (i * 8)); return result; } // other variations are not recognized too std::uint32_t bad(const unsigned char *p) { std::uint32_t result = 0; //result <<= 8; result |= static_cast<std::uint32_t>(p[3]); result <<= 8; result |= static_cast<std::uint32_t>(p[2]); result <<= 8; result |= static_cast<std::uint32_t>(p[1]); result <<= 8; result |= static_cast<std::uint32_t>(p[0]); return result; } std::uint32_t loop2(const unsigned char *p) { std::uint32_t result = 0; for (int i = 0; i < 4; ++i) { result <<= 8; result |= static_cast<std::uint32_t>(p[3 - i]); } return result; }
Unrolling happens too late for bswap.
On PowerPC, for "bad" we get addi 9,3,2 lbz 0,1(3) lbz 3,0(3) lhbrx 10,0,9 rlwimi 0,10,8,0,31-8 rlwimi 3,0,8,0,31-8 rldicl 3,3,0,32 blr (BE -m64); it managed to recognise the top two bytes as a byte-reverse load, but not the lower two. (And yup, "loop" uses no byte-reverse at all.)
*** Bug 94834 has been marked as a duplicate of this bug. ***