Bug 89811 - uint32_t load is not recognized if shifts are done in a fixed-size loop
Summary: uint32_t load is not recognized if shifts are done in a fixed-size loop
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 9.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 94834 (view as bug list)
Depends on:
Blocks: 94094
  Show dependency treegraph
 
Reported: 2019-03-24 18:31 UTC by Nikita Kniazev
Modified: 2024-03-12 18:08 UTC (History)
3 users (show)

See Also:
Host: x86_64
Target: x86_64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-09-04 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nikita Kniazev 2019-03-24 18:31:15 UTC
I was expecting that fixed-size loop will be unrolled and the uint32_t load pattern is recognized, but it does not happen. Clang has no problems with this https://godbolt.org/z/8ES09V

#include <cstdint>
#include <cstring>

// recognized
std::uint32_t good(const unsigned char *p)
{
    std::uint32_t result = 0;
    result |= (static_cast<std::uint32_t>(p[0]) << 0);
    result |= (static_cast<std::uint32_t>(p[1]) << 8);
    result |= (static_cast<std::uint32_t>(p[2]) << 16);
    result |= (static_cast<std::uint32_t>(p[3]) << 24);
    return result;
}

// not recognized if done in a loop
std::uint32_t loop(const unsigned char *p)
{
  std::uint32_t result = 0;
  for (int i = 0; i < 4; ++i)
      result |= (static_cast<std::uint32_t>(p[i]) << (i * 8));

  return result;
}

// other variations are not recognized too
std::uint32_t bad(const unsigned char *p)
{
  std::uint32_t result = 0;
  //result <<= 8;
  result |= static_cast<std::uint32_t>(p[3]);
  result <<= 8;
  result |= static_cast<std::uint32_t>(p[2]);
  result <<= 8;
  result |= static_cast<std::uint32_t>(p[1]);
  result <<= 8;
  result |= static_cast<std::uint32_t>(p[0]);

  return result;
}

std::uint32_t loop2(const unsigned char *p)
{
  std::uint32_t result = 0;
  for (int i = 0; i < 4; ++i) {
      result <<= 8;
      result |= static_cast<std::uint32_t>(p[3 - i]);
  }

  return result;
}
Comment 1 Richard Biener 2019-03-25 09:30:38 UTC
Unrolling happens too late for bswap.
Comment 2 Segher Boessenkool 2019-04-22 12:37:55 UTC
On PowerPC, for "bad" we get

        addi 9,3,2
        lbz 0,1(3)
        lbz 3,0(3)
        lhbrx 10,0,9
        rlwimi 0,10,8,0,31-8
        rlwimi 3,0,8,0,31-8
        rldicl 3,3,0,32
        blr

(BE -m64); it managed to recognise the top two bytes as a byte-reverse load,
but not the lower two.

(And yup, "loop" uses no byte-reverse at all.)
Comment 3 Andrew Pinski 2021-09-04 21:33:06 UTC
*** Bug 94834 has been marked as a duplicate of this bug. ***