Bug 122240 - LIM missed opportunity in loop
Summary: LIM missed opportunity in loop
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 16.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2025-10-10 15:05 UTC by Matt Godbolt
Modified: 2025-10-13 07:52 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2025-10-13 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Godbolt 2025-10-10 15:05:28 UTC
In code similar to 122226, I have this "strlen" type thing for ints:

```
#include <cstddef>

// [[gnu::noinline]]
// [[gnu::const]]
// [[gnu::noinline, gnu::const]]
static std::size_t count_ints(const int *ints) {
  std::size_t num{};
  while (*ints) {
    num++;
    ints++;
  }
  return num;
}

std::size_t num_compares{};

// Returns if a zero-termed list of ints has 1234
bool has_1234(const int *ints) {
  for (std::size_t index = 0; index < count_ints(ints); ++index) {
    ++num_compares;
    if (ints[index] == 1234) {
      return true;
    }
  }
  return false;
}
```

That is, we have a "strlen" type function (count_ints), and another that foolishly calls it for each loop iteration, looking for "1234".

CE link: https://godbolt.org/z/a8o8doob9

Even though gcc can see the body of the "count_ints" function, and the `int` does not alias with the `size_t`, and there's no aggregate being returned, gcc generates the code for `count_ints` inside the loop:

```
.L3:
        xor     eax, eax ; length = 0
.L6:
        add     rax, 1
        mov     r8d, DWORD PTR [rdi+rax*4]
        test    r8d, r8d
        jne     .L6   ; loop looking for a zero
        cmp     rdx, rax ; is current "1234 index" same as length
        jnb     .L15   ; exit loop
        add     rcx, 1 ; inc count
        cmp     DWORD PTR [rdi+rdx*4], 1234  ; check this entry for 1234
        je      .L16  ; exit if found
        add     rdx, 1 ; ++loop
        mov     esi, 1
        jmp     .L3   ; back to main loop at L3 (which re-scans the loop from the beginning again)
```

If we mark it noinline (comment in and out the various attributes), we see it call the function each time.

Manually marking it `gnu::const` does not help here. Marking it _both_ noinline _and_ const gives the result I'd expect, with the code counting the length once at the top of the loop.

Initially I thought gcc was being clever when inlining and walking along the array looking for _either_ zero or 1234, but it does seem like it's unnecessarily looping each time, having not recognised it can LIM out the count_ints.

I may be missing something, but I didn't see anything in the optimisation report. But it seems:

  - GCC with visible body, no attributes: Re-counts on every iteration (conservative but slow)
  - GCC with visible body + [[gnu::const]]: Hoists correctly, though should be unnecessary, unless I misunderstand `const` here
  - GCC with just declaration + [[gnu::const]]: Doesn't hoist (a bug? - doesn't respect the attribute)


For reference, clang 21.1 also fails to hoist when the body is visible without attributes (though it rewrites count_ints to `wcslen`). However, clang does respect [[gnu::const]] on just the declaration (without visible body), and successfully hoists in that case - see https://godbolt.org/z/57b4Y3KPG. Maybe the bug here is that GCC doesn't respect [[gnu::const]] on declarations when it can't see the function body, even if I'm missed a reason why the body of the function isn't already `const` when it can be seen.

Thanks in advance and apologies if I've missed an obvious reason why this can't be LIMed.
Comment 1 Matt Godbolt 2025-10-10 17:52:55 UTC
Correction: the "clang hoists if it sees const declaration" CE link is https://godbolt.org/z/K8Ee6P3fT
Comment 2 Richard Biener 2025-10-13 07:52:32 UTC
I think there's an existing bug that talks about inlining vs. often better optimization when a 'const' (or 'pure') call is still visible as call.  The
function looks comparatively large, so likely only inlined during IPA (but
it's called-once static, so it might be inlined even early), so this would
be another issue where we miss an early invaraint motion pass.