Bug 114660 - Exponentiating by squaring not performed for x * y * y * y * y
Summary: Exponentiating by squaring not performed for x * y * y * y * y
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2024-04-09 15:45 UTC by Antony Polukhin
Modified: 2024-04-28 21:47 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-04-09 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Antony Polukhin 2024-04-09 15:45:37 UTC
For the following code:

int mul(int x, int y) {
    return x * y * y * y * y;
}


with -O2 GCC produces the frollowing assembly:

mul(int, int):
  mov eax, edi
  imul eax, esi
  imul eax, esi
  imul eax, esi
  imul eax, esi
  ret


However, a more optimal code could be generated with less multiplications:

mul(int, int):
        mov     eax, edi
        imul    esi, esi
        imul    eax, esi
        imul    eax, esi
        ret

Godbolt playground: https://godbolt.org/z/6dP11jPfx
Comment 1 Antony Polukhin 2024-04-09 15:48:57 UTC
The above godbolt link for an old version of GCC, here's for 14.0 https://godbolt.org/z/dTPYY1T9W
Comment 2 Andrew Pinski 2024-04-09 18:01:41 UTC
We don't do as much reassociation as we should with signed integers due to overflow. If you use -fwrapv, you get the reassociation; I am 99% sure there is a dup for this bug too.
Comment 3 Andrew Pinski 2024-04-09 18:07:56 UTC
(In reply to Andrew Pinski from comment #2)
> We don't do as much reassociation as we should with signed integers due to
> overflow. If you use -fwrapv, you get the reassociation; I am 99% sure there
> is a dup for this bug too.

I should say we also do it for unsigned already (see PR 95867), -fwrapv case we just treat signed similar to unsigned here. Anyways what needs to happen is we need 2 levels of gimple, one with signed integer overflow behavior and then one with wrapping behavior. RTL does not distinguish between signed and unsigned behaviors for many operations (plus and multiple) so we get some optimizations there but not all.
Comment 4 Jakub Jelinek 2024-04-09 18:22:57 UTC
I think we've been discussing an idea of turning on flag_wrapv very late among the GIMPLE passes and reassociate again.  Because RTL also kind of assumes flag_wrapv, there is no difference between signed/unsigned addition/subtraction/non-widening multiplication.
Though, a question is if it wouldn't screw up range info for use during expansion too much.  Other option is to rewrite into unsigned operations only what we've successfully reassociated in the late reassoc pass.
Comment 5 Richard Biener 2024-04-10 06:28:25 UTC
For the case in question when x is zero then y can be INT_MAX and thus y * y already overflow. But x * y * y * y * y * y could be associated as
(x * y) * (y^4).