Bug 89847 - Simplify subexpressions of % constant
Summary: Simplify subexpressions of % constant
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 89845
Blocks:
  Show dependency treegraph
 
Reported: 2019-03-27 10:12 UTC by Jakub Jelinek
Modified: 2019-04-22 20:15 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-04-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2019-03-27 10:12:51 UTC
+++ This bug was initially created as a clone of Bug #89845 +++

The
https://arxiv.org/pdf/1902.01961.pdf
paper also mentions clang is able to optimize:
int f1 (int x) { return (31 * x + 27961) & 15; }
unsigned f2 (unsigned x) { return (31 * x + 27961) & 15; }
into:
return (9 - x) & 15;
while gcc can't.  Of course it should be done only if the operand of %/& (or narrowing cast) isn't used multiple times (to be precise, isn't used outside of the &/&/cast operand) and after giving say SCCVN a chance to merge the same computations from multiple different spots, but then we can really simplify the constants there (modulo the outer constant) and multiplications by constants etc.
Comment 1 Segher Boessenkool 2019-04-22 20:15:58 UTC
They didn't test the right targets ;-)

While for x86_64 you get

        movl    %edi, %eax
        sall    $5, %eax
        subl    %edi, %eax
        addl    $27961, %eax
        andl    $15, %eax
        ret

and for aarch64 you get

        lsl     w1, w0, 5
        sub     w0, w1, w0
        mov     w1, 27961
        add     w0, w0, w1
        and     w0, w0, 15
        ret

for sparc{,64} you get

        sethi   %hi(27648), %g1
        or      %g1, 313, %g1
        sub     %g1, %o0, %o0
        jmp     %o7+8
         and    %o0, 15, %o0

(the mul-by-31 was optimised away by combine).

While for 32-bit powerpc you get

        mulli 3,3,31
        addi 3,3,27961
        rlwinm 3,3,0,28,31
        blr

(if you don't set a modern -mcpu=, anyway), for powerpc64 you get

        subfic 3,3,9
        rldicl 3,3,0,60
        blr

This again is done by combine:

Trying 10, 11 -> 12:
   10: r129:SI=r128:SI-r132:DI#4
      REG_DEAD r132:DI
      REG_DEAD r128:SI
   11: r130:SI=r129:SI+0x6d39
      REG_DEAD r129:SI
   12: r125:SI=r130:SI&0xf
      REG_DEAD r130:SI
Failed to match this instruction:
(set (reg:SI 125)
    (and:SI (minus:SI (const_int 9 [0x9])
            (subreg:SI (reg:DI 132) 4))
        (const_int 15 [0xf])))
Successfully matched this instruction:
(set (reg:SI 130)
    (minus:SI (const_int 9 [0x9])
        (subreg:SI (reg:DI 132) 4)))
Successfully matched this instruction:
(set (reg:SI 125)
    (and:SI (reg:SI 130)
        (const_int 15 [0xf])))


Ideally this would be done in gimple already, of course.  Combine cannot
handle this in general.