Bug 51708 - [SH] CSE constants after combine/split
Summary: [SH] CSE constants after combine/split
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-30 01:24 UTC by Oleg Endo
Modified: 2023-10-20 10:34 UTC (History)
1 user (show)

See Also:
Host:
Target: sh*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Endo 2011-12-30 01:24:14 UTC
If the shift amount is a constant for SHAD / SHLD insns it should be CSE-ed.
This could be done in a similar way as in the movnegt insn.
Example function:

int foo (int* a, int x, int n)
{
  int i;
  int count;

  for (i = 0; i < n; i ++)
    count += (*(a + i) << 12);

  return count;
}

compiled with -m4 -O3:
        cmp/pl  r6
        bf      .L11
        shll2   r6
        add     #-4,r6
        shlr2   r6
        add     #1,r6
.L9:
        mov.l   @r4+,r1
        mov     #12,r2    ! load constant in loop
        dt      r6
        shld    r2,r1     ! use constant in loop
        bf/s    .L9
        add     r1,r0
.L11:
        rts
        nop


better:
        cmp/pl  r6
        bf      .L11
        shll2   r6
        add     #-4,r6
        shlr2   r6
        add     #1,r6
        mov     #12,r2    ! load constant before loop
.L9:
        mov.l   @r4+,r1
        dt      r6
        shld    r2,r1     ! use constant in loop
        bf/s    .L9
        add     r1,r0
.L11:
        rts
        nop
Comment 1 Oleg Endo 2012-09-23 17:10:44 UTC
Another test case, which is probably easier to write a testsuite case for:

int test00 (int tab[], int i, int j, int tab2[])
{
  return tab[i * 8192 + 4] + tab2[j * 8192 + 8];
}

compiles to (OK):
        mov     #15,r1
        shld    r1,r5
        shld    r1,r6
        add     r5,r4
        add     r6,r7
        mov.l   @(16,r4),r0
        mov.l   @(32,r7),r1
        rts
        add     r1,r0


int test01 (int tab[], int i, int j, int tab2[])
{
  return (tab[i * 8192 + 4] + tab2[j * 8192 + 8]) * 15;
}

complies to (NG):
        mov     #15,r1
        mov     #15,r2
        shld    r1,r5
        shld    r2,r6
        add     r5,r4
        add     r6,r7
        mov.l   @(16,r4),r1
        mov.l   @(32,r7),r2
        add     r2,r1
        mov     r1,r0
        shll2   r0
        shll2   r0
        rts
        sub     r1,r0
Comment 2 Oleg Endo 2012-09-30 23:11:09 UTC
In order to 'force' the constant load to be CSE-ed the constant load and dynamic shift patterns have to be emitted in the respective expanders, so that the CSE pass can see the constant load.
It would be even better to check whether the shift insn is in a loop when deciding whether to use a dynamic shift insn or not.  It is better to convert all non 1/2/8/16 shifts that are buried inside loops to dynamic shifts.  If there are enough registers available every shift would become a 1 insn operation.
Comment 3 Oleg Endo 2013-03-09 14:43:07 UTC
(In reply to comment #2)
> In order to 'force' the constant load to be CSE-ed the constant load and
> dynamic shift patterns have to be emitted in the respective expanders, so that
> the CSE pass can see the constant load.

While implementing other patterns I've noticed that on SH there are quite a few cases, where certain constants can be formed during the combine pass when certain special case code patterns are discovered.
One such frequent constant is the '-1' which is used together with the 'negc' insn to store the negated T bit into a register (SH2A 'movrt' insn).
Another recent case came up when I was working on PR 55303 to add SH2A clip instructions.  There, offset constants can be formed during the combine pass which also would not be CSE'd afterwards.

I think it would be better to introduce a lightweight CSE pass for SH that is done after the first insn split pass after the combine pass.  This would eliminate the need for complicated 'constant reservation' patterns.
Comment 4 Oleg Endo 2014-07-27 19:03:37 UTC
There are also constants generated during reload.  This is a snippet from bzip blocksort.c:

.L6:
        mov.b   @r5+,r1
        dt      r3
        mov.w   .L175,r7  // r7 = 1868
        extu.b  r1,r1
        add     r15,r7    // r7 = r15 + 1868
        shll2   r1
        add     r7,r1
        mov.l   @r1,r7
        add     #1,r7
        bf/s    .L6
        mov.l   r7,@r1
        ...

.L175:
        .short  1868

where the constant load can be hoisted, assuming that r0 is not live throughout the loop:

        mov.w   .L175,r0   // r0 = 1868
        add     r15,r0     // r0 = r15 + 1868
.L6:
        mov.b   @r5+,r1
        dt      r3
        extu.b  r1,r1
        shll2   r1
        add     r0,r1
        mov.l   @r1,r7
        add     #1,r7
        bf/s    .L6
        mov.l   r7,@r1
        ...
.L175:
        .short  1868


Applying addressing mode optimization afterwards can eliminate another insn from the loop:
        mov.w   .L175,r0   // r0 = 1868
        add     r15,r0     // r0 = r15 + 1868
.L6:
        mov.b   @r5+,r1
        dt      r3
        extu.b  r1,r1
        shll2   r1
        mov.l   @(r0,r1),r7
        add     #1,r7
        bf/s    .L6
        mov.l   r7,@(r0,r1)
        ...
.L175:
        .short  1868

If there is no free register available it might be beneficial to insert pushs/pops around the loop to free up registers for the loop.
Comment 5 Oleg Endo 2023-10-20 10:34:59 UTC
The latest patch https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543 by Alex in PR 54089 inserts some additional passes in order to hoist constant loads for shift operations out of loops.  Perhaps that would fix this issue, too.