Bug 55303 - [SH] Add support for clips / clipu instructions
Summary: [SH] Add support for clips / clipu instructions
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.8.0
: P3 enhancement
Target Milestone: ---
Assignee: Oleg Endo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-13 00:30 UTC by Oleg Endo
Modified: 2013-05-06 05:48 UTC (History)
0 users

See Also:
Host:
Target: sh2a*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-03-02 00:00:00


Attachments
Working patch with a thinko. (3.68 KB, patch)
2013-03-02 16:16 UTC, Oleg Endo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Endo 2012-11-13 00:30:46 UTC
Support for the following SH2A specific instructions should be added:

clips.b  Rn
clips.w  Rn
clipu.b  Rn
clipu.w  Rn

These can be used to implement saturating arithmetic, for example.
Comment 1 Oleg Endo 2013-03-02 16:16:41 UTC
Created attachment 29567 [details]
Working patch with a thinko.

This patch, albeit working, has a thinko.
The idea was to reduce the constraints of the clips/clipu insn comparison constants by adding/subtracting a constant offset value before/after the actual clipping insn.  For example:

int
test_02 (int a)
{
  return max (0, min (255, a));
}

becomes:

_test_02:
        movi20  #128,r1
        sub     r1,r4
        mov     r4,r0
        clips.b r0
        rts
        add     r1,r0

The problem with this is that it won't work for values that will wrap-around before/after the offset subtraction/addition.

E.g. plugging the value 0x80000000 (−2147483648) into the above case:

        movi20  #128,r1
        sub     r1,r4     // r4 =  0x80000000 - 128 = 0x7FFFFF80
        mov     r4,r0
        clips.b r0        // !(r0 < -128) && (r0 > 127) -> r0 = 127
        rts
        add     r1,r0     // r0 = 127 + 128 = 255
                          // expected result: 0

Maybe this case could be handled by using subv/addv insns to catch over/underflows somehow, but probably the resulting code would be more complex (and thus slower) than two straight forward compare-and-branch sequences.

On the other hand, if it is known that the input value is in a certain range (e.g. a sign/zero extended HImode or QImode), the offset approach should work fine.

I will modify the attached patch so that it will allow only the HW clip constants for now.
Comment 2 Oleg Endo 2013-03-02 16:23:57 UTC
For non-SH2A targets there is an opportunity to generate better insn sequences for the special case

unsigned int test (unsigned int a)
{
  return a > 1 ? 1 : a;
}

on SH2A:
   tst     r4,r4
   movrt   r0

if zero-displacement branches are not good:
   tst     r4,r4
   mov     #-1,r0
   negc    r0,r0

if zero-displacement branches are good:
   tst     r4,r4
   bt      0f
   mov     #1,r1
0f:

This can be done by implementing a pattern
  (umin:SI (match_operand:SI 1 "arith_reg_operand")
           (const_int 1))

as it is already done for SH2A in attachment 29567 [details].
Comment 3 Oleg Endo 2013-05-06 05:48:18 UTC
(In reply to comment #1)
> I will modify the attached patch so that it will allow only the HW clip
> constants for now.

This has been committed as rev 198617:
http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=198617

    PR target/55303
    * config/sh/sh.c (sh_rtx_costs): Handle SMIN and SMAX cases.
    * config/sh/sh.md (*clips, uminsi3, *clipu, clipu_one): New insns and
    related expanders.
    * config/sh/iterators.md (SMIN_SMAX): New code iterator.
    * config/sh/predicates.md (arith_reg_or_0_or_1_operand,
    clips_min_const_int, clips_max_const_int, clipu_max_const_int):
    New predicates.

    PR target/55303
    * gcc.target/sh/pr55303-1.c: New.
    * gcc.target/sh/pr55303-2.c: New.
    * gcc.target/sh/pr55303-3.c: New.