Bug 60538

Summary: [SH] improve support for cmp/str insn
Product: gcc Reporter: Oleg Endo <olegendo>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: UNCONFIRMED ---    
Severity: enhancement    
Priority: P3    
Version: 4.9.0   
Target Milestone: ---   
Host: Target: sh*-*-*
Build: Known to work:
Known to fail: Last reconfirmed:

Description Oleg Endo 2014-03-15 13:37:59 UTC
Currently the cmp/str insn is only utilized in some builtin string functions.  There are some known higher level operations that could be implemented via the cmp/str insn, most notably a 'contains zero byte' function, which can be done in various ways:

bool haszero (unsigned int x)
{
  return (x - 0x01010101) & ~x & 0x80808080;
}

bool haszero2 (unsigned int x)
{
  return ~((((x & 0x7F7F7F7F) + 0x7F7F7F7F) | x) | 0x7F7F7F7F);
}

bool haszero3 (unsigned int x)
{
  return ((x & 0xFF) && (x & 0xFF00) && (x & 0xFF0000) && (x & 0xFF000000));
}

bool haszero4 (unsigned int x)
{
  return ((x & 0xFF) & ((x >> 8) & 0xFF)
         & ((x >> 16) & 0xFF) & ((x >> 24) & 0xFF)) == 0;
}

bool haszero5 (unsigned int x)
{
  // this one looks quite difficult for combine.
  const unsigned char* p = (const unsigned char*)&x;
  return (*p && *(p + 1) && *(p + 2) && *(p + 3)) == 0;
}

bool haszero6 (unsigned int x)
{
  // this one is probably impossible to do with combine patterns
  // due to the conditional branch involved.

  bool r = ((x + 0x7EFEFEFF) ^ ~x) & 0x81010100;
  if (r)
    r = ~((((x & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

  return r;
}

bool haszero7 (unsigned int a)
{
  return ((a & 0xFF) == 0
	  | (a & 0xFF00) == 0
	  | (a & 0xFF0000) == 0
	  | (a & 0xFF000000) == 0);
}

The expected minimal SH code for the functions above those would be:

    mov       #0,r1
    cmp/str   r1,r4
    rts
    movt      r0


It would also be nice if the actual cmp/str operation was recognized, although that's a bit more difficult to match.

A while ago I've already tried out adding combine patterns for those kinds of things, with limited success.  The problem is that combine would try out only 3 or 4 insns at once and matching such complex operations require lots of combine bridge patterns.
The annoying thing with combine bridge patterns are that they have to be split again manually if they don't make it into the final complex pattern.

Moreover, if a combine bridge pattern is matched but doesn't make it into the final pattern, there will be missed combine opportunities which would have been taken if the bridge pattern wasn't matched.

One possible solution could be to have explicit combine bridge patterns.  If a bridge pattern doesn't get combined with another (bridge) pattern, combine should discard the match and try out other options.
Combine bridge patterns could be identified by e.g. special insn attributes or by adding a 'define_bridge_insn' or something like that.

Another rather brute force option would be to run the combine pass (and split pass) again after split1.