This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] combine: Convert subreg-of-lshiftrt to zero_extract properly (PR78390)


On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote:
> r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
> It all has the same root cause: that patch makes combine convert every
> lowpart subreg of a logical shift right to a zero_extract.  This cannot
> work at all if it is not a constant shift, and it has to be a bit more
> careful exactly which bits it extracts.
> 
> Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression
> c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also
> has a bootstrap failure with some other patches).  Also tested that the
> x86_64 compiler still generates the wanted code for the PR77881 testcase.

On s390 (31-Bit) we get two (easily fixable) test regression
supposedly because of the original path (+ this fix), and I don't
know what to do about it.  Perhaps these point to two situations,
where the change from lshiftrt to zero_extract sould not be done?

1) (subtests f29 and f31 in s390/risbg-ll-1.c)

An unpatched Gcc would generate this:

--
(insn 19 17 20 2 (set (reg:DI 2 %r2 [+-4 ])
        (lshiftrt:DI (reg:DI 66 [ v_shl ])
            (const_int 32 [0x20])))

(insn 20 19 21 2 (set (reg:SI 3 %r3)
        (subreg:SI (reg:DI 66 [ v_shl ]) 4))
--

A patched Gcc generates

--
(insn 19 17 20 2 (parallel [
            (set (reg:DI 2 %r2 [+-4 ])
                (zero_extract:DI (reg:DI 66 [ v_shl ])
                    (const_int 32 [0x20])
                    (const_int 0 [0])))
            (clobber (reg:CC 33 %cc))
--

which results in a more complicated "risbg" instruction instead of
a simpler shift right logical instruction "srlg".  (The former is
harder to read and got a CC-less variant only recently).

==> Should the change from lshiftrt to zero_extract be skipped if
it's just a 32 bit shift right operation in DImode, or should this
be "fixed" in the backend by adding a special case to the
"extzv<mode>" pattern?

--------

2) There's not committed test case for the second finding yet.  In
a future patch, this will be added:

--
i32 f44 (i64 v_x)
{
  /* { dg-final { scan-assembler "f44:\n\(\t.*\n\)*\tnilf\t" { target { ! lp64 } } } } */
  i64 v_shr4 = ((ui64)v_x) >> 12;
  i32 v_conv = (ui32)v_shr4;
  i32 v_and = v_conv & 10;
  return v_and;
}
--

Rtl before combine is:

--
(insn 4 3 5 2 (set (reg:DI 65 [ v_x ])
        (subreg:DI (reg:SI 67 [ v_x+4 ]) 0))

(insn 5 4 7 2 (parallel [
            (set (zero_extract:DI (reg:DI 65 [ v_x ])
                    (const_int 32 [0x20])
                    (const_int 0 [0]))
                (subreg:DI (reg:SI 66 [ v_x ]) 0))
            (clobber (reg:CC 33 %cc))

(insn 10 7 11 2 (set (reg:DI 69)
        (lshiftrt:DI (reg:DI 65 [ v_x ])
            (const_int 12 [0xc])))

(insn 11 10 16 2 (parallel [
            (set (reg:SI 68 [ v_and ])
                (and:SI (subreg:SI (reg:DI 69) 4)
                    (const_int 10 [0xa])))
            (clobber (reg:CC 33 %cc))
--

Before the patch, combine generated

--
Trying 5, 10 -> 11:
Failed to match this instruction:
(parallel [
...
Failed to match this instruction:
(set (reg:SI 68 [ v_and ])
    (and:SI (subreg:SI (lshiftrt:DI (reg:DI 65 [ v_x ])
                (const_int 12 [0xc])) 4)
        (const_int 10 [0xa])))
Successfully matched this instruction:
(set (reg:DI 69)
    (lshiftrt:DI (reg:DI 65 [ v_x ])  <---------
        (const_int 12 [0xc])))
Successfully matched this instruction:
(set (reg:SI 68 [ v_and ])
    (and:SI (subreg:SI (reg:DI 69) 4)
        (const_int 10 [0xa])))
allowing combination of insns 5, 10 and 11
...
--

With the patch it uses zero_extract instead of lshiftrt (which is
the whole point of the patch?):

--
Trying 5, 10 -> 11:
Failed to match this instruction:
...
Successfully matched this instruction:
(set (reg:DI 69)
    (zero_extract:DI (reg:DI 65 [ v_x ])   <---------
        (const_int 32 [0x20])
        (const_int 20 [0x14])))
Successfully matched this instruction:
(set (reg:SI 68 [ v_and ])
    (and:SI (subreg:SI (reg:DI 69) 4)
        (const_int 10 [0xa])))
allowing combination of insns 5, 10 and 11
--

Regardless of whether zeroes are shifted into the word or
zero_extract explicitly extends the extracted bits, none of the
bits set to zero are used at all because of the following "(and
... (const_int 10))".

In this specific case zero_extract has more (unnecessary) "side
effects" than lshiftrt, and s390 needs a more complicated
instruction to do the extraction (risbg) than it needs for the
shift (slrg).  However, on s390 both instructions have the same
size and cost, so it doesn't really matter.  On other targets a
shift may actually be cheaper than the zero_extract.

While this seems to be all right on s390, it may still indicate a
case that should be handled differently?


Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]