This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: new sign/zero extension elimination pass


Richard Guenther wrote:
> On Mon, Oct 18, 2010 at 5:42 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
>>> I created a new sign/zero extension elimination pass.
>>>
>>> The motivating example for this pass is:
>>>
>>>  void f(unsigned char *p, short s, int c, int *z)
>>>    {
>>>      if (c)
>>>        *z = 0;
>>>      *p ^= (unsigned char)s;
>>>    }
>>>
>>> For MIPS, compilation results in the following insns.
>>>
>>>  (set (reg/v:SI 199)
>>>       (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>>>
>>>  ...
>>>
>>>  (set (reg:QI 203)
>>>       (subreg:QI (reg/v:SI 199) 3))
>>>
>>> These insns are the only def and the only use of reg 199, each located in a
>>> different bb.
>>
>> This sounds like a job for GCSE to do.
> 
> The question is why it is expanded the way it is.  On x86 I just get
> 
> ;; *p_3(D) = D.2690_7;
> 
> (insn 16 15 0 (parallel [
>             (set (mem:QI (reg/v/f:DI 62 [ p ]) [0 *p_3(D)+0 S1 A8])
>                 (xor:QI (mem:QI (reg/v/f:DI 62 [ p ]) [0 *p_3(D)+0 S1 A8])
>                     (subreg:QI (reg/v:HI 63 [ s ]) 0)))
>             (clobber (reg:CC 17 flags))
>         ]) t.c:5 -1
>      (nil))
> 
> note that the tree level already has
> 
>   D.2688_4 = *p_3(D);
>   D.2689_6 = (unsigned char) s_5(D);
>   D.2690_7 = D.2689_6 ^ D.2688_4;
>   *p_3(D) = D.2690_7;
> 
> so no extension.
> 
> Richard.
> 
>> Thanks,
>> Andrew Pinski
>>

The extension is generated for the 'short int s' function parameter by
assign_parm_setup_reg due to the MIPS setting of PROMOTE_MODE,
TARGET_PROMOTE_FUNCTION_MODE and TARGET_PROMOTE_PROTOTYPES. However, this
behavior is not reproducible for other targets that I tried (ARM, PPC, x86_64),
so this is probably not the best example to discuss between targets.

Maybe a better example is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40893#c0.

The optimized tree code for MIPS is this:
...
dct2x2dc_dconly (short int[2] * d)
{
  int d1;
  int d0;
  short int D.1992;
  short unsigned int D.1991;
  short int D.1990;
  short unsigned int D.1989;
  short unsigned int D.1988;
  short unsigned int D.1987;
  int D.1986;
  short int D.1985;
  int D.1984;
  short int D.1983;
  short int[2] * D.1982;
  int D.1981;
  short int D.1980;
  int D.1979;
  short int D.1978;

<bb 2>:
  D.1978_2 = (*d_1(D))[0];
  D.1979_3 = (int) D.1978_2;
  D.1980_4 = (*d_1(D))[1];
  D.1981_5 = (int) D.1980_4;
  d0_6 = D.1981_5 + D.1979_3;
  D.1982_7 = d_1(D) + 4;
  D.1983_8 = (*D.1982_7)[0];
  D.1984_9 = (int) D.1983_8;
  D.1985_11 = (*D.1982_7)[1];
  D.1986_12 = (int) D.1985_11;
  d1_13 = D.1986_12 + D.1984_9;
  D.1987_14 = (short unsigned int) d0_6;
  D.1988_15 = (short unsigned int) d1_13;
  D.1989_16 = D.1988_15 + D.1987_14;
  D.1990_17 = (short int) D.1989_16;
  (*d_1(D))[0] = D.1990_17;
  D.1991_20 = D.1987_14 - D.1988_15;
  D.1992_21 = (short int) D.1991_20;
  (*d_1(D))[1] = D.1992_21;
  return;

}
...

The assignments:
...
  D.1987_14 = (short unsigned int) d0_6;
  D.1988_15 = (short unsigned int) d1_13;
...

are expanded into zero_extensions:
...
(insn 10 9 11 3 ext13.c:5 (set (reg:SI 204 [ D.1987+-2 ])
        (zero_extend:SI (subreg:HI (reg:SI 213) 2))) -1 (nil))

(insn 14 13 15 3 ext13.c:5 (set (reg:SI 205 [ D.1988+-2 ])
        (zero_extend:SI (subreg:HI (reg:SI 216) 2))) -1 (nil))
...
The same holds for ARM and PPC.

But for x86_64, the assignments are expanded into subreg copies:
...
(insn 9 8 10 3 (set (reg:HI 69 [ D.2798 ])
        (subreg:HI (reg:SI 78) 0)) test4.c:7 -1
     (nil))

(insn 13 12 14 3 (set (reg:HI 70 [ D.2799 ])
        (subreg:HI (reg:SI 81) 0)) test4.c:7 -1
     (nil))
...

AFAIU, this difference in behaviour is caused by the difference in PROMOTE_MODE,
which causes the short unsigned int D.1987 to live an SI reg for ARM, MIPS and
PPC, an in an HI reg for X86_64.

For x86_64, the subreg copies are combined by combine with other operations, and
don't result in extra operations.

For ARM, MIPS and PPC, the zero_extensions are not optimized away by any current
pass. Demonstrated here in the MIPS assembly:
...
	lh	$6,2($2)
	lh	$4,0($2)
	lh	$5,4($2)
	lh	$3,6($2)
	addu	$4,$6,$4
	addu	$3,$3,$5
	andi	$4,$4,0xffff    <---
	andi	$3,$3,0xffff    <---
	addu	$5,$3,$4
	subu	$3,$4,$3
	sh	$5,0($2)
	j	$31
	sh	$3,2($2)
...

The new pass removes the superfluous zero_extensions for ARM, MIPS and PPC.

Another example for which the new pass removes the superfluous extensions is
https://bugs.launchpad.net/gcc-linaro/+bug/634682.

- Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]