This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: new sign/zero extension elimination pass
Richard Guenther wrote:
> On Mon, Oct 18, 2010 at 5:42 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
>>> I created a new sign/zero extension elimination pass.
>>>
>>> The motivating example for this pass is:
>>>
>>> void f(unsigned char *p, short s, int c, int *z)
>>> {
>>> if (c)
>>> *z = 0;
>>> *p ^= (unsigned char)s;
>>> }
>>>
>>> For MIPS, compilation results in the following insns.
>>>
>>> (set (reg/v:SI 199)
>>> (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>>>
>>> ...
>>>
>>> (set (reg:QI 203)
>>> (subreg:QI (reg/v:SI 199) 3))
>>>
>>> These insns are the only def and the only use of reg 199, each located in a
>>> different bb.
>>
>> This sounds like a job for GCSE to do.
>
> The question is why it is expanded the way it is. On x86 I just get
>
> ;; *p_3(D) = D.2690_7;
>
> (insn 16 15 0 (parallel [
> (set (mem:QI (reg/v/f:DI 62 [ p ]) [0 *p_3(D)+0 S1 A8])
> (xor:QI (mem:QI (reg/v/f:DI 62 [ p ]) [0 *p_3(D)+0 S1 A8])
> (subreg:QI (reg/v:HI 63 [ s ]) 0)))
> (clobber (reg:CC 17 flags))
> ]) t.c:5 -1
> (nil))
>
> note that the tree level already has
>
> D.2688_4 = *p_3(D);
> D.2689_6 = (unsigned char) s_5(D);
> D.2690_7 = D.2689_6 ^ D.2688_4;
> *p_3(D) = D.2690_7;
>
> so no extension.
>
> Richard.
>
>> Thanks,
>> Andrew Pinski
>>
The extension is generated for the 'short int s' function parameter by
assign_parm_setup_reg due to the MIPS setting of PROMOTE_MODE,
TARGET_PROMOTE_FUNCTION_MODE and TARGET_PROMOTE_PROTOTYPES. However, this
behavior is not reproducible for other targets that I tried (ARM, PPC, x86_64),
so this is probably not the best example to discuss between targets.
Maybe a better example is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40893#c0.
The optimized tree code for MIPS is this:
...
dct2x2dc_dconly (short int[2] * d)
{
int d1;
int d0;
short int D.1992;
short unsigned int D.1991;
short int D.1990;
short unsigned int D.1989;
short unsigned int D.1988;
short unsigned int D.1987;
int D.1986;
short int D.1985;
int D.1984;
short int D.1983;
short int[2] * D.1982;
int D.1981;
short int D.1980;
int D.1979;
short int D.1978;
<bb 2>:
D.1978_2 = (*d_1(D))[0];
D.1979_3 = (int) D.1978_2;
D.1980_4 = (*d_1(D))[1];
D.1981_5 = (int) D.1980_4;
d0_6 = D.1981_5 + D.1979_3;
D.1982_7 = d_1(D) + 4;
D.1983_8 = (*D.1982_7)[0];
D.1984_9 = (int) D.1983_8;
D.1985_11 = (*D.1982_7)[1];
D.1986_12 = (int) D.1985_11;
d1_13 = D.1986_12 + D.1984_9;
D.1987_14 = (short unsigned int) d0_6;
D.1988_15 = (short unsigned int) d1_13;
D.1989_16 = D.1988_15 + D.1987_14;
D.1990_17 = (short int) D.1989_16;
(*d_1(D))[0] = D.1990_17;
D.1991_20 = D.1987_14 - D.1988_15;
D.1992_21 = (short int) D.1991_20;
(*d_1(D))[1] = D.1992_21;
return;
}
...
The assignments:
...
D.1987_14 = (short unsigned int) d0_6;
D.1988_15 = (short unsigned int) d1_13;
...
are expanded into zero_extensions:
...
(insn 10 9 11 3 ext13.c:5 (set (reg:SI 204 [ D.1987+-2 ])
(zero_extend:SI (subreg:HI (reg:SI 213) 2))) -1 (nil))
(insn 14 13 15 3 ext13.c:5 (set (reg:SI 205 [ D.1988+-2 ])
(zero_extend:SI (subreg:HI (reg:SI 216) 2))) -1 (nil))
...
The same holds for ARM and PPC.
But for x86_64, the assignments are expanded into subreg copies:
...
(insn 9 8 10 3 (set (reg:HI 69 [ D.2798 ])
(subreg:HI (reg:SI 78) 0)) test4.c:7 -1
(nil))
(insn 13 12 14 3 (set (reg:HI 70 [ D.2799 ])
(subreg:HI (reg:SI 81) 0)) test4.c:7 -1
(nil))
...
AFAIU, this difference in behaviour is caused by the difference in PROMOTE_MODE,
which causes the short unsigned int D.1987 to live an SI reg for ARM, MIPS and
PPC, an in an HI reg for X86_64.
For x86_64, the subreg copies are combined by combine with other operations, and
don't result in extra operations.
For ARM, MIPS and PPC, the zero_extensions are not optimized away by any current
pass. Demonstrated here in the MIPS assembly:
...
lh $6,2($2)
lh $4,0($2)
lh $5,4($2)
lh $3,6($2)
addu $4,$6,$4
addu $3,$3,$5
andi $4,$4,0xffff <---
andi $3,$3,0xffff <---
addu $5,$3,$4
subu $3,$4,$3
sh $5,0($2)
j $31
sh $3,2($2)
...
The new pass removes the superfluous zero_extensions for ARM, MIPS and PPC.
Another example for which the new pass removes the superfluous extensions is
https://bugs.launchpad.net/gcc-linaro/+bug/634682.
- Tom