This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Add a TARGET_GEN_MEMSET_VALUE hook
On Mon, Aug 22, 2016 at 5:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 22, 2016 at 4:31 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Aug 19, 2016 at 4:45 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Aug 19, 2016 at 2:21 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Aug 18, 2016 at 5:16 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Aug 18, 2016 at 1:18 AM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Wed, Aug 17, 2016 at 10:11 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>>>>> builtin_memset_gen_str returns a register used for memset, which only
>>>>>>> supports integer registers. But a target may use vector registers in
>>>>>>> memmset. This patch adds a TARGET_GEN_MEMSET_VALUE hook to duplicate
>>>>>>> QImode value to mode derived from STORE_MAX_PIECES, which can be used
>>>>>>> with vector instructions. The default hook is the same as the original
>>>>>>> builtin_memset_gen_str. A target can override it to support vector
>>>>>>> instructions for STORE_MAX_PIECES.
>>>>>>>
>>>>>>> Tested on x86-64 and i686. Any comments?
>>>>>>>
>>>>>>> H.J.
>>>>>>> ---
>>>>>>> gcc/
>>>>>>>
>>>>>>> * builtins.c (builtin_memset_gen_str): Call targetm.gen_memset_value.
>>>>>>> (default_gen_memset_value): New function.
>>>>>>> * target.def (gen_memset_value): New hook.
>>>>>>> * targhooks.c: Inclue "expmed.h" and "builtins.h".
>>>>>>> (default_gen_memset_value): New function.
>>>>>>
>>>>>> I see default_gen_memset_value in builtins.c but it belongs here.
>>>>>>
>>>>>>> * targhooks.h (default_gen_memset_value): New prototype.
>>>>>>> * config/i386/i386.c (ix86_gen_memset_value): New function.
>>>>>>> (TARGET_GEN_MEMSET_VALUE): New.
>>>>>>> * config/i386/i386.h (STORE_MAX_PIECES): Likewise.
>>>>>>> * doc/tm.texi.in: Add TARGET_GEN_MEMSET_VALUE hook.
>>>>>>> * doc/tm.texi: Updated.
>>>>>>>
>>>>>
>>>>> Like this?
>>>>
>>>> Aww, ok - I see builtins.c is a better place - sorry for the extra work.
>>>>
>>>> Richard.
>>>
>>> Here is the original patch with the updated ChangeLog. OK for trunk?
>>
>> So now actually looking at what you are doing. Why do we need a new
>> hook? Is it that even if we make builtin_memset_gen_str handle vector-mode
>> MODE properly you don't get the desired optimized code? Your patch doesn't
>
> It is hard to say what codes builtin_memset_gen_str will generate
> if vector mode is supported. I suspect since broadcast instructions are
> target specific, a target hook may still be needed to broadcast a QImode
> input to a full vector register.
Well, we mainly seem to miss an optab that matches vec_duplicate (we have
vec_init only which might be sufficient as well).
>> seem to change the mode to vector but only STORE_MAX_PIECES and
>> op_by_pieces_d::run will only consider integer modes.
>
> My patch doesn't change mode of STORE_MAX_PIECES. It changes
> how TImode register from STORE_MAX_PIECES is generated:
>
> (insn 8 7 9 (set (reg:SI 90)
> (zero_extend:SI (reg:QI 89))) c1.i:4 -1
> (nil))
>
> (insn 9 8 10 (parallel [
> (set (reg:SI 91)
> (mult:SI (reg:SI 90)
> (const_int 16843009 [0x1010101])))
> (clobber (reg:CC 17 flags))
> ]) c1.i:4 -1
> (nil))
>
> (insn 10 9 11 (set (reg:V4SI 92)
> (vec_duplicate:V4SI (reg:SI 91))) c1.i:4 4197 {*vec_dupv4si}
> (nil))
>
> (insn 11 10 12 (set (subreg:V4SI (reg:TI 93) 0)
> (reg:V4SI 92)) c1.i:4 -1
> (nil))
>
> (insn 12 11 13 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void
> *)dst_2(D)]+0 S16
> A8])
> (reg:TI 93)) c1.i:4 -1
> (nil))
Yes, but if you change STORE_MAX_PIECES to also include OI/XImode
then this exposes OI/XImode scalars by doing OI/XImode stores. Thus
I am suggesting to instead allow to use vector integer modes as well
for the by_pieces stuff (and maybe guard that fact with a target hook).
Richard.
> vs.
>
> (insn 8 7 9 (set (reg:DI 91)
> (zero_extend:DI (reg:QI 89))) c1.i:4 -1
> (nil))
>
> (insn 9 8 10 (set (subreg:DI (reg:TI 90) 0)
> (reg:DI 91)) c1.i:4 -1
> (nil))
>
> (insn 10 9 11 (set (subreg:DI (reg:TI 90) 8)
> (const_int 0 [0])) c1.i:4 -1
> (nil))
>
> (insn 11 10 12 (set (reg:DI 93)
> (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
> (nil))
>
> (insn 12 11 13 (parallel [
> (set (reg:DI 92)
> (mult:DI (subreg:DI (reg:TI 90) 8)
> (reg:DI 93)))
> (clobber (reg:CC 17 flags))
> ]) c1.i:4 -1
> (nil))
>
> (insn 13 12 14 (set (reg:DI 95)
> (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
> (nil))
>
> (insn 14 13 15 (parallel [
> (set (reg:DI 94)
> (mult:DI (subreg:DI (reg:TI 90) 0)
> (reg:DI 95)))
> (clobber (reg:CC 17 flags))
> ]) c1.i:4 -1
> (nil))
>
> (insn 15 14 16 (parallel [
> (set (reg:DI 96)
> (plus:DI (reg:DI 92)
> (reg:DI 94)))
> (clobber (reg:CC 17 flags))
> ]) c1.i:4 -1
> (nil))
>
> (insn 16 15 17 (set (reg:DI 98)
> (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1
> (nil))
>
> (insn 17 16 18 (parallel [
> (set (reg:TI 97)
> (mult:TI (zero_extend:TI (subreg:DI (reg:TI 90) 0))
> (zero_extend:TI (reg:DI 98))))
> (clobber (reg:CC 17 flags))
> ]) c1.i:4 -1
> (nil))
>
> (insn 18 17 19 (parallel [
> (set (reg:DI 99)
> (plus:DI (reg:DI 96)
> (subreg:DI (reg:TI 97) 8)))
> (clobber (reg:CC 17 flags))
> ]) c1.i:4 -1
> (nil))
>
> (insn 19 18 20 (set (subreg:DI (reg:TI 97) 8)
> (reg:DI 99)) c1.i:4 -1
> (nil))
>
> (insn 20 19 21 (set (reg:TI 97)
> (reg:TI 97)) c1.i:4 -1
> (expr_list:REG_EQUAL (mult:TI (reg:TI 90)
> (const_wide_int 0x1010101010101010101010101010101))
> (nil)))
>
> (insn 21 20 22 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void
> *)dst_2(D)]+0 S16
> A8])
> (reg:TI 97)) c1.i:4 -1
> (nil))
>
>
>> So isn't this a general missed optimization that TImode 0x01010101...01 * x
>> is not optimized to a pxor or broadcast?
>
> In general, vector instructions aren't supported in integer computation.
> TImode 0x01010101...01 * x will use integer registers not vector registers.
> My target hook simply allows vector registers to be used in TImode for
> STORE_MAX_PIECES.
>
> --
> H.J.