This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)


On Fri, Mar 3, 2017 at 11:35 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Mar 03, 2017 at 09:10:03AM +0100, Uros Bizjak wrote:
>> > Or are there any define_insn/define_expand where it is desirable to have
>> > both input and output operand a MEM (and does it have to be matching)?
>> > For various scalar binary and unary expanders the backend already uses a helper
>> > that will force something into memory if dest and src are both memory and
>> > not rtx_equal_p, but do we have anything like that in anything these two
>> > builtin expanders emit?
>>
>> Any insn with matched memory (movlps, movhps and similar) can also
>> operate with matched register. To my knowledge, all insn patterns are
>> written in this way, since in the past we had plenty of problems with
>> matched memory operands.
>>
>> Also, I really don't remember what benefit brings us to *not* force
>> input operand to a register at -O0 at expand time and leave to RA to
>> find matched memory. OTOH, with -O0 we already have so many
>> unnecessary moves that you get tears in the eyes...
>
> So, I've tried:
> make mddump
> for i in `sed -n -e 's/^.*CODE_FOR_\([^,]*\),.*$/\1/p' ../../gcc/config/i386/i386-builtin.def`; do sed -n '/^(define_\(insn\|expand\) ("'$i'"/,/^$/p' tmp-mddump.md; done
> and looked for "=.*[mo] in there.
> Some insns might want that && !(MEM_P (operands[0]) && MEM_P (operands[1])),
> e.g.
> (define_insn_and_split "avx512f_<castmode><avxsizesuffix>_<castmode>"
>   [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m")
>         (unspec:AVX512MODE2P
>           [(match_operand:<ssequartermode> 1 "nonimmediate_operand" "xm,x")]
>           UNSPEC_CAST))]
>   "TARGET_AVX512F"
>   "#"
>   "&& reload_completed"
>   [(set (match_dup 0) (match_dup 1))]
> as the constraint require that both operands aren't memory, shall I create a
> patch for that?  This is the first category below.

Yes. Although expander takes care not to generate two memory
references, combine can propagate memory to the other operand,
creating semi-invalid RTX that is later resolved by RA.

> The second category is where matching operand is ok, so my patch can
> pessimize stuff.  I wonder if we couldn't handle this like:
>           /* If we aren't optimizing, only allow one memory operand to be
>              generated.  */
>           if (memory_operand (op, mode))
>             {
>               const char *const constraint
>                 = insn_data[icode].operand[i + adjust + 1].constraint;
>               if (optimize
>                   || num_memory != 1
>                   || !rtx_equal_p (real_target, op))
>                 num_memory++;
>               /* sse2_movsd allows matching operand.  */
>               else if (icode == CODE_FOR_sse2_movsd)
>                 ;
>               /* Various masked insns allow matching operand.  */
>               else if (insn_data[icode].operand[i + adjust + 1].predicate
>                        == vector_move_operand
>                        && (strcmp (constraint, "0C") == 0
>                            || strcmp (constraint, "0C,0") == 0))
>                 ;
>               else
>                 num_memory++;
>             }
> (though perhaps sse2_movsd is still too simplistic, because it has just
> =m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is
> fine as is, if it is 1, then only if the memory is offsettable; though
> perhaps LRA can just turn non-offsettable memory into offsettable one
> through secondary reload).

There is similarity with arithmetic operations. Please note that arith
ops fixup their arguments using ix86_expand_binary_operator, and their
insn patterns have to satisfy ix86_binary_operator_ok insn predicate.
However, even using this infrastructure, I have noticed that the
compiler is quite sloppy with memory operands, and we mostly get
matched memory after combine pass (if we are lucky).

Regarding the implementation, I'd rather see another flag for
i386-builtin.def entries, where we can mark if matching memory operand
is allowed. The expander can then look at the flag, and do some
special magic.

> And the last category is with "=m" destination (not allowing anything else)
> and (match_dup 0) somewhere in the pattern, I believe those have to be
> expanded specially, because otherwise one e.g. wouldn't get MEM_P target
> if optimize at all (I think the builtins are supposed to pass pointer to the
> result in that case) and the match_dup just doesn't appear as another operand.

It looks to me, that the expander already takes care for matching,
where strictly required. OTOH, matching memory requirement would also
fit in the proposed implementation with flag in builtin entries, we
can mark if matching memory operand is required, not only allowed.

However, as said above, expanding with memory operands is somehow
neglected area, desperately looking for some love.

Uros.

> missing !(MEM && MEM) ?
> =======================
> avx512f_pd512_256pd
> avx512f_pd512_pd
> avx512f_ps512_256ps
> avx512f_ps512_ps
> avx512f_si512_256si
> avx512f_si512_si
> avx_pd256_pd
> avx_ps256_ps
> avx_si256_si
> sse_storehps
> sse_storelps
>
> valid matching mem
> ==================
> avx512f_ss_truncatev16siv16hi2_mask
> avx512f_ss_truncatev16siv16qi2_mask
> avx512f_ss_truncatev8div8hi2_mask
> avx512f_ss_truncatev8div8si2_mask
> avx512f_truncatev16siv16hi2_mask
> avx512f_truncatev16siv16qi2_mask
> avx512f_truncatev8div8hi2_mask
> avx512f_truncatev8div8si2_mask
> avx512f_us_truncatev16siv16hi2_mask
> avx512f_us_truncatev16siv16qi2_mask
> avx512f_us_truncatev8div8hi2_mask
> avx512f_us_truncatev8div8si2_mask
> avx512f_vcvtps2ph512_mask
> avx512vl_ss_truncatev16hiv16qi2_mask
> avx512vl_ss_truncatev4div4si2_mask
> avx512vl_ss_truncatev8siv8hi2_mask
> avx512vl_truncatev16hiv16qi2_mask
> avx512vl_truncatev4div4si2_mask
> avx512vl_truncatev8siv8hi2_mask
> avx512vl_us_truncatev16hiv16qi2_mask
> avx512vl_us_truncatev4div4si2_mask
> avx512vl_us_truncatev8siv8hi2_mask
> sse2_movsd
> vcvtps2ph256_mask
>
> match_dup on mem target
> =======================
> avx512f_ss_truncatev8div16qi2_mask_store
> avx512f_storev16sf_mask
> avx512f_storev16si_mask
> avx512f_storev8df_mask
> avx512f_storev8di_mask
> avx512f_truncatev8div16qi2_mask_store
> avx512f_us_truncatev8div16qi2_mask_store
> avx512vl_compressstorev2df_mask
> avx512vl_compressstorev2di_mask
> avx512vl_compressstorev4df_mask
> avx512vl_compressstorev4di_mask
> avx512vl_compressstorev4sf_mask
> avx512vl_compressstorev4si_mask
> avx512vl_compressstorev8sf_mask
> avx512vl_compressstorev8si_mask
> avx512vl_ss_truncatev2div2hi2_mask_store
> avx512vl_ss_truncatev2div2qi2_mask_store
> avx512vl_ss_truncatev2div2si2_mask_store
> avx512vl_ss_truncatev4div4hi2_mask_store
> avx512vl_ss_truncatev4div4qi2_mask_store
> avx512vl_ss_truncatev4siv4hi2_mask_store
> avx512vl_ss_truncatev4siv4qi2_mask_store
> avx512vl_ss_truncatev8siv8qi2_mask_store
> avx512vl_storev16hi_mask
> avx512vl_storev16qi_mask
> avx512vl_storev2df_mask
> avx512vl_storev2di_mask
> avx512vl_storev32qi_mask
> avx512vl_storev4df_mask
> avx512vl_storev4di_mask
> avx512vl_storev4sf_mask
> avx512vl_storev4si_mask
> avx512vl_storev8hi_mask
> avx512vl_storev8sf_mask
> avx512vl_storev8si_mask
> avx512vl_truncatev2div2hi2_mask_store
> avx512vl_truncatev2div2qi2_mask_store
> avx512vl_truncatev2div2si2_mask_store
> avx512vl_truncatev4div4hi2_mask_store
> avx512vl_truncatev4div4qi2_mask_store
> avx512vl_truncatev4siv4hi2_mask_store
> avx512vl_truncatev4siv4qi2_mask_store
> avx512vl_truncatev8siv8qi2_mask_store
> avx512vl_us_truncatev2div2hi2_mask_store
> avx512vl_us_truncatev2div2qi2_mask_store
> avx512vl_us_truncatev2div2si2_mask_store
> avx512vl_us_truncatev4div4hi2_mask_store
> avx512vl_us_truncatev4div4qi2_mask_store
> avx512vl_us_truncatev4siv4hi2_mask_store
> avx512vl_us_truncatev4siv4qi2_mask_store
> avx512vl_us_truncatev8siv8qi2_mask_store
>
>         Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]