This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH i386 1/8] [AVX512] Adjust register classes.
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Kirill Yukhin <kirill dot yukhin at gmail dot com>
- Cc: Richard Henderson <rth at redhat dot com>, Jakub Jelinek <jakub at redhat dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Areg Melik-Adamyan <areg dot melikadamyan at gmail dot com>
- Date: Thu, 25 Jul 2013 13:17:59 +0200
- Subject: Re: [PATCH i386 1/8] [AVX512] Adjust register classes.
- References: <51F01C87 dot 2070504 at gmail dot com>
On Wed, Jul 24, 2013 at 8:27 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
> By this patch I am starting series of patches toward Intel (R) AVX-512 and SHA (see [1])
> extensions enabling in GCC.
> I've already submitted corresponding patches to BinUtils (see [2],[3]).
>
> This patch adds comand-line options for avx512* use and relevant cpuid bits
> detection. Vector registers are now 512-bit wide, so support for new modes
> (e.g. V16SF) is added. AVX512F introduce new 16 registers zmm16-zmm31. Some
> instructions are now have EVEX encoding and can now use those new registers while
> old instructions can't. We introduce new register class for them. We also add
> new constraint "v" which allows zmm0-zmm31. We can't extend "x" constraint
> because it's exposed in inline asm, and so may break some inline asm if we
> assign e. g. xmm21 to non-evex encodable instruction. Idea is to replace all
> uses of "x" for evex-encodable instructions with "v". And allow only scalar and
> 512-bit modes for registers 16+ in ix86_hard_regno_mode_ok. We update move
> instructions to use evex-encodable versions of instructions for AVX512F to
> allow usage of new registers. Main problem is with vector mov<mode>_internal
> in sse.md. In AVX512F we have some instructions reading/writing e. g. ymm16+
> (for exmaple vinsert64x4/vextract64x4),but there in no ymm mov instruction
> with evex encoding, so we have to use insert/extract instead.
Some comments while going through the mega-patch:
+(define_insn "*movxi_internal_avx512f"
+ [(set (match_operand:XI 0 "nonimmediate_operand" "=x,x ,m")
+ (match_operand:XI 1 "vector_move_operand" "C ,xm,x"))]
+ "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+{
+ switch (which_alternative)
+ {
+ case 0:
+ return standard_sse_constant_opcode (insn, operands[1]);
+ case 1:
+ case 2:
+ return "vmovdqu32\t{%1, %0|%0, %1}";
+ default:
+ gcc_unreachable ();
+ }
+}
+ [(set_attr "type" "sselog1,ssemov,ssemov")
+ (set_attr "prefix" "evex")
+ (set_attr "mode" "XI")])
Even with the fact, that the cost of unaligned move is equal to
aligned one, I would rather see these moves split to aligned and
unaligned case, in the same way as movoi_internal_avx.
(define_insn "*mov<mode>_internal"
- [(set (match_operand:V16 0 "nonimmediate_operand" "=x,x ,m")
- (match_operand:V16 1 "nonimmediate_or_sse_const_operand" "C ,xm,x"))]
+ [(set (match_operand:V16 0 "nonimmediate_operand" "=v,v ,m")
+ (match_operand:V16 1 "nonimmediate_or_sse_const_operand" "C ,vm,v"))]
...
+ /* Reg -> reg move is always aligned. Just use wider move. */
+ switch (mode)
+ {
+ case MODE_V8SF:
+ case MODE_V4SF:
+ return "vmovaps\t{%g1, %g0|%g0, %g1}";
+ case MODE_V4DF:
+ case MODE_V2DF:
+ return "%vmovapd\t{%g1, %g0|%g0, %g1}";
No need for % prefix. This is AVX only insn.
+ case MODE_OI:
+ case MODE_TI:
+ return "vmovdqu64\t{%g1, %g0|%g0, %g1}";
vmovdqa64, we are operating on registers only.
+ case MODE_XI:
+ if (<MODE>mode == V8DImode)
+ return "vmovdqu64\t{%1, %0|%0, %1}";
+ else
+ return "vmovdqu32\t{%1, %0|%0, %1}";
Please put mode calculations to mode attribute. It looks that
sseinsnmode is not working correctly here, and needs to be updated to
correctly select OI and XI mode.
Uros.