This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
- From: "hjl.tools at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 04 May 2016 15:45:52 +0000
- Subject: [Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.
- Auto-submitted: auto-generated
- References: <bug-70873-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873
--- Comment #23 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to UroÅ Bizjak from comment #22)
> Created attachment 38412 [details]
> Proposed patch
>
> This patch moves all TARGET_SSE_PARTIAL_REG_DEPENDENCY FP conversion
> splitters to a later split pass. Plus, the patch substantially cleans these
> and related patterns.
>
> The functionality of post-reload conversion splitters goes this way:
>
> - process FP conversions for TARGET_USE_VECTOR_FP_CONVERTS in an early
> post-reload splitter. This pass will rewrite FP conversions to vector insns
> and is thus incompatible with the next two passes. AMDFAM10 processors
> depend on this transformation.
>
> - process FP conversions for TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS in a
> peephole2 pass. This will transform mem->reg insns to reg->reg insns, and
> these insn could be processed by the next pass. Some Intel processors depend
> on this transformation.
>
> - process FP conversions for TARGET_SSE_PARTIAL_REG_DEPENDENCY in a late
> post-reload splitter, when allocated registers are stable. AMD and Intel
> processors depend on this pass, so it is part of generic tuning.
We need to move those special SSE SF->DF splitters before
(define_split
[(set (match_operand 0 "any_fp_register_operand")
(float_extend (match_operand 1 "memory_operand")))]
"reload_completed
&& (GET_MODE (operands[0]) == TFmode
|| GET_MODE (operands[0]) == XFmode
|| GET_MODE (operands[0]) == DFmode)"
[(set (match_dup 0) (match_dup 2))]
{
operands[2] = find_constant_src (curr_insn);
if (operands[2] == NULL_RTX
|| (SSE_REGNO_P (REGNO (operands[0]))
&& standard_sse_constant_p (operands[2],
GET_MODE (operands[0])) != 1)
|| (STACK_REGNO_P (REGNO (operands[0]))
&& standard_80387_constant_p (operands[2]) < 1))
FAIL;
})
Otherwise, they may not be used on memory operand since the general
SSE (In reply to UroÅ Bizjak from comment #22)
> Created attachment 38412 [details]
> Proposed patch
>
> This patch moves all TARGET_SSE_PARTIAL_REG_DEPENDENCY FP conversion
> splitters to a later split pass. Plus, the patch substantially cleans these
> and related patterns.
>
> The functionality of post-reload conversion splitters goes this way:
>
> - process FP conversions for TARGET_USE_VECTOR_FP_CONVERTS in an early
> post-reload splitter. This pass will rewrite FP conversions to vector insns
> and is thus incompatible with the next two passes. AMDFAM10 processors
> depend on this transformation.
>
> - process FP conversions for TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS in a
> peephole2 pass. This will transform mem->reg insns to reg->reg insns, and
> these insn could be processed by the next pass. Some Intel processors depend
> on this transformation.
>
> - process FP conversions for TARGET_SSE_PARTIAL_REG_DEPENDENCY in a late
> post-reload splitter, when allocated registers are stable. AMD and Intel
> processors depend on this pass, so it is part of generic tuning.
We need to move those special SSE SF->DF splitters before
(define_split
[(set (match_operand 0 "any_fp_register_operand")
(float_extend (match_operand 1 "memory_operand")))]
"reload_completed
&& (GET_MODE (operands[0]) == TFmode
|| GET_MODE (operands[0]) == XFmode
|| GET_MODE (operands[0]) == DFmode)"
[(set (match_dup 0) (match_dup 2))]
{
operands[2] = find_constant_src (curr_insn);
if (operands[2] == NULL_RTX
|| (SSE_REGNO_P (REGNO (operands[0]))
&& standard_sse_constant_p (operands[2],
GET_MODE (operands[0])) != 1)
|| (STACK_REGNO_P (REGNO (operands[0]))
&& standard_80387_constant_p (operands[2]) < 1))
FAIL;
})
Otherwise, they may not be used on memory operand since the general
SSE float_extend splitter on memory operand will be used.