This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/70873] [7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873

--- Comment #23 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to UroÅ Bizjak from comment #22)
> Created attachment 38412 [details]
> Proposed patch
> 
> This patch moves all TARGET_SSE_PARTIAL_REG_DEPENDENCY FP conversion
> splitters to a later split pass. Plus, the patch substantially cleans these
> and related patterns.
> 
> The functionality of post-reload conversion splitters goes this way:
> 
> - process FP conversions for TARGET_USE_VECTOR_FP_CONVERTS in an early
> post-reload splitter. This pass will rewrite FP conversions to vector insns
> and is thus incompatible with the next two passes. AMDFAM10 processors
> depend on this transformation.
> 
> - process FP conversions for TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS in a
> peephole2 pass. This will transform mem->reg insns to reg->reg insns, and
> these insn could be processed by the next pass. Some Intel processors depend
> on this transformation.
> 
> - process FP conversions for TARGET_SSE_PARTIAL_REG_DEPENDENCY in a late
> post-reload splitter, when allocated registers are stable. AMD and Intel
> processors depend on this pass, so it is part of generic tuning.

We need to move those special SSE SF->DF splitters before

(define_split
  [(set (match_operand 0 "any_fp_register_operand")
        (float_extend (match_operand 1 "memory_operand")))]
  "reload_completed
   && (GET_MODE (operands[0]) == TFmode
       || GET_MODE (operands[0]) == XFmode
       || GET_MODE (operands[0]) == DFmode)"
  [(set (match_dup 0) (match_dup 2))]
{
  operands[2] = find_constant_src (curr_insn);

  if (operands[2] == NULL_RTX
      || (SSE_REGNO_P (REGNO (operands[0]))
          && standard_sse_constant_p (operands[2],
                                      GET_MODE (operands[0])) != 1)
      || (STACK_REGNO_P (REGNO (operands[0]))
           && standard_80387_constant_p (operands[2]) < 1))
    FAIL;
})

Otherwise, they may not be used on memory operand since the general
SSE (In reply to UroÅ Bizjak from comment #22)
> Created attachment 38412 [details]
> Proposed patch
> 
> This patch moves all TARGET_SSE_PARTIAL_REG_DEPENDENCY FP conversion
> splitters to a later split pass. Plus, the patch substantially cleans these
> and related patterns.
> 
> The functionality of post-reload conversion splitters goes this way:
> 
> - process FP conversions for TARGET_USE_VECTOR_FP_CONVERTS in an early
> post-reload splitter. This pass will rewrite FP conversions to vector insns
> and is thus incompatible with the next two passes. AMDFAM10 processors
> depend on this transformation.
> 
> - process FP conversions for TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS in a
> peephole2 pass. This will transform mem->reg insns to reg->reg insns, and
> these insn could be processed by the next pass. Some Intel processors depend
> on this transformation.
> 
> - process FP conversions for TARGET_SSE_PARTIAL_REG_DEPENDENCY in a late
> post-reload splitter, when allocated registers are stable. AMD and Intel
> processors depend on this pass, so it is part of generic tuning.

We need to move those special SSE SF->DF splitters before

(define_split
  [(set (match_operand 0 "any_fp_register_operand")
        (float_extend (match_operand 1 "memory_operand")))]
  "reload_completed
   && (GET_MODE (operands[0]) == TFmode
       || GET_MODE (operands[0]) == XFmode
       || GET_MODE (operands[0]) == DFmode)"
  [(set (match_dup 0) (match_dup 2))]
{
  operands[2] = find_constant_src (curr_insn);

  if (operands[2] == NULL_RTX
      || (SSE_REGNO_P (REGNO (operands[0]))
          && standard_sse_constant_p (operands[2],
                                      GET_MODE (operands[0])) != 1)
      || (STACK_REGNO_P (REGNO (operands[0]))
           && standard_80387_constant_p (operands[2]) < 1))
    FAIL;
})

Otherwise, they may not be used on memory operand since the general
SSE float_extend splitter on memory operand will be used.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]