This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: LEA-splitting improvement patch.


Hello!

> It is known that LEA splitting is one of the most critical problems
> for Atom processors and changes try to improve it through:
> 1.       More aggressive Lea splitting – do not perform splitting if
> only split cost exceeds AGU stall .
> 2.       Reordering splitting instructions to get better scheduling –
> use the farthest defined register for SET instruction, then add
> constant offset if any and finally generate add instruction.This gives
> +0.5% speedup in geomean for eembc2.0 suite on Atom.
> All required testing was done – bootstraps for Atom & Core2, make check.
> Note that this fix affects only on Atom processors.

IMO, you should test LEA handling changes on x32 atom, too. With
recent changes, you will get lots of zero_extended addresses through
these functions, so I think it is worth to benchmark on x32 target.

> ChangeLog:
> 2012-08-08  Yuri Rumyantsev  Yuri.S.Rumyantsev@intel.com
>
> * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add additional argument.
> * config/i386/i386.md (ix86_splitt_lea_for_addr) : Add additional
> argument curr_insn.
> * config/i386/i386.c (find_nearest_reg-def): New function. Find
> nearest register definition used in address.

(find_nearest_reg_def)

> (ix86_split_lea_for_addr) : Do more aggressive lea splitting and
> instructions reodering to get opportunities for better scheduling.

Please merge entries for ix86_split_lea_for_addr.

@@ -16954,7 +16954,7 @@ ix86_lea_outperforms (rtx insn, unsigned int
regno0, unsigned int regno1,
   /* If there is no use in memory addess then we just check
      that split cost exceeds AGU stall.  */
   if (dist_use < 0)
-    return dist_define >= LEA_MAX_STALL;
+    return dist_define > LEA_MAX_STALL;

You didn't described this change in ChangeLog. Does this affect also
affect benchmark speedup?

+/* Return 0 if regno1 def is nearest to insn and 1 otherwise. */

Watch comment formatting and vertical spaces!

+static int
+find_nearest_reg_def (rtx insn, int regno1, int regno2)

IMO, you can return 0, 1 and 2; with 0 when no definitions are found
in the BB, 1 and 2 when either of two regnos are found. Otherwise,
please use bool for function type.

+       if (insn_defines_reg (regno1, regno1, prev))
+         return 0;
+       else if (insn_defines_reg (regno2, regno2, prev))

Please use INVALID_REGNUM as the second argument in the call to
insn_defines_reg when looking for only one regno definition.

 	    {
-	      emit_insn (gen_rtx_SET (VOIDmode, target, parts.base));
-	      tmp = parts.index;
+              rtx tmp1;
+              /* Try to give more opportunities to scheduler -
+                 choose operand for move instruction with longer
+                 distance from its definition to insn. */

(Hm, I don't think you mean gcc insn scheduler here.)

+              if (find_nearest_reg_def (insn, regno1, regno2) == 0)
+                {
+                  tmp = parts.index;  /* choose index for move. */
+                  tmp1 = parts.base;
+                }
+              else
+               {
+                 tmp = parts.base;
+                 tmp1 = parts.index;
+               }
+	      emit_insn (gen_rtx_SET (VOIDmode, target, tmp));
+              if (parts.disp && parts.disp != const0_rtx)
+                ix86_emit_binop (PLUS, mode, target, parts.disp);
+	      ix86_emit_binop (PLUS, mode, target, tmp1);
+              return;
 	    }

(Please use tabs instead of spaces in the added code.)

However, this whole new part can be written simply as following (untested):

	    {
	      rtx tmp1;

	      if (find_nearest_reg_def (insn, regno1, regno2) == 0)
		tmp1 = parts.base, tmp = parts.index;
	      else
		tmp1 = parts.index, tmp = parts.base;

	      emit_insn (gen_rtx_SET (VOIDmode, target, tmp1);
	    }

Please see how tmp is handled further down in the function.

-	  ix86_emit_binop (PLUS, mode, target, tmp);
+         ix86_emit_binop (PLUS, mode, target, tmp);

Please watch accidental whitespace changes.

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]