This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] lra: make reload_pseudo_compare_func a proper comparator


Hello,

I'd like to apply the following LRA patch to make qsort comparator
reload_pseudo_compare_func proper (right now it lacks transitivity
due to incorrect use of non_reload_pseudos bitmap, PR 68988).

This function was originally a proper comparator, and the problematic
code was added as a fix for PR 57878.  However, some time later the fix
for PR 60650 really fixed this LRA spill failure, making the original
fix unneeded.  So now GCC can revert to the original, simpler comparator.

The only question is what comparison order would be better for performance.
The check in question only matters for multi-reg pseudos, so it matters
mostly for 64-bit modes on 32-bit architectures.

To investigate that, I've bootstrapped GCC on 32-bit x86 in 4 configurations:

1. Current trunk.

[2-4 are with original PR 57878 fix reverted]
2. Original code, with ira_reg_class_max_nregs below regno_assign_info.freq
   check.
3. Hybrid code, with i_r_c_max_nregs preferred over r_a_i.freq during the
   second assignment pass, but not first.
4. With i_r_c_max_nregs above r_a_i.freq check, i.e. always do fragmentation
   avoidance check before the frequency check. This is the original PR 57878
   fix proposed by Wei Mi.

I have found that cc1 size is largest with variant 1, variants 2 and 3 result
in ~500 bytes size reduction, and variant 4 is further ~500 bytes smaller than
that (considering only the .text section, debug info variance is larger).

I have also tested variants 2 and 4 on SPEC CPU 2000: there's no significant 
difference in performance (in fact generated code is the same on almost all
tests).

Therefore I suggest we go with variant 4, implemented by the following patch.

Bootstrapped and regtested on 32-bit x86, OK to apply?

Thanks.
Alexander

	PR rtl-optimization/57878
	PR rtl-optimization/68988
	* lra-assigns.c (reload_pseudo_compare_func): Remove fragmentation
	avoidance test involving non_reload_pseudos.  Move frequency test
	below the general fragmentation avoidance test.

diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 9208fccfd59..5a65c7c8c5f 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -211,24 +211,15 @@ reload_pseudo_compare_func (const void *v1p, const void *v2p)
   if ((diff = (ira_class_hard_regs_num[cl1]
 	       - ira_class_hard_regs_num[cl2])) != 0)
     return diff;
-  if ((diff
-       = (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode]
-	  - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0
-      /* The code below executes rarely as nregs == 1 in most cases.
-	 So we should not worry about using faster data structures to
-	 check reload pseudos.  */
-      && ! bitmap_bit_p (&non_reload_pseudos, r1)
-      && ! bitmap_bit_p (&non_reload_pseudos, r2))
-    return diff;
-  if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
-	       - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
-    return diff;
   /* Allocate bigger pseudos first to avoid register file
      fragmentation.  */
   if ((diff
        = (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode]
 	  - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0)
     return diff;
+  if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
+	       - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
+    return diff;
   /* Put pseudos from the thread nearby.  */
   if ((diff = regno_assign_info[r1].first - regno_assign_info[r2].first) != 0)
     return diff;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]