This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] lra: make reload_pseudo_compare_func a proper comparator
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: gcc-patches at gcc dot gnu dot org
- Cc: "Vladimir N. Makarov" <vmakarov at redhat dot com>
- Date: Fri, 15 Sep 2017 20:38:07 +0300 (MSK)
- Subject: [PATCH] lra: make reload_pseudo_compare_func a proper comparator
- Authentication-results: sourceware.org; auth=none
Hello,
I'd like to apply the following LRA patch to make qsort comparator
reload_pseudo_compare_func proper (right now it lacks transitivity
due to incorrect use of non_reload_pseudos bitmap, PR 68988).
This function was originally a proper comparator, and the problematic
code was added as a fix for PR 57878. However, some time later the fix
for PR 60650 really fixed this LRA spill failure, making the original
fix unneeded. So now GCC can revert to the original, simpler comparator.
The only question is what comparison order would be better for performance.
The check in question only matters for multi-reg pseudos, so it matters
mostly for 64-bit modes on 32-bit architectures.
To investigate that, I've bootstrapped GCC on 32-bit x86 in 4 configurations:
1. Current trunk.
[2-4 are with original PR 57878 fix reverted]
2. Original code, with ira_reg_class_max_nregs below regno_assign_info.freq
check.
3. Hybrid code, with i_r_c_max_nregs preferred over r_a_i.freq during the
second assignment pass, but not first.
4. With i_r_c_max_nregs above r_a_i.freq check, i.e. always do fragmentation
avoidance check before the frequency check. This is the original PR 57878
fix proposed by Wei Mi.
I have found that cc1 size is largest with variant 1, variants 2 and 3 result
in ~500 bytes size reduction, and variant 4 is further ~500 bytes smaller than
that (considering only the .text section, debug info variance is larger).
I have also tested variants 2 and 4 on SPEC CPU 2000: there's no significant
difference in performance (in fact generated code is the same on almost all
tests).
Therefore I suggest we go with variant 4, implemented by the following patch.
Bootstrapped and regtested on 32-bit x86, OK to apply?
Thanks.
Alexander
PR rtl-optimization/57878
PR rtl-optimization/68988
* lra-assigns.c (reload_pseudo_compare_func): Remove fragmentation
avoidance test involving non_reload_pseudos. Move frequency test
below the general fragmentation avoidance test.
diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 9208fccfd59..5a65c7c8c5f 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -211,24 +211,15 @@ reload_pseudo_compare_func (const void *v1p, const void *v2p)
if ((diff = (ira_class_hard_regs_num[cl1]
- ira_class_hard_regs_num[cl2])) != 0)
return diff;
- if ((diff
- = (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode]
- - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0
- /* The code below executes rarely as nregs == 1 in most cases.
- So we should not worry about using faster data structures to
- check reload pseudos. */
- && ! bitmap_bit_p (&non_reload_pseudos, r1)
- && ! bitmap_bit_p (&non_reload_pseudos, r2))
- return diff;
- if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
- - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
- return diff;
/* Allocate bigger pseudos first to avoid register file
fragmentation. */
if ((diff
= (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode]
- ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0)
return diff;
+ if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
+ - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
+ return diff;
/* Put pseudos from the thread nearby. */
if ((diff = regno_assign_info[r1].first - regno_assign_info[r2].first) != 0)
return diff;