RFA: patch to fix IRA degradation on SPEC2006 calculix
H.J. Lu
hjl.tools@gmail.com
Wed Nov 19 18:00:00 GMT 2008
On Wed, Nov 19, 2008 at 6:56 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Nov 18, 2008 at 10:35:27PM -0800, H.J. Lu wrote:
>> On Tue, Nov 18, 2008 at 10:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Tue, Nov 18, 2008 at 7:45 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> >> H.J. reported huge degradation of IRA on calculix on x86_64
>> >>
>> >> http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00812.html
>> >>
>> >> Calculix has one hot function (about 80% of all program time) e_c3d
>> >> with small most frequently executed loop. One accidental spill in the
>> >> loop results in 30% degradation. The degradation occurred after
>> >> submitting patch introducing bad spills processing. I missed another
>> >> checks of bad spills in allocno comparison when all previous high
>> >> priority criteria are the same (see
>> >> ira-color.c::push_allocnos_to_stack).
>> >>
>> >> I also found that the hot function in calculix has more 50 loops,
>> >> therefore IRA decides to use one region allocation. Instead of such
>> >> approach, I change the code to make no more ira-max-loops-num regions
>> >> (most frequenly used ones) instead of one region. I also changed
>> >> ira-max-loops-num value to 100 (there are still functions in calculix
>> >> which contains more 100 loops) because it looks like fortran functions
>> >> have many implicit loops (from vector operations). It permits to
>> >> improve calculix score a bit more.
>> >>
>> >> The rest of the patch is a code printing more info which I used for
>> >> debugging.
>> >>
>> >> The patch was successfully bootstrapped on x86_64.
>> >>
>> >> Is it ok to commit?
>> >
>> > It doesn't bootstrap Linux/x86-64:
>> >
>> > [hjl@gnu-27 src]$ /export/build/gnu/gcc/build-x86_64-linux/./gcc/xgcc
>> > -shared-libgcc -B/export/build/gnu/gcc/build-x86_64-linux/./gcc
>> > -nostdinc++ -L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src
>> > -L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
>> > -B/usr/gcc-4.4/x86_64-unknown-linux-gnu/bin/
>> > -B/usr/gcc-4.4/x86_64-unknown-linux-gnu/lib/ -isystem
>> > /usr/gcc-4.4/x86_64-unknown-linux-gnu/include -isystem
>> > /usr/gcc-4.4/x86_64-unknown-linux-gnu/sys-include
>> > -I/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
>> > -I/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/include
>> > -I/net/gnu-13/export/gnu/src/gcc/gcc/libstdc++-v3/libsupc++
>> > -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual
>> > -fdiagnostics-show-location=once -ffunction-sections -fdata-sections
>> > -g -O2 -D_GNU_SOURCE -c
>> > /net/gnu-13/export/gnu/src/gcc/gcc/libstdc++-v3/src/bitmap_allocator.cc
>> > -fPIC -DPIC -o .libs/bitmap_allocator.o
>> > /net/gnu-13/export/gnu/src/gcc/gcc/libstdc++-v3/src/bitmap_allocator.cc:
>> > In member function âsize_t* __gnu_cxx::free_list::_M_get(size_t)â:
>> > /net/gnu-13/export/gnu/src/gcc/gcc/libstdc++-v3/src/bitmap_allocator.cc:103:
>> > internal compiler error: Segmentation fault
>> > Please submit a full bug report,
>> > with preprocessed source if appropriate.
>> > See <http://gcc.gnu.org/bugs.html> for instructions.
>> > [hjl@gnu-27 src]$
>> >
>> > I used "--enable-clocale=gnu --with-system-zlib
>> > --enable-checking=assert --with-demangler-in-ld --enable-shared
>> > --enable-threads=posix --enable-haifa --prefix=/usr/gcc-4.4
>> > --with-local-prefix=/usr/local" to configure gcc.
>> >
>>
>> Valgrind reports:
>>
>> ==666== Conditional jump or move depends on uninitialised value(s)
>> ==666== at 0xA260CF: ira_build (ira-build.c:1697)
>> ==666== by 0xA21673: rest_of_handle_ira (ira.c:1788)
>> ==666== by 0x6387D8: execute_one_pass (passes.c:1279)
>> ==666== by 0x638A04: execute_pass_list (passes.c:1328)
>> ==666== by 0x638A1C: execute_pass_list (passes.c:1329)
>> ==666== by 0x708166: tree_rest_of_compilation (tree-optimize.c:418)
>> ==666== by 0x8260C3: cgraph_expand_function (cgraphunit.c:1047)
>> ==666== by 0x8277A4: cgraph_optimize (cgraphunit.c:1106)
>> ==666== by 0x450B1C: cp_write_global_declarations (decl2.c:3615)
>> ==666== by 0x6CE146: toplev_main (toplev.c:979)
>> ==666== by 0x50448B3: (below main) (in /lib64/libc-2.5.so)
>> ==666==
>> ==666== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 1)
>> ==666== malloc/free: in use at exit: 993,009 bytes in 7,944 blocks.
>> ==666== malloc/free: 373,081 allocs, 365,137 frees, 225,318,650 bytes allocated.
>> ==666== For counts of detected errors, rerun with: -v
>> ==666== searching for pointers to 7,944 not-freed blocks.
>> ==666== checked 21,004,136 bytes.
>> ==666==
>> ==666== LEAK SUMMARY:
>> ==666== definitely lost: 144 bytes in 14 blocks.
>> ==666== possibly lost: 0 bytes in 0 blocks.
>> ==666== still reachable: 992,865 bytes in 7,930 blocks.
>> ==666== suppressed: 0 bytes in 0 blocks.
>>
>>
>
> The patch has
>
> for (n = i = 0; VEC_iterate (loop_p, ira_loops.larray, i, loop); i++)
> {
> if (ira_loop_nodes[i].parent == NULL)
> {
> /* Don't remove the root. */
> ira_loop_nodes[i].to_remove_p = false;
> continue;
> }
> sorted_loops[n++] = &ira_loop_nodes[i];
> ira_loop_nodes[i].to_remove_p
> = (low_pressure_loop_node_p (ira_loop_nodes[i].parent)
> && low_pressure_loop_node_p (&ira_loop_nodes[i]));
> }
>
> But the "parent" field may not be initialized. This patch works
> for me.
>
> H.J.
> ----
> Index: ChangeLog.ira
> ===================================================================
> --- ChangeLog.ira (revision 142002)
> +++ ChangeLog.ira (working copy)
> @@ -1,3 +1,8 @@
> +2008-11-19 H.J. Lu <hongjiu.lu@intel.com>
> +
> + * ira-build.c (create_loop_tree_nodes): Always initialize the
> + parent field.
> +
> 2008-11-18 H.J. Lu <hongjiu.lu@intel.com>
>
> * ira-color.c (pseudos_have_intersected_live_ranges_p): Define
> Index: ira-build.c
> ===================================================================
> --- ira-build.c (revision 142002)
> +++ ira-build.c (working copy)
> @@ -137,18 +137,22 @@ create_loop_tree_nodes (bool loops_p)
> skip_p = true;
> break;
> }
> + if (!skip_p)
> + {
> + edges = get_loop_exit_edges (loop);
> + for (j = 0; VEC_iterate (edge, edges, j, e); j++)
> + if ((e->flags & EDGE_ABNORMAL) && EDGE_CRITICAL_P (e))
> + {
> + skip_p = true;
> + break;
> + }
> + VEC_free (edge, heap, edges);
> + }
> if (skip_p)
> - continue;
> - edges = get_loop_exit_edges (loop);
> - for (j = 0; VEC_iterate (edge, edges, j, e); j++)
> - if ((e->flags & EDGE_ABNORMAL) && EDGE_CRITICAL_P (e))
> - {
> - skip_p = true;
> - break;
> - }
> - VEC_free (edge, heap, edges);
> - if (skip_p)
> - continue;
> + {
> + ira_loop_nodes[i].parent = NULL;
> + continue;
> + }
> }
> ira_loop_nodes[i].regno_allocno_map
> = (ira_allocno_t *) ira_allocate (sizeof (ira_allocno_t) * max_regno);
>
This patch fixes the bootstrap. But there are many regressions in
testsuite. My patch is either incomplete or incorrect.
--
H.J.
More information about the Gcc-patches
mailing list