RFA: patch to fix IRA degradation on SPEC2006 calculix

H.J. Lu hjl.tools@gmail.com
Wed Nov 19 18:00:00 GMT 2008


On Wed, Nov 19, 2008 at 6:56 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Nov 18, 2008 at 10:35:27PM -0800, H.J. Lu wrote:
>> On Tue, Nov 18, 2008 at 10:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Tue, Nov 18, 2008 at 7:45 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> >> H.J. reported huge degradation of IRA on calculix on x86_64
>> >>
>> >> http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00812.html
>> >>
>> >> Calculix has one hot function (about 80% of all program time) e_c3d
>> >> with small most frequently executed loop.  One accidental spill in the
>> >> loop results in 30% degradation.  The degradation occurred after
>> >> submitting patch introducing bad spills processing.  I missed another
>> >> checks of bad spills in allocno comparison when all previous high
>> >> priority criteria are the same (see
>> >> ira-color.c::push_allocnos_to_stack).
>> >>
>> >> I also found that the hot function in calculix has more 50 loops,
>> >> therefore IRA decides to use one region allocation.  Instead of such
>> >> approach, I change the code to make no more ira-max-loops-num regions
>> >> (most frequenly used ones) instead of one region.  I also changed
>> >> ira-max-loops-num value to 100 (there are still functions in calculix
>> >> which contains more 100 loops) because it looks like fortran functions
>> >> have many implicit loops (from vector operations).  It permits to
>> >> improve calculix score a bit more.
>> >>
>> >> The rest of the patch is a code printing more info which I used for
>> >> debugging.
>> >>
>> >> The patch was successfully bootstrapped on x86_64.
>> >>
>> >> Is it ok to commit?
>> >
>> > It doesn't bootstrap Linux/x86-64:
>> >
>> > [hjl@gnu-27 src]$ /export/build/gnu/gcc/build-x86_64-linux/./gcc/xgcc
>> > -shared-libgcc -B/export/build/gnu/gcc/build-x86_64-linux/./gcc
>> > -nostdinc++ -L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src
>> > -L/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
>> > -B/usr/gcc-4.4/x86_64-unknown-linux-gnu/bin/
>> > -B/usr/gcc-4.4/x86_64-unknown-linux-gnu/lib/ -isystem
>> > /usr/gcc-4.4/x86_64-unknown-linux-gnu/include -isystem
>> > /usr/gcc-4.4/x86_64-unknown-linux-gnu/sys-include
>> > -I/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
>> > -I/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/include
>> > -I/net/gnu-13/export/gnu/src/gcc/gcc/libstdc++-v3/libsupc++
>> > -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual
>> > -fdiagnostics-show-location=once -ffunction-sections -fdata-sections
>> > -g -O2 -D_GNU_SOURCE -c
>> > /net/gnu-13/export/gnu/src/gcc/gcc/libstdc++-v3/src/bitmap_allocator.cc
>> >  -fPIC -DPIC -o .libs/bitmap_allocator.o
>> > /net/gnu-13/export/gnu/src/gcc/gcc/libstdc++-v3/src/bitmap_allocator.cc:
>> > In member function âsize_t* __gnu_cxx::free_list::_M_get(size_t)â:
>> > /net/gnu-13/export/gnu/src/gcc/gcc/libstdc++-v3/src/bitmap_allocator.cc:103:
>> > internal compiler error: Segmentation fault
>> > Please submit a full bug report,
>> > with preprocessed source if appropriate.
>> > See <http://gcc.gnu.org/bugs.html> for instructions.
>> > [hjl@gnu-27 src]$
>> >
>> > I used "--enable-clocale=gnu --with-system-zlib
>> > --enable-checking=assert --with-demangler-in-ld --enable-shared
>> > --enable-threads=posix --enable-haifa --prefix=/usr/gcc-4.4
>> > --with-local-prefix=/usr/local" to configure gcc.
>> >
>>
>> Valgrind reports:
>>
>> ==666== Conditional jump or move depends on uninitialised value(s)
>> ==666==    at 0xA260CF: ira_build (ira-build.c:1697)
>> ==666==    by 0xA21673: rest_of_handle_ira (ira.c:1788)
>> ==666==    by 0x6387D8: execute_one_pass (passes.c:1279)
>> ==666==    by 0x638A04: execute_pass_list (passes.c:1328)
>> ==666==    by 0x638A1C: execute_pass_list (passes.c:1329)
>> ==666==    by 0x708166: tree_rest_of_compilation (tree-optimize.c:418)
>> ==666==    by 0x8260C3: cgraph_expand_function (cgraphunit.c:1047)
>> ==666==    by 0x8277A4: cgraph_optimize (cgraphunit.c:1106)
>> ==666==    by 0x450B1C: cp_write_global_declarations (decl2.c:3615)
>> ==666==    by 0x6CE146: toplev_main (toplev.c:979)
>> ==666==    by 0x50448B3: (below main) (in /lib64/libc-2.5.so)
>> ==666==
>> ==666== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 1)
>> ==666== malloc/free: in use at exit: 993,009 bytes in 7,944 blocks.
>> ==666== malloc/free: 373,081 allocs, 365,137 frees, 225,318,650 bytes allocated.
>> ==666== For counts of detected errors, rerun with: -v
>> ==666== searching for pointers to 7,944 not-freed blocks.
>> ==666== checked 21,004,136 bytes.
>> ==666==
>> ==666== LEAK SUMMARY:
>> ==666==    definitely lost: 144 bytes in 14 blocks.
>> ==666==      possibly lost: 0 bytes in 0 blocks.
>> ==666==    still reachable: 992,865 bytes in 7,930 blocks.
>> ==666==         suppressed: 0 bytes in 0 blocks.
>>
>>
>
> The patch has
>
>  for (n = i = 0; VEC_iterate (loop_p, ira_loops.larray, i, loop); i++)
>    {
>      if (ira_loop_nodes[i].parent == NULL)
>        {
>          /* Don't remove the root.  */
>          ira_loop_nodes[i].to_remove_p = false;
>          continue;
>        }
>      sorted_loops[n++] = &ira_loop_nodes[i];
>      ira_loop_nodes[i].to_remove_p
>        = (low_pressure_loop_node_p (ira_loop_nodes[i].parent)
>           && low_pressure_loop_node_p (&ira_loop_nodes[i]));
>    }
>
> But the "parent" field may not be initialized. This patch works
> for me.
>
> H.J.
> ----
> Index: ChangeLog.ira
> ===================================================================
> --- ChangeLog.ira       (revision 142002)
> +++ ChangeLog.ira       (working copy)
> @@ -1,3 +1,8 @@
> +2008-11-19  H.J. Lu  <hongjiu.lu@intel.com>
> +
> +       * ira-build.c (create_loop_tree_nodes): Always initialize the
> +       parent field.
> +
>  2008-11-18  H.J. Lu  <hongjiu.lu@intel.com>
>
>        * ira-color.c (pseudos_have_intersected_live_ranges_p): Define
> Index: ira-build.c
> ===================================================================
> --- ira-build.c (revision 142002)
> +++ ira-build.c (working copy)
> @@ -137,18 +137,22 @@ create_loop_tree_nodes (bool loops_p)
>                skip_p = true;
>                break;
>              }
> +         if (!skip_p)
> +           {
> +             edges = get_loop_exit_edges (loop);
> +             for (j = 0; VEC_iterate (edge, edges, j, e); j++)
> +               if ((e->flags & EDGE_ABNORMAL) && EDGE_CRITICAL_P (e))
> +                 {
> +                   skip_p = true;
> +                   break;
> +                 }
> +             VEC_free (edge, heap, edges);
> +           }
>          if (skip_p)
> -           continue;
> -         edges = get_loop_exit_edges (loop);
> -         for (j = 0; VEC_iterate (edge, edges, j, e); j++)
> -           if ((e->flags & EDGE_ABNORMAL) && EDGE_CRITICAL_P (e))
> -             {
> -               skip_p = true;
> -               break;
> -             }
> -         VEC_free (edge, heap, edges);
> -         if (skip_p)
> -           continue;
> +           {
> +             ira_loop_nodes[i].parent = NULL;
> +             continue;
> +           }
>        }
>       ira_loop_nodes[i].regno_allocno_map
>        = (ira_allocno_t *) ira_allocate (sizeof (ira_allocno_t) * max_regno);
>

This patch fixes the bootstrap. But there are many regressions in
testsuite. My patch is either incomplete or incorrect.


-- 
H.J.



More information about the Gcc-patches mailing list