Linux kernel block sha1 code on powerpc64 has many redundant clrldi instructions, significantly slowing execution time. Current gcc seems to generate more of these than 3.4.5 which is in turn worse than 3.3. Breakdown of clrldi insns - 140 redundant clrldi on rotate insn output - 79 other redundant clrldi - 11 useful
Created attachment 18372 [details] block sha1 source blk_SHA1Block takes all its input from unsigned ints and only writes to unsigned ints, thus all zero_extends in the body of this function are redundant
Created attachment 18373 [details] Cure the zero_extends on rotate output This patch teaches gcc that the powerpc rotate/shift unit appropriately zero or sign extends to the full register width, at least for the most common case of SImode operations.
Created attachment 18374 [details] aims to teach gcc that rotate/shift insn input register's high bits are ignored This patch is aimed at the "79 other redundant clrldi", removing 59 cases on rotate/shift input. I'm not particularly happy with it due to hack for LOAD_EXTEND_OP zero_extends. Before I discovered that particular problem, fwprop seemed a natural place to teach gcc about insn inputs. If we don't leave those zero_extends alone, some rotate insns will take their input directly from the load, while other insns still need the zero_extend. This prevents combine from removing the zero_extend on loads.
Please ignore the RS6000_ALT_REG_ALLOC_ORDER hunk in rs6000-2.diff. I forgot to edit that out..
If you are going to submit these patches, can you please make EXTEND_INPUT_REG_OP a target hook instead of a macro?
Subject: Bug 41081 Author: amodra Date: Sun Aug 23 02:57:26 2009 New Revision: 151022 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=151022 Log: PR target/41081 * fwprop.c (try_fwprop_subst): Allow multiple sets. (get_reg_use_in): New function. (forward_propagate_subreg): Propagate through subreg of zero_extend or sign_extend. Modified: trunk/gcc/ChangeLog trunk/gcc/fwprop.c
.
Subject: Bug 41081 Author: amodra Date: Sun Aug 23 03:53:02 2009 New Revision: 151025 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=151025 Log: PR target/41081 * config/rs6000/rs6000.md (rotlsi3_64, ashlsi3_64, lshrsi3_64, ashrsi3_64): New. Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.md
Is it possible to extend this to address another zero extend bug, PR 17387?
No, that looks like a different problem. It affects powerpc64 too.
Subject: Bug 41081 Author: amodra Date: Sun Aug 30 06:09:42 2009 New Revision: 151221 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=151221 Log: PR target/41081 * fwprop.c (get_reg_use_in): Delete. (free_load_extend): New function. (forward_propagate_subreg): Use it. Modified: trunk/gcc/ChangeLog trunk/gcc/fwprop.c
Subject: Bug 41081 Author: bergner Date: Fri Oct 2 17:12:31 2009 New Revision: 152411 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152411 Log: Backport from mainline: 2009-08-23 Alan Modra <amodra@bigpond.net.au> PR target/41081 * config/rs6000/rs6000.md (rotlsi3_64, ashlsi3_64, lshrsi3_64, ashrsi3_64): New. Backport from 4.3 branch: 2009-09-25 Alan Modra <amodra@bigpond.net.au> * config/rs6000/rs6000.md (load_toc_v4_PIC_3c): Correct POWER form of instruction. 2009-09-23 Alan Modra <amodra@bigpond.net.au> PR target/40473 * config/rs6000/rs6000.c (rs6000_output_function_prologue): Don't call final to emit non-scheduled prologue, instead insert at entry. Modified: branches/ibm/gcc-4_3-branch/gcc/ChangeLog.ibm branches/ibm/gcc-4_3-branch/gcc/config/rs6000/rs6000.c branches/ibm/gcc-4_3-branch/gcc/config/rs6000/rs6000.md
Subject: Bug 41081 Author: bergner Date: Sat Oct 3 01:39:14 2009 New Revision: 152430 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152430 Log: Backport from mainline. 2009-08-30 Alan Modra <amodra@bigpond.net.au> PR target/41081 * fwprop.c (get_reg_use_in): Delete. (free_load_extend): New function. (forward_propagate_subreg): Use it. 2009-08-23 Alan Modra <amodra@bigpond.net.au> PR target/41081 * fwprop.c (try_fwprop_subst): Allow multiple sets. (get_reg_use_in): New function. (forward_propagate_subreg): Propagate through subreg of zero_extend or sign_extend. 2009-05-08 Paolo Bonzini <bonzini@gnu.org> PR rtl-optimization/33928 PR 26854 * fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween, process_uses, build_single_def_use_links): New. (update_df): Update use_def_ref. (forward_propagate_into): Use get_def_for_use instead of use-def chains. (fwprop_init): Call build_single_def_use_links and let it initialize dataflow. (fwprop_done): Free use_def_ref. (fwprop_addr): Eliminate duplicate call to df_set_flags. * df-problems.c (df_rd_simulate_artificial_defs_at_top, df_rd_simulate_one_insn): New. (df_rd_bb_local_compute_process_def): Update head comment. (df_chain_create_bb): Use the new RD simulation functions. * df.h (df_rd_simulate_artificial_defs_at_top, df_rd_simulate_one_insn): New. * opts.c (decode_options): Enable fwprop at -O1. * doc/invoke.texi (-fforward-propagate): Document this. Modified: branches/ibm/gcc-4_3-branch/gcc/ChangeLog.ibm branches/ibm/gcc-4_3-branch/gcc/REVISION branches/ibm/gcc-4_3-branch/gcc/df-problems.c branches/ibm/gcc-4_3-branch/gcc/df.h branches/ibm/gcc-4_3-branch/gcc/doc/invoke.texi branches/ibm/gcc-4_3-branch/gcc/fwprop.c branches/ibm/gcc-4_3-branch/gcc/opts.c
Subject: Bug 41081 Author: bergner Date: Wed Apr 28 22:52:57 2010 New Revision: 158846 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158846 Log: Backport from mainline: 2009-08-23 Alan Modra <amodra@bigpond.net.au> PR target/41081 * config/rs6000/rs6000.md (rotlsi3_64, ashlsi3_64, lshrsi3_64, ashrsi3_64): New. Modified: branches/ibm/gcc-4_4-branch/gcc/ChangeLog.ibm branches/ibm/gcc-4_4-branch/gcc/config/rs6000/rs6000.md
Subject: Bug 41081 Author: bergner Date: Thu Apr 29 14:34:35 2010 New Revision: 158902 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158902 Log: Backport from mainline. 2009-08-30 Alan Modra <amodra@bigpond.net.au> PR target/41081 * fwprop.c (get_reg_use_in): Delete. (free_load_extend): New function. (forward_propagate_subreg): Use it. 2009-08-23 Alan Modra <amodra@bigpond.net.au> PR target/41081 * fwprop.c (try_fwprop_subst): Allow multiple sets. (get_reg_use_in): New function. (forward_propagate_subreg): Propagate through subreg of zero_extend or sign_extend. 2009-05-08 Paolo Bonzini <bonzini@gnu.org> PR rtl-optimization/33928 PR 26854 * fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween, process_uses, build_single_def_use_links): New. (update_df): Update use_def_ref. (forward_propagate_into): Use get_def_for_use instead of use-def chains. (fwprop_init): Call build_single_def_use_links and let it initialize dataflow. (fwprop_done): Free use_def_ref. (fwprop_addr): Eliminate duplicate call to df_set_flags. * df-problems.c (df_rd_simulate_artificial_defs_at_top, df_rd_simulate_one_insn): New. (df_rd_bb_local_compute_process_def): Update head comment. (df_chain_create_bb): Use the new RD simulation functions. * df.h (df_rd_simulate_artificial_defs_at_top, df_rd_simulate_one_insn): New. * opts.c (decode_options): Enable fwprop at -O1. * doc/invoke.texi (-fforward-propagate): Document this. Modified: branches/ibm/gcc-4_4-branch/gcc/ChangeLog.ibm branches/ibm/gcc-4_4-branch/gcc/df-problems.c branches/ibm/gcc-4_4-branch/gcc/df.h branches/ibm/gcc-4_4-branch/gcc/doc/invoke.texi branches/ibm/gcc-4_4-branch/gcc/fwprop.c branches/ibm/gcc-4_4-branch/gcc/opts.c