Bug 41081 - redundant ZERO_EXTENDs
Summary: redundant ZERO_EXTENDs
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.0
: P3 normal
Target Milestone: 4.5.0
Assignee: Alan Modra
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-16 02:37 UTC by Alan Modra
Modified: 2009-08-24 02:38 UTC (History)
2 users (show)

See Also:
Host:
Target: powerpc64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
block sha1 source (2.45 KB, application/x-gzip)
2009-08-16 02:41 UTC, Alan Modra
Details
Cure the zero_extends on rotate output (691 bytes, patch)
2009-08-16 02:45 UTC, Alan Modra
Details | Diff
aims to teach gcc that rotate/shift insn input register's high bits are ignored (2.11 KB, patch)
2009-08-16 03:05 UTC, Alan Modra
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Modra 2009-08-16 02:37:18 UTC
Linux kernel block sha1 code on powerpc64 has many redundant clrldi instructions, significantly slowing execution time.  Current gcc seems to generate more of these than 3.4.5 which is in turn worse than 3.3.

Breakdown of clrldi insns
- 140 redundant clrldi on rotate insn output
- 79 other redundant clrldi
- 11 useful
Comment 1 Alan Modra 2009-08-16 02:41:11 UTC
Created attachment 18372 [details]
block sha1 source

blk_SHA1Block takes all its input from unsigned ints and only writes to unsigned ints, thus all zero_extends in the body of this function are redundant
Comment 2 Alan Modra 2009-08-16 02:45:06 UTC
Created attachment 18373 [details]
Cure the zero_extends on rotate output

This patch teaches gcc that the powerpc rotate/shift unit appropriately zero or sign extends to the full register width, at least for the most common case of SImode operations.
Comment 3 Alan Modra 2009-08-16 03:05:08 UTC
Created attachment 18374 [details]
aims to teach gcc that rotate/shift insn input register's high bits are ignored

This patch is aimed at the "79 other redundant clrldi", removing 59 cases on rotate/shift input.  I'm not particularly happy with it due to hack for LOAD_EXTEND_OP zero_extends.  Before I discovered that particular problem, fwprop seemed a natural place to teach gcc about insn inputs.  If we don't leave those zero_extends alone, some rotate insns will take their input directly from the load, while other insns still need the zero_extend.  This prevents combine from removing the zero_extend on loads.
Comment 4 Alan Modra 2009-08-16 03:07:35 UTC
Please ignore the RS6000_ALT_REG_ALLOC_ORDER hunk in rs6000-2.diff.  I forgot to edit that out..
Comment 5 Steven Bosscher 2009-08-16 09:50:23 UTC
If you are going to submit these patches, can you please make EXTEND_INPUT_REG_OP a target hook instead of a macro?
Comment 6 Alan Modra 2009-08-23 02:57:41 UTC
Subject: Bug 41081

Author: amodra
Date: Sun Aug 23 02:57:26 2009
New Revision: 151022

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=151022
Log:
	PR target/41081
	* fwprop.c (try_fwprop_subst): Allow multiple sets.
	(get_reg_use_in): New function.
	(forward_propagate_subreg): Propagate through subreg of zero_extend
	or sign_extend.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fwprop.c

Comment 7 Alan Modra 2009-08-23 03:49:16 UTC
.
Comment 8 Alan Modra 2009-08-23 03:53:24 UTC
Subject: Bug 41081

Author: amodra
Date: Sun Aug 23 03:53:02 2009
New Revision: 151025

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=151025
Log:
	PR target/41081
	* config/rs6000/rs6000.md (rotlsi3_64, ashlsi3_64, lshrsi3_64,
	ashrsi3_64): New.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/rs6000.md

Comment 9 H.J. Lu 2009-08-23 19:55:13 UTC
Is it possible to extend this to address another zero extend bug, PR 17387?
Comment 10 Alan Modra 2009-08-24 02:38:57 UTC
No, that looks like a different problem.  It affects powerpc64 too.
Comment 11 Alan Modra 2009-08-30 06:09:56 UTC
Subject: Bug 41081

Author: amodra
Date: Sun Aug 30 06:09:42 2009
New Revision: 151221

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=151221
Log:
	PR target/41081
	* fwprop.c (get_reg_use_in): Delete.
	(free_load_extend): New function.
	(forward_propagate_subreg): Use it.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fwprop.c

Comment 12 Peter Bergner 2009-10-02 17:12:43 UTC
Subject: Bug 41081

Author: bergner
Date: Fri Oct  2 17:12:31 2009
New Revision: 152411

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152411
Log:
	Backport from mainline:

	2009-08-23  Alan Modra  <amodra@bigpond.net.au>
	PR target/41081
	* config/rs6000/rs6000.md (rotlsi3_64, ashlsi3_64, lshrsi3_64,
	ashrsi3_64): New.


	Backport from 4.3 branch:

	2009-09-25  Alan Modra  <amodra@bigpond.net.au>
	* config/rs6000/rs6000.md (load_toc_v4_PIC_3c): Correct POWER
	form of instruction.

	2009-09-23  Alan Modra  <amodra@bigpond.net.au>
	PR target/40473
	* config/rs6000/rs6000.c (rs6000_output_function_prologue): Don't
	call final to emit non-scheduled prologue, instead insert at entry.

Modified:
    branches/ibm/gcc-4_3-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_3-branch/gcc/config/rs6000/rs6000.c
    branches/ibm/gcc-4_3-branch/gcc/config/rs6000/rs6000.md

Comment 13 Peter Bergner 2009-10-03 01:39:35 UTC
Subject: Bug 41081

Author: bergner
Date: Sat Oct  3 01:39:14 2009
New Revision: 152430

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152430
Log:
	Backport from mainline.

	2009-08-30  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* fwprop.c (get_reg_use_in): Delete.
	(free_load_extend): New function.
	(forward_propagate_subreg): Use it.

	2009-08-23  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* fwprop.c (try_fwprop_subst): Allow multiple sets.
	(get_reg_use_in): New function.
	(forward_propagate_subreg): Propagate through subreg of zero_extend
	or sign_extend.

	2009-05-08  Paolo Bonzini  <bonzini@gnu.org>

	PR rtl-optimization/33928
	PR 26854
	* fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween,
	process_uses, build_single_def_use_links): New.
	(update_df): Update use_def_ref.
	(forward_propagate_into): Use get_def_for_use instead of use-def
	chains.
	(fwprop_init): Call build_single_def_use_links and let it initialize
	dataflow.
	(fwprop_done): Free use_def_ref.
	(fwprop_addr): Eliminate duplicate call to df_set_flags.
	* df-problems.c (df_rd_simulate_artificial_defs_at_top,
	df_rd_simulate_one_insn): New.
	(df_rd_bb_local_compute_process_def): Update head comment.
	(df_chain_create_bb): Use the new RD simulation functions.
	* df.h (df_rd_simulate_artificial_defs_at_top,
	df_rd_simulate_one_insn): New.
	* opts.c (decode_options): Enable fwprop at -O1.
	* doc/invoke.texi (-fforward-propagate): Document this.

Modified:
    branches/ibm/gcc-4_3-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_3-branch/gcc/REVISION
    branches/ibm/gcc-4_3-branch/gcc/df-problems.c
    branches/ibm/gcc-4_3-branch/gcc/df.h
    branches/ibm/gcc-4_3-branch/gcc/doc/invoke.texi
    branches/ibm/gcc-4_3-branch/gcc/fwprop.c
    branches/ibm/gcc-4_3-branch/gcc/opts.c

Comment 14 Peter Bergner 2010-04-28 22:53:13 UTC
Subject: Bug 41081

Author: bergner
Date: Wed Apr 28 22:52:57 2010
New Revision: 158846

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158846
Log:
	Backport from mainline:
	2009-08-23  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* config/rs6000/rs6000.md (rotlsi3_64, ashlsi3_64, lshrsi3_64,
	ashrsi3_64): New.

Modified:
    branches/ibm/gcc-4_4-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_4-branch/gcc/config/rs6000/rs6000.md

Comment 15 Peter Bergner 2010-04-29 14:34:58 UTC
Subject: Bug 41081

Author: bergner
Date: Thu Apr 29 14:34:35 2010
New Revision: 158902

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158902
Log:
	Backport from mainline.

	2009-08-30  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* fwprop.c (get_reg_use_in): Delete.
	(free_load_extend): New function.
	(forward_propagate_subreg): Use it.

	2009-08-23  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* fwprop.c (try_fwprop_subst): Allow multiple sets.
	(get_reg_use_in): New function.
	(forward_propagate_subreg): Propagate through subreg of zero_extend
	or sign_extend.

	2009-05-08  Paolo Bonzini  <bonzini@gnu.org>

	PR rtl-optimization/33928
	PR 26854
	* fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween,
	process_uses, build_single_def_use_links): New.
	(update_df): Update use_def_ref.
	(forward_propagate_into): Use get_def_for_use instead of use-def
	chains.
	(fwprop_init): Call build_single_def_use_links and let it initialize
	dataflow.
	(fwprop_done): Free use_def_ref.
	(fwprop_addr): Eliminate duplicate call to df_set_flags.
	* df-problems.c (df_rd_simulate_artificial_defs_at_top,
	df_rd_simulate_one_insn): New.
	(df_rd_bb_local_compute_process_def): Update head comment.
	(df_chain_create_bb): Use the new RD simulation functions.
	* df.h (df_rd_simulate_artificial_defs_at_top,
	df_rd_simulate_one_insn): New.
	* opts.c (decode_options): Enable fwprop at -O1.
	* doc/invoke.texi (-fforward-propagate): Document this.

Modified:
    branches/ibm/gcc-4_4-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_4-branch/gcc/df-problems.c
    branches/ibm/gcc-4_4-branch/gcc/df.h
    branches/ibm/gcc-4_4-branch/gcc/doc/invoke.texi
    branches/ibm/gcc-4_4-branch/gcc/fwprop.c
    branches/ibm/gcc-4_4-branch/gcc/opts.c