This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: Yufeng Zhang <Yufeng dot Zhang at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Richard Biener <richard dot guenther at gmail dot com>
- Date: Mon, 11 Nov 2013 11:09:46 -0600
- Subject: Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
- Authentication-results: sourceware.org; auth=none
- References: <5277EA58 dot 5020303 at arm dot com>
The idea is a good one but I don't like your implementation of adding an
extra expression parameter to look at on the find_basis_for_candidate
lookup. This goes against the design of the pass and may not be
sufficiently general (might there be situations where a third possible
basis could exist?).
The overall design is set up to have alternate interpretations of
candidates in the candidate table to handle this sort of ambiguity. The
goal for your example is create a second candidate (chained to the first
one by way of the next_interp field) so that the candidate table looks
8  *_10[j_7(D)] = 2;
REF : _10 + ((sizetype) j_7(D) * 4) + 0 : int *
basis: 0 dependent: 0 sibling: 0
next-interp: 9 dead-savings: 0
9  *_10[j_7(D)] = 2;
REF : _5 + ((sizetype) j_7(D) * 4) + 800 : int *
basis: 5 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 0
This will in turn allow subsequent candidates to be seen in terms of
either _5 or _10, which may be necessary to avoid missed opportunities.
There may be a subsequent REF _15 +... that can be an affine expression
of either of these, for example.
If you fail to find a basis for a candidate with its first
interpretation, you can then follow the next-interp chain to look for a
basis for the next one, without the messy passing of extra possibilities
to the find-basis routine.
I haven't read the patch in detail, but I think this should give you
enough to work with to re-design the idea to fit better with the
existing framework. Please let me know if you need more information, or
if you feel I've misunderstood something.
On Mon, 2013-11-04 at 18:41 +0000, Yufeng Zhang wrote:
> This patch extends the slsr pass to optionally use an alternative base
> expression in finding basis for CAND_REFs. Currently the pass uses
> hash-based algorithm to match the base_expr in a candidate. Given a
> test case like the following, slsr will not be able to recognize the two
> CAND_REFs have the same basis, as their base_expr are of different
> typedef int arr_2;
> void foo (arr_2 a2, int i, int j)
> a2[i][j] = 1;
> a2[i + 10][j] = 2;
> The gimple dump before slsr is like the following (using an
> arm-none-eabi gcc):
> i.0_2 = (unsigned int) i_1(D);
> _3 = i.0_2 * 80;
> _5 = a2_4(D) + _3;
> *_5[j_7(D)] = 1; <----
> _9 = _3 + 800;
> _10 = a2_4(D) + _9;
> *_10[j_7(D)] = 2; <----
> Here are the dumps for the two CAND_REFs generated for the two
> statements pointed by the arrows:
> 4  _5 = a2_4(D) + _3;
> ADD : a2_4(D) + (80 * i_1(D)) : int *
> basis: 0 dependent: 0 sibling: 0
> next-interp: 0 dead-savings: 0
> 8  *_10[j_7(D)] = 2;
> REF : _10 + ((sizetype) j_7(D) * 4) + 0 : int *
> basis: 5 dependent: 0 sibling: 0
> next-interp: 0 dead-savings: 0
> As mentioned previously, slsr cannot establish that candidate 4 is the
> basis for the candidate 8, as they have different base_exprs: a2_4(D)
> and _10, respectively. However, the two references actually only differ
> by an immediate offset (800).
> This patch uses the tree affine combination facilities to create an
> optional alternative base expression to be used in finding (as well as
> recording) the basis. It calls tree_to_aff_combination_expand on
> base_expr, reset the offset field of the generated aff_tree to 0 and
> generate a tree from it by calling aff_combination_to_tree.
> The new tree is recorded as a potential basis, and when
> find_basis_for_candidate fails to find a basis for a CAND_REF in its
> normal approach, it searches again using a tree expanded in such way.
> Such an expanded tree usually discloses the expression behind an
> SSA_NAME. In the example above, instead of seeing the strength
> reduction candidate chains like this:
> _5 -> 5
> _10 -> 8
> we are now having:
> _5 -> 5
> _10 -> 8
> a2_4(D) + (sizetype) i_1(D) * 80 -> 5 -> 8
> With the candidates 5 and 8 linked to the same tree expression (a2_4(D)
> + (sizetype) i_1(D) * 80), slsr is now able to establish that 5 is the
> basis of 8.
> The patch doesn't attempt to change the content of any CAND_REF though.
> It only enables CAND_REFs which (1) have the same stride and (2) have
> the underlying expressions of their base_expr only differ in immediate
> offsets, to be recognized to have the same basis. The statements with
> such CAND_REFs will be lowered to MEM_REFs, and later on the RTL
> expander shall be able to fold and re-associate the immediate offsets to
> the rightmost side of the addressing expression, and therefore exposes
> the common sub-expression successfully.
> The code-gen difference of the example code on arm with -O2
> -mcpu=cortex-15 is:
> mov r3, r1, asl #6
> - add ip, r0, r2, asl #2
> str lr, [sp, #-4]!
> + mov ip, #1
> + mov lr, #2
> add r1, r3, r1, asl #4
> - mov lr, #1
> - mov r3, #2
> add r0, r0, r1
> - add r0, r0, #800
> - str lr, [ip, r1]
> - str r3, [r0, r2, asl #2]
> + add r3, r0, r2, asl #2
> + str ip, [r0, r2, asl #2]
> + str lr, [r3, #800]
> ldr pc, [sp], #4
> One fewer instruction in this simple case.
> The example used in illustration is too simple to show code-gen
> difference on x86_64, but the included test case will show the benefit
> of the patch quite obviously.
> The patch has passed
> * bootstrapping on arm and x86_64
> * regtest on arm-none-eabi, aarch64-none-elf and x86_64
> There is no regression in SPEC2K on arm or x86_64.
> OK to commit to the trunk?
> Any comment is welcomed!
> * gimple-ssa-strength-reduction.c: Include tree-affine.h.
> (find_basis_for_base_expr): Update comment.
> (find_basis_for_candidate): Add new parameter 'alt_base_expr' of
> type 'tree'. Optionally call find_basis_for_base_expr with
> (record_potential_basis): Add new parameter 'alt_base_expr' of
> type 'tree'; set node->base_expr with 'alt_base_expr' if it is
> not NULL.
> (name_expansions): New static variable.
> (get_alternative_base): New function.
> (alloc_cand_and_find_basis): Call get_alternative_base for
> Update calls to find_basis_for_candidate and
> (execute_strength_reduction): Call free_affine_expand_cache with
> * gcc.dg/tree-ssa/slsr-41.c: New test.