This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify


On 03/09/15 12:11, Andre Vieira wrote:
On 01/09/15 15:01, Richard Biener wrote:
On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
<Andre.SimoesDiasVieira@arm.com> wrote:
Hi Marc,

On 28/08/15 19:07, Marc Glisse wrote:

(not a review, I haven't even read the whole patch)

On Fri, 28 Aug 2015, Andre Vieira wrote:

2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>

    * match.pd: Added new patterns:
      ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
      (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)


+(for op0 (rshift rshift lshift lshift bit_and bit_and)
+ op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
+ op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)

You can nest for-loops, it seems clearer as:
(for op0 (rshift lshift bit_and)
     (for op1 (bit_ior bit_xor)
          op2 (bit_xor bit_ior)


Will do, thank you for pointing it out.


+(simplify
+ (op2:c
+  (op1:c
+   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)

I suspect you will want more :s (single_use) and less :c (canonicalization
should put constants in second position).

I can't find the definition of :s (single_use).

Sorry for that - didn't get along updating it yet :/  It restricts matching to
sub-expressions that have a single-use.  So

+  a &= 0xd123;
+  unsigned short tem = a ^ 0x6040;
+  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
... use of tem ...

we shouldn't do the simplifcation here because the expression
(a & 0x123) ^ 0x6040 is kept live by 'tem'.

GCC internals do point out
that canonicalization does put constants in the second position, didnt see
that first. Thank you for pointing it out.

+       C1 = wi::bit_and_not (C1,C2);

Space after ','.

Will do.

Having wide_int_storage in many places is surprising, I can't find similar
code anywhere else in gcc.



I tried looking for examples of something similar, I think I ended up using
wide_int because I was able to convert easily to and from it and it has the
"mask" and "wide_int_to_tree" functions. I welcome any suggestions on what I
should be using here for integer constant transformations and comparisons.

Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
constructors - those
are indeed odd.  Is it just for the initializers of wide-ints?

+    wide_int zero_mask = wi::zero (prec);
+    wide_int C0 = wide_int_storage (@1);
+    wide_int C1 = wide_int_storage (@2);
+    wide_int C2 = wide_int_storage (@3);
...
+       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, prec));

tree_to_uhwi (@1) should do the trick as well

+       C1 = wi::bit_and_not (C1,C2);
+       cst_emit = wi::bit_or (C1, C2);

the ops should be replacable with @2 and @3, the store to C1 obviously not
(but you can use a tree temporary and use wide_int_to_tree here to avoid
the back-and-forth for the case where C1 is not assigned to).

Note that transforms only doing association are prone to endless recursion
in case some other pattern does the reverse op...

Richard.


BR,
Andre


Thank you for all the comments, see reworked version:

Two new algorithmic optimisations:
    1.((X op0 C0) op1 C1) op2 C2)
      with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
      zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
      and 0's otherwise.
      if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
      if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
      if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
    2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)


This patch does two algorithmic optimisations that target patterns like:
(((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
0x80000000.

The transformation uses the knowledge of which bits are zero after
operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
run-time operations.
The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF
and (X >> 1) ^ 0xa0000001 respectively.

The second transformation enables more applications of the first. Also
some targets may benefit from delaying shift operations. I am aware that
such an optimization, in combination with one or more optimizations that
cause the reverse transformation, may lead to an infinite loop. Though
such behavior has not been detected during regression testing and
bootstrapping on aarch64.

gcc/ChangeLog:

2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>

    * match.pd: Added new patterns:
      ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
      (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)

gcc/testsuite/ChangeLog:

2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
              Hale Wang  <hale.wang@arm.com>

    * gcc.dg/tree-ssa/forwprop-33.c: New test.


Ping.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]