This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Questions about match.pd


Hi All,

I am trying to understand how match.pd works as I'm writing a simple matching rule but have run into some issues
and there's very little documented on match.pd.

short version:

1) Why is there a difference in expressiveness between the GIMPLE and the GENERIC
   versions of match.pd. Particularly why does GENERIC has the limitation that it must
   be an internal function while GIMPLE is happy to just do the substitution.

2) Why doesn't the GIMPLE pass match on the GIMPLE code produced by the Fortran version?
   I have created an example where the GIMPLE trees of the two match exactly % some attributes
   on the expressions.

3) It seems to be either or, you can't have both rules? e.g. I either match in GENERIC or GIMPLE
   but not both? Is this correct? Right now if some other optimization makes it produce x*cos(x) then
   this will never be matched again? Since the GIMPLE passes don't seem to like the internal builtin?

4) There seems to be a bug with matching in GENERIC pass and when type conversions are being done. I
   am not sure who is wrong though. In the TREE the type has been correctly casted but it results in a segfault.

Thanks,
Tamar

----

For the long version and to see my current understanding of match.pd and what I have done:

Let’s say I want to transform occurrences of `x*cos(x)` with a super-dooper optimized builtin `foo`.
So `x*cos(x) => foo(x)`.

I start off at the obvious place, `match.pd`.

Since I need a new builtin I have to first define it in `builtins.def`. Since I’m replacing a math function I want optimized versions for `float`, `double`, etc.

DEF_GCC_BUILTIN        (BUILT_IN_FOO, "foo", BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
DEF_GCC_BUILTIN        (BUILT_IN_FOOF, "foof", BT_FN_FLOAT_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST)
DEF_GCC_BUILTIN        (BUILT_IN_FOOL, "fool", BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
#define FOO_TYPE(F) BT_FN_##F##_##F##_##F
DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_FOO, "foo", FOO_TYPE, ATTR_CONST_NOTHROW_LEAF_LIST)
#undef FOO_TYPE

So now I have the builtins defined (I need to also define the proper handling code in builtins.c but that’s unrelated to the question) and can define the match rule:

/* x*cos(x) -> foo(x).  */
(for coss (COS)
 (simplify
  (mult:c (coss:s @0) @1)
  ...
 )
)

And the question is what goes into the … block.

I have builtins defined for each of the functions, so one implementation could be:

/* x*cos(x) -> foo(x).  */
(for coss (COS)
 (simplify
  (mult:c (coss:s @0) @1)
  (switch
    (if (types_match (type, float_type_node))
     (BUILT_IN_FOOF @1 @0))
    (if (types_match (type, double_type_node))
     (BUILT_IN_FOO @1 @0))
    (if (types_match (type, long_double_type_node))
     (BUILT_IN_FOOL @1 @0))
  )
 )
)

But one of the artifacts of defining the builtin in `builtins.def` is that an iterator is created for you for use in `match.pd` by `gencfn-macros.c`, in this case I have something like:

     (define_operator_list FOO
	 BUILT_IN_FOOF
	 BUILT_IN_FOO
	 BUILT_IN_FOOL
	 null)

The last `null` is important but I'll get to that later. These are generated and placed in `cfn-operators.pd` which is
included by `match.pd`.

The other artifact is a macro, for use in case statements in e.g. builtin.c, e.g. `CASE_CFN_FOO`
Which expands to:

      case CFN_BUILT_IN_FOOF:
      case CFN_BUILT_IN_FOOT:
      case CFN_BUILT_IN_FOOL:
      case CFN_FOO:

These are handy, because it means we can easily write match statements to the correct type. The iterator items are generated in increasing order of width of the type. E.g. float, double, long double.

So by iterating over both `COS` and the `FOO` iterator at the same time, we can simplify the the match rule:

/* x*cos(x) -> foo(x).  */
(for coss (COS)
     foos (FOO)
 (simplify
  (mult:c (coss:s @0) @1)
  (foos @0)
 )
)

This rule gets compiled into two files `generic-match.c` and `gimple-match.c`.
The first working on GENERIC tree and the second ofcourse on GIMPLE trees.

Given that GIMPLE and GENERIC have slightly different semantics the code for these
two are fairly different. The placement of the rule in `match.pd` seems to be important,
the rules are applied in order of their specification in `match.pd`.

The GIMPLE matcher will generate statements such as

	    switch (gimple_call_combined_fn (def))
	      {
	      case CFN_BUILT_IN_COSF:
	        {
		  tree o20 = gimple_call_arg (def, 0);
		  if ((o20 = do_valueize (valueize, o20)))
		    {
		      {
		       ...
			*res_code = CFN_BUILT_IN_FOOF;
			res_ops[0] = captures[1];
			gimple_resimplify1 (lseq, res_code, type, res_ops, valueize);
			return true;
		      }
		    }
	          break;
	        }
	      case CFN_BUILT_IN_COSS:
	        {
		  tree o20 = gimple_call_arg (def, 0);
		  if ((o20 = do_valueize (valueize, o20)))
		    {
		      {
		       ...
			*res_code = CFN_BUILT_IN_FOO;
			res_ops[0] = captures[1];
			gimple_resimplify1 (lseq, res_code, type, res_ops, valueize);
			return true;
		      }
		    }
	          break;
	        }
	      ...
	      default:;
	      }
        }

Because of the order of the iterations, the correct function is matched. e.g. COSF -> FOOF,
COS -> FOO, etc.

The GENERIC version has the same global structure but differs vastly in how replacements are done:

   case CALL_EXPR:
      switch (get_call_combined_fn (op0))
        {
	case CFN_BUILT_IN_COSF:
	  {
	    tree o20 = CALL_EXPR_ARG (op0, 0);
	    {
	     ...
	      res = maybe_build_call_expr_loc (loc, CFN_BUILT_IN_FOOF, type, 1, res_op0);
	      if (!res)
	        return NULL_TREE;
	      if (TREE_SIDE_EFFECTS (captures[2]))
		res = build2_loc (loc, COMPOUND_EXPR, type, fold_ignored_result (captures[2]), res);
	      return res;
	    }
	    break;
	  }
	case CFN_BUILT_IN_COSS:
	...

The important function here is `maybe_build_call_expr_loc`, again, I'll get to that later.

Testing it out, given a C input file:

#include <math.h>

float do_cool_stuff(float x)
{
   return x*cos(x);
}

we get:

do_cool_stuff (float x)
{
  float D.4154;

  _1 = (double) x;
  _2 = __builtin_foo (_1);
  D.4154 = (float) _2;
  return D.4154;
}

The rule seems to be matching early in the GIMPLE phase and not the generic, fair enough as long as it works.
It also correctly adds type casting and calling the double version of `foo`.

Fortran however disagrees:

For my Fortran function

 function func()
    real :: j ! output
    j = j * cos(j)
 end function func

I get

func ()
{
  real(kind=4) j;

  _1 = __builtin_cosf (j);
  j = j * _1;
}

So fortran is not applying the rule on neither GENERIC nor GIMPLE.

I am not sure why the GIMPLE one does not apply, but the GENERIC one doesn't apply
for neither Fortran nor C because of an implicit constrains on `generic-match.c` which comes
from `maybe_build_call_expr_loc`.

This function will only allow the substitution if the function to be substituted to is an internal builtin.
Presumably because it wants to prevent introducing a function that it cannot fold or lower later. As this might
introduce a link error later if it can't remove it.

Fair enough, the way to tell it that it can do get rid of the function is by adding it to `internal-fn.def`.

A comment in the file states that:

   DEF_INTERNAL_FLT_FN is like DEF_INTERNAL_OPTAB_FN, but in addition,
   the function implements the computational part of a built-in math
   function BUILT_IN_<NAME>{F,,L}.

Which is exactly what I need.

So to `internal-fn.def` I add a new entry

DEF_INTERNAL_FLT_FN (FOO, ECF_CONST, foo, unary)

Which now in order for it to work requires the optab `foo` to be defined. So in `optabs.def`
I add:

OPTAB_D (foo_optab, "foo$F$a2")

As the comment suggests I expected a mapping to be done between the typed functions.
e.g. I expected there to be 3 new internal functions, one for float, double and long double.

However only one new builtin is generated `IFN_FOO`. And this is added in the iterator for FOO
in place of the `(null)` we saw earlier:

     (define_operator_list FOO
	 BUILT_IN_FOOF
	 BUILT_IN_FOO
	 BUILT_IN_FOOL
	 IFN_FOO)

Re-running the fortran example it still does not match. This is because the `cos` is
expanded into `cosf` and then matches against `BUILT_IN_FOOF` and not the internal function.
So generic-match.c still refuses to match it.

The only way I have found to make it match is to change the matching rule to:

/* x*cos(x) -> foo(x).  */
(for coss (COS)
 (simplify
  (mult:c (coss:s @0) @1)
  (IFN_FOO @0)
 )
)

e.g. Map all the rules to the type generic IFN_FOO.

Running the example now still produces nothing, this is because if I understood correctly the rewrite
is only done *if* an optab for the required type actually exists. So I create one in the backend.

I create an obtab for both double and float types (doing so in the aarch64 backend):

(define_insn "foo<mode>2"
  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
	(abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
  "TARGET_FLOAT"
  "fabs\\t%<s>0, %<s>1"
  [(set_attr "type" "ffarith<stype>")]
)

The operation is non-sense but that doesn't matter, we just care about the gimple:

func ()
{
  real(kind=4) j;

  j = FOO (j);
}


So the fortan code has correctly replaced `x*cos(x)` with `FOO`.

C however now crashes with a segfault:

cos.c: In function ‘do_cool_stuff’:
cos.c:5:4: internal compiler error: Segmentation fault
    return x*cos(x);
    ^~~~~~
0xb55ddf crash_signal
	/work/tamchr01/gnu-work/src/gcc/gcc/toplev.c:333
0x6f227c builtin_mathfn_code(tree_node const*)
	/work/tamchr01/gnu-work/src/gcc/gcc/builtins.c:7532
0x76070e convert_to_real_1

This happens

  if (TREE_CODE (t) != CALL_EXPR
      || TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR)

here because `CALL_EXPR_FN (t)` returns NULL. This is because one of the operators
in the operators list is null. TREE_CODE just blindly dereferences it. This seems to
be a bug, however I am not sure where. If the bug is in `builtin_mathfn_code` or in the
code that generated the tree in the first place. Note that this does not happen if
type casts are not being done.

If I change

float do_cool_stuff(float x)
{
   return x*cos(x);
}

to

float do_cool_stuff(float x)
{
   return x*cosf(x);
}

The type error can be fixed by adding an extra check:

  if (TREE_CODE (t) != CALL_EXPR
      || (CALL_EXPR_FN (t) && TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR))

to `builtin_mathfn_code` and this seems to work fine and I think is correct, as this function is allowed to say NO.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]