'walk_stmt_load_store_addr_ops' for non-'gimple_assign_single_p (stmt)'

Thomas Schwinge thomas@codesourcery.com
Tue Mar 16 23:24:55 GMT 2021


Hi!

Thanks, Michael, and again Richard for your quick responses.

On 2021-03-16T15:25:10+0000, Michael Matz <matz@suse.de> wrote:
> On Tue, 16 Mar 2021, Thomas Schwinge wrote:
>
>> >>Indeed, given (Fortran) 'zzz = 1', we produce GIMPLE:
>> >>
>> >>    gimple_assign <real_cst, zzz, 1.0e+0, NULL, NULL>
>> >>
>> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, as
>> >>expected, the 'visit_store' callback invoked, with 'rhs' and 'arg':
>> >>'<var_decl zzz>'.
>> >>
>> >>However, given (Fortran) 'zzz = r + r2', we produce GIMPLE:
>> >>
>> >>    gimple_assign <plus_expr, zzz, r, r2, NULL>
>
> But that's pre-ssa form.  After writing into SSA 'zzz' will be replaced by
> an SSA name, and the actual store into 'zzz' will happen in a store
> instruction.

Oh, I see, and...

>> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see,
>> >>unexpectedly, no callback at all invoked: neither 'visit_load', nor
>> >>'visit_store' (nor 'visit_address', obviously).
>> >
>> > The variables involved are registers. You only get called on memory operands.
>>
>> How would I have told that from the 'walk_stmt_load_store_addr_ops'
>> function description?  (How to improve that one "to reflect relatity"?)
>>
>> But 'zzz' surely is the same in 'zzz = 1' vs. 'zzz = r + r2' -- for the
>> former I *do* see the 'visit_store' callback invoked, for the latter I
>> don't?
>
> The walk_gimple functions are intended to be used on the SSA form of
> gimple (i.e. the one that it is in most of the time).

Yes, "most of the time", but actually not in my case: I'm doing stuff
right after gimplification (before OMP lowering)...  So that's the
"detail" I was missing -- sorry for not mentioning that right away.  :-|

As 'walk_gimple_[...]' are used during gimplification, OMP lowering,
supposedly they're fine to use in non-SSA form -- but evidently some of
the helper functions are not.  (Might there be a way to add
'gcc_checking_assert (gimple_in_ssa_p (cfun))' or similar for that?
Putting that into 'walk_stmt_load_store_addr_ops' triggers a lot, as
called from 'gcc/cgraphbuild.c:cgraph_node::record_stmt_references', for
example.)

> And in that it's
> not the case that 'zzz = 1' and 'zzz = r + r2' are similar.  The former
> can have memory as the lhs (that includes static variables, or indirection
> through pointers), the latter can not.  The lhs of a binary statement is
> always an SSA name.  A write to an SSA name is not a store, which is why
> it's not walked for walk_stmt_load_store_addr_ops.
>
> Maybe it helps to look at simple C examples: [...]

I see, many thanks for reminding me about these items!

> If you want
> to capture writes into SSA names as well ([...])
> you need the per-operand callback indeed.

What I actually need is loads from/uses of actual variables (and I didn't
see 'walk_stmt_load_store_addr_ops' give me these).

> But that depends on
> what you actually want to do.

This is a prototype/"initial hack" for a very simple implementation of
<https://gcc.gnu.org/PR90591> "Avoid unnecessary data transfer out of OMP
construct".  (... to be completed and posted later.)  So simple that it
will easily fail (gracefully, of course), but yet is effective for a lot
of real-world code:

    subroutine [...]
    [...]
    real([...])   xx        !temporary variable, for distance calculation
    [...]
    !$acc kernels pcopyin(x, zii) reduction(+:eva) ! implicit 'copy(xx)' for scalar used inside region; established during gimplification
          do 100 i=0,n-1
            evx=0.0d0
            do 90 j=0,n-1
              xx=abs(x(1,i)-x(1,j))
    [...]
    !$acc end kernels
    [...]
    ['xx' never used here]
    end subroutine [...]

Inside 'kernels', we'd like to automatically parallelize loops (we've
basically got that working; analysis by Graphite etc.), but the problem
is that given "implicit 'copy(xx)' for scalar used inside region", when
Graphite later looks at the outlined 'kernels' region's function, it must
assume that 'xx' is still live after the OpenACC 'kernels' construct --
and thus cannot treat it as a thread-private temporary, cannot
parallelize.

Now, walking each function backwards (!), I'm taking note of any
variables' uses, and if I reach an 'kernels' construct, but have not seen
a use of 'xx', I may then optimize 'copy(xx)' -> 'firstprivate(xx)',
enabling Graphite to do its thing.  (Other such optimizations may be
added later.)  (This is inspired by Jakub's commit
1a80d6b87d81c3f336ab199a901cf80ae349c335 "re PR tree-optimization/68128
(A huge regression in Parboil v2.5 OpenMP CUTCP test (2.5 times lower
performance))".)

I've now got a simple 'callback_op', which for '!is_lhs' looks at
'get_base_address ([op])', and if that 'var' is contained in the set of
current candidates (initialized per containg 'bind's, which we enter
first, even if walking a 'gimple_seq' backwards), removes that 'var' as a
candidate for such optimization.  (Plus some "details", of couse.)  This
seems to work fine, as far as I can tell.  :-)


Of course, the eventual IPA-based solution (see PR90591, PR94693, etc.)
will be much better -- but we need something now.


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf


More information about the Gcc mailing list