[PATCH, 8/8] Do simple omp lowering for no address taken var
Tom de Vries
Tom_deVries@mentor.com
Mon Nov 24 11:55:00 GMT 2014
On 24-11-14 12:28, Tom de Vries wrote:
> On 17-11-14 11:13, Richard Biener wrote:
>> On Sat, 15 Nov 2014, Tom de Vries wrote:
>>
>>> >On 15-11-14 13:14, Tom de Vries wrote:
>>>> > >Hi,
>>>> > >
>>>> > >I'm submitting a patch series with initial support for the oacc kernels
>>>> > >directive.
>>>> > >
>>>> > >The patch series uses pass_parallelize_loops to implement parallelization of
>>>> > >loops in the oacc kernels region.
>>>> > >
>>>> > >The patch series consists of these 8 patches:
>>>> > >...
>>>> > > 1 Expand oacc kernels after pass_build_ealias
>>>> > > 2 Add pass_oacc_kernels
>>>> > > 3 Add pass_ch_oacc_kernels to pass_oacc_kernels
>>>> > > 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>>> > > 5 Add pass_loop_im to pass_oacc_kernels
>>>> > > 6 Add pass_ccp to pass_oacc_kernels
>>>> > > 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>>> > > 8 Do simple omp lowering for no address taken var
>>>> > >...
>>> >
>>> >This patch lowers integer variables that do not have their address taken as
>>> >local variable. We use a copy at region entry and exit to copy the value in
>>> >and out.
>>> >
>>> >In the context of reduction handling in a kernels region, this allows the
>>> >parloops reduction analysis to recognize the reduction, even after oacc
>>> >lowering has been done in pass_lower_omp.
>>> >
>>> >In more detail, without this patch, the omp_data_i load and stores are
>>> >generated in place (in this case, in the loop):
>>> >...
>>> > {
>>> > .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
>>> > {
>>> > unsigned intD.9 iD.2146;
>>> >
>>> > iD.2146 = 0;
>>> > goto <D.2207>;
>>> > <D.2208>:
>>> > D.2216 = .omp_data_iD.2201->cD.2203;
>>> > c.9D.2176 = *D.2216;
>>> > D.2177 = (long unsigned intD.10) iD.2146;
>>> > D.2178 = D.2177 * 4;
>>> > D.2179 = c.9D.2176 + D.2178;
>>> > D.2180 = *D.2179;
>>> > D.2217 = .omp_data_iD.2201->sumD.2205;
>>> > D.2218 = *D.2217;
>>> > D.2217 = .omp_data_iD.2201->sumD.2205;
>>> > D.2219 = D.2180 + D.2218;
>>> > *D.2217 = D.2219;
>>> > iD.2146 = iD.2146 + 1;
>>> > <D.2207>:
>>> > if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>>> > <D.2209>:
>>> > }
>>> >...
>>> >
>>> >With this patch, the omp_data_i load and stores for sum are generated at entry
>>> >and exit:
>>> >...
>>> > {
>>> > .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
>>> > D.2216 = .omp_data_iD.2201->sumD.2205;
>>> > sumD.2206 = *D.2216;
>>> > {
>>> > unsigned intD.9 iD.2146;
>>> >
>>> > iD.2146 = 0;
>>> > goto <D.2207>;
>>> > <D.2208>:
>>> > D.2217 = .omp_data_iD.2201->cD.2203;
>>> > c.9D.2176 = *D.2217;
>>> > D.2177 = (long unsigned intD.10) iD.2146;
>>> > D.2178 = D.2177 * 4;
>>> > D.2179 = c.9D.2176 + D.2178;
>>> > D.2180 = *D.2179;
>>> > sumD.2206 = D.2180 + sumD.2206;
>>> > iD.2146 = iD.2146 + 1;
>>> > <D.2207>:
>>> > if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>>> > <D.2209>:
>>> > }
>>> > *D.2216 = sumD.2206;
>>> > #pragma omp return
>>> > }
>>> >...
>>> >
>>> >
>>> >So, without the patch the reduction operation looks like this:
>>> >...
>>> > *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) + x
>>> >...
>>> >
>>> >And with this patch the reduction operation is simply:
>>> >...
>>> > sumD.2206 = sumD.2206 + x:
>>> >...
>>> >
>>> >OK for trunk?
>> I presume the reason you are trying to do that here is that otherwise
>> it happens too late? What you do is what loop store motion would
>> do.
>
> Richard,
>
> Thanks for the hint. I've built a reduction example:
> ...
> void __attribute__((noinline))
> f (unsigned int *__restrict__ a, unsigned int *__restrict__ sum, unsigned int n)
> {
> unsigned int i;
> for (i = 0; i < n; ++i)
> *sum += a[i];
> }...
> and observed that store motion of the *sum store is done by pass_loop_im,
> provided the *sum load is taken out of the the loop by pass_pre first.
>
> So alternatively, we could use pass_pre and pass_loop_im to achieve the same
> effect.
>
> When trying out adding pass_pre as a part of the pass group pass_oacc_kernels, I
> found that also pass_copyprop was required to get parloops to recognize the
> reduction.
>
Attached patch adds pass_copyprop to pass group pass_oacc_kernels.
Bootstrapped and reg-tested in the same way as before.
OK for trunk?
Thanks,
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0008-Add-pass_copy_prop-in-pass_oacc_kernels.patch
Type: text/x-patch
Size: 1441 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20141124/bc816f3b/attachment.bin>
More information about the Gcc-patches
mailing list