This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Tue, Jul 26, 2016 at 5:49 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: > Hi Richard, > > It turned out that the patch proposed by you does not work properly > for nested loop. If we slightly change pr70729.cc to > (non-essential part is omitted > void inline Ss::foo (float w) > { > #pragma omp simd > for (int i=0; i<S_n; i++) > { > float w1 = C2[S_n + i] * w; > v1.v_i[i] += (int)w1; > C1[S_n + i] += w1; > } > } > void Ss::boo (int n) > { > for (int i = 0; i<n; i++) > foo(ww[i]); > } > loop in foo won't be vectorized since REF_LOOP is outer loop which is > not marked with omp simd pragma: > t1.cc:73:25: note: not vectorized: not suitable for scatter store *_31 = _33; > t1.cc:73:25: note: bad data references. > > Note that the check I proposed before works fine. The attached works for me (current trunk doesn't work because of caching we do - I suppose we should try again to avoid the quadraticness in other ways when ref_indep_loop_p is called from outermost_indep_loop). Richard. > 2016-07-20 12:35 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>: >> Richard, >> >> Jakub has already fixed it. >> Sorry for troubles. >> Yuri. >> >> 2016-07-19 18:29 GMT+03:00 Renlin Li <renlin.li@foss.arm.com>: >>> Hi Yuri, >>> >>> I saw this test case runs on arm platforms, and maybe other platforms as >>> well. >>> >>> testsuite/g++.dg/vect/pr70729.cc:7:10: fatal error: xmmintrin.h: No such >>> file or directory >>> >>> Before the change here, it's gated by vect_simd_clones target selector, >>> which limit it to i?86/x86_64 platform only. >>> >>> Regards, >>> Renlin Li >>> >>> >>> >>> >>> On 08/07/16 15:07, Yuri Rumyantsev wrote: >>>> >>>> Hi Richard, >>>> >>>> Thanks for your help - your patch looks much better. >>>> Here is new patch in which additional argument was added to determine >>>> source loop of reference. >>>> >>>> Bootstrap and regression testing did not show any new failures. >>>> >>>> Is it OK for trunk? >>>> ChangeLog: >>>> 2016-07-08 Yuri Rumyantsev <ysrumyan@gmail.com> >>>> >>>> PR tree-optimization/71734 >>>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Add REF_LOOP argument which >>>> contains REF, use it to check safelen, assume that safelen value >>>> must be greater 1, fix style. >>>> (ref_indep_loop_p_2): Add REF_LOOP argument. >>>> (ref_indep_loop_p): Pass LOOP as additional argument to >>>> ref_indep_loop_p_2. >>>> gcc/testsuite/ChangeLog: >>>> * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style. >>>> >>>> 2016-07-08 11:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>> >>>>> On Thu, Jul 7, 2016 at 5:38 PM, Yuri Rumyantsev <ysrumyan@gmail.com> >>>>> wrote: >>>>>> >>>>>> I checked simd3.f90 and found out that my additional check reject >>>>>> independence of references >>>>>> >>>>>> REF is independent in loop#3 >>>>>> .istart0.19, .iend0.20 >>>>>> which are defined in loop#1 which is outer for loop#3. >>>>>> Note that these references are defined by >>>>>> _103 = __builtin_GOMP_loop_dynamic_next (&.istart0.19, &.iend0.20); >>>>>> which is in loop#1. >>>>>> It is clear that both these references can not be independent for >>>>>> loop#3. >>>>> >>>>> >>>>> Ok, so we end up calling ref_indep_loop for ref in LOOP also for inner >>>>> loops >>>>> of LOOP to catch memory references in those as well. So the issue is >>>>> really >>>>> that we look at the wrong loop for safelen and we _do_ want to apply >>>>> safelen >>>>> to inner loops as well. >>>>> >>>>> So better track the loop we are ultimately asking the question for, like >>>>> in the >>>>> attached patch (fixes the testcase for me). >>>>> >>>>> Richard. >>>>> >>>>> >>>>> >>>>>> 2016-07-07 17:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>>>> >>>>>>> On Thu, Jul 7, 2016 at 4:04 PM, Yuri Rumyantsev <ysrumyan@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> I Added this check because of new failures in libgomp.fortran suite. >>>>>>>> Here is copy of Jakub message: >>>>>>>> --- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> --- >>>>>>>> The #c27 r237844 change looks bogus to me. >>>>>>>> First of all, IMNSHO you can argue this way only if ref is a reference >>>>>>>> seen in >>>>>>>> loop LOOP, >>>>>>> >>>>>>> >>>>>>> or inner loops of LOOP I guess. I _think_ we never call >>>>>>> ref_indep_loop_p_1 with >>>>>>> a REF whose loop is not a sub-loop of LOOP or LOOP itself (as it would >>>>>>> not make >>>>>>> sense to do that, it would be a waste of time). >>>>>>> >>>>>>> So only if "or inner loops of LOOP" is not correct the check would be >>>>>>> needed >>>>>>> but then my issue with unrolling an inner loop and turning a ref that >>>>>>> safelen >>>>>>> does not apply to into a ref that it now applies to arises. >>>>>>> >>>>>>> I don't fully get what Jakub is hinting at. >>>>>>> >>>>>>> Can you install the safelen > 0 -> safelen > 1 fix please? Jakub, can >>>>>>> you >>>>>>> explain that bitmap check with a simple testcase? >>>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>>> which is the case of e.g. *.omp_data_i_23(D).a ref in simd3.f90 -O2 >>>>>>>> -fopenmp -msse2, but not the D.3815[0] case tested during can_sm_ref_p >>>>>>>> - the >>>>>>>> D.3815[0] = 0; as well as something = D.3815[0]; stmt found in the >>>>>>>> outer loop >>>>>>>> obviously can be dependent on many of the loads and/or stores in the >>>>>>>> loop, be >>>>>>>> it "omp simd array" or not. >>>>>>>> Say for >>>>>>>> void >>>>>>>> foo (int *p, int *q) >>>>>>>> { >>>>>>>> #pragma omp simd >>>>>>>> for (int i = 0; i < 1024; i++) >>>>>>>> p[i] += q[0]; >>>>>>>> } >>>>>>>> sure, q[0] can't alias p[0] ... p[1022], the earlier iterations could >>>>>>>> write >>>>>>>> something that changes its value, and then it would behave differently >>>>>>>> from >>>>>>>> using VF = 1024, where everything is performed in parallel. >>>>>>>> Though, actually, it can alias, just it would have to write the same >>>>>>>> value as >>>>>>>> was there. So, if this is used to determine if it is safe to hoist >>>>>>>> the load >>>>>>>> before the loop, it is fine, if it is used to determine if &q[0] >= >>>>>>>> &p[0] && >>>>>>>> &q[0] <= &p[1023], then it is not fine. >>>>>>>> >>>>>>>> For aliasing of q[0] and p[1023], I don't see why they couldn't alias >>>>>>>> in a >>>>>>>> valid program. #pragma omp simd I think guarantees that the last >>>>>>>> iteration is >>>>>>>> executed last, it isn't necessarily executed last alone, it could be, >>>>>>>> or >>>>>>>> together with one before last iteration, or (for simdlen INT_MAX) even >>>>>>>> all >>>>>>>> iterations can be done concurrently, in hw or sw, so it is fine if it >>>>>>>> is >>>>>>>> transformed into: >>>>>>>> int temp[1024], temp2[1024], temp3[1024]; >>>>>>>> for (int i = 0; i < 1024; i++) >>>>>>>> temp[i] = p[i]; >>>>>>>> for (int i = 0; i < 1024; i++) >>>>>>>> temp2[i] = q[0]; >>>>>>>> /* The above two loops can be also swapped, or intermixed. */ >>>>>>>> for (int i = 0; i < 1024; i++) >>>>>>>> temp3[i] = temp[i] + temp2[i]; >>>>>>>> for (int i = 0; i < 1024; i++) >>>>>>>> p[i] = temp3[i]; >>>>>>>> /* Or the above loop reversed etc. */ >>>>>>>> >>>>>>>> If you have: >>>>>>>> int >>>>>>>> bar (int *p, int *q) >>>>>>>> { >>>>>>>> q[0] = 0; >>>>>>>> #pragma omp simd >>>>>>>> for (int i = 0; i < 1024; i++) >>>>>>>> p[i]++; >>>>>>>> return q[0]; >>>>>>>> } >>>>>>>> i.e. something similar to what misbehaves in simd3.f90 with the >>>>>>>> change, then >>>>>>>> the answer is that q[0] isn't guaranteed to be independent of any >>>>>>>> references in >>>>>>>> the simd loop. >>>>>>>> >>>>>>>> 2016-07-07 16:57 GMT+03:00 Richard Biener >>>>>>>> <richard.guenther@gmail.com>: >>>>>>>>> >>>>>>>>> On Thu, Jul 7, 2016 at 3:24 PM, Yuri Rumyantsev <ysrumyan@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Richard, >>>>>>>>>> >>>>>>>>>> Did you meas the following check enhancement: >>>>>>>>>> >>>>>>>>>> outer = loop_outer (loop); >>>>>>>>>> /* We consider REF defined in LOOP as independent if at least 2 >>>>>>>>>> loop >>>>>>>>>> iterations are not dependent. */ >>>>>>>>>> if ((loop->safelen > 1 >>>>>>>>>> || (outer && outer->safelen > 1)) >>>>>>>>>> && bitmap_bit_p (&memory_accesses.refs_in_loop[loop->num], >>>>>>>>>> ref->id)) >>>>>>>>>> return true; >>>>>>>>>> which allows us to consider reference x[0] as invariant for the test >>>>>>>>>> #pragma omp simd >>>>>>>>>> for (i = 0; i< 100; i++) >>>>>>>>>> { >>>>>>>>>> y[i] = i * 2; >>>>>>>>>> for (j = 0; j < 100; j++) >>>>>>>>>> z[j] += x[0]; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Moving statement >>>>>>>>>> _9 = *x_19(D); >>>>>>>>>> (cost 20) out of loop 1. >>>>>>>>> >>>>>>>>> >>>>>>>>> outer loops will be automatically considered by outermost_indep_loop >>>>>>>>> which eventually >>>>>>>>> calls the function you are patching. >>>>>>>>> >>>>>>>>> I was questioning the added test >>>>>>>>> >>>>>>>>> && bitmap_bit_p (&memory_accesses.refs_in_loop[loop->num], >>>>>>>>> ref->id)) >>>>>>>>> >>>>>>>>> and still do not understand why you need that. It makes us only >>>>>>>>> consider the >>>>>>>>> loop of ref itself when looking at safelen (but not refs that belong >>>>>>>>> to inner loops >>>>>>>>> of loop). >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>> >>>>>>>>>> 2016-07-07 11:46 GMT+03:00 Richard Biener >>>>>>>>>> <richard.guenther@gmail.com>: >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 6, 2016 at 4:42 PM, Yuri Rumyantsev >>>>>>>>>>> <ysrumyan@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Richard, >>>>>>>>>>>> >>>>>>>>>>>> I don't understand your question and how it is related to proposed >>>>>>>>>>>> patch. >>>>>>>>>>>> >>>>>>>>>>>> 2016-07-06 16:55 GMT+03:00 Richard Biener >>>>>>>>>>>> <richard.guenther@gmail.com>: >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 6, 2016 at 3:50 PM, Yuri Rumyantsev >>>>>>>>>>>>> <ysrumyan@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> See for example Jakub comments for 70729. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> But how can the check be valid given a >>>>>>>>>>>>> >>>>>>>>>>>>> #pragma omp simd >>>>>>>>>>>>> for (;;) >>>>>>>>>>>>> { >>>>>>>>>>>>> for (;;) >>>>>>>>>>>>> ...ref in question... >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> can be transformed to >>>>>>>>>>>>> >>>>>>>>>>>>> #pragma omp simd >>>>>>>>>>>>> for (;;) >>>>>>>>>>>>> { >>>>>>>>>>>>> ...ref in question... >>>>>>>>>>>>> ...ref in question.. >>>>>>>>>>>>> ... >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> by means of unrolling? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The bitmap check would return false for the not unrolled inner loop >>>>>>>>>>> and true for the inner unrolled loop. >>>>>>>>>>> So it cannot be correct. >>>>>>>>>>> >>>>>>>>>>> (not sure why this is all off-list btw) >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>>> 2016-07-06 16:25 GMT+03:00 Richard Biener >>>>>>>>>>>>>> <richard.guenther@gmail.com>: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 1:43 PM, Yuri Rumyantsev >>>>>>>>>>>>>>> <ysrumyan@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I did not learn LIM detailed but I see on the following >>>>>>>>>>>>>>>> example >>>>>>>>>>>>>>>> #pragma omp simd >>>>>>>>>>>>>>>> for (i = 0; i< 100; i++) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> y[i] = i * 2; >>>>>>>>>>>>>>>> for (j = 0; j < 100; j++) >>>>>>>>>>>>>>>> z[j] += x[0]; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> that only inner,ost loop is handled: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Memory reference in loop#2 1: *_7 >>>>>>>>>>>>>>>> Memory reference in loop#2 2: *x_19(D) >>>>>>>>>>>>>>>> Memory reference in loop#1 3: *_3 >>>>>>>>>>>>>>>> Basic block 3 (loop 1 -- depth 1): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Basic block 4 (loop 2 -- depth 2): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Loop#2 is innermost >>>>>>>>>>>>>>>> Querying dependency of refs 2 and 1: dependent. >>>>>>>>>>>>>>>> Querying dependencies of ref 2 in loop 2: dependent >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> well, LIM doesn't bother to check against refs in an outer loop >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>> refs in the inner loop are already dependent. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Loop unroll of inner loop does not invalidate 'safelen' of >>>>>>>>>>>>>>>> outer one >>>>>>>>>>>>>>>> accordingly with safelen definition. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So why do you need the bitmap_bit_p check then? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2016-07-06 14:23 GMT+03:00 Richard Biener >>>>>>>>>>>>>>>> <richard.guenther@gmail.com>: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 1:17 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>> <ysrumyan@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> LIM phase considers only innermost loops and does not handle >>>>>>>>>>>>>>>>>> loop >>>>>>>>>>>>>>>>>> nests, so in your case the innermost loop (j) does not have >>>>>>>>>>>>>>>>>> non-zero >>>>>>>>>>>>>>>>>> safelen. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Huh? LIM considers loop nests just fine. And yes, the >>>>>>>>>>>>>>>>> innermost loop >>>>>>>>>>>>>>>>> does not have safelen but shouldn't it? Consider the inner >>>>>>>>>>>>>>>>> loop being unrolled >>>>>>>>>>>>>>>>> by GCC so it is removed - should that then invalidate the >>>>>>>>>>>>>>>>> outer loop safelen >>>>>>>>>>>>>>>>> because innermost loop refs now appear in the outer loop? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2016-07-06 13:34 GMT+03:00 Richard Biener >>>>>>>>>>>>>>>>>> <richard.guenther@gmail.com>: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 11:50 AM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>>> <ysrumyan@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I pointed out in the commentary that REF is defined inside >>>>>>>>>>>>>>>>>>>> loop and >>>>>>>>>>>>>>>>>>>> this check is related to this statement. Should I clarify >>>>>>>>>>>>>>>>>>>> it? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> + /* We consider REF defined in LOOP as independent if at >>>>>>>>>>>>>>>>>>>> least 2 loop >>>>>>>>>>>>>>>>>>>> + iterations are not dependent. */ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes, I fail to see why x[0] should not be disambiguated >>>>>>>>>>>>>>>>>>> against y[i] in >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> #pragma GCC loop ivdep >>>>>>>>>>>>>>>>>>> for (i...) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> y[i] = ...; >>>>>>>>>>>>>>>>>>> for (j...) >>>>>>>>>>>>>>>>>>> ... = x[0]; >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> REF is always inside the loop nest with LOOP being the >>>>>>>>>>>>>>>>>>> outermost loop. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2016-07-06 12:38 GMT+03:00 Richard Biener >>>>>>>>>>>>>>>>>>>> <richard.guenther@gmail.com>: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Tue, Jul 5, 2016 at 4:56 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>>>>> <ysrumyan@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Here is a simple fix to cure regressions introduced by >>>>>>>>>>>>>>>>>>>>>> my fix for >>>>>>>>>>>>>>>>>>>>>> 70729. Patch also contains minor changes in test found >>>>>>>>>>>>>>>>>>>>>> by Jakub. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Bootstrapping and regression testing did not show any >>>>>>>>>>>>>>>>>>>>>> new failures. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> + && bitmap_bit_p >>>>>>>>>>>>>>>>>>>>> (&memory_accesses.refs_in_loop[loop->num], ref->id)) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So safelen does not apply to refs in nested loops? The >>>>>>>>>>>>>>>>>>>>> added comment only >>>>>>>>>>>>>>>>>>>>> explains the safelen check change but this also requires >>>>>>>>>>>>>>>>>>>>> explanation. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>>>>>>>> 2016-07-05 Yuri Rumyantsev <ysrumyan@gmail.com> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> PR tree-optimization/71734 >>>>>>>>>>>>>>>>>>>>>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider REF >>>>>>>>>>>>>>>>>>>>>> defined in >>>>>>>>>>>>>>>>>>>>>> LOOP as independent if at least two loop iterations are >>>>>>>>>>>>>>>>>>>>>> not dependent. >>>>>>>>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>>>>>>>>>>>> * g++.dg/vect/pr70729.cc: Delete redundant dg >>>>>>>>>>>>>>>>>>>>>> options, fix style.
Attachment:
p
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |