This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: fix PR46029: reimplement if conversion of loads and stores


On Tue, Jul 7, 2015 at 11:23 PM, Abe <abe_skolnik@yahoo.com> wrote:
>> (if-conversion could directly generate masked load/stores
>>  of course and not use a scratch-pad at all in that case).
>
>
> IMO that`s a great idea, but I don`t know how to do it.  Hints would be
> welcome.  In particular, how does one "generate masked load/stores" at the
> GIMPLE level?

It already does that, see predicate_mem_writes.  You should definitely
preserve that path (I think it does that only when versioning the loop
for if-conversion, non-vectorized loops will then not be if-converted).

>
>> But are we correctly handling these cases in the current if conversion
>> code?
>
>
> I`m uncertain to what that is intended to refer, but I believe Sebastian
> would agree that the new if converter is safer than the old one in terms of
> correctness at the time of running the code being compiled.
>
>
>> Abe's changes would seem like a step forward from a correctness standpoint
>
>
> Not to argue, but as a point of humility: Sebastian did by far the most work
> on this patch.  I just modernized it and helped move it along.  Even _that_
> was done with Sebastian`s help.
>
>
>> even if they take us a step backwards from a performance standpoint.
>
>
> For now, we have a few performance regressions, and so far we have found
> that it`s non-trivial to remove all of those regressions.

On what hardware did you test?

> We may be better off pushing the current patch to trunk and having the
> performance regressions currently introduced by the new if converter be
> fixed by later patches.
> Pushing to trunk gives us excellent "visibility" amongst GCC hackers, so the
> code will get more "eyeballs" than if it lingers in an uncommitted patch or
> in a branch.
> I, for one, would love some help in fixing these performance regressions.
> ;-)
>
> If fixing the performance regressions winds up taking too long, perhaps the
> current imperfect patch could be undone on trunk just before a release is
> tagged,
> and then we`ll push it in again when trunk goes back to being allowed to be
> unstable?  According to my analysis of the data near the end of the page at
> <https://gcc.gnu.org/develop.html>, we have until roughly April of 2016 to
> work on not-yet-perfect patches in trunk.
>
>
>>> So the question is whether we get more non-vectorized if-converted
>>> code out of this (and thus whether we want to use --param
>>> allow-store-data-races to get the old code back which is nicer to less
>>> capable CPUs and probably faster than using scatter/gather or masked
>>> loads/stores).
>
>
>> I do think conditionalizing some of this on the allow-store-data-races
>> makes sense.
>
>
> I think having both the old if-converter and the new one "live on" in GCC is
> nontrivial, but not impossible.  I also don`t think it`s the best long-term
> goal,
> but only a short-term workaround.  In the long run, IMO there should be only
> one if converter, albeit perhaps with tweaking flags [e.g.
> "-fallow-unsafe-if-conversion"].
>
>
>> I also wonder if we should really care about load data races (not sure
>> your patch does).
>
>
> According to a recent long discussion I had with Sebastian, our current
> patch does not have the flaw I was concerned it might have in terms of loads
> because:
>
>   [1] the scratchpad is only being used to if-convert assignments to
> thread-local scalars, never to globals/statics, and because...
>
>   [2] the gimplifier is supposed to detect "the address of this scalar has
> been taken" and when such is detected in the code being compiled,
>       it causes the scalar to no longer look like a scalar in GIMPLE so that
> we are also safe from stale-data problems that could come from
>       corner-case code that takes the address of a thread-local variable and
> gives that address to another thread [which then proceeds to
>       overwrite the value in the supposedly-thread-local scalar that belongs
> to a different thread from the one doing the writing]
>
>
> Regards,
>
> Abe
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]