This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][P1 tree-optimization/68541] Add heuristics to path splitting


Hi Jeff,

On 05/02/16 23:49, Jeff Law wrote:
This patch addresses multiple issues with how we determine when to split paths.  The primary motivation is addressing 68541 (P1).


As I've gotten testcodes from Ajit, I've been able to look closely at the path splitting opportunities and consistently what I've seen is that contrary to the original belief, CSE/DCE opportunities are rarely exposed by path splitting. It seems the benefit is more due to removing the unconditional jump inherent in a CFG diamond -- even more so on the microblaze where these jumps have delay slots.

While there are cases where splitting allows for CSE/DCE, they're the exception rather than the rule in the codes I've looked at.

With that in mind, we want to encourage path splitting when the amount of code we're duplicating is small.   We also want to balance that with not path splitting when the original code is something that may be if-convertable.

The vast majority of if-convertable codes are cases where the two predecessors of the join block are single statement blocks with their results feeding the same PHI in the join block. We now reject that case for path splitting so as not to spoil if-conversion.

The only wrinkle with that heuristic is that some codes (MIN, MAX, ABS, COND, calls, etc) are not good candidates for further if conversion. (ie, we have a MIN_EXPR in both predecessors that feed the PHI in the join). So we still allow those cases for splitting. This can be easily refined as we find more cases or as the if-converter is extended.

So with that in place as the first filter, we just need to put a limiter on the number of statements we allow to be copied. I punted a bit and re-used the PARAM for jump threading. They've got enough in common that I felt it was reasonable. If I were to create a new PARAM, I'd probably start with a smaller default value after doing further instrumentation.

While I was working through this I noticed we were path splitting in some cases where we shouldn't.  Particularly if we had a CFG like:

          a
         / \
        b   c
       / \ /
      e   d
         / \
        /   \
       /     \
     latch   exit


Duplicating d for the edges b->d and c->d isn't as helpful because the duplicate can't be merged into b.  We now detect this kind of case and reject it for path splitting.

The new tests are a combination of reductions of codes from Ajit and my own poking around.

Looking further out, I really wonder if the low level aspects of path splitting like we're trying to capture here really belong in the cross-jumping phase of the RTL optimizers. The higher level aspects (when we are able to expose CSE/DCE opportunities) should be driven by actually looking at the propagation/simplifications enabled by a potential split path. But those are gcc-7 things.

Bootstrapped and regression tested on x86 linux. Installed on the trunk. I'll obviously keep an eye out for how the tests are handled on other platforms -- I don't think the tests are real sensitive to branch costs and such, but I've been surprised before.

Onward to the jump threading code explosion wrap-up...

jeff


After this patch I also see:
FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths "Duplicating join block"

on arm, but not on aarch64. My arm-none-eabi cross compiler is configured with:
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb

Kyrill


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]