This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Add of type-demotion pass


On Tue, 9 Jul 2013, Kai Tietz wrote:

I would be glad if you could give me a couple hints.
I try.

Thanks :-)

I wonder why you implemented this as a separate pass instead of adding it
to tree-ssa-forwprop. demote_cast is (or should be) a special case of
combine_conversions, so it would be nice to avoid the code duplication
(there is also a version in fold-const.c). demote_into could be called
from roughly the same place as simplify_conversion_from_bitmask. And you
could reuse get_prop_source_stmt, can_propagate_from,
remove_prop_source_from_use, etc.

Initial patch (from last year) actual implemented that in forwprop.

Ah, reading the conversation from last year helped me understand a bit better.

I was then kindly asked to put this into a separate pass. There are some good reason why forward-propagation isn't the right place for it. Eg, forwprop does type-raising by default.

Even in cases where that increases the size of the operation? I see a hoist_conversion_for_bitop_p that seems to try and prevent it.

So by implementing type-demotion there too, would lead to raise-condition. So there would be required additionally that within forwprop a straight line-depth conversion is done for statement-lists. All this doesn't fit pretty well into current concept of forward-propagation ... The cast demotion is of course something of interest for folding and might be fitting into forward-propagation-pass too. The main cause why it is implemented within demotion pass is, that this pass introduces such cast-demotion-folding opportunities due its "unsigned"-type expansion. So we want to fold that within pass and not waiting until a later pass optimizes such redundant sequences away.

I hope we can at least find a way to share code between the passes.

If I understand, the main reason is because you want to go through the
statements in reverse order, since this is the way the casts are being
propagated (would forwprop also work, just more slowly, or would it miss
opportunities across basic blocks?).
It would miss some opportunities,

Could you explain in what case? I guess my trouble understanding this is the same as in the next question, and I am missing a fundamental point...

and causes penalty on speed due concept of forwprop.

I have some trouble understanding why something as complicated as
build_and_add_sum (which isn't restricted to additions by the way) is
needed. Could you add a comment to the code explaining why you need to
insert the statements precisely there and not for instance just before
their use? Is that to try and help CSE?
Yes, this function is a bit more complex as actual required for now in general. Nevertheless required. And I expect that demotion-pass will improve and so even the right now "unlikely" cases might be required more frequent. I had in front lighter version of statement addition, but it failed in some cases.In some (rare) cases we would otherwise choose wrong basic-block to add the new "cast"-statements. Well, there is another place (reassoc) where you can find nearly the same function, and its needs are exactly the same as within demote-pass.

I had a look at the reassoc pass and failed to understand the logic there as well :-( I don't really understand how the basic block can be wrong. If we only handle the single use case with no side effects and no weird jump in between, nothing but the user should notice if you move the definition next to the use. So is this a way to handle weirder jumps than you could otherwise?

I have added an additional early pass "typedemote1" to this patch for
simple cases types can be easily sunken into statement without special
unsigned-cast for overflow-case.   Jakub asked for it. Tests have shown
that this pass does optimizations in pretty few cases.  As example in
testsuite see for example pr46867.c testcase.
The second pass (I put it behind first vrp pass to avoid
testcase-conflicts) uses 'unsigned'-type casts to avoid undefined overflow
behavior. This pass has much more hits in standard code,

My little understanding from the old conversation and Jeff's post in this thread is that there are 2 ways to think about this optimization:

1) the canonical representation is with the smallest type possible. Using forwprop could make sense there (even if it may need help from another pass for some complicated cases, and it isn't the most efficient way). There is possibly a late pass to change all the types too small for the hardware into int.

2) we compute for each operation what minimal size it needs. Then, we may do some optimizations (truncate constants), possibly convert the operations to some smaller type (many at once), but not necessarily. This requires a separate pass but doesn't look like what you are doing.



On Tue, 9 Jul 2013, Jeff Law wrote:

On 07/08/2013 02:52 PM, Marc Glisse wrote:

I wonder why you implemented this as a separate pass instead of adding
it to tree-ssa-forwprop. demote_cast is (or should be) a special case of
combine_conversions, so it would be nice to avoid the code duplication
(there is also a version in fold-const.c). demote_into could be called
from roughly the same place as simplify_conversion_from_bitmask. And you
could reuse get_prop_source_stmt, can_propagate_from,
remove_prop_source_from_use, etc.
That's a real good question; I find myself looking a lot at the bits in forwprop and I'm getting worried it's on its way to being an unmaintainable mess. Sadly, I'm making things worse rather than better with my recent changes. I'm still hoping more structure will become evident as I continue to work through various improvements.

It looks to me like a gimple version of fold, except that since it is gimple, basic operations are an order of magnitude more complicated. But I don't really see why that would make it an unmaintainable mess, giant switches are not that bad.

If I understand, the main reason is because you want to go through the
statements in reverse order, since this is the way the casts are being
propagated (would forwprop also work, just more slowly, or would it miss
opportunities across basic blocks?).
SSA_NAME_DEF_STMT can cross block boundaries.

I know, which is why it isn't obvious to me what opportunities are missed.

I assume that, when the pass is done, we can consider removing the
corresponding code from the front-ends? That would increase the hits ;-)
Kai and I have briefly touched on this, mostly in the context of removing bits from fold-const.c rather than the front-ends proper.

The code in the front-ends is a pain. Lately, it made implementing some parts of the undefined behavior sanitizer much more complicated than it should have been. I think fold-const hurts less than optimization code directly in the front-ends.

Thanks Kai and Jeff, your posts helped a lot,

--
Marc Glisse


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]