[RFH] split {generic,gimple}-match.c files

Tue Sep 4 15:07:00 GMT 2018

On 09/03/2018 04:43 PM, Richard Biener wrote:
> On Mon, 3 Sep 2018, Martin LiÅ¡ka wrote:
> 
>> On 09/03/2018 04:00 PM, Richard Biener wrote:
>>> On Mon, 3 Sep 2018, Martin LiÅ¡ka wrote:
>>>
>>>> On 09/03/2018 02:54 PM, Martin LiÅ¡ka wrote:
>>>>> On 09/03/2018 02:41 PM, Richard Biener wrote:
>>>>>> On Mon, 3 Sep 2018, Martin LiÅ¡ka wrote:
>>>>>>
>>>>>>> On 04/25/2018 01:42 PM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> The following patch^Whack splits $subject files into three, one
>>>>>>>> for the predicates (due to an implementation detail) and two for
>>>>>>>> the rest - for now into similar LOC size files.
>>>>>>>>
>>>>>>>> I'd like to get help on the makefile changes to make them less
>>>>>>>> verbose, somehow globbing the -[12p] parts.
>>>>>>>>
>>>>>>>> Also you can see the split point is manually chosen which means
>>>>>>>> it will bitrot.  Timings for the stage2 compiles on a x86_64
>>>>>>>> box are
>>>>>>>>
>>>>>>>> gimple-match-p.c   5s
>>>>>>>> generic-match-p.c  3s
>>>>>>>> gimple-match-1.c  85s
>>>>>>>> generic-match-1.c 56s
>>>>>>>> gimple-match-2.c  82s
>>>>>>>> generic-match-2.c 31s
>>>>>>>>
>>>>>>>> the required header files are quite big (and of course everything
>>>>>>>> needs to be exported without the analysis work becoming too cumbersome),
>>>>>>>> it's 3342 LOC for gimple-match-head.h and 1556 LOC for 
>>>>>>>> generic-match-head.h
>>>>>>>>
>>>>>>>> The machine I tested is quite fast so the 80ish second timings are still
>>>>>>>> too slow I guess and thus splitting up into four files for gimple and
>>>>>>>> three files for generic looks better.
>>>>>>>>
>>>>>>>> Note we lose some inlining/cloning capability in the splitting process
>>>>>>>> (I see quite a bit of constprop/isra work being done on the generated 
>>>>>>>> files).  I didn't try to measure the runtime impact though.
>>>>>>>>
>>>>>>>> The patch still needs quite some TLC, it really is a bit hacky but I'd
>>>>>>>> like to get feedback on the approach and I didn't want to spend time
>>>>>>>> on programatically finding optimal split points (so everything is output
>>>>>>>> in the same semi-random order as before).
>>>>>>>>
>>>>>>>> Richard.
> ...
>>>>>>> I took a look at gimple-match.c and what about doing a split in following way:
>>>>>>> - all gimple_simplify_$number going into a separate header file (~12000 LOC)
>>>>>>> - all the function can be marked as static inline
>>>>>>> - all other gimple_simplify_$code can be split into arbitrary number of parts
>>>>>>> - we have 287 such functions where each function only calls gimple_simplify_$number and
>>>>>>>   on average there 10 of such calls
>>>>>>> - that would allow to remove most of gimple_simplify_$number functions from the header file
>>>>>>>
>>>>>>> Richi do you think it will be viable?
>>>>>>
>>>>>> That relies on the cgraph code DCEing all unused gimple_simplify_$number
>>>>>> functions from the header fast as they are now effectively duplicated
>>>>>> into all parts, correct?  Also I'm not sure if we actually want to inline
>>>>>> them...  they are split out to get both code size and compile-time
>>>>>> under control.  Unfortunately we have still high max-inline-insns-single
>>>>>> which is used for inline marked functions.
>>>>>>
>>>>>> Eventually doing a "proper" partitioning algorithm is viable, that is,
>>>>>> partition based on gimple_simplify_$code and put gimple_simplify_$number
>>>>>> where they are used.  If they are used across different codes then
>>>>>> merge those partitions.  I guess you'll see that that'll merge the 
>>>>>> biggest _$code parititions :/ (MINUS_EXPR, PLUS_EXPR).
>>>>>
>>>>> Yes, that should be much better. I'm attaching a 'callgraph' that was done by grepping.
>>>>> Function starting at the beginning of a line is function definition, with an indentation
>>>>> one can see calls.
>>>>>
>>>>> Yes, PLUS and MINUS call ~20 gimple_simplify_$number calls.
>>>>>
>>>>> Well, generating some simple call graph format for the source file and a source file
>>>>> annotation of nodes can be input for a partitioning tool that can do the split.
>>>>>
>>>>> Issue with the generated files is that one needs to fix the most offenders (*-match.c, insn-recog.c, insn-emit.c, ..)
>>>>> in order to see some improvement.
>>>>>
>>>>> Looking at insn-recog.c, maybe similar callgraph-based split can be done for recog_$number functions?
>>>>>
>>>>> Martin
>>>>>
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>
>>>>
>>>> I'm sending SCC components for gimple-match.c. So there are 3 quite big one and rest is small. It's questionable
>>>> whether partitioning based on that will provide desired speed up?
>>>
>>> When I experimented split based on # of functions wasn't working well,
>>> only split based on # of lines did.  I'd still expect that eventually
>>> basing the split on the SCC components makes sense if you use say,
>>> the biggest 4 (but measure size in # lines) and merge the rest evenly.
>>
>> I see! Note that shrinking gimple-match.o 4 times will be probably sufficient for general
>> speed up:
>> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
>>
>>>
>>> It would be nice if that all would be scriptable instead of coding it
>>> into genmatch.c but that's of course possible as well - just add
>>> some extra "passes" over code-gen as I did in the hac^Wpatch.  You
>>
>> That would be my plan, genmatch can mark in C comments function that can be partitioned
>> and callgraph of these functions.
>>
>>> could use graphds.c routines to compute SCCs for example.  Knowing
>>> # lines beforehand is a bit hard though - code-generating into
>>> a set of character buffers might be possible but I wired everything
>>> to use FILE ... (and no stringstreams in the C library).
>>> And no, please do not convert to C++ streams ;))
>>
>> ... and a C++ splitter can do the rest: read content, do SCC, split to N parts
>> and stream out.
>>
>> I can work on that. Questionable is still Makefile integration of such 
>> parallelism?
> 
> Well, just hard-code the number of pieces and thus the pices to
> compile in the end...
> 
> Of course we shouldn't add any additional build dependencies
> (a "C++ splitter").  Adding additional annotation to the genmatch
> generated sources may be OK but then eventually doing everything in
> genmatch isn't too complicated (putting aside that # of lines metric...).
> 
> Richard.
> 

Hi.

There's working solution for {gimple,generic}-match.c. It introduces splitter.cc which
parses annotated generated files and split it according to call graph provided by annotations.
Currently I copied a SCC implementation as graphds.c has quite some dependencies (xrealloc, bitmap,..).
Questionable how to reduce the dependencies?

For gimple-match.c the biggest SCC component contains about 1/4 LOC, which means maximal reasonable number
of parts is equal to 4. Doing that the biggest component is compiled with -O2 14.5s, which the original gimple-match.c
50.0s. That sounds promising:

 l gimple-match-part*.c
-rw-r--r-- 1 marxin users 791442 Sep  4 17:02 gimple-match-part-0.c
-rw-r--r-- 1 marxin users 748610 Sep  4 17:02 gimple-match-part-1.c
-rw-r--r-- 1 marxin users 783749 Sep  4 17:02 gimple-match-part-2.c
-rw-r--r-- 1 marxin users 828990 Sep  4 17:02 gimple-match-part-3.c
-rw-r--r-- 1 marxin users 103130 Sep  4 17:02 gimple-match-part-footer.c

for generic-match.c:

l generic-match-part*.c
-rw-r--r-- 1 marxin users 528712 Sep  4 17:02 generic-match-part-0.c
-rw-r--r-- 1 marxin users 517120 Sep  4 17:02 generic-match-part-1.c
-rw-r--r-- 1 marxin users 519777 Sep  4 17:02 generic-match-part-2.c
-rw-r--r-- 1 marxin users 221971 Sep  4 17:02 generic-match-part-3.c
-rw-r--r-- 1 marxin users  14596 Sep  4 17:02 generic-match-part-footer.c

There are some issues with functions that live in gimple-match-head.c, these
should live in a header file the parts can include. I also renamed
gimple_simplify function that live in gimple-match.c to gimple_simplify_generated
as they clash with these that are in gimple-match-head.c.

There are still various questions:
- how to make the split process optional, how to modify then Makefile?
- SCC implementation should be shared with graphds.c
- splitter.cc should be cleaned up
- in order to achieve real speed up we need to split also other generated (and also dwarf2out.c, i386.c, ..) files:
here I'm most concerned about insn-recog.c, which can't be split the same way without ending up with a single huge SCC component.
Ideas what to do with that file?

Thoughts?

Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Come-up-with-splitter-for-gimple-and-generic-match-f.patch
Type: text/x-patch
Size: 37256 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20180904/8af5a6f4/attachment.bin>