This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: determining reassociation width

On Mon, May 2, 2016 at 8:49 PM, Aaron Sawdey
<> wrote:
> So, my first cut at the function to select reassociation width for
> power was modeled after what I saw i386 and aarch64 doing, which is to
> return something based on the number of that kind of op we can do at
> the same time:
> static int
> rs6000_reassociation_width (unsigned int opc, enum machine_mode mode)
> {
>     switch (rs6000_cpu) {
>     case PROCESSOR_POWER8:
>     case PROCESSOR_POWER9:
>         if (VECTOR_MODE_P (mode))
>             return 2;
>         if (INTEGRAL_MODE_P (mode)) {
>             if ( opc == MULT_EXPR ) return 2;
>             return 6; /* correct for all integral modes? */
>         }
>         if (FLOAT_MODE_P (mode))
>             return 2;
>         /* decimal float gets default 1 */
>         break;
>     default:
>         break;
>     }
>     return 1;
> }
> However, the reality of the situation is a bit more complicated I
> think.
> * If we want maximum parallelism, we should really base this on the
> number of units times the latency. I.e. for float on p8 we have 2 units
> and 6 cycles latency so we would want to issue up to 12 fadd or fmul in
> parallel, then the result from the first one would be ready for the
> next series of dependent ops.
> * Of course this may cause massive register spills and so we can't
> really make things that wide. So, reassociation ought to be aware of
> how much register pressure it is creating and how much has been created
> by things that want to be live across this bb.
> * Ideally we would also be aware of whether we are reassociating a tree
> of fp additions whose terms are fp multiplies because now we have
> fused multipy-adds to consider. See PR 70912 for more on this.
> Suggestions?

reassoc-width cannot be more than a tunable discovered by some serious
benchmarking.  This is due to the limitations of the reassoc implementation
(which you already noticed).

To really fill the pipeline via reassoc opportunities would require do perform
the association during instruction scheduling for example (one could imagine
fully decomposing the chains and re-materializing of the chain building at
issue time).  Of course scheduling then still needs to be aware of register


> Thanks,
>    Aaron
> --
> Aaron Sawdey, Ph.D.
> 050-2/C113  (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]