This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
On 04/04/16 11:13, Evandro Menezes wrote:
On 04/01/16 18:08, Wilco Dijkstra wrote:Evandro Menezes wrote:I hope that this gets in the ballpark of what's been discussed previously.Yes that's very close to what I had in mind. A minor issue is that the vectormodes cannot work as they start at MAX_MODE_FLOAT (which is > 32): +/* Control approximate alternatives to certain FP operators. */ +#define AARCH64_APPROX_MODE(MODE) \ + ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \ + ? (1 << ((MODE) - MIN_MODE_FLOAT)) \+ : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \ + : (0)) That should be:+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \It would be worth testing all the obvious cases to be sure they work.Also I don't think it is a good idea to enable all modes on Exynos-M1 and XGene-1 - I haven't seen any evidence that shows it gives a speedup on real code for all modes (or at least on a good micro benchmark like the unit vector test I suggested - a simplethroughput test does not count!).This approximation does benefit M1 in general across several benchmarks. As for my choice for Xgene1, it preserves the original setting. I believe that, with this more granular option, developers can fine tune their targets.The issue is it hides performance gains from an improved divider/sqrt on new revisions or microarchitectures. That means you should only enable cases where there is evidence of a major speedup that cannot be matched by a future improved divider/sqrt.I did notice that some benchmarks with heavy use of multiplication or multiply-accumulation, the series may be detrimental, since it increases the competition for the unit(s) that do(es) such operations.But those micro-architectures that get a better unit for division or sqrt() are free to add their own tuning parameters. Granted, I assume that running legacy code is not much of an issue only in a few markets.
Ping^1 -- Evandro Menezes
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |