This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions


Hi Joseph,

On Thu, May 26, 2016 at 09:02:02PM +0000, Joseph Myers wrote:
> On Thu, 26 May 2016, Jan Hubicka wrote:
> 
> > > > +ffp-int-builtin-inexact
> > > > +Common Report Var(flag_fp_int_builtin_inexact) Optimization
> > > > +Allow built-in functions ceil, floor, round, trunc to raise \"inexact\" exceptions.
> > 
> > When adding new codegen option which affects the correctness, it is also
> > necessary to update can_inline_edge_p and inline_call.
> 
> This patch version adds handling for the new option in those places.
> Other changes: the default for the option is corrected so that
> -ffp-int-builtin-inexact really is in effect by default as intended;
> md.texi documentation for the patterns in question is updated to
> describe how they are affected by this option.
> 
> 
> Add option for whether ceil etc. can raise "inexact", adjust x86 conditions.
> 
> In ISO C99/C11, the ceil, floor, round and trunc functions may or may
> not raise the "inexact" exception for noninteger arguments.  Under TS
> 18661-1:2014, the C bindings for IEEE 754-2008, these functions are
> prohibited from raising "inexact", in line with the general rule that
> "inexact" is only when the mathematical infinite precision result of a
> function differs from the result after rounding to the target type.
> 
> GCC has no option to select TS 18661 requirements for not raising
> "inexact" when expanding built-in versions of these functions inline.
> Furthermore, even given such requirements, the conditions on the x86
> insn patterns for these functions are unnecessarily restrictive.  I'd
> like to make the out-of-line glibc versions follow the TS 18661
> requirements; in the cases where this slows them down (the cases using
> x87 floating point), that makes it more important for inline versions
> to be used when the user does not care about "inexact".

Unfortunately, I have found out that this commit regresses run-time of
538.imagick_r by about 5% on an AMD Ryzen machine and by 9% on a
slightly older Intel machine when compiled with just -O2 (so with
generic tuning).

The problem is that ImageMagick spends a lot time calculating ceil and
floor and even with with generic tuning their library implementations
can use the ifunc mechanism to execute an efficient SSE 4.1
implementation on the processors that have it, whereas the inline
expansion cannot do so and is much bigger and much much slower.  To
give you an idea, this is the profile before and after the change:

  | Symbol                           |  237073 | % of runtime |  237074 | % of runtime | sample delta | % sample delta |
  |----------------------------------+---------+--------------+---------+--------------+--------------+----------------|
  | MorphologyApply                  | 1058932 |       52.88% | 1043194 |       46.65% |       -15738 |          98.51 |
  | MeanShiftImage                   |  508088 |       25.50% |  833378 |       37.43% |       325290 |         164.02 |
  | GetVirtualPixelsFromNexus        |  173354 |        8.70% |  168298 |        7.56% |        -5056 |          97.08 |
  | SetPixelCacheNexusPixels.isra.10 |  114101 |        5.72% |  112790 |        5.07% |        -1311 |          98.85 |
  | __ceil_sse41                     |   21404 |        1.07% |       0 |            0 |       -21404 |           0.00 |
  | __floor_sse41                    |   19179 |        0.96% |       0 |            0 |       -19179 |           0.00 |

And all of the sample count increases in MeanShiftImage can be tracked
down to the line in the cource calculating

  if ((x-floor(x)) < (ceil(x)-x))

I am not sure what to do about this, to me it seems that the
-ffp-int-builtin-inexact simply has a wrong default value, at least
for x86_64, as it was added in order not to slow code down but does
exactly that (all of the slowdown of course disappears when
-fno-fp-int-builtin-inexact is used).

Or is the situation somehow more complex?

Martin


> 
> This patch fixes these issues.  A new option
> -fno-fp-int-builtin-inexact is added to request TS 18661 rules for
> these functions; the default -ffp-int-builtin-inexact reflects that
> such exceptions are allowed by C99 and C11.  (The intention is that if
> C2x incorporates TS 18661-1, then the default would change in C2x
> mode.)
> 
> The x86 built-ins for rint (x87, SSE2 and SSE4.1) are made
> unconditionally available (no longer depending on
> -funsafe-math-optimizations or -fno-trapping-math); "inexact" is
> correct for noninteger arguments to rint.  For floor, ceil and trunc,
> the x87 and SSE2 built-ins are OK if -ffp-int-builtin-inexact or
> -fno-trapping-math (they may raise "inexact" for noninteger
> arguments); the SSE4.1 built-ins are made to use ROUND_NO_EXC so that
> they do not raise "inexact" and so are OK unconditionally.
> 
> Now, while there was no semantic reason for depending on
> -funsafe-math-optimizations, the insn patterns had such a dependence
> because of use of gen_truncxf<mode>2_i387_noop to truncate back to
> SFmode or DFmode after using frndint in XFmode.  In this case a no-op
> truncation is safe because rounding to integer always produces an
> exactly representable value (the same reason why IEEE semantics say it
> shouldn't produce "inexact") - but of course that insn pattern isn't
> safe because it would also match cases where the truncation is not in
> fact a no-op.  To allow frndint to be used for SFmode and DFmode
> without that unsafe pattern, the relevant frndint patterns are
> extended to SFmode and DFmode or new SFmode and DFmode patterns added,
> so that the frndint operation can be represented in RTL as an
> operation acting directly on SFmode or DFmode without the extension
> and the problematic truncation.
> 
> A generic test of the new option is added, as well as x86-specific
> tests, both execution tests including the generic test with different
> x86 options and scan-assembler tests verifying that functions that
> should be inlined with different options are indeed inlined.
> 
> I think other architectures are OK for TS 18661-1 semantics already.
> Considering those defining "ceil" patterns: aarch64, arm, rs6000, s390
> use instructions that do not raise "inexact"; nvptx does not support
> floating-point exceptions.  (This does mean the -f option in fact only
> affects one architecture, but I think it should still be a -f option;
> it's logically architecture-independent and is expected to be affected
> by future -std options, so is similar to e.g. -fexcess-precision=,
> which also does nothing on most architectures but is implied by -std
> options.)
> 
> Bootstrapped with no regressions on x86_64-pc-linux-gnu.  OK to
> commit?
> 
> gcc:
> 2016-05-26  Joseph Myers  <joseph@codesourcery.com>
> 
> 	PR target/71276
> 	PR target/71277
> 	* common.opt (ffp-int-builtin-inexact): New option.
> 	* doc/invoke.texi (-fno-fp-int-builtin-inexact): Document.
> 	* doc/md.texi (floor@var{m}2, btrunc@var{m}2, round@var{m}2)
> 	(ceil@var{m}2): Document dependence on this option.
> 	* ipa-inline-transform.c (inline_call): Handle
> 	flag_fp_int_builtin_inexact.
> 	* ipa-inline.c (can_inline_edge_p): Likewise.
> 	* config/i386/i386.md (rintxf2): Do not test
> 	flag_unsafe_math_optimizations.
> 	(rint<mode>2_frndint): New define_insn.
> 	(rint<mode>2): Do not test flag_unsafe_math_optimizations for 387
> 	or !flag_trapping_math for SSE.  Just use gen_rint<mode>2_frndint
> 	for 387 instead of extending and truncating.
> 	(frndintxf2_<rounding>): Test flag_fp_int_builtin_inexact ||
> 	!flag_trapping_math instead of flag_unsafe_math_optimizations.
> 	Change to frndint<mode>2_<rounding>.
> 	(frndintxf2_<rounding>_i387): Likewise.  Change to
> 	frndint<mode>2_<rounding>_i387.
> 	(<rounding_insn>xf2): Likewise.
> 	(<rounding_insn><mode>2): Test flag_fp_int_builtin_inexact ||
> 	!flag_trapping_math instead of flag_unsafe_math_optimizations for
> 	x87.  Test TARGET_ROUND || !flag_trapping_math ||
> 	flag_fp_int_builtin_inexact instead of !flag_trapping_math for
> 	SSE.  Use ROUND_NO_EXC in constant operand of
> 	gen_sse4_1_round<mode>2.  Just use gen_frndint<mode>2_<rounding>
> 	for 387 instead of extending and truncating.
> 
> gcc/testsuite:
> 2016-05-26  Joseph Myers  <joseph@codesourcery.com>
> 
> 	PR target/71276
> 	PR target/71277
> 	* gcc.dg/torture/builtin-fp-int-inexact.c,
> 	gcc.target/i386/387-builtin-fp-int-inexact.c,
> 	gcc.target/i386/387-rint-inline-1.c,
> 	gcc.target/i386/387-rint-inline-2.c,
> 	gcc.target/i386/sse2-builtin-fp-int-inexact.c,
> 	gcc.target/i386/sse2-rint-inline-1.c,
> 	gcc.target/i386/sse2-rint-inline-2.c,
> 	gcc.target/i386/sse4_1-builtin-fp-int-inexact.c,
> 	gcc.target/i386/sse4_1-rint-inline.c: New tests.
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]