This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [001/nnn] poly_int: add poly-int.h

From: Martin Sebor <msebor at gmail dot com>
To: gcc-patches at gcc dot gnu dot org, richard dot sandiford at linaro dot org
Date: Thu, 9 Nov 2017 10:20:50 -0700
Subject: Re: [001/nnn] poly_int: add poly-int.h
Authentication-results: sourceware.org; auth=none
References: <871sltvm7r.fsf@linaro.org> <87vaj5u7id.fsf@linaro.org> <acd0c3a8-f9cf-0a86-84a2-3e0dbdf9c6e8@gmail.com> <87po8t5d55.fsf@linaro.org> <1b724f75-7515-08f7-a783-d86fb1fe8bf8@gmail.com> <87375oww6b.fsf@linaro.org> <52db2aa0-43cb-ab73-d5ff-0347a4cfee10@gmail.com> <87r2t8pqv3.fsf@linaro.org> <cdfb5964-8639-933a-00b3-c129c9a51173@gmail.com> <87lgjfpv78.fsf@linaro.org>

On 11/09/2017 04:06 AM, Richard Sandiford wrote:

Martin Sebor <msebor@gmail.com> writes:

On 11/08/2017 11:28 AM, Richard Sandiford wrote:

Martin Sebor <msebor@gmail.com> writes:

On 11/08/2017 09:51 AM, Richard Sandiford wrote:

Martin Sebor <msebor@gmail.com> writes:

On 11/08/2017 02:32 AM, Richard Sandiford wrote:

Martin Sebor <msebor@gmail.com> writes:

I haven't done nearly a thorough review but the dtor followed by
the placement new in the POLY_SET_COEFF() macro caught my eye so
I thought I'd ask sooner rather than later.  Given the macro
definition:

+   The dummy comparison against a null C * is just a way of checking
+   that C gives the right type.  */
+#define POLY_SET_COEFF(C, RES, I, VALUE) \
+  ((void) (&(RES).coeffs[0] == (C *) 0), \
+   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
+   ? (void) ((RES).coeffs[I] = VALUE) \
+   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))

is the following use well-defined?

+template<unsigned int N, typename C>
+inline poly_int_pod<N, C>&
+poly_int_pod<N, C>::operator <<= (unsigned int a)
+{
+  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);

It looks to me as though the VALUE argument in the ctor invoked
by the placement new expression is evaluated after the dtor has
destroyed the very array element the VALUE argument expands to.


Good catch!  It should simply have been doing <<= on each coefficient --
I must have got carried away when converting to POLY_SET_COEFF.

I double-checked the other uses and think that's the only one.

Whether or not is, in fact, a problem, it seems to me that using
a function template rather than a macro would be a clearer and
safer way to do the same thing.  (Safer in that the macro also
evaluates its arguments multiple times, which is often a source
of subtle bugs.)


That would slow down -O0 builds though, by introducing an extra
function call and set of temporaries even when the coefficients
are primitive integers.


Would decorating the function template with attribute always_inline
help?


It would remove the call itself, but we'd still have the extra temporary
objects that were the function argument and return value.


Sorry, I do not want to get into another long discussion about
trade-offs between safety and efficiency but I'm not sure I see
what extra temporaries it would create.  It seems to me that
an inline function template that took arguments of user-defined
types by reference and others by value should be just as efficient
as a macro.

 From GCC's own manual:

   6.43 An Inline Function is As Fast As a Macro
   https://gcc.gnu.org/onlinedocs/gcc/Inline.html


You can see the difference with something like:

  inline
  void __attribute__((always_inline))
  f(int &dst, const int &src) { dst = src; }

  int g1(const int &y) { int x; f(x, y); return x; }
  int g2(const int &y) { int x; x = y; return x; }


Let me say at the outset that I struggle to comprehend that a few
instructions is even a consideration when not optimizing, especially
in light of the bug the macro caused that would have been prevented
by using a function instead.  But...


Many people still build at -O0 though.  One of the things I was asked
for was the time it takes to build stage 2 with an -O0 stage 1
(where stage 1 would usually be built by the host compiler).


Yes, of course.  I do all my development and basic testing at
-O0.  But I don't expect the performance to be comparable to
-O2.  I'd be surprised if anyone did.  What I do expect at -O0
(and what I'm grateful for) is GCC to make it easier to find
bugs in my code by enabling extra checks, even if they come at
the expense of slower execution.

That said, if your enhancement has such dramatic performance
implications at -O0 that the only way to avoid them is by using
macros then I would say it's not appropriate.

...I don't think your example above is representative of using
the POLY_SET_COEFF macro.  The function template I'm suggesting
might look something to this:

   template <unsigned N, class C>
   inline void __attribute__ ((always_inline))
   poly_set_coeff (poly_int_pod<N, C> *p, unsigned idx, C val)
   {
     ((void) (&(*p).coeffs[0] == (C *) 0),
wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void)
((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new
(&(*p).coeffs[0]) C (val)));

     if (N >= 2)
       for (unsigned int i = 1; i < N; i++)
         ((void) (&(*p).coeffs[0] == (C *) 0),
wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void)
((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new
(&(*p).coeffs[i]) C (val)));
   }


That ignores the idx parameter and sets all coefficents to val.  Did you
mean somnething like:

   template <unsigned N, typename C1, typename C2>
   inline void __attribute__ ((always_inline))
   poly_set_coeff (poly_int_pod<N, C1> *p, unsigned idx, C2 val)
   {
     wi::int_traits<C1>::precision_type == wi::FLEXIBLE_PRECISION ? (void) ((*p).coeffs[idx] = val) : (void) ((*p).coeffs[idx].~C1 (), new (&(*p).coeffs[idx]) C1 (val));
   }

?  If so...


Yes, I didn't have it quite right.  With the above there's
a difference of three x86_64 mov instructions at -O0.  The code
at -O2 is still identical.

To compare apples to apples I suggest to instead compare the shift
operator (or any other poly_int function that uses the macro) that
doesn't suffer from the bug vs one that makes use of the function
template.  I see a difference of 2 instructions on x86_64 (21 vs
23) for operator<<=.

Are two assembly instructions even worth talking about?


...the problem is that passing C by value defeats the point of the
optimisation:

  /* RES is a poly_int result that has coefficients of type C and that
     is being built up a coefficient at a time.  Set coefficient number I
     to VALUE in the most efficient way possible.

     For primitive C it is better to assign directly, since it avoids
     any further calls and so is more efficient when the compiler is
     built at -O0.  But for wide-int based C it is better to construct
     the value in-place.  This means that calls out to a wide-int.cc
     routine can take the address of RES rather than the address of
     a temporary.

With the inline function, the wide-int.cc routines will be taking
the address of the temporary "val" object, which will then be used
to initialise the target object via a copy.  The macro was there
to avoid the copy.


There are many ways to write code that is both safe and efficient
that don't involve resorting to convoluted, error-prone macros.
I don't quite see how passing a fundamental type by value to
to an inline function can be a bottleneck.  If C can be both
a fundamental type or an aggregate or then the aggregate case
can be optimized by providing an overload or specialization.
But an answer that "the only way to write the code is by using
a macro" cannot possibly be acceptable (maybe 20 years ago but
not today).

But the use of this inline function in <<= would be just as incorrect as
using the macro.

[These are all sizes for normally-optimised release builds]


We seem to have very different priorities.  I make mistakes all
the time and so I need all the help I can get to find problems
in my code.  Microoptimizations that make it easier to get
things wrong make debugging harder.  In both of the cases we
have discussed -- the no-op ctor and the macro this has actually
happened, to both of us.  -O0 is for meant for development and
should make it easy to avoid mistakes (as all the GCC checking
tries to do).  If the overhead of an always-inline function is
unacceptable when optimizing then use a macro if you must but
only then and please at least make the -O0 default safe.  And
open a bug to get the inefficiency fixed.  Otherwise we can't
really claim that An Inline Function is As Fast As a Macro.

Martin

References:
- Re: [001/nnn] poly_int: add poly-int.h
  - From: Richard Sandiford
- Re: [001/nnn] poly_int: add poly-int.h
  - From: Martin Sebor
- Re: [001/nnn] poly_int: add poly-int.h
  - From: Richard Sandiford
- Re: [001/nnn] poly_int: add poly-int.h
  - From: Martin Sebor
- Re: [001/nnn] poly_int: add poly-int.h
  - From: Richard Sandiford
- Re: [001/nnn] poly_int: add poly-int.h
  - From: Martin Sebor
- Re: [001/nnn] poly_int: add poly-int.h
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]