This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch 14/17] [libgcc, ARM] Generalise float-to-half conversion function.


Hi James,

On 11/11/16 15:42, James Greenhalgh wrote:
Hi,

I'm adapting this patch from work started by Matthew Wahab.

Conversions from double precision floats to the ARM __fp16 are required
to round only once. A conversion function for double to __fp16 to
support this on soft-fp targets. This and the following patch add this
conversion function by reusing the exising float to __fp16 function
config/arm/fp16.c:__gnu_f2h_internal.

This patch generalizes __gnu_f2h_internal by adding a specification of
the source format and reworking the code to make use of it. Initially,
only the binary32 format is supported.

A previous version of this patch had a bug handling rounding, the update
in this patch should be sufficient to fix the bug,

replacing:

   else
     mask = 0x00001fff;
With:

      mask = (point - 1) >> 10;

I've tested that fix throwing semi-random bit-patterns at the conversion
function to confirm that the software implementation now matches the
hardware behaviour for this routine.

Additionally, bootstrapped again, and cross-tested with no issues.

OK?

Thanks,
James

----

libgcc/

2016-11-09  James Greenhalgh  <james.greenhalgh@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/fp16.c (struct format): New.
	(binary32): New.
	(__gnu_float2h_internal): New.  Body moved from
	__gnu_f2h_internal and generalize.
	(_gnu_f2h_internal): Move body to function __gnu_float2h_internal.
	Call it with binary32.


diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
index 39c863c..ba89796 100644
--- a/libgcc/config/arm/fp16.c
+++ b/libgcc/config/arm/fp16.c
@@ -22,40 +22,74 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
+struct format
+{
+  /* Number of bits.  */
+  unsigned long long size;
+  /* Exponent bias.  */
+  unsigned long long bias;
+  /* Exponent width in bits.  */
+  unsigned long long exponent;
+  /* Significand precision in explicitly stored bits.  */
+  unsigned long long significand;
+};
+
+static const struct format
+binary32 =
+{
+  32,   /* size.  */
+  127,  /* bias.  */
+  8,    /* exponent.  */
+  23    /* significand.  */
+};
+
 static inline unsigned short
-__gnu_f2h_internal(unsigned int a, int ieee)
+__gnu_float2h_internal (const struct format* fmt,
+			unsigned long long a, int ieee)
 {
-  unsigned short sign = (a >> 16) & 0x8000;
-  int aexp = (a >> 23) & 0xff;
-  unsigned int mantissa = a & 0x007fffff;
-  unsigned int mask;
-  unsigned int increment;
+  unsigned long long point = 1ULL << fmt->significand;;

Trailing ';'.

<...>
@@ -93,7 +127,13 @@ __gnu_f2h_internal(unsigned int a, int ieee)
/* We leave the leading 1 in the mantissa, and subtract one
      from the exponent bias to compensate.  */
-  return sign | (((aexp + 14) << 10) + (mantissa >> 13));
+  return sign | (((aexp + 14) << 10) + (mantissa >> (fmt->significand - 10)));
+}

I suppose I'm not very familiar with the soft-fp code but I don't see at a glance how
the comment relates to the operation it's above of (where is the 'one' being subtracted
from the bias?). If you want to improve that comment or give me a quick explanation of why
the code does what it says it does it would be appreciated.

I've gone through the generalisation and it looks correct to me.
So given that you have put this through the testing you say you did this is ok with the nits
above addressed.

Thanks,
Kyrill



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]