]PATCH][RFC] Initial patch for better performance of 64-bit math instructions in 32-bit mode on x86-64

Yuri Rumyantsev ysrumyan@gmail.com
Tue May 31 15:59:00 GMT 2016


Hi Uros,

Here is initial patch to improve performance of 64-bit integer
arithmetic in 32-bit mode. We discovered that gcc is significantly
behind icc and clang on rsa benchmark from eembc2.0 suite.
Te problem function looks like
typedef unsigned long long ull;
typedef unsigned long ul;
ul mul_add(ul *rp, ul *ap, int num, ul w)
 {
 ul c1=0;
 ull t;
 for (;;)
  {
  { t=(ull)w * ap[0] + rp[0] + c1;
   rp[0]= ((ul)t)&0xffffffffL; c1= ((ul)((t)>>32))&(0xffffffffL); };
  if (--num == 0) break;
  { t=(ull)w * ap[1] + rp[1] + c1;
   rp[1]= ((ul)(t))&(0xffffffffL); c1= (((ul)((t)>>32))&(0xffffffffL)); };
  if (--num == 0) break;
  { t=(ull)w * ap[2] + rp[2] + c1;
   rp[2]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); };
  if (--num == 0) break;
  { t=(ull)w * ap[3] + rp[3] + c1;
   rp[3]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); };
  if (--num == 0) break;
  ap+=4;
  rp+=4;
  }
 return(c1);
 }

If we apply patch below we will get +6% speed-up for rsa on Silvermont.

The patch looks loke (not complete since there are other 64-bit
instructions e.g. subtraction):

Index: i386.md
===================================================================
--- i386.md     (revision 236181)
+++ i386.md     (working copy)
@@ -5439,7 +5439,7 @@
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)"
   "#"
-  "reload_completed"
+  "1"
   [(parallel [(set (reg:CCC FLAGS_REG)
                   (compare:CCC
                     (plus:DWIH (match_dup 1) (match_dup 2))

What is your opinion?



More information about the Gcc-patches mailing list