]PATCH][RFC] Initial patch for better performance of 64-bit math instructions in 32-bit mode on x86-64
Yuri Rumyantsev
ysrumyan@gmail.com
Tue May 31 15:59:00 GMT 2016
Hi Uros,
Here is initial patch to improve performance of 64-bit integer
arithmetic in 32-bit mode. We discovered that gcc is significantly
behind icc and clang on rsa benchmark from eembc2.0 suite.
Te problem function looks like
typedef unsigned long long ull;
typedef unsigned long ul;
ul mul_add(ul *rp, ul *ap, int num, ul w)
{
ul c1=0;
ull t;
for (;;)
{
{ t=(ull)w * ap[0] + rp[0] + c1;
rp[0]= ((ul)t)&0xffffffffL; c1= ((ul)((t)>>32))&(0xffffffffL); };
if (--num == 0) break;
{ t=(ull)w * ap[1] + rp[1] + c1;
rp[1]= ((ul)(t))&(0xffffffffL); c1= (((ul)((t)>>32))&(0xffffffffL)); };
if (--num == 0) break;
{ t=(ull)w * ap[2] + rp[2] + c1;
rp[2]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); };
if (--num == 0) break;
{ t=(ull)w * ap[3] + rp[3] + c1;
rp[3]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); };
if (--num == 0) break;
ap+=4;
rp+=4;
}
return(c1);
}
If we apply patch below we will get +6% speed-up for rsa on Silvermont.
The patch looks loke (not complete since there are other 64-bit
instructions e.g. subtraction):
Index: i386.md
===================================================================
--- i386.md (revision 236181)
+++ i386.md (working copy)
@@ -5439,7 +5439,7 @@
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (PLUS, <DWI>mode, operands)"
"#"
- "reload_completed"
+ "1"
[(parallel [(set (reg:CCC FLAGS_REG)
(compare:CCC
(plus:DWIH (match_dup 1) (match_dup 2))
What is your opinion?
More information about the Gcc-patches
mailing list