This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ]PATCH][RFC] Initial patch for better performance of 64-bit math instructions in 32-bit mode on x86-64


2016-05-31 19:15 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> On Tue, May 31, 2016 at 5:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Uros,
>>
>> Here is initial patch to improve performance of 64-bit integer
>> arithmetic in 32-bit mode. We discovered that gcc is significantly
>> behind icc and clang on rsa benchmark from eembc2.0 suite.
>> Te problem function looks like
>> typedef unsigned long long ull;
>> typedef unsigned long ul;
>> ul mul_add(ul *rp, ul *ap, int num, ul w)
>>  {
>>  ul c1=0;
>>  ull t;
>>  for (;;)
>>   {
>>   { t=(ull)w * ap[0] + rp[0] + c1;
>>    rp[0]= ((ul)t)&0xffffffffL; c1= ((ul)((t)>>32))&(0xffffffffL); };
>>   if (--num == 0) break;
>>   { t=(ull)w * ap[1] + rp[1] + c1;
>>    rp[1]= ((ul)(t))&(0xffffffffL); c1= (((ul)((t)>>32))&(0xffffffffL)); };
>>   if (--num == 0) break;
>>   { t=(ull)w * ap[2] + rp[2] + c1;
>>    rp[2]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); };
>>   if (--num == 0) break;
>>   { t=(ull)w * ap[3] + rp[3] + c1;
>>    rp[3]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); };
>>   if (--num == 0) break;
>>   ap+=4;
>>   rp+=4;
>>   }
>>  return(c1);
>>  }
>>
>> If we apply patch below we will get +6% speed-up for rsa on Silvermont.
>>
>> The patch looks loke (not complete since there are other 64-bit
>> instructions e.g. subtraction):
>>
>> Index: i386.md
>> ===================================================================
>> --- i386.md     (revision 236181)
>> +++ i386.md     (working copy)
>> @@ -5439,7 +5439,7 @@
>>     (clobber (reg:CC FLAGS_REG))]
>>    "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)"
>>    "#"
>> -  "reload_completed"
>> +  "1"
>>    [(parallel [(set (reg:CCC FLAGS_REG)
>>                    (compare:CCC
>>                      (plus:DWIH (match_dup 1) (match_dup 2))
>>
>> What is your opinion?
>
> This splitter doesn't depend on hard registers, so there is no
> technical obstacle for the proposed patch. OTOH, this is a very old
> splitter, it is possible that it was introduced to handle some of
> reload deficiencies. Maybe Jeff knows something about this approach.
> We have LRA now, so perhaps we have to rethink the purpose of these
> DImode splitters.

The change doesn't spoil splitter for hard register case and therefore
splitter still should be able to handle any reload deficiencies.  I think
we should try to split all instructions working on multiword registers
(not only PLUS case) at earlier passes to allow more optimizations on
splitted code and relax registers allocation (now we need to allocate
consequent registers).  Probably make a separate split right after STV?
This should help with PR70321.

Thanks,
Ilya

>
> A pragmatic approach would be - if the patch shows measurable benefit,
> and doesn't introduce regressions, then Stage 1 is the time to try it.
>
> BTW: Use "&&  1" in the split condition of the combined insn_and_split
> pattern to copy the enable condition from the insn part. If there is
> no condition, you should just use "".
>
> Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]