This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PING**6] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
- From: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
- To: Ramana Radhakrishnan <ramana dot gcc at googlemail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, Richard Earnshaw <richard dot earnshaw at arm dot com>, Wilco Dijkstra <wilco dot dijkstra at arm dot com>
- Date: Wed, 5 Jul 2017 18:14:27 +0000
- Subject: [PING**6] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
- Authentication-results: sourceware.org; auth=none
- Authentication-results: googlemail.com; dkim=none (message not signed) header.d=none;googlemail.com; dmarc=none action=none header.from=hotmail.de;
- References: <HE1PR0701MB2169CD4BF5110F84B68E4AF9E4A40@HE1PR0701MB2169.eurprd07.prod.outlook.com> <CAJA7tRZGYttnYYCsbqFuc88jt8DySFvLY9J+1+88sfofY8Gweg@mail.gmail.com> <AM4PR0701MB2162C11BF479CD62542E2E8AE48A0@AM4PR0701MB2162.eurprd07.prod.outlook.com> <AM4PR0701MB21629F5BB692295C62E09DFCE4120@AM4PR0701MB2162.eurprd07.prod.outlook.com> <AM4PR0701MB2162C4EC4B8E114B38F714DCE4E20@AM4PR0701MB2162.eurprd07.prod.outlook.com> <bd5e03b1-860f-dd16-2030-9ce0f9a94c7c@hotmail.de> <9a0fbb5d-9909-ef4d-6871-0cb4f7971bbb@hotmail.de>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Ping...
The latest version of this patch was here:
https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01567.html
Thanks
Bernd.
On 06/14/17 14:34, Bernd Edlinger wrote:
> Ping...
>
> On 06/01/17 18:01, Bernd Edlinger wrote:
>> Ping...
>>
>> On 05/12/17 18:49, Bernd Edlinger wrote:
>>> Ping...
>>>
>>> On 04/29/17 19:45, Bernd Edlinger wrote:
>>>> Ping...
>>>>
>>>> I attached a rebased version since there was a merge conflict in
>>>> the xordi3 pattern, otherwise the patch is still identical.
>>>> It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
>>>> early when the target has no neon or iwmmxt.
>>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>>
>>>> On 11/28/16 20:42, Bernd Edlinger wrote:
>>>>> On 11/25/16 12:30, Ramana Radhakrishnan wrote:
>>>>>> On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
>>>>>> <bernd.edlinger@hotmail.de> wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> This improves the stack usage on the sha512 test case for the case
>>>>>>> without hardware fpu and without iwmmxt by splitting all di-mode
>>>>>>> patterns right while expanding which is similar to what the
>>>>>>> shift-pattern
>>>>>>> does. It does nothing in the case iwmmxt and fpu=neon or vfp as
>>>>>>> well as
>>>>>>> thumb1.
>>>>>>>
>>>>>>
>>>>>> I would go further and do this in the absence of Neon, the VFP unit
>>>>>> being there doesn't help with DImode operations i.e. we do not
>>>>>> have 64
>>>>>> bit integer arithmetic instructions without Neon. The main reason why
>>>>>> we have the DImode patterns split so late is to give a chance for
>>>>>> folks who want to do 64 bit arithmetic in Neon a chance to make this
>>>>>> work as well as support some of the 64 bit Neon intrinsics which IIRC
>>>>>> map down to these instructions. Doing this just for soft-float
>>>>>> doesn't
>>>>>> improve the default case only. I don't usually test iwmmxt and I'm
>>>>>> not
>>>>>> sure who has the ability to do so, thus keeping this restriction for
>>>>>> iwMMX is fine.
>>>>>>
>>>>>>
>>>>>
>>>>> Yes I understand, thanks for pointing that out.
>>>>>
>>>>> I was not aware what iwmmxt exists at all, but I noticed that most
>>>>> 64bit expansions work completely different, and would break if we
>>>>> split
>>>>> the pattern early.
>>>>>
>>>>> I can however only look at the assembler outout for iwmmxt, and make
>>>>> sure that the stack usage does not get worse.
>>>>>
>>>>> Thus the new version of the patch keeps only thumb1, neon and
>>>>> iwmmxt as
>>>>> it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
>>>>> for the test cases, and vfp and soft-float at around 270 bytes stack
>>>>> usage.
>>>>>
>>>>>>> It reduces the stack usage from 2300 to near optimal 272 bytes (!).
>>>>>>>
>>>>>>> Note this also splits many ldrd/strd instructions and therefore I
>>>>>>> will
>>>>>>> post a followup-patch that mitigates this effect by enabling the
>>>>>>> ldrd/strd
>>>>>>> peephole optimization after the necessary reg-testing.
>>>>>>>
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>
>>>>>> What do you mean by arm-linux-gnueabihf - when folks say that I
>>>>>> interpret it as --with-arch=armv7-a --with-float=hard
>>>>>> --with-fpu=vfpv3-d16 or (--with-fpu=neon).
>>>>>>
>>>>>> If you've really bootstrapped and regtested it on armhf, doesn't this
>>>>>> patch as it stand have no effect there i.e. no change ?
>>>>>> arm-linux-gnueabihf usually means to me someone has configured with
>>>>>> --with-float=hard, so there are no regressions in the hard float ABI
>>>>>> case,
>>>>>>
>>>>>
>>>>> I know it proves little. When I say arm-linux-gnueabihf
>>>>> I do in fact mean --enable-languages=all,ada,go,obj-c++
>>>>> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
>>>>> --with-float=hard.
>>>>>
>>>>> My main interest in the stack usage is of course not because of linux,
>>>>> but because of eCos where we have very small task stacks and in fact
>>>>> no fpu support by the O/S at all, so that patch is exactly what we
>>>>> need.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf
>>>>> Is it OK for trunk?
>>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.