From sandra@codesourcery.com Tue May 1 04:12:00 2018 From: sandra@codesourcery.com (Sandra Loosemore) Date: Tue, 01 May 2018 04:12:00 -0000 Subject: "position independent" vs "position-independent" in documentation In-Reply-To: <20180430115654.GK20930@redhat.com> References: <20180430115654.GK20930@redhat.com> Message-ID: <646c8ca5-73bb-7566-abf3-a03c47a02140@codesourcery.com> On 04/30/2018 05:56 AM, Jonathan Wakely wrote: > Should we standardize on "position-independent" and add it to > https://gcc.gnu.org/codingconventions.html#Spelling ? The same generic English usage rules apply here as to other compound phrases; hyphenate when immediately before a noun, don't hyphenate in other contexts. So "the compiler generates position-independent code" and "the compiler generates code that is position independent" are both correct. However, I don't think it's common to use "position independent" (hyphenated or not) except as a modifier for "code", so you could add "position-independent code" (rather than just "position-independent") to the glossary. -Sandra From andrewm.roberts@sky.com Tue May 1 04:42:00 2018 From: andrewm.roberts@sky.com (Andrew Roberts) Date: Tue, 01 May 2018 04:42:00 -0000 Subject: gcc 8.0.1 RC documentation broken Message-ID: <4aba19fd-3e5d-8a81-e342-ec170befa672@sky.com> I filed a bug (85578) about the documentation in: gcc-8.0.1-RC-20180427/INSTALL being broken (links not working). I filed this under 'web' as I couldn't see any documentation component. It doesn't appear to have been looked at, so just wanted to flag it up before the release tomorrow. From jakub@redhat.com Tue May 1 07:27:00 2018 From: jakub@redhat.com (Jakub Jelinek) Date: Tue, 01 May 2018 07:27:00 -0000 Subject: Broken links in INSTALL/specific.html Message-ID: <20180501072701.GJ8577@tucnak> Hi! PR web/85578 complains about broken links in INSTALL/specific.html inside of the rc tarballs, I've looked at past releases and at least the releases I've checked (4.7.0, 6.1, 7.1, 7.3, 8.1rc2) all have the broken links, e.g. aarch64*-*-* and

aarch64*-*-*

Looking at online docs, they are ok. I think this has been fixed for the online docs with: Index: preprocess =================================================================== RCS file: /cvs/gcc/wwwdocs/bin/preprocess,v retrieving revision 1.38 retrieving revision 1.39 diff -u -p -r1.38 -r1.39 --- preprocess 28 Aug 2003 13:05:38 -0000 1.38 +++ preprocess 5 Sep 2004 21:50:02 -0000 1.39 @@ -144,7 +144,10 @@ process_file() cat $STYLE > $TMPDIR/input printf '\n' `pwd` >> $TMPDIR/input cat $f >> $TMPDIR/input - ${MHC} $TMPDIR/input > $TMPDIR/output + # Use sed to work around makeinfo 4.7 brokenness. + ${MHC} $TMPDIR/input \ + | sed -e 's/_002d/-/g' -e 's/_002a/*/g' \ + > $TMPDIR/output # Copy the page only if it's new or there has been a change, and, # first of all, if there was no problem when running MetaHTML. revision 1.39 date: 2004/09/05 21:50:02; author: gerald; state: Exp; lines: +4 -1 Use sed to work around makeinfo 4.7 brokenness. Isn't this something we should be doing in gcc/doc/install.texi2html too (or somewhere else)? Bugzilla is down, so can't discuss it there... Jakub From marketing@iccsat.com Tue May 1 07:34:00 2018 From: marketing@iccsat.com (=?utf-8?Q?ICCSAT?=) Date: Tue, 01 May 2018 07:34:00 -0000 Subject: =?utf-8?Q?Fast=20and=20Unlimited=20Satellite=20Internet=20Subscription?= Message-ID: http://iccsat.com?&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** ICCSAT (http://www.iccsat.com?&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ We continue to expand across a broad range of industries as we leverage our expertise and advanced technology to meet the varied needs of these customers from remote offices, support mobile connectivity across land, sea and air, providing high-speed broadband access anywhere in the world. ** ICCSAT SERVICES ------------------------------------------------------------ http://www.iccsat.net/?page_id=10491&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** Point to Point (http://www.iccsat.net/?page_id=10491&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ Link is a dedicated link that connects exactly two communication facilities. http://www.iccsat.net/?page_id=10507&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** Point to Multipoint (http://www.iccsat.net/?page_id=10507&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ Communication which is accomplished via a specific and distinct type of multipoint connection. ** 100% Secured ------------------------------------------------------------ Enjoy and have a peace of mind with our 100% Secured Connection services - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - http://www.iccsat.net/?page_id=10446&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** SCPC/SCPC (http://www.iccsat.net/?page_id=10446&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ Cost effective solution for broadband connectivity, can be mounted on smaller vehicles. http://www.iccsat.net/?page_id=10512&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** SCPC/DVB (http://www.iccsat.net/?page_id=10512&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ SCPC-DVB Dedicated Services SCPC/DVB is widely used for access to internet services. http://www.iccsat.net/?page_id=10515&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** Mesh Service (http://www.iccsat.net/?page_id=10515&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ A type of networking wherein each site in the network may act as an independent router. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - http://www.iccsat.net/?page_id=10509&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** Internet Services (http://www.iccsat.net/?page_id=10509&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ Provide internet broadband connection to the end customer based on their requirements. http://www.iccsat.net/?page_id=10509&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** IP Connectivity (http://www.iccsat.net/?page_id=10509&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ Provide internet broadband connection to the end customer based on their requirements. http://www.iccsat.net/?page_id=10446&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31 ** Mobile VSAT (http://www.iccsat.net/?page_id=10446&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ------------------------------------------------------------ Cost effective solution for broadband connectivity, can be mounted on smaller vehicles. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ICCSAT Jeddah, Kingdom Saudi Arabia. Tel.: +966 92 000 1445 Fax: +966 12 6067356 Email: sales@iccsat.com www.iccsat.com SHARE (https://www.facebook.com/sharer/sharer.php?u=preview.mailerlite.com/l0b4r4&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) FORWARD (?&utm_source=newsletter&utm_medium=email&utm_campaign=vsat_satellite_service_data_voice_internet&utm_term=2017-10-31) ** Like on Facebook: ------------------------------------------------------------ ** VSAT Satellite Service Data - Voice - Internet ------------------------------------------------------------ This email was sent to gcc@gcc.gnu.org (mailto:gcc@gcc.gnu.org) why did I get this? (https://4ntelecom.us17.list-manage.com/about?u=f714fbd47cdf592d2c963afc5&id=b416cecabb&e=74f394ded9&c=48b2726943) unsubscribe from this list (https://4ntelecom.us17.list-manage.com/unsubscribe?u=f714fbd47cdf592d2c963afc5&id=b416cecabb&e=74f394ded9&c=48b2726943) update subscription preferences (https://4ntelecom.us17.list-manage.com/profile?u=f714fbd47cdf592d2c963afc5&id=b416cecabb&e=74f394ded9) 4NTELECOMMUNICATION . Saudi Arabia. Jeddah. Tahlia Corniche Road. Villa no. 3 . Jeddah 21332 . Saudi Arabia From andrewm.roberts@sky.com Tue May 1 09:43:00 2018 From: andrewm.roberts@sky.com (Andrew Roberts) Date: Tue, 01 May 2018 09:43:00 -0000 Subject: Broken links in INSTALL/specific.html In-Reply-To: <20180501072701.GJ8577@tucnak> References: <20180501072701.GJ8577@tucnak> Message-ID: <2d2fc215-d59e-77a5-9b3e-793c0a23678d@sky.com> The reason I was looking at the versions in the RC tarball was that I have never been clear as to what release the website install/prerequisite/target info actually applies to. It would be much better if this info was on a per release basis on the web site, like the changelog and manuals. Thus any target specific stuff which was removed wouldn't affect older releases documentation. Speaking of manuals, it might be worth documenting the make commands for the documentation (make html, install-html and pdf, install-pdf) on the build.html page. The documentation can only get checked before the release if people are aware how to build it. Andrew On 01/05/18 08:27, Jakub Jelinek wrote: > Hi! > > PR web/85578 complains about broken links in INSTALL/specific.html inside of > the rc tarballs, I've looked at past releases and at least the releases I've > checked (4.7.0, 6.1, 7.1, 7.3, 8.1rc2) all have the broken links, > e.g. > aarch64*-*-* > and > >

aarch64*-*-*

> Looking at online docs, they are ok. > > I think this has been fixed for the online docs with: > Index: preprocess > =================================================================== > RCS file: /cvs/gcc/wwwdocs/bin/preprocess,v > retrieving revision 1.38 > retrieving revision 1.39 > diff -u -p -r1.38 -r1.39 > --- preprocess 28 Aug 2003 13:05:38 -0000 1.38 > +++ preprocess 5 Sep 2004 21:50:02 -0000 1.39 > @@ -144,7 +144,10 @@ process_file() > cat $STYLE > $TMPDIR/input > printf '\n' `pwd` >> $TMPDIR/input > cat $f >> $TMPDIR/input > - ${MHC} $TMPDIR/input > $TMPDIR/output > + # Use sed to work around makeinfo 4.7 brokenness. > + ${MHC} $TMPDIR/input \ > + | sed -e 's/_002d/-/g' -e 's/_002a/*/g' \ > + > $TMPDIR/output > > # Copy the page only if it's new or there has been a change, and, > # first of all, if there was no problem when running MetaHTML. > > revision 1.39 > date: 2004/09/05 21:50:02; author: gerald; state: Exp; lines: +4 -1 > Use sed to work around makeinfo 4.7 brokenness. > > Isn't this something we should be doing in gcc/doc/install.texi2html > too (or somewhere else)? > > Bugzilla is down, so can't discuss it there... > > Jakub > From jwakely@redhat.com Tue May 1 10:17:00 2018 From: jwakely@redhat.com (Jonathan Wakely) Date: Tue, 01 May 2018 10:17:00 -0000 Subject: "position independent" vs "position-independent" in documentation In-Reply-To: <646c8ca5-73bb-7566-abf3-a03c47a02140@codesourcery.com> References: <20180430115654.GK20930@redhat.com> <646c8ca5-73bb-7566-abf3-a03c47a02140@codesourcery.com> Message-ID: <20180501101732.GM20930@redhat.com> On 30/04/18 22:12 -0600, Sandra Loosemore wrote: >On 04/30/2018 05:56 AM, Jonathan Wakely wrote: >>Should we standardize on "position-independent" and add it to >>https://gcc.gnu.org/codingconventions.html#Spelling ? > >The same generic English usage rules apply here as to other compound >phrases; hyphenate when immediately before a noun, don't hyphenate in >other contexts. So "the compiler generates position-independent code" >and "the compiler generates code that is position independent" are >both correct. Or we could decide that "position-independent" should always be hyphenated as it's a commonly established adjective. I'm not arguing in favour of that, but it's how it's used in some of the docs already. >However, I don't think it's common to use "position >independent" (hyphenated or not) except as a modifier for "code", so >you could add "position-independent code" (rather than just >"position-independent") to the glossary. We have several of uses of "position independent executable" and one "position independent data" which should be hyphenated. Ignoring all the uses of "position-independent code" that already have a hyphen we're left with the following and none of them is correct! -- gcc/doc/invoke.texi-@opindex pie gcc/doc/invoke.texi:Produce a dynamically linked position independent executable on targets gcc/doc/invoke.texi-that support it. For predictable results, you must also specify the same -- gcc/doc/invoke.texi-@opindex no-pie gcc/doc/invoke.texi:Don't produce a dynamically linked position independent executable. gcc/doc/invoke.texi- -- gcc/doc/invoke.texi-@opindex static-pie gcc/doc/invoke.texi:Produce a static position independent executable on targets that support gcc/doc/invoke.texi:it. A static position independent executable is similar to a static gcc/doc/invoke.texi-executable, but can be loaded at any address without a dynamic linker. -- gcc/doc/invoke.texi-but not for the Sun 386i. Code generated for the IBM RS/6000 is always gcc/doc/invoke.texi:position-independent. gcc/doc/invoke.texi- -- gcc/doc/invoke.texi-Generate code that does not use a global pointer register. The result gcc/doc/invoke.texi:is not position independent code, and violates the IA-64 ABI@. gcc/doc/invoke.texi- -- gcc/doc/invoke.texi-@itemx -mno-shared gcc/doc/invoke.texi:Generate (do not generate) code that is fully position-independent, gcc/doc/invoke.texi-and that can therefore be linked into shared libraries. This option -- gcc/doc/invoke.texi- gcc/doc/invoke.texi:All @option{-mabicalls} code has traditionally been position-independent, gcc/doc/invoke.texi-regardless of options like @option{-fPIC} and @option{-fpic}. However, -- gcc/doc/invoke.texi-@opindex mno-pid gcc/doc/invoke.texi:Enables the generation of position independent data. When enabled any gcc/doc/invoke.texi-access to constant data is done via an offset from a base address -- gcc/doc/md.texi-Constant for arithmetic/logical operations. gcc/doc/md.texi:This is like @code{i}, except that for position independent code, gcc/doc/md.texi-no symbols / expressions needing relocations are allowed. -- gcc/doc/tm.texi-* Sections:: Dividing storage into text, data, and other sections. gcc/doc/tm.texi:* PIC:: Macros for position independent code. gcc/doc/tm.texi-* Assembler Format:: Defining how to write insns and pseudo-ops to output. -- gcc/doc/tm.texi-@section Position Independent Code gcc/doc/tm.texi:@cindex position independent code gcc/doc/tm.texi-@cindex PIC -- gcc/doc/tm.texi-A C expression that is nonzero if @var{x} is a legitimate immediate gcc/doc/tm.texi:operand on the target machine when generating position independent code. gcc/doc/tm.texi-You can assume that @var{x} satisfies @code{CONSTANT_P}, so you need not -- gcc/doc/tm.texi-(including @code{SYMBOL_REF}) can be immediate operands when generating gcc/doc/tm.texi:position independent code. gcc/doc/tm.texi-@end defmac -- gcc/doc/tm.texi.in-* Sections:: Dividing storage into text, data, and other sections. gcc/doc/tm.texi.in:* PIC:: Macros for position independent code. gcc/doc/tm.texi.in-* Assembler Format:: Defining how to write insns and pseudo-ops to output. -- gcc/doc/tm.texi.in-@section Position Independent Code gcc/doc/tm.texi.in:@cindex position independent code gcc/doc/tm.texi.in-@cindex PIC -- gcc/doc/tm.texi.in-A C expression that is nonzero if @var{x} is a legitimate immediate gcc/doc/tm.texi.in:operand on the target machine when generating position independent code. gcc/doc/tm.texi.in-You can assume that @var{x} satisfies @code{CONSTANT_P}, so you need not -- gcc/doc/tm.texi.in-(including @code{SYMBOL_REF}) can be immediate operands when generating gcc/doc/tm.texi.in:position independent code. gcc/doc/tm.texi.in-@end defmac From jwakely.gcc@gmail.com Tue May 1 10:29:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Tue, 01 May 2018 10:29:00 -0000 Subject: gcc 8.0.1 RC documentation broken In-Reply-To: <4aba19fd-3e5d-8a81-e342-ec170befa672@sky.com> References: <4aba19fd-3e5d-8a81-e342-ec170befa672@sky.com> Message-ID: On 1 May 2018 at 05:42, Andrew Roberts wrote: > I filed this under 'web' as I couldn't see any documentation component. It > doesn't appear to have been looked at, There's a "documentation" keyword instead, but I'm often not sure which component doc bugs should be filed under. From maxim.kuvyrkov@linaro.org Tue May 1 13:04:00 2018 From: maxim.kuvyrkov@linaro.org (Maxim Kuvyrkov) Date: Tue, 01 May 2018 13:04:00 -0000 Subject: Stack protector: leak of guard's address on stack In-Reply-To: <87muxm2rny.fsf@mid.deneb.enyo.de> References: <20180427121601.GT8577@tucnak> <20180427122204.GU8577@tucnak> <20180427133845.GV8577@tucnak> <87y3h76vig.fsf@mid.deneb.enyo.de> <94B2316C-48EA-41AC-AED6-C7ACBBD628FE@linaro.org> <87muxm2rny.fsf@mid.deneb.enyo.de> Message-ID: <7132E024-182D-4A0F-859C-299CC9B5DBA8@linaro.org> > On Apr 29, 2018, at 2:11 PM, Florian Weimer wrote: > > * Maxim Kuvyrkov: > >>> On Apr 28, 2018, at 9:22 PM, Florian Weimer wrote: >>> >>> * Thomas Preudhomme: >>> >>>> Yes absolutely, CSE needs to be avoided. I made memory access volatile >>>> because the change was easier to do. Also on Arm Thumb-1 computing the >>>> guard's address itself takes several loads so had to modify some more >>>> patterns. Anyway, regardless of the proper fix, do you have any objection >>>> to raising a CVE for that issue? >>> >>> Please file a bug in Bugzilla first and use that in the submission to >>> MITRE. >> >> Thomas filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85434 couple >> of weeks ago. > > Is there a generic way to find other affected targets? > > If we only plan to fix 32-bit Arm, we should make the CVE identifier > specific to that, to avoid confusion. The problem is fairly target-dependent, so architecture maintainers need to look at how stack-guard canaries and their addresses are handled and whether they can be spilled onto stack. It appears we need to poll architecture maintainers before filing the CVE. -- Maxim Kuvyrkov www.linaro.org From fw@deneb.enyo.de Tue May 1 13:37:00 2018 From: fw@deneb.enyo.de (Florian Weimer) Date: Tue, 01 May 2018 13:37:00 -0000 Subject: Stack protector: leak of guard's address on stack In-Reply-To: <7132E024-182D-4A0F-859C-299CC9B5DBA8@linaro.org> (Maxim Kuvyrkov's message of "Tue, 1 May 2018 16:04:34 +0300") References: <20180427121601.GT8577@tucnak> <20180427122204.GU8577@tucnak> <20180427133845.GV8577@tucnak> <87y3h76vig.fsf@mid.deneb.enyo.de> <94B2316C-48EA-41AC-AED6-C7ACBBD628FE@linaro.org> <87muxm2rny.fsf@mid.deneb.enyo.de> <7132E024-182D-4A0F-859C-299CC9B5DBA8@linaro.org> Message-ID: <87604733ac.fsf@mid.deneb.enyo.de> * Maxim Kuvyrkov: > The problem is fairly target-dependent, so architecture maintainers > need to look at how stack-guard canaries and their addresses are > handled and whether they can be spilled onto stack. > > It appears we need to poll architecture maintainers before filing the CVE. One CVE ID by identified affected architecture would work as well. MITRE cares about affected software *versions* as well, and since the targets were added at different GCC versions (or stack protector support was added), the CVE IDs should be split in most cases anyway. From freddie_chopin@op.pl Tue May 1 18:38:00 2018 From: freddie_chopin@op.pl (Freddie Chopin) Date: Tue, 01 May 2018 18:38:00 -0000 Subject: Second GCC 8.1 Release Candidate available from gcc.gnu.org In-Reply-To: <20180427213950.GA8577@tucnak> References: <20180427213950.GA8577@tucnak> Message-ID: <8aa3c845c6eaff74cd7ebff731afe4d55bdc67de.camel@op.pl> On Fri, 2018-04-27 at 23:39 +0200, Jakub Jelinek wrote: > The second release candidate for GCC 8.1 is available from > > ftp://gcc.gnu.org/pub/gcc/snapshots/8.0.1-RC-20180427 > > and shortly its mirrors. It has been generated from SVN revision > 259731. > > I have so far bootstrapped and tested the release candidate on > x86_64-linux and i686-linux. Please test it and report any issues to > bugzilla. > > If all goes well, I'd like to release 8.1 on Wednesday, May 2nd. Hello! I would like to bring this strange issue to your attention too. https://sourceware.org/bugzilla/show_bug.cgi?id=23126#c3 This is not (at least not entirely) a GCC problem, but with previous versions (back to GCC 5 at least) it was working fine, because GCC did not explicitly set `.arch` directive in assembly files - just `.cpu`. Regards, FCh From chengniansun@gmail.com Tue May 1 19:53:00 2018 From: chengniansun@gmail.com (Chengnian Sun) Date: Tue, 01 May 2018 19:53:00 -0000 Subject: Should GCC emit the same code for compilation with '-g' and without '-g' Message-ID: Hi, Does gcc have a requirement about the impact of emitting debug info on the generated code? Should the code be the same no matter whether '-g' is specified? Thank you. -- Best Regards. Chengnian From jakub@redhat.com Tue May 1 19:58:00 2018 From: jakub@redhat.com (Jakub Jelinek) Date: Tue, 01 May 2018 19:58:00 -0000 Subject: Should GCC emit the same code for compilation with '-g' and without '-g' In-Reply-To: References: Message-ID: <20180501195759.GP8577@tucnak> On Tue, May 01, 2018 at 12:53:45PM -0700, Chengnian Sun wrote: > Does gcc have a requirement about the impact of emitting debug info on the > generated code? Should the code be the same no matter whether '-g' is > specified? Yes (except for selective scheduling, but that warns if you combine -fselecting-scheduling{,2} together with -fvar-tracking-assignments and disables the latter by default). Jakub From Robert.Suchanek@mips.com Wed May 2 09:53:00 2018 From: Robert.Suchanek@mips.com (Robert Suchanek) Date: Wed, 02 May 2018 09:53:00 -0000 Subject: Introducing a nanoMIPS port for GCC Message-ID: <3f321d50ed794f5daa3f5ee4f5b62dea@mips.com> Yesterday, MIPS Tech announced the latest generation of the MIPS family of architectures called nanoMIPS [1]. As part of the development we have been designing all the open source tools necessary to support the architecture and, thanks to the speed with which we were able to prototype, we have also been using these tools to shape the architecture along the way. This has led to some really interesting improvements in the tools, which MIPS would like to contribute back to the community. While doing this work many of us have been unable to contribute to the community as actively as we would have liked, we are therefore very grateful for the community support given to the MIPS architecture over the last 18 months. This announcement has a general introduction at the start, so if you have already read it for one of the other tools, you can skip down to the information specific to GCC. For anyone who knows the MIPS architecture you may well wonder why we are introducing another major variant and the question is perfectly valid. We do admittedly have quite a few: MIPS I through MIPS IV, MIPS32 and MIPS64 through to MIPS32R6 and MIPS64R6, MIPS16e, MIPS16e2, microMIPSR3 and microMIPSR6. Each of these serves (or served) a purpose and there is a high level of synergy between all of them. In general, they build upon the previous and there is a high level of compatibility, even when switching to a new encoding like moving from MIPS to microMIPS. The switch to MIPS32R6/MIPS64R6 was a major shift in the way the architecture innovated and drew more on the original theory of the architecture, where evolution was not expected to be limited by binary compatibility. MIPS Release 6 removed instructions and did create some very minor incompatibility but is also much cleaner to implement from a micro-architecture perspective. We have taken this idea much further with nanoMIPS and reimagined the instruction set, by drawing on all the experience gained from previous designs. Hopefully others will find it as interesting as we do. The major driving force behind the nanoMIPS architecture was to achieve outstanding code density, while also balancing out hardware and software design cost. As background MIPS has two compressed ISA variants: MIPS16e, which cannot exist without also implementing MIPS32, and microMIPS, which can exist on its own. Since MIPS16e has specific limits that cannot be engineered around, we chose to use an approach similar to the microMIPS design. nanoMIPS has a variable-length compressed instruction set that is completely standalone from the other MIPS ISAs. It is designed to compress the highest frequency instructions to 16-bits, and use 48-bit instructions to efficiently encode 32-bit constants into the instruction stream. There is also a wider range of 32-bit instructions, which merge carefully chosen high frequency instruction sequences into single operations creating more flexible addressing modes such as indexed and scaled indexed addressing, branch compare with immediate and macro style instructions. The macro like instructions compress prologue and epilogue sequences, as well as a small number of high frequency instruction pairs like two move instructions or a move and function call. nanoMIPS also totally eliminates branch delay slots which follows a precedent set by microMIPSR6. To get the best from a new ISA we also re-engineered the ABI and created a new symbiotic relationship between the ISA and ABI that pushes code density and performance further still. The ABI creates a fully link time relaxable model, which enables us to squeeze every last byte out of the code image even when deferring final addressing mode and layout decisions to link time. We have been mindful of MIPS heritage and ensured that while open to any possible change, we also have minimal impact when porting code from MIPS to nanoMIPS, and have plenty of support to achieve source compatibility between the two. The net effect of these changes leads to an average code size reduction of 20% relative to microMIPSR6. This compression could well be one of the best achieved by GNU tools for any RISC ISA. Comparing the ISA in terms of number of instructions to issue vs microMIPS we also see a reduction of between 8% and 11% of dynamic instruction count. Below we dig into some technical specifics for each of the GNU tools; we welcome any feedback and questions as we start to look at rebasing this work to the trunk/master and formally submitting it. nanoMIPS pre-built toolchains and source code tarballs are available at: http://codescape.mips.com/components/toolchain/nanomips/2018.04-02/ GCC specific details ==================== The back-end ------------ Instead of creating a new back-end for nanoMIPS, we decided to reuse the existing MIPS back-end. Starting from scratch would have required copying the majority of code but most of the logic would have remained the same. Reusing allowed us to speed up porting. Maintenance might be more difficult but a fix for nanoMIPS could automatically be a fix MIPS and vice versa. Most of the back-end is contained within a small number of files. The shared part is mostly in mips.{h,c,md,opt} files. The MIPS toolchains use mips-classic.md as the entry file (instead of mips.md) i.e. it includes shareable mips.md, processor configuration and includes all other .md files as necessary. nanomips.md is used as the entry file for nanoMIPS toolchains and similarly includes its own processors list, mips.md and other machine descriptions files as needed. Doing it in this way makes it easier to enable features which can be shared between the two. Some chunks of the back-end code had to be enabled conditionally, as the compiler would otherwise fail to build (missing patterns etc). Lastly, we needed to clean up nanoMIPS' target options by disabling them in mips.opt, and also create a number nanoMIPS specific files to keep the separation as clear as possible. The p32 ABI [2] --------------- 1. Calling convention To avoid major porting issues, the register conventions have been left mostly intact, and resemble the MIPS n32/n64 ABIs. The main difference is the removal of dedicated return registers ($2/v0, $3/v1) and using the argument registers to return values from functions. The old return registers have been re-purposed and have now become temporaries. This allowed us to achieve better code density because of more efficient data passing between functions e.g. foo(bar()). This is particularly visible in soft-float mode, where more complex expressions require multiple library calls. The nanoMIPS ABI requires the usage of either named registers, such as $a0..$a7 for arguments, $s0..$s7 for saved temporaries, etc. or of the $r0..$r31 format. Using $0..$31 is no longer supported by default, but can be re-enabled by using -mlegacyregs. 2. Stack frame organization The major change in comparison to the previous MIPS ABIs is changing the location of the frame pointer. The frame pointers now form a chain that will allow an efficient stack unwinding. Previously, in order to find the location of the frame-pointer, the instructions had to be scanned at the current program counter, going backwards. With this change, finding the location is trivial, however, it's important to point out that the frame pointer is biased by 4096 bytes i.e. logical_frame_pointer = $fp + 4096. The rationale was to enable full use of the unsigned 12-bit offsets in memory instructions when using the frame pointer as the base. Another notable difference is in the order of general-purpose registers on the stack, which now reflects the operation of the SAVE/RESTORE instructions from the nanoMIPS ISA. 3. Code and data models The automatic model (-mcmodel=auto) produces the most compact code possible by relying on the linker to do further size optimizations on the compiler-generated code. The linker will also expand the code when symbols end up being out of range. This model has been designed to keep the size difference between the intermediate objects and the fully linked object as small as possible, although having the linker perform too many expansions will widen that gap. It can be used only with a linker which is capable of performing relaxations and expansions. The medium model (-mcmodel=medium) is somewhat similar to the automatic model in terms of the range and size of the generated code, but it does not rely on linker relaxations and expansions. This lack of linker transformations makes the size of the fully linked object more predictable, even though it squanders some opportunities for further size optimization and it introduces inherent limitations in the fully linked code. The large model (-mcmodel=large) produces code which has an unlimited range by only using instructions which cover the entire address space. Because these instructions tend to be bigger, this model sacrifices code size in order to guarantee that code sequences will work regardless of where the symbol is placed in memory. The large model also does not rely on linker relaxations and expansions. In addition to the models, there are 4 addressing modes: - absolute: addresses are fixed at link-time. This mode is rarely necessary but has some potential for energy efficiency. - PC-relative: addresses appear as offsets from the PC and are used in PC-relative instructions. This mode produces position-independent code. - GP-relative: addresses appear as offsets from the GP and are used in GP-relative instructions. Symbols are placed in the small data section, also known as .sdata. This mode produces position-independent data for some or all symbols of an application. - GOT-dependent: addresses are kept in the GOT and are loaded by using offsets between the GP and a given symbol's entry in the GOT. This mode produces dynamically linkable code. 4. Thread Local Storage The nanoMIPS TLS ABI has support for both the traditional TLS models and TLS descriptors. All of the TLS models have been adapted to the nanoMIPS ISA following an approach similar to the one taken for the code and data models. The runtime TLS layout has also been redesigned to take advantage of the unsigned offset LW[U12] nanoMIPS instruction, thus extending the possible range of symbols inside the TLS block. Target-independent optimizations -------------------------------- In addition to these ABI improvements, we have also developed various target-independent and nanoMIPS-specific compiler optimizations, in order to further improve code size and performance. 1. LRA: use equivalences to help with frame pointer elimination (currently enabled by -mlra-equiv) The patch has already been posted [3] and went through some additional changes since posting. A case was found where LRA produced suboptimal code for a large frame and frame growing downward. The code size was affected particularly in cases where the offset was large and could not be used in an add operation directly introducing more instructions for a single frame access. Using the equivalences, the frame pointer gets eliminated more often, resulting in smaller code. The reasons are twofold: register pressure drops resulting in fewer spills and the offset might be smaller fitting into a single add instruction for every frame access. 2. IRA register recoloring (-fadjust-costs) The goal of register cost adjustment optimization is to make better usage of instructions that improve code density. This group of instructions includes 16-bit instructions and 32-bit nanoMIPS instructions which replace two other instructions (e.g. movep, move.balc, etc). Most of these instructions can use only a subset of all available registers and the purpose of this optimization is to increase the chances that pseudo registers used inside these instructions are assigned to the required hard registers. This is achieved by introducing a new target hook through which the cost of corresponding hard registers is modified just before allocation of a pseudo register. Cost modification is based on the properties of all instructions in which a pseudo register is used. If assigning a pseudo to some hard registers would lead to more dense code e.g. by being able to generate 16-bit instructions, then the cost of these hard registers is decreased. Otherwise, the cost of the hard registers is increased, thus improving the chances that these hard registers will be available for pseudos that are allocated later in the process. 3. Jump-table optimization (-fjump-table-clusters) The optimization enables the splitting of a single switch statement into a combination of multiple jump tables and decision trees. GCC currently emits either a single jump table or decision tree. The optimization can be enabled by the command line option -fjump-table-clusters and is target-independent. A MIPS specific option has been added (-mjump-table-density=DENSITY) to change the default density. DENSITY is the minimum percentage of non-default case indices in a jump table. If not specified, GCC will use the default density of 40%, if optimizing for size, and 10%, if optimizing for speed. The target option will be later replaced by an appropriate --param jump-table-density option or something similar. 4. Edge sorting for -Os during basic block reordering (-freorder-blocks-edge-sort=[one|two|all|default]) When reordering blocks using the `simple' algorithm edges are sorted for speed optimized functions and not sorted for size optimized functions. However, sorting the edges for size optimized functions can significantly improve performance with some code size cost. Inner loops show the greatest benefit with `level' set to `one'. Further improvement is possible by sorting one level of nested loops (`level' set to `two') with additional cost in size. Finally, all edges can be sorted (`level' set to `all'). This option overrides the normal sorting choice for both size and speed optimized functions. Target-specific optimizations ----------------------------- 1. Optimized inline memcpy (-mmemcpy-interleave=NUM/-mmulti-memcpy) These options have been introduced to control the inlined memcpy. -mmulti-memcpy attempts to exploit Load/Store Word Multiple instructions and -mmemcpy-interleave=NUM controls how loads and stores are interleaved i.e. how many NUM words are loaded first before storing them. 2. MOVEP/MOVE.BALC/RESTORE.JRC A machine-dependent hook will attempt to find opportunities in the instruction stream to combine instructions into MOVEP, MOVE.BALC or RESTORE.JRC to improve code density. MOVE.BALC can be controlled with -m[no-]opt-movebalc switch. 3. Offset shrinking pass (-m[no-]shrink-offsets) This pass processes the instruction stream, extracts offsets from memory accesses, and then tries to figure out the best offset adjustment to get the maximum potential code size savings. We take into account the cost of introducing a new add instruction that could undo the code size savings. As the pass is run before the register allocation, we can only speculate and be optimistic about the potential code size improvement. These guesses appear to be relatively good on average but might need to be considered on a case by case basis. 4. Jump-table optimization (-mjump-table-opt) This switch enables jump-tables which contain relative addresses. 5. BALC stubs (-m[no-]balc-stubs) This code size optimization is not performed by the compiler, but by the the assembler. It controls out-of-range call optimization through trampoline stubs. It is enabled by default when optimizing for size. Note that support for 64-bit and floating-point is not finalized and still unofficial. GCC contributors ================ - nanoMIPS port, ABI, code and data models, TLS, bugfixes: Robert Suchanek Toma Tabacu Matthew Fortune - IRA register recosting, edge sorting: Zoran Jovanovic - Jump-table optimization, scheduler, MOVE.BALC/MOVEP optimization: Prachi Godbole - RESTORE.JRC optimization: Robert Suchanek - Lightweight sync codes: Faraz Shahbazker - Offset shrinking pass: Robert Suchanek Steve Ellcey - Exception handling: Jack Romo - Dejagnu tests, bugfixes: Stefan Markovic Sara Popadic References: [1] https://www.mips.com/press/new-mips-i7200-processor-core-delivers-unmatched-performance-and-efficiency-for-advanced-lte5g-communications-and-networking-ic-designs/ [2] Codescape GNU tools for nanoMIPS: ELF ABI Supplement, https://codescape.mips.com/components/toolchain/nanomips/2018.04-02/docs/MIPS_nanoMIPS_ABI_supplement_01_02_DN00179.pdf [3] https://patchwork.ozlabs.org/patch/666637/ From Matthew.Fortune@mips.com Wed May 2 10:05:00 2018 From: Matthew.Fortune@mips.com (Matthew Fortune) Date: Wed, 02 May 2018 10:05:00 -0000 Subject: Introducing a nanoMIPS port for GCC Message-ID: <9253772adfd3429dbdaddf0dd622c709@mips.com> Robert Suchanek writes: > the last 18 months. This announcement has a general introduction at > the start, so if you have already read it for one of the other tools, > you can skip down to the information specific to GCC. Thanks, Robert. Corresponding technical info for other toolchain components can be found in the following archived posts. binutils/gdb/gold ================= http://sourceware.org/ml/binutils/2018-05/msg00003.html qemu ==== http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg00081.html Thanks, Matthew From chenyiliangex@gmail.com Wed May 2 10:22:00 2018 From: chenyiliangex@gmail.com (yiliang chen) Date: Wed, 02 May 2018 10:22:00 -0000 Subject: i don't known what is happened Message-ID: Hello, I am a Chinese developer.When compiling the gcc source code, I encountered a rather confusing problem, but I don't know it was a bug or not. I compile Gcc with MSYS2 on Windows? In order to output the Intel assembly file by default, I modified the gcc\config\i386\i386.opt file as follows masm= Target RejectNegative Joined Enum(asm_dialect) Var(ix86_asm_dialect) Init(ASM_INTEL) Use given assembler dialect. however, An error occurred when compiling to libgcc/config/i386/sfp-exceptions.c file? it tell me : C:\MSYS\MSYS32\tmp\cctIKXUx.s: Assembler messages: C:\MSYS\MSYS32\tmp\cctIKXUx.s:69: Error: no such instruction: `fdivs DWORD PTR [esp]' C:\MSYS\MSYS32\tmp\cctIKXUx.s:135: Error: no such instruction: `fdivs DWORD PTR [esp]' I don't known why!! From jwakely.gcc@gmail.com Wed May 2 10:24:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Wed, 02 May 2018 10:24:00 -0000 Subject: i don't known what is happened In-Reply-To: References: Message-ID: Please don't cross post to all these mailing lists. If you want a bugzilla account then email the account request list, not the other lists. If you want help, email gcc-help, not the other lists. If you want to discuss development of GCC then email the development list, not the other lists. It is never appropriate to cross-post to all three. On 2 May 2018 at 11:22, yiliang chen wrote: > Hello, I am a Chinese developer.When compiling the gcc source code, I > encountered a rather confusing problem, but I don't know it was a bug or > not. > > I compile Gcc with MSYS2 on Windows? In order to output the Intel assembly > file by default, I modified the gcc\config\i386\i386.opt file as follows > > masm= > Target RejectNegative Joined Enum(asm_dialect) Var(ix86_asm_dialect) > Init(ASM_INTEL) > Use given assembler dialect. > > however, An error occurred when compiling to > libgcc/config/i386/sfp-exceptions.c file? it tell me : > > C:\MSYS\MSYS32\tmp\cctIKXUx.s: Assembler messages: > C:\MSYS\MSYS32\tmp\cctIKXUx.s:69: Error: no such instruction: `fdivs DWORD > PTR [esp]' > C:\MSYS\MSYS32\tmp\cctIKXUx.s:135: Error: no such instruction: `fdivs DWORD > PTR [esp]' > > I don't known why!! > From chris@groessler.org Wed May 2 10:49:00 2018 From: chris@groessler.org (Christian Groessler) Date: Wed, 02 May 2018 10:49:00 -0000 Subject: wrong comment in gcc/testsuite/gcc.c-torture/compile/simd-5.c Message-ID: <5fb48660-4ec7-fee8-213c-4d1b68ec4755@groessler.org> Hi, --- a/gcc/testsuite/gcc.c-torture/compile/simd-5.c +++ b/gcc/testsuite/gcc.c-torture/compile/simd-5.c @@ -6,7 +6,7 @@ main(){ ??vector64 int a = {1, -1}; ??vector64 int b = {2, -2}; ??c = -a + b*b*(-1LL); -/* c is now {5, 3} */ +/* c is now {-5, -3} */ ?? printf("result is %llx\n", (long long)c); ??} regards, chris From jakub@redhat.com Wed May 2 12:16:00 2018 From: jakub@redhat.com (Jakub Jelinek) Date: Wed, 02 May 2018 12:16:00 -0000 Subject: GCC 8.1 Released Message-ID: <20180502121524.GT8577@tucnak> We are proud to announce the next, major release of the GNU Compiler Collection. Are you tired of your existing compilers? Want fresh new language features and better optimizations? Make your day with the new GCC 8.1! GCC 8.1 is a major release containing substantial new functionality not available in GCC 7.x or previous GCC releases. The C++ front-end now has experimental support for some parts of the upcoming C++2a draft, with the -std=c++2a and -std=gnu++2a options, and the libstdc++ library has some further C++17 and C++2a draft library features implemented too. This releases features significant improvements in the emitted diagnostics, including improved locations, location ranges and fix-it hints (especially in the C++ front-end), and various new warnings have been added. Profile driven optimizations have been significantly improved, on x86 functions are now split into hot and cold regions by default. The link time optimizations now have a new way of emitting the DWARF debug information, which makes LTO optimized code more debuggable. New loop optimizers have added and existing improved and some, like -ftree-loop-distribution, -floop-unroll-and-jam and -floop-interchange have been enabled by default at -O3. The AArch64 target now supports the Scalable Vector Extension, which features vectors with runtime determined number of elements. Some code that compiled successfully with older GCC versions might require source changes, see http://gcc.gnu.org/gcc-8/porting_to.html for details. See https://gcc.gnu.org/gcc-8/changes.html for more information about changes in GCC 8.1. This release is available from the FTP servers listed here: http://www.gnu.org/order/ftp.html The release is in gcc/gcc-8.1.0/ subdirectory. If you encounter difficulties using GCC 8.1, please do not contact me directly. Instead, please visit http://gcc.gnu.org for information about getting help. Driving a leading free software project such as GNU Compiler Collection would not be possible without support from its many contributors. Not to only mention its developers but especially its regular testers and users which contribute to its high quality. The list of individuals is too large to thank individually! Please consider a donation to the GNU Toolchain Fund to support the continued development of GCC! From joseph@codesourcery.com Wed May 2 15:25:00 2018 From: joseph@codesourcery.com (Joseph Myers) Date: Wed, 02 May 2018 15:25:00 -0000 Subject: Introducing a nanoMIPS port for GCC In-Reply-To: <9253772adfd3429dbdaddf0dd622c709@mips.com> References: <9253772adfd3429dbdaddf0dd622c709@mips.com> Message-ID: On Wed, 2 May 2018, Matthew Fortune wrote: > qemu > ==== > http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg00081.html That answers one thing I was wondering, by saying you're using the generic Linux kernel syscall interface rather than any of the existing MIPS syscall interfaces. Is your Linux kernel port available somewhere (or a description of it corresponding to these descriptions of changes to toolchain components)? -- Joseph S. Myers joseph@codesourcery.com From daniel.santos@pobox.com Wed May 2 16:37:00 2018 From: daniel.santos@pobox.com (Daniel Santos) Date: Wed, 02 May 2018 16:37:00 -0000 Subject: GCC 8.1 Released In-Reply-To: <20180502121545.GU8577@tucnak> References: <20180502121545.GU8577@tucnak> Message-ID: <28325bfe-e9fc-81be-e3f2-a3cb5919a529@pobox.com> Woo hoo! Looks like I forgot to add the details of -mcall-ms2sysv-xlogues to the changes (https://gcc.gnu.org/gcc-8/changes.html).? Is it too late to change that?? At least this only really affects one project (that I'm aware of).? I've got some improvements to it for GCC 9 that I'll get together in the next few weeks.? I'll email the Wine list later today. Daniel From jakub@redhat.com Wed May 2 16:43:00 2018 From: jakub@redhat.com (Jakub Jelinek) Date: Wed, 02 May 2018 16:43:00 -0000 Subject: GCC 8.1 Released In-Reply-To: <28325bfe-e9fc-81be-e3f2-a3cb5919a529@pobox.com> References: <20180502121545.GU8577@tucnak> <28325bfe-e9fc-81be-e3f2-a3cb5919a529@pobox.com> Message-ID: <20180502164306.GZ8577@tucnak> On Wed, May 02, 2018 at 11:38:51AM -0500, Daniel Santos wrote: > Looks like I forgot to add the details of -mcall-ms2sysv-xlogues to the > changes (https://gcc.gnu.org/gcc-8/changes.html).?? Is it too late to > change that??? At least this only really affects one project (that I'm > aware of).?? I've got some improvements to it for GCC 9 that I'll get > together in the next few weeks.?? I'll email the Wine list later today. There is no deadline on gcc-*/changes.html changes, it can be changed whenever changes for it are acked. Of course it is always better to do it before the release if possible. Jakub From damian@sourceryinstitute.org Wed May 2 17:21:00 2018 From: damian@sourceryinstitute.org (Damian Rouson) Date: Wed, 02 May 2018 17:21:00 -0000 Subject: GCC 8.1 Released In-Reply-To: <20180502164306.GZ8577@tucnak> References: <20180502121545.GU8577@tucnak> <28325bfe-e9fc-81be-e3f2-a3cb5919a529@pobox.com> <20180502164306.GZ8577@tucnak> Message-ID: ? On May 2, 2018 at 9:43:23 AM, Jakub Jelinek (jakub@redhat.com(mailto:jakub@redhat.com)) wrote: > There is no deadline on gcc-*/changes.html changes, it can be changed > whenever changes for it are acked. Of course it is always better to > do it before the release if possible. > Could someone please point me to instructions for how to submit a change to the gfortran changes list? ?I?d like to add the following bullet: * Partial support is provided for the Fortran 2018 teams feature, which enables the formation of hierarchical subsets of images that execute independently of other image subsets. Damian From jwakely.gcc@gmail.com Wed May 2 17:35:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Wed, 02 May 2018 17:35:00 -0000 Subject: GCC 8.1 Released In-Reply-To: References: <20180502121545.GU8577@tucnak> <28325bfe-e9fc-81be-e3f2-a3cb5919a529@pobox.com> <20180502164306.GZ8577@tucnak> Message-ID: On 2 May 2018 at 18:21, Damian Rouson wrote: > Could someone please point me to instructions for how to submit a change to the gfortran changes list? The web pages are hosted in CVS and patches for them are handled like any other GCC patches: https://gcc.gnu.org/about.html#cvs From jimw@sifive.com Wed May 2 18:02:00 2018 From: jimw@sifive.com (Jim Wilson) Date: Wed, 02 May 2018 18:02:00 -0000 Subject: GCC 8.1 Released In-Reply-To: References: <20180502121545.GU8577@tucnak> <28325bfe-e9fc-81be-e3f2-a3cb5919a529@pobox.com> <20180502164306.GZ8577@tucnak> Message-ID: On 05/02/2018 10:21 AM, Damian Rouson wrote: > Could someone please point me to instructions for how to submit a change to the gfortran changes list? ??I???d like to add the following bullet: See also https://gcc.gnu.org/contribute.html#webchanges Jim From Matthew.Fortune@mips.com Wed May 2 20:52:00 2018 From: Matthew.Fortune@mips.com (Matthew Fortune) Date: Wed, 02 May 2018 20:52:00 -0000 Subject: Introducing a nanoMIPS port for GCC In-Reply-To: References: <9253772adfd3429dbdaddf0dd622c709@mips.com> Message-ID: <1e1ee389e85547fdba538748d031bc22@mips.com> Joseph Myers writes: > On Wed, 2 May 2018, Matthew Fortune wrote: > > > qemu > > ==== > > http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg00081.html > > That answers one thing I was wondering, by saying you're using the generic > Linux kernel syscall interface rather than any of the existing MIPS > syscall interfaces. > > Is your Linux kernel port available somewhere (or a description of it > corresponding to these descriptions of changes to toolchain components)? Hi Joseph, The kernel is being prepared for a branch in the linux-mips.org repository and a document adding to the wiki there. All being well it will not take too long to get that available. To my knowledge the major areas of the nanoMIPS kernel that are not yet finalised are the debug interfaces but our kernel engineers will be able to give a more detailed description. Matthew From gccadmin@gcc.gnu.org Wed May 2 22:43:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Wed, 02 May 2018 22:43:00 -0000 Subject: gcc-6-20180502 is now available Message-ID: <20180502224306.65662.qmail@sourceware.org> Snapshot gcc-6-20180502 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/6-20180502/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch revision 259867 You'll find: gcc-6-20180502.tar.xz Complete GCC SHA256=12bc75f9ecc4e02c8f496446fbb6dc2340c02552f08a203e098d94914dbc197f SHA1=99ea089cbc190b12e2b39d003ebbb7fea8bb4ac4 Diffs from 6-20180425 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From njasrotia662webtech@gmail.com Thu May 3 12:37:00 2018 From: njasrotia662webtech@gmail.com (nidhi jasrotia) Date: Thu, 03 May 2018 12:37:00 -0000 Subject: Guaranteed 1st Page On Google Message-ID: Hi, Hope that you are well! We offer SEO, SMO & Local business listing service to increase website visibility and traffic from the relevant market. If you are looking to rank in top page on Google and wanted more traffic, please let me know. We look forward to hearing from you. *With Best Regards* *Nidhi jasrotia* *Business Development Manager,* *Online SEO I SMO I PPC I Web Design & Development* [image: beacon] From David.Taylor@dell.com Thu May 3 17:09:00 2018 From: David.Taylor@dell.com (taylor, david) Date: Thu, 03 May 2018 17:09:00 -0000 Subject: gcov and initialized data Message-ID: <63F1AEE13FAE864586D589C671A6E18B10FE1091@MX203CL03.corp.emc.com> I want to use gcov in an embedded type environment. It supports both cold boot and warm boot. For warm boot it does not reload the program from media, instead it 'just' jumps to the start and begins again. Due to support for warm boot, it does not support read-write initialized data. Writable data is initialized at run-time. When you build your program for code coverage (-ftest-coverage -fprofile-arcs), GCC creates some initialized read-write GCOV related data. Has anyone modified GCC to, presumably either under control of a command line option or possibly a configure time option, to initialize such data at run-time instead of compile-time? Thanks. David p.s. In case it matters / anyone cares -- we have copyright assignments on file for GCC, BINUTILS, and GDB, which the company lawyers assure me survived our acquisition by Dell. From nathan@acm.org Thu May 3 17:57:00 2018 From: nathan@acm.org (Nathan Sidwell) Date: Thu, 03 May 2018 17:57:00 -0000 Subject: gcov and initialized data In-Reply-To: <63F1AEE13FAE864586D589C671A6E18B10FE1091@MX203CL03.corp.emc.com> References: <63F1AEE13FAE864586D589C671A6E18B10FE1091@MX203CL03.corp.emc.com> Message-ID: <05d38325-59f4-4b83-8658-49b6808647b8@acm.org> On 05/03/2018 01:09 PM, taylor, david wrote: > When you build your program for code coverage (-ftest-coverage -fprofile-arcs), > GCC creates some initialized read-write GCOV related data. Has anyone modified > GCC to, presumably either under control of a command line option or possibly a > configure time option, to initialize such data at run-time instead of compile-time? How is this distinct to having to support regular C code such as: int x = 5; ? (I'm guessing the simplest solution would be to post-process the statically linked image to do some kind of run-length compression on its .data section and squirrel that away somewhere that a decompressor can find it early on) nathan -- Nathan Sidwell From toon@moene.org Thu May 3 18:43:00 2018 From: toon@moene.org (Toon Moene) Date: Thu, 03 May 2018 18:43:00 -0000 Subject: Interesting statistics on vectorization for Skylake avx512 (i9-7900) - 8.1 vs. 7.3. Message-ID: <0b14a1db-f90c-66f4-420c-954a354d2f82@moene.org> Consider the attached Fortran code (the most expensive routine, computation-wise, in our weather forecasting model). verint.s.7.3 is the result of: gfortran -g -O3 -S -march=native -mtune=native verint.f using release 7.3. verint.s.8.1 is the result of: gfortran -g -O3 -S -march=native -mtune=native verint.f using the recently released GCC 8.1. $ wc -l verint.s.7.3 verint.s.8.1 7818 verint.s.7.3 6087 verint.s.8.1 $ grep vfma verint.s.7.3 | wc -l 381 $ grep vfma verint.s.8.1 | wc -l 254 but: $ grep vfma verint.s.7.3 | grep -v ss | wc -l 127 $ grep vfma verint.s.8.1 | grep -v ss | wc -l 127 and: $ grep movaps verint.s.7.3 | wc -l 306 $ grep movaps verint.s.8.3 | wc -l 270 Finally: $ grep zmm verint.s.7.3 | wc -l 1494 $ grep zmm verint.s.8.1 | wc -l 0 $ grep ymm verint.s.7.3 | wc -l 379 $ grep ymm verint.s.8.1 | wc -l 1464 I haven't had the opportunity to test this for speed (is quite complicated, as I have to build several support libraries with 8.1, like openmpi, netcdf, hdf{4|5}, fftw ...) -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news -------------- next part -------------- A non-text attachment was scrubbed... Name: verint.f Type: text/x-fortran Size: 12959 bytes Desc: not available URL: From David.Taylor@dell.com Thu May 3 18:47:00 2018 From: David.Taylor@dell.com (taylor, david) Date: Thu, 03 May 2018 18:47:00 -0000 Subject: gcov and initialized data In-Reply-To: <05d38325-59f4-4b83-8658-49b6808647b8@acm.org> References: <63F1AEE13FAE864586D589C671A6E18B10FE1091@MX203CL03.corp.emc.com> <05d38325-59f4-4b83-8658-49b6808647b8@acm.org> Message-ID: <63F1AEE13FAE864586D589C671A6E18B10FE110A@MX203CL03.corp.emc.com> > From: Nathan Sidwell [mailto:nathanmsidwell@gmail.com] On Behalf Of > Nathan Sidwell > Sent: Thursday, May 3, 2018 1:58 PM > To: taylor, david; gcc@gcc.gnu.org > Subject: Re: gcov and initialized data > > On 05/03/2018 01:09 PM, taylor, david wrote: > > > When you build your program for code coverage (-ftest-coverage > > -fprofile-arcs), GCC creates some initialized read-write GCOV related > > data. Has anyone modified GCC to, presumably either under control of > > a command line option or possibly a configure time option, to initialize such > data at run-time instead of compile-time? > > How is this distinct to having to support regular C code such as: > > int x = 5; > > ? (I'm guessing the simplest solution would be to post-process the statically > linked image to do some kind of run-length compression on its .data section > and squirrel that away somewhere that a decompressor can find it early on) There's a linker script. It sets the size of the .data section to 0. Any attempt to use initialized read-write data overflows the .data section and fails the build. > nathan > > -- > Nathan Sidwell From gccadmin@gcc.gnu.org Thu May 3 22:40:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Thu, 03 May 2018 22:40:00 -0000 Subject: gcc-7-20180503 is now available Message-ID: <20180503224009.54657.qmail@sourceware.org> Snapshot gcc-7-20180503 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20180503/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 259911 You'll find: gcc-7-20180503.tar.xz Complete GCC SHA256=2c7c10ee96986e919c29ffa5475b305945d80ee8ee39b3ddf9232de06607ffd1 SHA1=ebf856d64f84bc8efe2ee48225f5591ee1d98984 Diffs from 7-20180426 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From bert.wesarg@googlemail.com Fri May 4 06:23:00 2018 From: bert.wesarg@googlemail.com (Bert Wesarg) Date: Fri, 04 May 2018 06:23:00 -0000 Subject: GCC 8.1 Released In-Reply-To: <20180502121524.GT8577@tucnak> References: <20180502121524.GT8577@tucnak> Message-ID: Hi, On Wed, May 2, 2018 at 2:15 PM, Jakub Jelinek wrote: > Some code that compiled successfully with older GCC versions might require > source changes, see http://gcc.gnu.org/gcc-8/porting_to.html for > details. in "Fortran language issues" it reads: "Prior to GCC 7", shouldn't that be "Prior to GCC 8" or "Up to GCC 7"? And can somebody can tell me, whether this Fortran issue effects also Fortran code which calls C functions? Thanks. Best, Bert From blomqvist.janne@gmail.com Fri May 4 06:31:00 2018 From: blomqvist.janne@gmail.com (Janne Blomqvist) Date: Fri, 04 May 2018 06:31:00 -0000 Subject: GCC 8.1 Released In-Reply-To: References: <20180502121524.GT8577@tucnak> Message-ID: On Fri, May 4, 2018 at 9:22 AM, Bert Wesarg wrote: > Hi, > > On Wed, May 2, 2018 at 2:15 PM, Jakub Jelinek wrote: > > Some code that compiled successfully with older GCC versions might > require > > source changes, see http://gcc.gnu.org/gcc-8/porting_to.html for > > details. > > in "Fortran language issues" it reads: "Prior to GCC 7", shouldn't > that be "Prior to GCC 8" or "Up to GCC 7"? > Yes, indeed it should. Thanks for noticing. > > And can somebody can tell me, whether this Fortran issue effects also > Fortran code which calls C functions? > > If it's a "normal" C function with NULL-terminated strings, then no. If it's a C function which is designed to follow the Fortran procedure ABI (where strings are passed as a pointer + a hidden length argument), then yes. -- Janne Blomqvist From richard.guenther@gmail.com Fri May 4 07:14:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Fri, 04 May 2018 07:14:00 -0000 Subject: Interesting statistics on vectorization for Skylake avx512 (i9-7900) - 8.1 vs. 7.3. In-Reply-To: <0b14a1db-f90c-66f4-420c-954a354d2f82@moene.org> References: <0b14a1db-f90c-66f4-420c-954a354d2f82@moene.org> Message-ID: On Thu, May 3, 2018 at 8:43 PM, Toon Moene wrote: > Consider the attached Fortran code (the most expensive routine, > computation-wise, in our weather forecasting model). > > verint.s.7.3 is the result of: > > gfortran -g -O3 -S -march=native -mtune=native verint.f > > using release 7.3. > > verint.s.8.1 is the result of: > > gfortran -g -O3 -S -march=native -mtune=native verint.f > > using the recently released GCC 8.1. > > $ wc -l verint.s.7.3 verint.s.8.1 > 7818 verint.s.7.3 > 6087 verint.s.8.1 > > $ grep vfma verint.s.7.3 | wc -l > 381 > $ grep vfma verint.s.8.1 | wc -l > 254 > > but: > > $ grep vfma verint.s.7.3 | grep -v ss | wc -l > 127 > $ grep vfma verint.s.8.1 | grep -v ss | wc -l > 127 > > and: > > $ grep movaps verint.s.7.3 | wc -l > 306 > $ grep movaps verint.s.8.3 | wc -l > 270 > > Finally: > > $ grep zmm verint.s.7.3 | wc -l > 1494 > $ grep zmm verint.s.8.1 | wc -l > 0 > $ grep ymm verint.s.7.3 | wc -l > 379 > $ grep ymm verint.s.8.1 | wc -l > 1464 > > I haven't had the opportunity to test this for speed (is quite complicated, > as I have to build several support libraries with 8.1, like openmpi, netcdf, > hdf{4|5}, fftw ...) GCC 8 has changes to prefer AVX256 by default for Skylake-avx512, even with AVX512 available. You can change that with -mprefer-vector-width=512 or by changing the avx256_optimal tune via -mtune-ctrl=^avx256_optimal There are now also measures in place to avoid fma in certain situations where it doesn't help performance. So - performance measurements would be nice to have ;) Richard. > > -- > Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ > Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news From cc647390@gmail.com Fri May 4 12:12:00 2018 From: cc647390@gmail.com (Christina Chavez) Date: Fri, 04 May 2018 12:12:00 -0000 Subject: Fwd: OMB-A-130 Message-ID: {?}Copyright2047ICC.cc3.* ---------- Forwarded message ---------- From: cc647390@gmail.com Date: Apr 25, 2018 6:21 PM Subject: OMB-A-130 To: Cc: >>>((ASU-2016-01))(2018-03)//regarding CONSECUTIVE CONCURRENT WILLFULL CONSTITUTIONAL & INTERNATIONAL TRADE VIOLATIONS! Request::REQUEST;(please r u STRONG ENOUGH 2 HAVE AN "ACTIVE" Conscious???)FROM:: (((Ms.CHRISTINA MILDRED CHAVEZ)))vs. State of New Mexico Secretary of State &/ or all " affiliates"(cash-control/sba& ha) ASSET ACCOUNTS=$45 BILLION FROM MY PERSONAL BUSINESS UNIVERSAL GOVT CONTRACTOR)(X.3)w&i(EULA PATENTS; NOTICE!FAR52.227.19.CC3.*{?}2047).Personal Services Consultant(Psc7)WHD/EBSA/SSA"LEGACY" INVESTMENT ACCOUNT(S)RETIRED MILITARY .CIA&IRS&SEC&DEPARTMENT OF(PIA)FISCAL SERVICES U.S.TREASURY.INTERNATIONAL SECURITIES ENTITY{{SG1}}.see JSC INQUIRY No.2013-077(ALSO VIOLATIONS HB-1,HB2,)SUPREME COURT DOCKET NO.34,601(Court Judge Conduct Violations NMSA 1978 RULES-21-206(A),21-101,21-102,21-103,21-303,21-204(B)(C), et.al. NET INCOME LOSSES EXCEEDS PUBLIC HEALTH &/Or SAFETY RISKS LOSS BASE MINIMUMS. 7-1-4-3 nmsa 1978/CONTINUED NON COMPLIANCE MALICIOUS INDIFFERENCE TO"REASONABLE_MAN_STANDARD". National Defense Authorization Act FY2016/Note 552a Title5 P.L.111-203,124 Stat.2081/1693d&1693f TitleX/ Title 1 Violations. There is No "Competent" Attorney's CPA'$ Available in the State of New Nexico to help Resolve Mandated Required_ Already Adjucated_PAYROLL,RETIREMENT& INVESTMENT ACCOUNT VIOLATIONS. Dear Sir. I am 10 Years of " Information Collection Reporting Entity" Active Contributing Earned Income & Investment Asset Issuer.( No Notification or Comprehensive Communication from New Mexico State Financial Administration.) I Require "Independent" Investment Individual Plan EXECUTIVE OF ESTATE & Advance Directives.Asset Accounts in Virginia_Industrial Savings&Loan Virginia, NYFRB. ASSET ACCOUNTS.IRREVOCAVLE LETTERS OF CREDIT FROM U.S.DEPT.OF TREASURY(Fiscal Services) IRS LR'$ (Taxes,Security Together) DOD, DOL, GOA, GPA,[[ILC.USA 1.ILC]]{07}& Others. My mobile phone(505)357-4566. Thank You Sir.>>>> {?}Copyright2047ICC.cc3.* From umesh.kalappa0@gmail.com Fri May 4 12:50:00 2018 From: umesh.kalappa0@gmail.com (Umesh Kalappa) Date: Fri, 04 May 2018 12:50:00 -0000 Subject: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction In-Reply-To: References: Message-ID: Hi Alex , Agree that float division don't touch memory ,but fdiv result (stack register ) is stored back to a memory i.e fResult . So compiler barrier in the inline asm i.e ::memory should prevent the shrinkage of instructions like "fstps fResult(%rip)" behind the fence ? BTW ,if we make fDivident and fResult = 0.0f gloabls,the code emitted looks ok i.e #gcc -S test.c -O3 -mmmx -mno-sse flds .LC0(%rip) fsts fDivident(%rip) fdivs .LC1(%rip) fstps fResult(%rip) #APP # 10 "test.c" 1 mfence # 0 "" 2 #NO_APP flds fResult(%rip) movl $.LC2, %edi xorl %eax, %eax fstpl (%rsp) call printf So i strongly believe that ,its compiler issue and please feel free correct me in any case. Thank you and waiting for your reply. ~Umesh On Fri, Apr 13, 2018 at 5:58 PM, Alexander Monakov wrote: > On Fri, 13 Apr 2018, Vivek Kinhekar wrote: >> The mfence instruction with memory clobber asm instruction should create a >> barrier between division and printf instructions. > > No, floating-point division does not touch memory, so the asm does not (and > need not) restrict its motion. > > Alexander From dmalcolm@redhat.com Fri May 4 20:47:00 2018 From: dmalcolm@redhat.com (David Malcolm) Date: Fri, 04 May 2018 20:47:00 -0000 Subject: ANN: gcc-python-plugin 0.16 Message-ID: <1525466854.2961.34.camel@redhat.com> gcc-python-plugin is a plugin for GCC 4.6 onwards which embeds the CPython interpreter within GCC, allowing you to write new compiler warnings in Python, generate code visualizations, etc. This releases adds support for gcc 7 and gcc 8 (along with continued support for gcc 4.6, 4.7, 4.8, 4.9, 5 and 6). The upstream location for the plugin has moved from fedorahosted.org to https://github.com/davidmalcolm/gcc-python-plugin Additionally, this release contains the following improvements: * add gcc.RichLocation for GCC 6 onwards * gcc.Location * add caret, start, finish attributes for GCC 7 onwards * add gcc.Location.offset_column() method Tarball releases are available at: https://github.com/davidmalcolm/gcc-python-plugin/releases Prebuilt-documentation can be seen at: http://gcc-python-plugin.readthedocs.org/en/latest/index.html The plugin and checker are Free Software, licensed under the GPLv3 or later. Enjoy! Dave Malcolm From gccadmin@gcc.gnu.org Fri May 4 22:41:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Fri, 04 May 2018 22:41:00 -0000 Subject: gcc-8-20180504 is now available Message-ID: <20180504224106.45651.qmail@sourceware.org> Snapshot gcc-8-20180504 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/8-20180504/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch revision 259960 You'll find: gcc-8-20180504.tar.xz Complete GCC SHA256=b49b674524449c999c0966271c2fc4488a2db8cec8d65e78ba6665408577f572 SHA1=9b4f388d4c8f58d0a4fcfe888a7bc8ca86679d39 Diffs from 8-20180427 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From reply5@oddcoll.com Sat May 5 00:15:00 2018 From: reply5@oddcoll.com (Debt Recovery) Date: Sat, 05 May 2018 00:15:00 -0000 Subject: {Alerting You} Message-ID: Hello, I am very sorry I have to reach you through this medium. I am a member of the European Debt Recovery Unit and I am aware of your ordeal about your unpaid fund. It may also interest you to know that, not long after the Debt Management Office (DMO) completed the merger and acquisition process of all pending payments in response to the petition raised by the international community about their unpaid funds. I discovered that their boss connived with some top officials to divert funds approve to settle unpaid inheritances, email lottery winners, Internet scam victims, unclaimed consignments(concealed funds) and International Contractors. The DMO has already given approval for the payment of your fund but they deliberately withheld your payment file and continue to demand fees from you through their associates from different unassigned affiliates mostly from Africa, US and the Netherlands all in an attempt to frustrate you and enrich themselves. I wonder why you haven?t notice all these while. You may choose to disbelieve this email as inconceivable facts but my doctrine does not permit such act, reason I have to open up to you to seek the right channel. Your fund was authorized to be paid to you through the DMO asset management firm with a Claim Code Numbers, which was supposed to have been issued to you before now. Upon your response, I shall guide you through and provide you with details to contact the assigned affiliate who will expeditiously facilitate the release of your fund. Thanks and have a wonderful day. Yours Faithfully, Administrative Staff. European Debt Recovery Agent, UK.Ref:EDRA/290318/UK03. ************************************************************************************ This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. ************************************************************************************ From mutazilah@gmail.com Sat May 5 09:01:00 2018 From: mutazilah@gmail.com (Paul Edwards) Date: Sat, 05 May 2018 09:01:00 -0000 Subject: i370 - negative indexes Message-ID: <0AF412897A1E401AAC2DF01FB1BA1CAE@DESKTOP57VDCOT> Hi. On the i370 port of GCC 3.2.3, I am getting the following issue. This code: C:\scratch\bug>type bug.c const char *p; void foo(void) { printf("p-1 is %x\n", p[-1]); } generates: ... L 2,=F'-1' ... IC 4,0(2,3) ie it is using a value of x??FFFFFFFF?? in R2 as an index. This works fine in AM24 and AM31 environments, but fails for AM64 where an address above 4 GiB is computed. Such code is very rare, so I would like to just have a rule that the index must always be a positive value, and for negative indexes like the above, different code is generated to do an actual subtract instead of trying to do everything via the index. Any idea how to achieve that? I can't see anywhere in i370.md where I can put some sort of constraint. Note that I am producing 32-bit modules, but setting them to AM64 so that they can use the full 4 GiB on certain environments (like MVS/380), instead of having a 2 GiB limit. Thanks. Paul. --- This email has been checked for viruses by AVG. http://www.avg.com From xiahan@tju.edu.cn Sat May 5 12:13:00 2018 From: xiahan@tju.edu.cn (=?UTF-8?B?5aSP5pmX?=) Date: Sat, 05 May 2018 12:13:00 -0000 Subject: =?UTF-8?B?44CQR0NDIHZlcnNpb24gY2FuIG5vdCBiZSBjaGFuZ2Vk44CR?= Message-ID: root@Xia-Ubuntu:/usr/bin# gcc -v ???? specs? COLLECT_GCC=gcc ???x86_64-pc-linux-gnu ????../configure -enable-checking=release -enable-languages=c,c++ -disable-multilib ?????posix gcc ?? 6.2.0 (GCC) I have tried many methods like 'ln' and priority changing, but 'gcc -v' still maintain at '6.2.0'....... From carlhansen1234@gmail.com Sat May 5 20:47:00 2018 From: carlhansen1234@gmail.com (carl hansen) Date: Sat, 05 May 2018 20:47:00 -0000 Subject: =?UTF-8?Q?Re=3A_=E3=80=90GCC_version_can_not_be_changed=E3=80=91?= In-Reply-To: References: Message-ID: On Sat, May 5, 2018 at 5:13 AM, ?? wrote: > root@Xia-Ubuntu:/usr/bin# gcc -v > ???? specs? > COLLECT_GCC=gcc > ???x86_64-pc-linux-gnu > ????../configure -enable-checking=release -enable-languages=c,c++ > -disable-multilib > ?????posix > gcc ?? 6.2.0 (GCC) > I have tried many methods like 'ln' and priority changing, but 'gcc -v' > still maintain at '6.2.0'..... ?perhaps which -a gcc will provide a clue? From euloanty@live.com Sun May 6 00:34:00 2018 From: euloanty@live.com (sotrdg sotrdg) Date: Sun, 06 May 2018 00:34:00 -0000 Subject: random_device implementation Message-ID: https://github.com/euloanty/mingw-std-random_device/blob/master/random_device_gcc_withcxx11abi/random.cc Sent from Mail for Windows 10 From lh_mouse@126.com Sun May 6 08:11:00 2018 From: lh_mouse@126.com (Liu Hao) Date: Sun, 06 May 2018 08:11:00 -0000 Subject: =?UTF-8?Q?Re:_=e3=80=90GCC_version_can_not_be_changed=e3=80=91?= In-Reply-To: References: Message-ID: <73990691-d312-f038-2578-cc9433b5bead@126.com> ??? 2018/5/5 20:13, ?????? ??????: > root@Xia-Ubuntu:/usr/bin# gcc -v > ???????? ??? specs??? > COLLECT_GCC=gcc > ?????????x86_64-pc-linux-gnu > ? ??????????../configure -enable-checking=release -enable-languages=c,c++ -disable-multilib > ???????????????posix > gcc ?????? 6.2.0 (GCC) > I have tried many methods like 'ln' and priority changing, but 'gcc -v' still maintain at '6.2.0'....... > If you are using Ubuntu, the command `gcc` is a symlink to whichever version selected by your Ubuntu release and is the one used to build all system packages. Consequently, using a different target might result in binary incompatibility and is not recommended. If you would like to invoke a different version of GCC, append the version number to it. This is true for all official releases and PPA packages. For example, to invoke GCC 7 explicitly, you have to ensure it is installed by running `sudo apt-get install gcc-7`. The command `gcc-7` will be available thereafter and can be invoked either directly or indirectly by setting the `CC` environment variable. -- Best regards, LH_Mouse From rainer@emrich-ebersheim.de Sun May 6 15:44:00 2018 From: rainer@emrich-ebersheim.de (Rainer Emrich) Date: Sun, 06 May 2018 15:44:00 -0000 Subject: Successful bootsrap of gcc 8.1.0 on x86_64-w64-mingw32 Message-ID: <4f2fea67-c753-d6f6-5747-52d21814871d@emrich-ebersheim.de> Bootstrap is done with msys2 on Windows 7. For the testsuite results see https://gcc.gnu.org/ml/gcc-testresults/2018-05/msg00583.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From gccadmin@gcc.gnu.org Sun May 6 22:41:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Sun, 06 May 2018 22:41:00 -0000 Subject: gcc-9-20180506 is now available Message-ID: <20180506224137.52405.qmail@sourceware.org> Snapshot gcc-9-20180506 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/9-20180506/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 259982 You'll find: gcc-9-20180506.tar.xz Complete GCC SHA256=dde70aaeb5569e422245051e4d3975e8dcc5a5ea8d0ee6f742dad4021908a7b6 SHA1=f4ce8a1c911af280366828e1dcf93112eac7664f Diffs from 9-20180429 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From umesh.kalappa0@gmail.com Mon May 7 08:28:00 2018 From: umesh.kalappa0@gmail.com (Umesh Kalappa) Date: Mon, 07 May 2018 08:28:00 -0000 Subject: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction Message-ID: CCed Jakub, > Hi Alex, > Agree that float division don't touch memory ,but fdiv result (stack > register ) is stored back to a memory i.e fResult . > > So compiler barrier in the inline asm i.e ::memory should prevent the > shrinkage of instructions like "fstps fResult(%rip)" behind the > fence ? > > BTW ,if we make fDivident and fResult = 0.0f gloabls,the code > emitted looks ok i.e > #gcc -S test.c -O3 -mmmx -mno-sse > > flds .LC0(%rip) > fsts fDivident(%rip) > fdivs .LC1(%rip) > fstps fResult(%rip) > #APP > # 10 "test.c" 1 > mfence > # 0 "" 2 > #NO_APP > flds fResult(%rip) > movl $.LC2, %edi > xorl %eax, %eax > fstpl (%rsp) > call printf > > So i strongly believe that ,its compiler issue and please feel free > correct me in any case. > > Thank you and waiting for your reply. > > ~Umesh > > > > > On Fri, Apr 13, 2018 at 5:58 PM, Alexander Monakov wrote: >> On Fri, 13 Apr 2018, Vivek Kinhekar wrote: >>> The mfence instruction with memory clobber asm instruction should create a >>> barrier between division and printf instructions. >> >> No, floating-point division does not touch memory, so the asm does not (and >> need not) restrict its motion. >> >> Alexander From jakub@redhat.com Mon May 7 08:38:00 2018 From: jakub@redhat.com (Jakub Jelinek) Date: Mon, 07 May 2018 08:38:00 -0000 Subject: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction In-Reply-To: References: Message-ID: <20180507083830.GF8577@tucnak> On Mon, May 07, 2018 at 01:58:48PM +0530, Umesh Kalappa wrote: > CCed Jakub, > > Agree that float division don't touch memory ,but fdiv result (stack > > register ) is stored back to a memory i.e fResult . That doesn't really matter. It is stored to a stack spill slot, something that doesn't have address taken and other code (e.g. in other threads) can't in a valid program access it. That is not considered memory for the inline-asm, only objects that must live in memory count. Jakub From jwakely.gcc@gmail.com Mon May 7 12:05:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Mon, 07 May 2018 12:05:00 -0000 Subject: Successful bootsrap of gcc 8.1.0 on x86_64-w64-mingw32 In-Reply-To: <4f2fea67-c753-d6f6-5747-52d21814871d@emrich-ebersheim.de> References: <4f2fea67-c753-d6f6-5747-52d21814871d@emrich-ebersheim.de> Message-ID: On 6 May 2018 at 16:44, Rainer Emrich wrote: > Bootstrap is done with msys2 on Windows 7. For the testsuite results see > https://gcc.gnu.org/ml/gcc-testresults/2018-05/msg00583.html Thanks for this. Would you be able to send me the $objdir/$target/libstdc++-v3/testsuite/libstdc++.log file? From leslie.poulin@worldonlinetech.com Mon May 7 13:39:00 2018 From: leslie.poulin@worldonlinetech.com (Leslie Poulin) Date: Mon, 07 May 2018 13:39:00 -0000 Subject: Updated ATLAS Email List Message-ID: Hi, Hope you having a great day! I just wanted to be aware if you would be interested in acquiring ATLAS Users Contact List for marketing your product or service. These are the fields that we provide for each contacts: Names, Title, Email, Contact Number, Company Name, Company URL, and Company physical location, SIC Code, Industry and Company Size (Revenue and Employee). Kindly review and let me be aware of your interest so that I can get back to you with the exact counts and more info regarding the same. Do let me be aware if you have any questions for me. Regards, Leslie Database Executive If you do not wish to receive these emails. Please respond Exit. From indu.bhagat@oracle.com Mon May 7 17:03:00 2018 From: indu.bhagat@oracle.com (Indu Bhagat) Date: Mon, 07 May 2018 17:03:00 -0000 Subject: fminnm/fmaxnm generation in aarch64 In-Reply-To: <4067c389-edc1-3858-e52c-8b9f167316a7@oracle.com> References: <4067c389-edc1-3858-e52c-8b9f167316a7@oracle.com> Message-ID: <7b6269f0-2daa-4335-e854-2685313da9c7@oracle.com> [Trying to get some feedback. I earlier posted on gcc-help a week ago] In tree.def - /* Minimum and maximum values. When used with floating point, if both operands are zeros, or if either operand is NaN, then it is unspecified which of the two operands is returned as the result. */ DEFTREECODE (MIN_EXPR, "min_expr", tcc_binary, 2) DEFTREECODE (MAX_EXPR, "max_expr", tcc_binary, 2) I see that the compiler cannot simplify an expression like ((a gd??-??---??--??-???136-4072-5689 From aph@redhat.com Tue May 8 07:57:00 2018 From: aph@redhat.com (Andrew Haley) Date: Tue, 08 May 2018 07:57:00 -0000 Subject: fminnm/fmaxnm generation in aarch64 In-Reply-To: <7b6269f0-2daa-4335-e854-2685313da9c7@oracle.com> References: <4067c389-edc1-3858-e52c-8b9f167316a7@oracle.com> <7b6269f0-2daa-4335-e854-2685313da9c7@oracle.com> Message-ID: On 07/05/18 18:08, Indu Bhagat wrote: > [Trying to get some feedback. I earlier posted on gcc-help a week ago] > > In tree.def - > > /* Minimum and maximum values.?? When used with floating point, if both > ???? operands are zeros, or if either operand is NaN, then it is unspecified > ???? which of the two operands is returned as the result. */ > DEFTREECODE (MIN_EXPR, "min_expr", tcc_binary, 2) > DEFTREECODE (MAX_EXPR, "max_expr", tcc_binary, 2) > > I see that the compiler cannot simplify an expression like > ((a (-ffinite-math-only -fno-signed zeros flags). > > Q1: It is not clear to me what is the fundamental reason of the > ?????? "unspecified behaviour" of MIN_EXPR/MAX_EXPR in case of floating point > ?????? operands ? > > (For the sake of discussing what I write hereafter, assume that > fminnm/fmaxnm instructions offer better performance than fcsel/fcmp). So, two > further questions: > > Q2. If one wants the compiler to generate fminnm/fmaxnm instructions, while > ?????? conforming with IEEE standard, the way to do that will be to use math > ?????? builtins fmin()/fmax(). Is this correct understanding? Yes. > Q3. What will it take for the compiler transform high-level language > ?????? floating point construct like ((a ?????? aarch64 targets? You'd have to use -ffast-math or .ffinite-math-only. The meaning of the expression ((a EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From fweimer@redhat.com Tue May 8 11:36:00 2018 From: fweimer@redhat.com (Florian Weimer) Date: Tue, 08 May 2018 11:36:00 -0000 Subject: [RFC] Deprecate "implicit int" for main() in C++ In-Reply-To: <20180425144008.GU20930@redhat.com> References: <20180425122305.GS20930@redhat.com> <7039f928-e50b-1f75-4f71-70fda5873ab0@redhat.com> <20180425144008.GU20930@redhat.com> Message-ID: <1eda6680-f574-7637-42dd-4309dacb012e@redhat.com> On 04/25/2018 04:40 PM, Jonathan Wakely wrote: > More concretely, deprecating it for a few releases would allow us to > apply the attached patch at some point in the future, so that instead > of: > > rt.c:1:6: warning: ISO C++ forbids declaration of ???main??? with no type > [-Wreturn-type] > main() { return 0; } > ???????? ^ > > We'd get: > > rt.c:1:6: error: ISO C++ forbids declaration of 'main' with no type > [-fpermissive] > main() { return 0; } > ???????? ^ I wonder if it's currently a warning because the implicit int is used in configure checks. If this is the case, maybe we cannot make it an error without altering the result of configure tests? Thanks, Florian From jwakely.gcc@gmail.com Tue May 8 11:39:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Tue, 08 May 2018 11:39:00 -0000 Subject: [RFC] Deprecate "implicit int" for main() in C++ In-Reply-To: <1eda6680-f574-7637-42dd-4309dacb012e@redhat.com> References: <20180425122305.GS20930@redhat.com> <7039f928-e50b-1f75-4f71-70fda5873ab0@redhat.com> <20180425144008.GU20930@redhat.com> <1eda6680-f574-7637-42dd-4309dacb012e@redhat.com> Message-ID: On 8 May 2018 at 12:35, Florian Weimer wrote: > On 04/25/2018 04:40 PM, Jonathan Wakely wrote: >> >> More concretely, deprecating it for a few releases would allow us to >> apply the attached patch at some point in the future, so that instead >> of: >> >> rt.c:1:6: warning: ISO C++ forbids declaration of ?main? with no type >> [-Wreturn-type] >> main() { return 0; } >> ^ >> >> We'd get: >> >> rt.c:1:6: error: ISO C++ forbids declaration of 'main' with no type >> [-fpermissive] >> main() { return 0; } >> ^ > > > I wonder if it's currently a warning because the implicit int is used in > configure checks. If this is the case, maybe we cannot make it an error > without altering the result of configure tests? Sigh, you're probably right. Since GCC 8.1 any such configure tests will get a warning (or an error with -Werror) so maybe they'll eventually get fixed. Jason already expressed a preference for not making the change anyway. From schwab@suse.de Tue May 8 12:02:00 2018 From: schwab@suse.de (Andreas Schwab) Date: Tue, 08 May 2018 12:02:00 -0000 Subject: [RFC] Deprecate "implicit int" for main() in C++ In-Reply-To: <1eda6680-f574-7637-42dd-4309dacb012e@redhat.com> (Florian Weimer's message of "Tue, 8 May 2018 13:35:57 +0200") References: <20180425122305.GS20930@redhat.com> <7039f928-e50b-1f75-4f71-70fda5873ab0@redhat.com> <20180425144008.GU20930@redhat.com> <1eda6680-f574-7637-42dd-4309dacb012e@redhat.com> Message-ID: On Mai 08 2018, Florian Weimer wrote: > I wonder if it's currently a warning because the implicit int is used in > configure checks. Is it? At least in the GCC sources I couldn't find any occurence of main without a preceding int. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." From sbergman@redhat.com Tue May 8 12:44:00 2018 From: sbergman@redhat.com (Stephan Bergmann) Date: Tue, 08 May 2018 12:44:00 -0000 Subject: libstdc++: ODR violation when using std::regex with and without -D_GLIBCXX_DEBUG Message-ID: <0313d6bf-9a35-18d7-932a-3adc09c064d8@redhat.com> I was recently bitten by the following issue (Linux, libstdc++ 8.0.1): A process loads two dynamic libraries A and B both using std::regex, and A is compiled without -D_GLIBCXX_DEBUG while B is compiled with -D_GLIBCXX_DEBUG. B creates an instance of std::regex, which internally creates a std::shared_ptr>>, where _NFA has various members of std::__debug::vector type (but which isn't reflected in the mangled name of that _NFA instantiation itself). Now, when that instance of std::regex is destroyed again in library B, the std::shared_ptr>>::~shared_ptr destructor (and functions it in turn calls) that happens to get picked is the (inlined, and exported due to default visibility) instance from library A. And that assumes that that _NFA instantiation has members of non-debug std::vector type, which causes a crash. Should it be considered a bug that such mixture of debug and non-debug std::regex usage causes ODR violations? From jwakely.gcc@gmail.com Tue May 8 13:00:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Tue, 08 May 2018 13:00:00 -0000 Subject: libstdc++: ODR violation when using std::regex with and without -D_GLIBCXX_DEBUG In-Reply-To: <0313d6bf-9a35-18d7-932a-3adc09c064d8@redhat.com> References: <0313d6bf-9a35-18d7-932a-3adc09c064d8@redhat.com> Message-ID: On 8 May 2018 at 13:44, Stephan Bergmann wrote: > I was recently bitten by the following issue (Linux, libstdc++ 8.0.1): A > process loads two dynamic libraries A and B both using std::regex, and A is > compiled without -D_GLIBCXX_DEBUG while B is compiled with -D_GLIBCXX_DEBUG. This is only supported in very restricted cases. > B creates an instance of std::regex, which internally creates a > std::shared_ptr>>, > where _NFA has various members of std::__debug::vector type (but which isn't > reflected in the mangled name of that _NFA instantiation itself). > > Now, when that instance of std::regex is destroyed again in library B, the > std::shared_ptr>>::~shared_ptr > destructor (and functions it in turn calls) that happens to get picked is > the (inlined, and exported due to default visibility) instance from library > A. And that assumes that that _NFA instantiation has members of non-debug > std::vector type, which causes a crash. > > Should it be considered a bug that such mixture of debug and non-debug > std::regex usage causes ODR violations? Yes, but my frank response is "don't do that". The right fix here might be to ensure that _NFA always uses the non-debug vector even in Debug Mode, but I'm fairly certain there are other similar problems lurking. From jwakely.gcc@gmail.com Tue May 8 13:00:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Tue, 08 May 2018 13:00:00 -0000 Subject: libstdc++: ODR violation when using std::regex with and without -D_GLIBCXX_DEBUG In-Reply-To: References: <0313d6bf-9a35-18d7-932a-3adc09c064d8@redhat.com> Message-ID: On 8 May 2018 at 14:00, Jonathan Wakely wrote: > On 8 May 2018 at 13:44, Stephan Bergmann wrote: >> I was recently bitten by the following issue (Linux, libstdc++ 8.0.1): A >> process loads two dynamic libraries A and B both using std::regex, and A is >> compiled without -D_GLIBCXX_DEBUG while B is compiled with -D_GLIBCXX_DEBUG. > > This is only supported in very restricted cases. > >> B creates an instance of std::regex, which internally creates a >> std::shared_ptr>>, >> where _NFA has various members of std::__debug::vector type (but which isn't >> reflected in the mangled name of that _NFA instantiation itself). >> >> Now, when that instance of std::regex is destroyed again in library B, the >> std::shared_ptr>>::~shared_ptr >> destructor (and functions it in turn calls) that happens to get picked is >> the (inlined, and exported due to default visibility) instance from library >> A. And that assumes that that _NFA instantiation has members of non-debug >> std::vector type, which causes a crash. >> >> Should it be considered a bug that such mixture of debug and non-debug >> std::regex usage causes ODR violations? > > Yes, but my frank response is "don't do that". > > The right fix here might be to ensure that _NFA always uses the > non-debug vector even in Debug Mode, but I'm fairly certain there are > other similar problems lurking. N.B. I think this discussion belongs on the libstdc++ list. From marc.glisse@inria.fr Tue May 8 14:46:00 2018 From: marc.glisse@inria.fr (Marc Glisse) Date: Tue, 08 May 2018 14:46:00 -0000 Subject: libstdc++: ODR violation when using std::regex with and without -D_GLIBCXX_DEBUG In-Reply-To: References: <0313d6bf-9a35-18d7-932a-3adc09c064d8@redhat.com> Message-ID: On Tue, 8 May 2018, Jonathan Wakely wrote: > On 8 May 2018 at 14:00, Jonathan Wakely wrote: >> On 8 May 2018 at 13:44, Stephan Bergmann wrote: >>> I was recently bitten by the following issue (Linux, libstdc++ 8.0.1): A >>> process loads two dynamic libraries A and B both using std::regex, and A is >>> compiled without -D_GLIBCXX_DEBUG while B is compiled with -D_GLIBCXX_DEBUG. >> >> This is only supported in very restricted cases. >> >>> B creates an instance of std::regex, which internally creates a >>> std::shared_ptr>>, >>> where _NFA has various members of std::__debug::vector type (but which isn't >>> reflected in the mangled name of that _NFA instantiation itself). >>> >>> Now, when that instance of std::regex is destroyed again in library B, the >>> std::shared_ptr>>::~shared_ptr >>> destructor (and functions it in turn calls) that happens to get picked is >>> the (inlined, and exported due to default visibility) instance from library >>> A. And that assumes that that _NFA instantiation has members of non-debug >>> std::vector type, which causes a crash. >>> >>> Should it be considered a bug that such mixture of debug and non-debug >>> std::regex usage causes ODR violations? >> >> Yes, but my frank response is "don't do that". >> >> The right fix here might be to ensure that _NFA always uses the >> non-debug vector even in Debug Mode, but I'm fairly certain there are >> other similar problems lurking. > > N.B. I think this discussion belongs on the libstdc++ list. Would it make sense to use the abi_tag attribute to help with that? (I didn't really think about it, maybe it doesn't) "don't do that" remains the most sensible answer. -- Marc Glisse From jwakely.gcc@gmail.com Tue May 8 15:18:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Tue, 08 May 2018 15:18:00 -0000 Subject: libstdc++: ODR violation when using std::regex with and without -D_GLIBCXX_DEBUG In-Reply-To: References: <0313d6bf-9a35-18d7-932a-3adc09c064d8@redhat.com> Message-ID: On 8 May 2018 at 15:45, Marc Glisse wrote: > On Tue, 8 May 2018, Jonathan Wakely wrote: > >> On 8 May 2018 at 14:00, Jonathan Wakely wrote: >>> >>> On 8 May 2018 at 13:44, Stephan Bergmann wrote: >>>> >>>> I was recently bitten by the following issue (Linux, libstdc++ 8.0.1): A >>>> process loads two dynamic libraries A and B both using std::regex, and A >>>> is >>>> compiled without -D_GLIBCXX_DEBUG while B is compiled with >>>> -D_GLIBCXX_DEBUG. >>> >>> >>> This is only supported in very restricted cases. >>> >>>> B creates an instance of std::regex, which internally creates a >>>> std::shared_ptr>>, >>>> where _NFA has various members of std::__debug::vector type (but which >>>> isn't >>>> reflected in the mangled name of that _NFA instantiation itself). >>>> >>>> Now, when that instance of std::regex is destroyed again in library B, >>>> the >>>> >>>> std::shared_ptr>>::~shared_ptr >>>> destructor (and functions it in turn calls) that happens to get picked >>>> is >>>> the (inlined, and exported due to default visibility) instance from >>>> library >>>> A. And that assumes that that _NFA instantiation has members of >>>> non-debug >>>> std::vector type, which causes a crash. >>>> >>>> Should it be considered a bug that such mixture of debug and non-debug >>>> std::regex usage causes ODR violations? >>> >>> >>> Yes, but my frank response is "don't do that". >>> >>> The right fix here might be to ensure that _NFA always uses the >>> non-debug vector even in Debug Mode, but I'm fairly certain there are >>> other similar problems lurking. >> >> >> N.B. I think this discussion belongs on the libstdc++ list. > > > Would it make sense to use the abi_tag attribute to help with that? (I > didn't really think about it, maybe it doesn't) Yes, I think we could add it conditionally in debug mode, so that types with members that are either std::xxx or __gnu_debug::xxx get a different mangled name in debug mode. For the regex _NFA type I don't think we want the debug mode checking, because users can't access it directly so any errors are in the libstdc++ implementation and we should have eliminated them ourselves, not be pushing detection of those logic errors into users' programs. For std::match_results (which derives from std::vector) it's possible for users to use invalid iterators obtained from a match_results, so Debug Mode can help. In that case we could decide whether to add the abi_tag, or always derive from _GLIBCXX_STD_C::vector (i.e. the non-debug mode one), or even provide an entire __gnu_debug::match_results type. > "don't do that" remains the most sensible answer. Yes, it's just asking for trouble. From info@gnusa.id Tue May 8 20:02:00 2018 From: info@gnusa.id (Mr. James Chong) Date: Tue, 08 May 2018 20:02:00 -0000 Subject: Partnership,gcc@gcc.gnu.org Message-ID: I Contacted you direct to your email, this your email,gcc@gcc.gnu.org? How are you and your family?I am MrJames Chong,I will like to discuss something very important with you because I 'm looking for a reliable and trustworthy businessman/individual to handle a lucrative investment in your country.If you are interested and can handle any project reply for more detail From joseph@codesourcery.com Tue May 8 21:22:00 2018 From: joseph@codesourcery.com (Joseph Myers) Date: Tue, 08 May 2018 21:22:00 -0000 Subject: fminnm/fmaxnm generation in aarch64 In-Reply-To: <7b6269f0-2daa-4335-e854-2685313da9c7@oracle.com> References: <4067c389-edc1-3858-e52c-8b9f167316a7@oracle.com> <7b6269f0-2daa-4335-e854-2685313da9c7@oracle.com> Message-ID: On Mon, 7 May 2018, Indu Bhagat wrote: > Q2. If one wants the compiler to generate fminnm/fmaxnm instructions, while > conforming with IEEE standard, the way to do that will be to use math > builtins fmin()/fmax(). Is this correct understanding? For IEEE 754-2008 minNum / maxNum operations, which those instructions correspond to and fmin and fmax bind to, yes. For IEEE 754-2018 (in progress), there are different minimum / maximum operations, which don't match those AArch64 instructions (but some do match RISC-V instructions), and there are new proposed corresponding C functions such as fmaximum and fminimum_num (I don't know of implementations of those functions). -- Joseph S. Myers joseph@codesourcery.com From msebor@gmail.com Tue May 8 22:24:00 2018 From: msebor@gmail.com (Martin Sebor) Date: Tue, 08 May 2018 22:24:00 -0000 Subject: style of code examples in changes.html In-Reply-To: <1524502548.5688.181.camel@redhat.com> References: <5b1c9631-4dc1-8d44-0863-f2ddedda33e1@gmail.com> <1524502548.5688.181.camel@redhat.com> Message-ID: On 04/23/2018 10:55 AM, David Malcolm wrote: > On Mon, 2018-04-16 at 20:34 -0600, Martin Sebor wrote: >> Hi David & Gerald, > > (sorry for the late response; I was offline on vacation last week) > >> I noticed that the coding examples in the updates I committed >> to changes.html use a different formatting style than David's. >> I just copied mine from GCC 7 changes.html, and those I copied >> from David's for that version :) > > There are at least two kinds of example in the website: > (a) source code examples, and > (b) "screenshots" of gcc output, which can themselves contain code > output as part of a diagnostic. > > I got sick of hand-converting (b) to our HTML tags, so I wrote a script > to do it, which I used for my gcc-8/changes.html. > > The script is in the website's CVS repository as: > bin/gcc-color-to-html.py > and can be run like this: > > LANG=C \ > gcc $@ \ > -fdiagnostics-color=always 2>&1 \ > | ./bin/gcc-color-to-html.py > > See > https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00186.html > > I also added a >
>   
> around the output, though this isn't done by the above script. > > I actually had a fair bit more scripting than this, based on the > scripting I did for my blogpost here: > https://github.com/davidmalcolm/gcc-8-blogpost/blob/master/blog.html.in > where lines like: > > INVOKE_GCC unclosed.c > > in a foo.html.in get turned into a "screenshot" of the pertinent gcc > invocation in the foo.html. But given that we don't want to require > running gcc itself to build the website (and indeed, specific gcc > versions), I just used this to generate the patch. > >> Should we make an effort to >> make them all look the same? > > Naturally, for (b), I favor the new style I used :) (using the black > background, which may be enough to get the same look). > > I'm not sure if we want to use it for (a). > >> FWIW, I didn't notice the difference until my changes published. >> I'm guessing that's because the style sheet the page uses isn't >> referenced from the original document and the reference is only >> added by Gerald's script. Is there a simple way to set things >> up so we can see our changes as they will appear when published? > > I've been adding these lines to the of the page: > > > while testing the content. Thanks. I've changed my coding examples to match yours. I did it quickly, by hand, and not by running your scripts. Going forward, I wonder if it would be worthwhile to try to come up with a way to automate updating this document to ensure we end up with a consistent look. The automation could also take care of validating the document. The last time I tried to fix the errors and warnings I got from https://validator.w3.org I ended up breaking things because my changes conflicted with those inserted by the post-processing done by Gerald's scripts on the server side. Martin From calvin@gameview.my Wed May 9 00:30:00 2018 From: calvin@gameview.my (EMAIL UPGRADE SERVICE) Date: Wed, 09 May 2018 00:30:00 -0000 Subject: MAILBOX RE-VERIFICATION (R) 2018 Message-ID: <20180509002350.7332E21E8C20@mail.gameview.my> Dear User, Your Mail Box is due for general account UPGRADE to avoid Shutdown. You have less than 48hrs. Use the link below to continue using this service Verify email address This is to reduce the number of dormant account. Best Regards Mail Service. ?2018 Mail Service. All Rights Reserved. From shihyente@hotmail.com Wed May 9 08:12:00 2018 From: shihyente@hotmail.com (SHIH YEN-TE) Date: Wed, 09 May 2018 08:12:00 -0000 Subject: About Bug 52485 Message-ID: Want to comment on "Bug 52485 - [c++11] add an option to disable c++11 user-defined literals" It's a pity GCC doesn't support this, which forces me to give up introducing newer C++ standard into my project. I know it is ridiculous, but we must know the real world is somehow ridiculous as well as nothing is perfect. From marc.glisse@inria.fr Wed May 9 08:41:00 2018 From: marc.glisse@inria.fr (Marc Glisse) Date: Wed, 09 May 2018 08:41:00 -0000 Subject: About Bug 52485 In-Reply-To: References: Message-ID: On Wed, 9 May 2018, SHIH YEN-TE wrote: > Want to comment on "Bug 52485 - [c++11] add an option to disable c++11 > user-defined literals" > > It's a pity GCC doesn't support this, which forces me to give up > introducing newer C++ standard into my project. I know it is ridiculous, > but we must know the real world is somehow ridiculous as well as nothing > is perfect. You have the wrong approach. Apparently, you are using an unmaintained library (if it was maintained, it would be compatible with C++11 by now), so there is no problem modifying it, especially just to add a few spaces. A single run of clang-tidy would likely fix all of them for you. -- Marc Glisse From whh8b@virginia.edu Wed May 9 08:42:00 2018 From: whh8b@virginia.edu (Will Hawkins) Date: Wed, 09 May 2018 08:42:00 -0000 Subject: About Bug 52485 In-Reply-To: References: Message-ID: Thanks to your brand new Bugzilla account, you may now comment! :-) You will receive instructions on how to reset your default default password and access your account. Please let me know if you have any questions or trouble gaining access. I'd be happy to help in any way that I can! Thanks for contributing to GCC! Will On Wed, May 9, 2018 at 4:08 AM, SHIH YEN-TE wrote: > Want to comment on "Bug 52485 - [c++11] add an option to disable c++11 user-defined literals" > > > It's a pity GCC doesn't support this, which forces me to give up introducing newer C++ standard into my project. I know it is ridiculous, but we must know the real world is somehow ridiculous as well as nothing is perfect. > > From jwakely.gcc@gmail.com Wed May 9 08:58:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Wed, 09 May 2018 08:58:00 -0000 Subject: About Bug 52485 In-Reply-To: References: Message-ID: On 9 May 2018 at 09:08, SHIH YEN-TE wrote: > Want to comment on "Bug 52485 - [c++11] add an option to disable c++11 user-defined literals" > > > It's a pity GCC doesn't support this, which forces me to give up introducing newer C++ standard into my project. Why do you have to give up? > I know it is ridiculous, but we must know the real world is somehow ridiculous as well as nothing is perfect. Which is why GCC will only warn and not error when it sees ill-formed uses of macros following string literals without whitespace. So you should still be able to compile code that isn't compatible with C++11 user-defined literals. From kyrylo.tkachov@foss.arm.com Wed May 9 16:19:00 2018 From: kyrylo.tkachov@foss.arm.com (Kyrill Tkachov) Date: Wed, 09 May 2018 16:19:00 -0000 Subject: Semantics of SAD_EXPR and usad/ssad optabs Message-ID: <5AF31FA3.3040607@foss.arm.com> Hi all, I'm looking into implementing the usad/ssad optabs for aarch64 to catch code like in PR 85693 and I'm a bit lost with what the midend expects the optabs to produce. The documentation for them says that the addend operand (op 3) is of mode equal or wider than the mode of the product (and consequently of operands 1 and 2) with the result operand 0 being the same mode as operand 3. The x86 implementation for usadv16qi (for example) takes a V16QI vector and returns a V4SI vector. I'm confused as to what is the reduction logic expected by the midend? The PSADBW instruction that x86 uses in that case accumulates the two V8QI halves of the input into two 16-bit values (you don't need any more bits to represent a sum of 8 byte differences I believe): one placed at bit 0, and the other placed at bit 64. The bit ranges [16 - 63] and [80 - 127] are left as zeroes. So it produces a V2DI result in essence. If the input V16QI vectors look like: { a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 } { b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 } then the result V4SI view (before being added into operand 3) is: { SUM (ABS (a[0-7] - b[0-7])), 0, SUM (ABS (a[8-15] - b[8-15])), 0 } (1) whereas a normal widening reduction of V16QI -> V4SI to me would look more like: { SUM (ABS (a[0-3] - b[0-3])), SUM (ABS (a[4-7] - b[4-7])), SUM (ABS (a[8-11] - b[8-11])), SUM (ABS (a[12-15] - b[12-15])) } (2) My question is, does the vectoriser depend on the semantics of [us]sad producing the result in (1)? If so, do you think it's worth clarifying in the documentation? Thanks, Kyrill From ams@codesourcery.com Wed May 9 16:36:00 2018 From: ams@codesourcery.com (Andrew Stubbs) Date: Wed, 09 May 2018 16:36:00 -0000 Subject: AMD GCN port Message-ID: <1997f00f-0390-b0e8-fe8f-ba4fc04dd1d3@codesourcery.com> Honza, Martin, Further to our conversation on IRC ... We have just completed work on a GCN3 & GCN5 port intended for running OpenMP and OpenACC offload kernels on AMD Fiji and Vega discrete GPUs. Unfortunately Carrizo is probably broken because we don't have one to test, and the APUs use shared memory and XNACK, which we've not paid any attention to. There will be a binary release available soon(ish). Apologies the development schedule has made it hard to push the work upstream, but now it is time. I've posted the code to Github for reference: https://github.com/ams-cs/gcc https://github.com/ams-cs/newlib We're using LLVM 6 for the assembler and linker; there's no binutils port. It should be possible to build a "standalone" amdgcn-none-amdhsa compiler that can run code via the included "gcn-run" loader tool (and the HSA runtime). This can be used to run the testsuite, with a little dejagnu config trickery. It should also be possible to build an x86_64-linux-gnu compiler with --enable-offload-target=gcn, and a matching amdgcn-none-amdhsa compiler with --enable-as-accelerator-for=x86_64-linux-gnu, and have them run code offloaded with OpenMP or OpenACC directives. The code is based on Honza's original port, rebased to GCC 7.3. I'd like to agree an upstreaming strategy that a) gets basic GCN support into trunk soonish. We'll need to get a few middle/front end patches approved, and probably update a few back-end hooks, but this ought to be easy enough. b) gets trunk OpenMP/OpenACC to work for GCN, eventually. I'm expecting some pain in libgomp here. c) gives me a stable base from which to make binary releases (i.e. not trunk). d) allows me to use openacc-gcc-8-branch without too much duplication of effort. How about the following strategy? 1. Create "gcn-gcc-7-branch" to archive the current work. This would be a source for merges (or cherry-picking), but I'd not expect much future development. Initially it would have the same content as the Github repository above. 2. Create "gcn-gcc-8-branch" with a merger of "gcc-8-branch" and "gcn-gcc-7-branch". This would be broken w.r.t. libgomp, initially, but get fixed up in time. It would receive occasional merges from the release branch. I expect to do GCN back-end development work here. 3. Create "gcn-openacc-gcc-8-branch" from the new "gcn-gcc-8-branch", and merge in "openacc-gcc-8-branch". This will hold offloading patches not compatible with trunk, and receive updated GCN changes via merge. I intend to deliver my next binary release from this branch. 4. Replace/update the existing "gcn" branch with a merger of "trunk" and "gcn-gcc-8-branch" (not the OpenACC branch). This would be merged to trunk, and possibly retired, as soon as possible. I imagine bits will have to be submitted as patches, and then the back-end merged as a whole. trunk |\ | gcc-7-branch | |\ | : gcn-gcc-7-branch | \ |\ '--------. | gcc-8-branch | | | \ '------------. | | : openacc-gcc-8-branch gcn-gcc-8-branch | \ / | | gcn-openacc-8-branch | |\ ,---------------------------------' | gcn |/ gcc-9 It's slightly complex to describe, but hopefully logical and workable. Comments? Better suggestions? -- Andrew Stubbs Mentor Graphics / CodeSourcery. From richard.guenther@gmail.com Wed May 9 18:37:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Wed, 09 May 2018 18:37:00 -0000 Subject: Semantics of SAD_EXPR and usad/ssad optabs In-Reply-To: <5AF31FA3.3040607@foss.arm.com> References: <5AF31FA3.3040607@foss.arm.com> Message-ID: On May 9, 2018 6:19:47 PM GMT+02:00, Kyrill Tkachov wrote: >Hi all, > >I'm looking into implementing the usad/ssad optabs for aarch64 to catch >code like in PR 85693 >and I'm a bit lost with what the midend expects the optabs to produce. >The documentation for them says that the addend operand (op 3) is of >mode equal or wider than >the mode of the product (and consequently of operands 1 and 2) with the >result operand 0 being >the same mode as operand 3. > >The x86 implementation for usadv16qi (for example) takes a V16QI vector >and returns a V4SI vector. >I'm confused as to what is the reduction logic expected by the midend? >The PSADBW instruction that x86 uses in that case accumulates the two >V8QI halves of the input into >two 16-bit values (you don't need any more bits to represent a sum of 8 >byte differences I believe): >one placed at bit 0, and the other placed at bit 64. The bit ranges [16 >- 63] and [80 - 127] are left as zeroes. >So it produces a V2DI result in essence. > >If the input V16QI vectors look like: >{ a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 >} >{ b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 >} > >then the result V4SI view (before being added into operand 3) is: >{ SUM (ABS (a[0-7] - b[0-7])), 0, SUM (ABS (a[8-15] - b[8-15])), 0 } >(1) > >whereas a normal widening reduction of V16QI -> V4SI to me would look >more like: > >{ SUM (ABS (a[0-3] - b[0-3])), SUM (ABS (a[4-7] - b[4-7])), SUM (ABS >(a[8-11] - b[8-11])), SUM (ABS (a[12-15] - b[12-15])) } (2) > >My question is, does the vectoriser depend on the semantics of [us]sad >producing the result in (1)? No, it doesn't. It is required that any association of the embedded reduction is correct and thus this requires appropriate - ffast-math flags. Note it's also the reason why we do not implement constant folding of SAD. >If so, do you think it's worth clarifying in the documentation? Probably yes - but I'm not sure the current state of affairs is best... Do other targets implement the same reduction order as x86? Other similar reduction ops have high /low or even /odd variants. But they also do not reduce the outputs. Note DOT_PROD has the very same issue. Richard. >Thanks, >Kyrill From gccadmin@gcc.gnu.org Wed May 9 22:43:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Wed, 09 May 2018 22:43:00 -0000 Subject: gcc-6-20180509 is now available Message-ID: <20180509224259.98193.qmail@sourceware.org> Snapshot gcc-6-20180509 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/6-20180509/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch revision 260099 You'll find: gcc-6-20180509.tar.xz Complete GCC SHA256=9bce0a94d7eeb0922ff4201ad51e45d30dd012b380f608822862f22fc48f289d SHA1=8e363c89969b8726b0185014b5362060fb5875a8 Diffs from 6-20180502 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From sales@noblelift.com Thu May 10 04:21:00 2018 From: sales@noblelift.com (=?utf-8?Q?Noblelift=20Equipment?=) Date: Thu, 10 May 2018 04:21:00 -0000 Subject: =?utf-8?Q?Noblelift=20at=20CEMAT=202018=2C=20see=20what=27s=20new=21=20=2D=2DNoblelift=20E=2DNewsletter?= Message-ID: https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=218fb2531d&e=8d69277303 Noblelift at CEMAT 2018 Hannover, Germany https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=3ca9d9285f&e=8d69277303 Noblelift @ CeMAT 2018 in Hannover Germany The biggest exhibition for mataerial handling industry--CEMAT, was taking place on April 23-27 as part of the Hannover Messe ? one of the world?s largest and best-known industrial events. With a booth of 249? located at Hall 26-F09, Noblelift showcased its latest range N series Class 3 trucks and Class 1 trucks, as well as the revolutionary Easy Mover which is designed to replace applications of hand pallet trucks at an affordable cost. Visitors will be able to gain an insight into Noblelift' s approach to supply reliable products with great quality as well as affordable cost, by using the most well-known brands for key compotnents such as Drive unit, Controller as well as Hydraulic unit., combining with the manufacturing in China. An example of the new N series Class 3 trucks: ?German Drive unit with high speed at 7/8KM/H for high efficiency ?Optimized design with smallest turning raidus for ride-on stackers in the industry ?Various options such as Proportional Lift, EPS, Side Battery Extraction, Pin-code Panel and more To see more about Noblelift at CEMAT 2018 and welcome to inqure to our products and be our dealers! https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=9b44f9a635&e=8d69277303 ` ** https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=44b36ef84d&e=8d69277303 ------------------------------------------------------------ https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=5002f79914&e=8d69277303 https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=9e3b8e77d2&e=8d69277303 https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=fcf234989c&e=8d69277303 Hand Pallet Trucks (https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=9a18e14d93&e=8d69277303) https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=5c584c9153&e=8d69277303 Manual Stackers (https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=4aca40d93d&e=8d69277303) https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=c5b9673b22&e=8d69277303 Electric Pallet Trucks (https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=90162c0901&e=8d69277303) https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=8a4b854195&e=8d69277303 Electric Stackers (https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=51a5d47fbe&e=8d69277303) https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=1d9a43c605&e=8d69277303 Electric Forklifts (https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=6dde1f13ce&e=8d69277303) https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=d8523df383&e=8d69277303 Aerial Work Platforms (https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=256709e08f&e=8d69277303) Noblelift Equipment is the leading manufacturer of material handling equipments in China, with more than 20 years offering the full range of the material handling equipments from manual pallet tucks, stackers, to electric pallet trucks & stackers, Forklifts, manual and electric lift tables & platforms and more, we offer the biggest range of the equipments in the business. Noblelift was public-listed in Shanghai Stock Exchange(SSE) on Jan.28th, 2015. To know more about Noblelif, please visit www.noblelift.com. https://noblelift.us10.list-manage.com/track/click?u=e8aec7d772de62b3b6c40316a&id=3826cb71a9&e=8d69277303 Tel: 86-572-6210817 6210311 Email: sales@noblelift.com (mailto:sales@noblelift.com) www.noblelift.com www.noblelift.us This email was sent to gcc@gcc.gnu.org (mailto:gcc@gcc.gnu.org) why did I get this? (https://noblelift.us10.list-manage.com/about?u=e8aec7d772de62b3b6c40316a&id=df20522972&e=8d69277303&c=a3af634187) unsubscribe from this list (https://noblelift.us10.list-manage.com/unsubscribe?u=e8aec7d772de62b3b6c40316a&id=df20522972&e=8d69277303&c=a3af634187) update subscription preferences (https://noblelift.us10.list-manage.com/profile?u=e8aec7d772de62b3b6c40316a&id=df20522972&e=8d69277303) Noblelift Equipment . #528 Changzhou Road . Changxing, Zhejiang 313100 . China From kyrylo.tkachov@foss.arm.com Thu May 10 08:53:00 2018 From: kyrylo.tkachov@foss.arm.com (Kyrill Tkachov) Date: Thu, 10 May 2018 08:53:00 -0000 Subject: Semantics of SAD_EXPR and usad/ssad optabs In-Reply-To: References: <5AF31FA3.3040607@foss.arm.com> Message-ID: <5AF4087F.6090403@foss.arm.com> Hi Richard, On 09/05/18 19:37, Richard Biener wrote: > On May 9, 2018 6:19:47 PM GMT+02:00, Kyrill Tkachov wrote: >> Hi all, >> >> I'm looking into implementing the usad/ssad optabs for aarch64 to catch >> code like in PR 85693 >> and I'm a bit lost with what the midend expects the optabs to produce. >> The documentation for them says that the addend operand (op 3) is of >> mode equal or wider than >> the mode of the product (and consequently of operands 1 and 2) with the >> result operand 0 being >> the same mode as operand 3. >> >> The x86 implementation for usadv16qi (for example) takes a V16QI vector >> and returns a V4SI vector. >> I'm confused as to what is the reduction logic expected by the midend? >> The PSADBW instruction that x86 uses in that case accumulates the two >> V8QI halves of the input into >> two 16-bit values (you don't need any more bits to represent a sum of 8 >> byte differences I believe): >> one placed at bit 0, and the other placed at bit 64. The bit ranges [16 >> - 63] and [80 - 127] are left as zeroes. >> So it produces a V2DI result in essence. >> >> If the input V16QI vectors look like: >> { a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 >> } >> { b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 >> } >> >> then the result V4SI view (before being added into operand 3) is: >> { SUM (ABS (a[0-7] - b[0-7])), 0, SUM (ABS (a[8-15] - b[8-15])), 0 } >> (1) >> >> whereas a normal widening reduction of V16QI -> V4SI to me would look >> more like: >> >> { SUM (ABS (a[0-3] - b[0-3])), SUM (ABS (a[4-7] - b[4-7])), SUM (ABS >> (a[8-11] - b[8-11])), SUM (ABS (a[12-15] - b[12-15])) } (2) >> >> My question is, does the vectoriser depend on the semantics of [us]sad >> producing the result in (1)? > No, it doesn't. It is required that any association of the embedded reduction is correct and thus this requires appropriate - ffast-math flags. Note it's also the reason why we do not implement constant folding of SAD. At the moment I'm looking at the integer modes, so I guess reassociation and -ffast-math doesn't come into play, but I'll keep that in mind. >> If so, do you think it's worth clarifying in the documentation? > Probably yes - but I'm not sure the current state of affairs is best... Do other targets implement the same reduction order as x86? Other similar reduction ops have high /low or even /odd variants. But they also do not reduce the outputs. AFAICS only x86 and powerpc implement this so far. The powerpc implementation synthesises the V16QI -> V4SI reduction using multiple instructions. The result it produces is variant (2) in my original post. So the two ports differ. From a purely target implementation perspective it is convenient to not impose any particular reduction strategy. If we say that the only requirement from the [us]sad optabs is that the result vector should be suitable for a full V4SI -> SI reduction but not rely on any particular approach, then each target can provide its optimal sequence. For example, an aarch64 implementation I'm experimenting with now would compute the V16QI -> V16QI absolute differences vector, reduce that into a single HImode value (there is a full widening reduction instruction in aarch64 for that) and then do a widening add of that value into element zero of the result V4SI vector. Following the notation above, this would produce from: { a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 } { b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 } the V4SI result: { SUM (ABS (a[0-15] - b[0-15])), 0, 0, 0 } Matching the x86 or powerpc strategy would require a more costly sequence on aarch64, but of course this would only be safe if we had some guarantees that the midend won't rely on any particular reduction strategy and just treat it as a vector on which to perform a full reduction at the end of a loop. Thanks, Kyrill > Note DOT_PROD has the very same issue. > > Richard. > >> Thanks, >> Kyrill From richard.guenther@gmail.com Thu May 10 10:20:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Thu, 10 May 2018 10:20:00 -0000 Subject: Semantics of SAD_EXPR and usad/ssad optabs In-Reply-To: <5AF4087F.6090403@foss.arm.com> References: <5AF31FA3.3040607@foss.arm.com> <5AF4087F.6090403@foss.arm.com> Message-ID: <3033D658-C610-4B49-AE2B-312D45586855@gmail.com> On May 10, 2018 10:53:19 AM GMT+02:00, Kyrill Tkachov wrote: >Hi Richard, > >On 09/05/18 19:37, Richard Biener wrote: >> On May 9, 2018 6:19:47 PM GMT+02:00, Kyrill Tkachov > wrote: >>> Hi all, >>> >>> I'm looking into implementing the usad/ssad optabs for aarch64 to >catch >>> code like in PR 85693 >>> and I'm a bit lost with what the midend expects the optabs to >produce. >>> The documentation for them says that the addend operand (op 3) is of >>> mode equal or wider than >>> the mode of the product (and consequently of operands 1 and 2) with >the >>> result operand 0 being >>> the same mode as operand 3. >>> >>> The x86 implementation for usadv16qi (for example) takes a V16QI >vector >>> and returns a V4SI vector. >>> I'm confused as to what is the reduction logic expected by the >midend? >>> The PSADBW instruction that x86 uses in that case accumulates the >two >>> V8QI halves of the input into >>> two 16-bit values (you don't need any more bits to represent a sum >of 8 >>> byte differences I believe): >>> one placed at bit 0, and the other placed at bit 64. The bit ranges >[16 >>> - 63] and [80 - 127] are left as zeroes. >>> So it produces a V2DI result in essence. >>> >>> If the input V16QI vectors look like: >>> { a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, >a15 >>> } >>> { b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, >b15 >>> } >>> >>> then the result V4SI view (before being added into operand 3) is: >>> { SUM (ABS (a[0-7] - b[0-7])), 0, SUM (ABS (a[8-15] - b[8-15])), 0 } >>> (1) >>> >>> whereas a normal widening reduction of V16QI -> V4SI to me would >look >>> more like: >>> >>> { SUM (ABS (a[0-3] - b[0-3])), SUM (ABS (a[4-7] - b[4-7])), SUM (ABS >>> (a[8-11] - b[8-11])), SUM (ABS (a[12-15] - b[12-15])) } (2) >>> >>> My question is, does the vectoriser depend on the semantics of >[us]sad >>> producing the result in (1)? >> No, it doesn't. It is required that any association of the embedded >reduction is correct and thus this requires appropriate - ffast-math >flags. Note it's also the reason why we do not implement constant >folding of SAD. > >At the moment I'm looking at the integer modes, so I guess >reassociation and -ffast-math doesn't come into play, but I'll keep >that in mind. > >>> If so, do you think it's worth clarifying in the documentation? >> Probably yes - but I'm not sure the current state of affairs is >best... Do other targets implement the same reduction order as x86? >Other similar reduction ops have high /low or even /odd variants. But >they also do not reduce the outputs. > >AFAICS only x86 and powerpc implement this so far. The powerpc >implementation synthesises the V16QI -> V4SI reduction using multiple >instructions. >The result it produces is variant (2) in my original post. So the two >ports differ. > >From a purely target implementation perspective it is convenient to not >impose any particular reduction strategy. >If we say that the only requirement from the [us]sad optabs is that the >result vector should be suitable for a full V4SI -> SI reduction >but not rely on any particular approach, then each target can provide >its optimal sequence. > >For example, an aarch64 implementation I'm experimenting with now would >compute the V16QI -> V16QI absolute differences vector, >reduce that into a single HImode value (there is a full widening >reduction instruction in aarch64 for that) and then do a widening add >of >that value into element zero of the result V4SI vector. Following the >notation above, this would produce from: > >{ a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 >} >{ b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 >} > >the V4SI result: > >{ SUM (ABS (a[0-15] - b[0-15])), 0, 0, 0 } > >Matching the x86 or powerpc strategy would require a more costly >sequence on aarch64, but of course this would only be >safe if we had some guarantees that the midend won't rely on any >particular reduction strategy and just treat it as a vector >on which to perform a full reduction at the end of a loop. OK, sounds reasonable. BTW, in other context I needed a very specific reduction order because the result was not used in a reduction. For that purpose we'd then need different optabs. Richard. >Thanks, >Kyrill > >> Note DOT_PROD has the very same issue. >> >> Richard. >> >>> Thanks, >>> Kyrill From jakub@redhat.com Thu May 10 18:35:00 2018 From: jakub@redhat.com (Jakub Jelinek) Date: Thu, 10 May 2018 18:35:00 -0000 Subject: Unused __builtin_ia32_* builtins Message-ID: <20180510183514.GL8577@tucnak> Hi! for i in `grep __builtin_ia32 config/i386/i386-builtin.def | sed 's/^.*__builtin_ia32_/__builtin_ia32_/;s/".*$//' | sort -u`; do grep -q -w $i config/i386/*.h || echo $i; done shows many builtins not used in any of the intrinsic headers. I believe for the __builtin_ia32_* builtins we only support the intrinsics and not the builtins directly. Can we remove some of these (not necessarily all of them), after checking when and why they were added and if they were added for the intrinsic headers which now e.g. uses generic vector arith instead? E.g. __builtin_ia32_add{pd,ps}{,256} were used in intrinsic headers in <= 4.9.x and unused afterwards. __builtin_ia32_ceilpd I can't find in any header of any version. Perhaps just start with the builtins that were used in <= 4.9.x headers and aren't anymore (that is the first list until empty line, rest are builtins not appearing in 4.9 intrinsic headers either). __builtin_ia32_addpd __builtin_ia32_addpd256 __builtin_ia32_addps __builtin_ia32_addps256 __builtin_ia32_andsi256 __builtin_ia32_divpd __builtin_ia32_divpd256 __builtin_ia32_divps __builtin_ia32_divps256 __builtin_ia32_loaddqu __builtin_ia32_loaddqu256 __builtin_ia32_loadupd __builtin_ia32_loadupd256 __builtin_ia32_loadups __builtin_ia32_loadups256 __builtin_ia32_mulpd __builtin_ia32_mulpd256 __builtin_ia32_mulps __builtin_ia32_mulps256 __builtin_ia32_paddb128 __builtin_ia32_paddb256 __builtin_ia32_paddd128 __builtin_ia32_paddd256 __builtin_ia32_paddq128 __builtin_ia32_paddq256 __builtin_ia32_paddw128 __builtin_ia32_paddw256 __builtin_ia32_pand128 __builtin_ia32_pcmpeqb128 __builtin_ia32_pcmpeqb256 __builtin_ia32_pcmpeqd128 __builtin_ia32_pcmpeqd256 __builtin_ia32_pcmpeqq __builtin_ia32_pcmpeqq256 __builtin_ia32_pcmpeqw128 __builtin_ia32_pcmpeqw256 __builtin_ia32_pcmpgtb128 __builtin_ia32_pcmpgtb256 __builtin_ia32_pcmpgtd128 __builtin_ia32_pcmpgtd256 __builtin_ia32_pcmpgtq __builtin_ia32_pcmpgtq256 __builtin_ia32_pcmpgtw128 __builtin_ia32_pcmpgtw256 __builtin_ia32_pmulld128 __builtin_ia32_pmulld256 __builtin_ia32_pmullw128 __builtin_ia32_pmullw256 __builtin_ia32_por128 __builtin_ia32_por256 __builtin_ia32_psubb128 __builtin_ia32_psubb256 __builtin_ia32_psubd128 __builtin_ia32_psubd256 __builtin_ia32_psubq128 __builtin_ia32_psubq256 __builtin_ia32_psubw128 __builtin_ia32_psubw256 __builtin_ia32_pxor128 __builtin_ia32_pxor256 __builtin_ia32_storedqu __builtin_ia32_storedqu256 __builtin_ia32_storeupd __builtin_ia32_storeupd256 __builtin_ia32_storeups __builtin_ia32_storeups256 __builtin_ia32_subpd __builtin_ia32_subpd256 __builtin_ia32_subps __builtin_ia32_subps256 __builtin_ia32_bndcl __builtin_ia32_bndcu __builtin_ia32_bndint __builtin_ia32_bndldx __builtin_ia32_bndlower __builtin_ia32_bndmk __builtin_ia32_bndret __builtin_ia32_bndstx __builtin_ia32_bndupper __builtin_ia32_ceilpd __builtin_ia32_ceilpd256 __builtin_ia32_ceilpd512 __builtin_ia32_ceilpd_vec_pack_sfix __builtin_ia32_ceilpd_vec_pack_sfix256 __builtin_ia32_ceilpd_vec_pack_sfix512 __builtin_ia32_ceilps __builtin_ia32_ceilps256 __builtin_ia32_ceilps512 __builtin_ia32_ceilps_sfix __builtin_ia32_ceilps_sfix256 __builtin_ia32_ceilps_sfix512 __builtin_ia32_copysignpd __builtin_ia32_copysignpd256 __builtin_ia32_copysignpd512 __builtin_ia32_copysignps __builtin_ia32_copysignps256 __builtin_ia32_copysignps512 __builtin_ia32_cvtps2dq512 __builtin_ia32_exp2ps __builtin_ia32_fldenv __builtin_ia32_floorpd __builtin_ia32_floorpd256 __builtin_ia32_floorpd512 __builtin_ia32_floorpd_vec_pack_sfix __builtin_ia32_floorpd_vec_pack_sfix256 __builtin_ia32_floorpd_vec_pack_sfix512 __builtin_ia32_floorps __builtin_ia32_floorps256 __builtin_ia32_floorps512 __builtin_ia32_floorps_sfix __builtin_ia32_floorps_sfix256 __builtin_ia32_floorps_sfix512 __builtin_ia32_fnclex __builtin_ia32_fnstenv __builtin_ia32_fnstsw __builtin_ia32_narrow_bounds __builtin_ia32_pswapdsi __builtin_ia32_rintpd __builtin_ia32_rintpd256 __builtin_ia32_rintps __builtin_ia32_rintps256 __builtin_ia32_roundpd_az __builtin_ia32_roundpd_az256 __builtin_ia32_roundpd_az_vec_pack_sfix __builtin_ia32_roundpd_az_vec_pack_sfix256 __builtin_ia32_roundpd_az_vec_pack_sfix512 __builtin_ia32_roundps_az __builtin_ia32_roundps_az256 __builtin_ia32_roundps_az_sfix __builtin_ia32_roundps_az_sfix256 __builtin_ia32_roundps_az_sfix512 __builtin_ia32_rsqrtf __builtin_ia32_rsqrtps_nr __builtin_ia32_rsqrtps_nr256 __builtin_ia32_sizeof __builtin_ia32_sqrtpd512 __builtin_ia32_sqrtps512 __builtin_ia32_sqrtps_nr __builtin_ia32_sqrtps_nr256 __builtin_ia32_truncpd __builtin_ia32_truncpd256 __builtin_ia32_truncpd512 __builtin_ia32_truncps __builtin_ia32_truncps256 __builtin_ia32_truncps512 __builtin_ia32_vec_pack_sfix __builtin_ia32_vec_pack_sfix256 __builtin_ia32_vec_pack_sfix512 __builtin_ia32_vpcmov256 __builtin_ia32_vpcmov_v16hi256 __builtin_ia32_vpcmov_v16qi __builtin_ia32_vpcmov_v2df __builtin_ia32_vpcmov_v2di __builtin_ia32_vpcmov_v32qi256 __builtin_ia32_vpcmov_v4df256 __builtin_ia32_vpcmov_v4di256 __builtin_ia32_vpcmov_v4sf __builtin_ia32_vpcmov_v4si __builtin_ia32_vpcmov_v8hi __builtin_ia32_vpcmov_v8sf256 __builtin_ia32_vpcmov_v8si256 Jakub From marc.glisse@inria.fr Thu May 10 19:08:00 2018 From: marc.glisse@inria.fr (Marc Glisse) Date: Thu, 10 May 2018 19:08:00 -0000 Subject: Unused __builtin_ia32_* builtins In-Reply-To: <20180510183514.GL8577@tucnak> References: <20180510183514.GL8577@tucnak> Message-ID: On Thu, 10 May 2018, Jakub Jelinek wrote: > for i in `grep __builtin_ia32 config/i386/i386-builtin.def | sed 's/^.*__builtin_ia32_/__builtin_ia32_/;s/".*$//' | sort -u`; do grep -q -w $i config/i386/*.h || echo $i; done > > shows many builtins not used in any of the intrinsic headers. > > I believe for the __builtin_ia32_* builtins we only support the intrinsics > and not the builtins directly. Can we remove some of these (not necessarily > all of them), after checking when and why they were added and if they were > added for the intrinsic headers which now e.g. uses generic vector arith > instead? When I removed their use in the intrinsic headers, I tried to remove them, but Ada people asked us to keep them https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00843.html -- Marc Glisse From joseph@codesourcery.com Thu May 10 19:39:00 2018 From: joseph@codesourcery.com (Joseph Myers) Date: Thu, 10 May 2018 19:39:00 -0000 Subject: Unused __builtin_ia32_* builtins In-Reply-To: <20180510183514.GL8577@tucnak> References: <20180510183514.GL8577@tucnak> Message-ID: On Thu, 10 May 2018, Jakub Jelinek wrote: > __builtin_ia32_fldenv > __builtin_ia32_fnclex > __builtin_ia32_fnstenv > __builtin_ia32_fnstsw Calls to these are generated by ix86_atomic_assign_expand_fenv. -- Joseph S. Myers joseph@codesourcery.com From tyomitch@gmail.com Thu May 10 19:44:00 2018 From: tyomitch@gmail.com (A. Skrobov) Date: Thu, 10 May 2018 19:44:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access Message-ID: Hello, While working on a port of GCC for a non-public architecture that has pre/post-modify memory access instructions, I discovered what looks like a bug which can cause incorrect code generation. My suggested fix is trivial: https://github.com/tyomitch/gcc/commit/7d9cc102adf11065358d4694109ce3e9f0b5c642 -- but I cannot submit this patch without a testcase, and my expertise in the standard GCC target architectures is insufficient for reproducing this bug in any one of them. So, perhaps a maintainer of any supported architecture having pre/post-modify memory access can take a look at this? Basically, it seems to me that if a BB has a sequence like (using C syntax for clarity) r1 = r2 + 42; r3 = *r1++; r4 = *(r2 + 42); --then the cse pass overlooks the modification of r1 by the second instruction, and changes the last instruction to "r4 = *r1" From richard.sandiford@linaro.org Thu May 10 20:41:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Thu, 10 May 2018 20:41:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: (A. Skrobov's message of "Thu, 10 May 2018 22:44:45 +0300") References: Message-ID: <87wowbp7kz.fsf@linaro.org> Hi, "A. Skrobov" writes: > Hello, > > While working on a port of GCC for a non-public architecture that has > pre/post-modify memory access instructions, I discovered what looks > like a bug which can cause incorrect code generation. > > My suggested fix is trivial: > https://github.com/tyomitch/gcc/commit/7d9cc102adf11065358d4694109ce3e9f0b5c642 > -- but I cannot submit this patch without a testcase, and my expertise > in the standard GCC target architectures is insufficient for > reproducing this bug in any one of them. So, perhaps a maintainer of > any supported architecture having pre/post-modify memory access can > take a look at this? > > Basically, it seems to me that if a BB has a sequence like (using C > syntax for clarity) > > r1 = r2 + 42; > r3 = *r1++; > r4 = *(r2 + 42); > > --then the cse pass overlooks the modification of r1 by the second > instruction, and changes the last instruction to "r4 = *r1" Yeah, I can't see off-hand how this would be handled correctly by current sources. I think the issue's probably latent on in-tree targets since cse runs before inc_dec. I don't think the hash function itself is the right place to invalidate the cache though. Any instruction with a pre/post modification needs to have a REG_INC note as well, so iterating over the REG_INC notes in invalidate_from_sets_and_clobbers should be enough. Does the attached (completely untested :-)) patch work for your test case? A more elaborate fix would be to model the inc or dec as a dummy set in find_sets_in_insn, so that we can still CSE the register after it has been modified, but that would be hard to test with the current pass order. Thanks, Richard Index: gcc/cse.c =================================================================== --- gcc/cse.c 2018-05-10 21:29:33.320961107 +0100 +++ gcc/cse.c 2018-05-10 21:29:33.399958000 +0100 @@ -6195,6 +6195,9 @@ invalidate_from_sets_and_clobbers (rtx_i invalidate (SET_DEST (y), VOIDmode); } } + + for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1)) + invalidate (XEXP (note, 0), VOIDmode); } /* Process X, part of the REG_NOTES of an insn. Look at any REG_EQUAL notes From law@redhat.com Thu May 10 21:09:00 2018 From: law@redhat.com (Jeff Law) Date: Thu, 10 May 2018 21:09:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: References: Message-ID: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> On 05/10/2018 01:44 PM, A. Skrobov wrote: > Hello, > > While working on a port of GCC for a non-public architecture that has > pre/post-modify memory access instructions, I discovered what looks > like a bug which can cause incorrect code generation. > > My suggested fix is trivial: > https://github.com/tyomitch/gcc/commit/7d9cc102adf11065358d4694109ce3e9f0b5c642 > -- but I cannot submit this patch without a testcase, and my expertise > in the standard GCC target architectures is insufficient for > reproducing this bug in any one of them. So, perhaps a maintainer of > any supported architecture having pre/post-modify memory access can > take a look at this? > > Basically, it seems to me that if a BB has a sequence like (using C > syntax for clarity) > > r1 = r2 + 42; > r3 = *r1++; > r4 = *(r2 + 42); > > --then the cse pass overlooks the modification of r1 by the second > instruction, and changes the last instruction to "r4 = *r1" My recollection is that auto-increment addressing modes should not appear in the RTL in the CSE pass. Where are the auto-increment addressing modes coming from? jeff From freddie_chopin@op.pl Thu May 10 21:32:00 2018 From: freddie_chopin@op.pl (Freddie Chopin) Date: Thu, 10 May 2018 21:32:00 -0000 Subject: LTO vs GCC 8 Message-ID: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> Hi! In one of my embedded projects I have an option to enable LTO. This was working more or less fine for GCC 6 and GCC 7, however for GCC 8.1.0 (and binutils 2.30) - with the same set of options - I see something like this -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- $ arm-none-eabi-g++ -Wall -Wextra -Wshadow -std=gnu++11 -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -g -ggdb3 -O2 -flto -ffat- lto-objects -fno-use-cxa-atexit -ffunction-sections -fdata-sections -fno-rtti -fno-exceptions ... [include paths] ... -MD -MP -c test/TestCase.cpp -o output/test/TestCase.o $ arm-none-eabi-g++ -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -g -O2 -flto -fuse-linker-plugin -Wl,- Map=output/test/distortosTest.map,--cref,--gc-sections -Toutput/ST_STM32F4DISCOVERY.preprocessed.ld ... [a lot of objects] ... -Wl,--whole-archive -l:output/libdistortos.a -Wl,--no-whole-archive -o output/test/distortosTest.elf $ arm-none-eabi-objdump --demangle -S output/test/distortosTest.elf > output/test/distortosTest.lss arm-none-eabi-objdump: Dwarf Error: Could not find abbrev number 167. arm-none-eabi-objdump: Dwarf Error: found dwarf version '37', this reader only handles version 2, 3, 4 and 5 information. arm-none-eabi-objdump: Dwarf Error: found dwarf version '6144', this reader only handles version 2, 3, 4 and 5 information. arm-none-eabi-objdump: Dwarf Error: found dwarf version '4864', this reader only handles version 2, 3, 4 and 5 information. ... ... (a lot more) ... -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- As you see, the errors apear only when I try to generate an assembly dump. I'm not sure whether the problem is in GCC or in objdump, but when I have an .elf file produced (with the same options) by gcc 7.3.0, then this new version of objdump doesn't produce any errors. What is also interesting is that the errors are not fatal - the exit code of the process is 0. What is also interesing is that this problem doesn't appear in a trivial test case, so I suspect this is something more subtle. I did not try to narrow it down into a shareable test case, but if you have no hints then maybe I'll try to do that. Any ideas what may be the problem here? Especially do you know whether I should be asking this question here or maybe on binutils mailing list? Thanks in advance! Regards, FCh From gccadmin@gcc.gnu.org Thu May 10 22:40:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Thu, 10 May 2018 22:40:00 -0000 Subject: gcc-7-20180510 is now available Message-ID: <20180510224018.52697.qmail@sourceware.org> Snapshot gcc-7-20180510 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20180510/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 260137 You'll find: gcc-7-20180510.tar.xz Complete GCC SHA256=560bce7d26725be46f3b940581d82490168521d2560f8befd8cecacdb2049b0e SHA1=f31bb1283d8a2a2a6d51a7ff49474ab5663ae4f1 Diffs from 7-20180503 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From tyomitch@gmail.com Fri May 11 05:45:00 2018 From: tyomitch@gmail.com (A. Skrobov) Date: Fri, 11 May 2018 05:45:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> Message-ID: On Fri, May 11, 2018 at 12:09 AM, Jeff Law wrote: > > My recollection is that auto-increment addressing modes should not > appear in the RTL in the CSE pass. Fair enough; but they're explicitly listed in the big switch block in hash_rtx_cb (). Should my added line change from "invalidate_dest (XEXP (x, 0));" to "gcc_unreachable ();" ? Such a patch wouldn't need a testcase, I suppose. From richard.guenther@gmail.com Fri May 11 09:19:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Fri, 11 May 2018 09:19:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> Message-ID: On Thu, May 10, 2018 at 11:32 PM, Freddie Chopin wrote: > Hi! > > In one of my embedded projects I have an option to enable LTO. This was > working more or less fine for GCC 6 and GCC 7, however for GCC 8.1.0 > (and binutils 2.30) - with the same set of options - I see something > like this > > -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- > > $ arm-none-eabi-g++ -Wall -Wextra -Wshadow -std=gnu++11 -mcpu=cortex-m4 > -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -g -ggdb3 -O2 -flto -ffat- > lto-objects -fno-use-cxa-atexit -ffunction-sections -fdata-sections > -fno-rtti -fno-exceptions ... [include paths] ... -MD -MP -c > test/TestCase.cpp -o output/test/TestCase.o > > $ arm-none-eabi-g++ -mcpu=cortex-m4 -mthumb -mfloat-abi=hard > -mfpu=fpv4-sp-d16 -g -O2 -flto -fuse-linker-plugin -Wl,- > Map=output/test/distortosTest.map,--cref,--gc-sections > -Toutput/ST_STM32F4DISCOVERY.preprocessed.ld ... [a lot of objects] ... > -Wl,--whole-archive -l:output/libdistortos.a -Wl,--no-whole-archive -o > output/test/distortosTest.elf > > $ arm-none-eabi-objdump --demangle -S output/test/distortosTest.elf > > output/test/distortosTest.lss > arm-none-eabi-objdump: Dwarf Error: Could not find abbrev number 167. > arm-none-eabi-objdump: Dwarf Error: found dwarf version '37', this > reader only handles version 2, 3, 4 and 5 information. > arm-none-eabi-objdump: Dwarf Error: found dwarf version '6144', this > reader only handles version 2, 3, 4 and 5 information. > arm-none-eabi-objdump: Dwarf Error: found dwarf version '4864', this > reader only handles version 2, 3, 4 and 5 information. > ... > ... (a lot more) > ... > > -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- > > As you see, the errors apear only when I try to generate an assembly > dump. I'm not sure whether the problem is in GCC or in objdump, but > when I have an .elf file produced (with the same options) by gcc 7.3.0, > then this new version of objdump doesn't produce any errors. What is > also interesting is that the errors are not fatal - the exit code of > the process is 0. > > What is also interesing is that this problem doesn't appear in a > trivial test case, so I suspect this is something more subtle. I did > not try to narrow it down into a shareable test case, but if you have > no hints then maybe I'll try to do that. > > Any ideas what may be the problem here? Especially do you know whether > I should be asking this question here or maybe on binutils mailing > list? Hmm, can you try without --gc-sections? "Old" GNU ld versions have a bug that wrecks debug info (sourceware PR20882). Richard. > Thanks in advance! > > Regards, > FCh From richard.guenther@gmail.com Fri May 11 09:26:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Fri, 11 May 2018 09:26:00 -0000 Subject: AMD GCN port In-Reply-To: <1997f00f-0390-b0e8-fe8f-ba4fc04dd1d3@codesourcery.com> References: <1997f00f-0390-b0e8-fe8f-ba4fc04dd1d3@codesourcery.com> Message-ID: On Wed, May 9, 2018 at 6:35 PM, Andrew Stubbs wrote: > Honza, Martin, > > Further to our conversation on IRC ... > > We have just completed work on a GCN3 & GCN5 port intended for running > OpenMP and OpenACC offload kernels on AMD Fiji and Vega discrete GPUs. > Unfortunately Carrizo is probably broken because we don't have one to test, > and the APUs use shared memory and XNACK, which we've not paid any attention > to. > > There will be a binary release available soon(ish). > > Apologies the development schedule has made it hard to push the work > upstream, but now it is time. > > I've posted the code to Github for reference: > https://github.com/ams-cs/gcc > https://github.com/ams-cs/newlib > > We're using LLVM 6 for the assembler and linker; there's no binutils port. > > It should be possible to build a "standalone" amdgcn-none-amdhsa compiler > that can run code via the included "gcn-run" loader tool (and the HSA > runtime). This can be used to run the testsuite, with a little dejagnu > config trickery. > > It should also be possible to build an x86_64-linux-gnu compiler with > --enable-offload-target=gcn, and a matching amdgcn-none-amdhsa compiler with > --enable-as-accelerator-for=x86_64-linux-gnu, and have them run code > offloaded with OpenMP or OpenACC directives. > > The code is based on Honza's original port, rebased to GCC 7.3. > > I'd like to agree an upstreaming strategy that > a) gets basic GCN support into trunk soonish. We'll need to get a few > middle/front end patches approved, and probably update a few back-end hooks, > but this ought to be easy enough. > b) gets trunk OpenMP/OpenACC to work for GCN, eventually. I'm expecting some > pain in libgomp here. > c) gives me a stable base from which to make binary releases (i.e. not > trunk). > d) allows me to use openacc-gcc-8-branch without too much duplication of > effort. > > How about the following strategy? > > 1. Create "gcn-gcc-7-branch" to archive the current work. This would be a > source for merges (or cherry-picking), but I'd not expect much future > development. Initially it would have the same content as the Github > repository above. > > 2. Create "gcn-gcc-8-branch" with a merger of "gcc-8-branch" and > "gcn-gcc-7-branch". This would be broken w.r.t. libgomp, initially, but get > fixed up in time. It would receive occasional merges from the release > branch. I expect to do GCN back-end development work here. > > 3. Create "gcn-openacc-gcc-8-branch" from the new "gcn-gcc-8-branch", and > merge in "openacc-gcc-8-branch". This will hold offloading patches not > compatible with trunk, and receive updated GCN changes via merge. I intend > to deliver my next binary release from this branch. > > 4. Replace/update the existing "gcn" branch with a merger of "trunk" and > "gcn-gcc-8-branch" (not the OpenACC branch). This would be merged to trunk, > and possibly retired, as soon as possible. I imagine bits will have to be > submitted as patches, and then the back-end merged as a whole. > > trunk > |\ > | gcc-7-branch > | |\ > | : gcn-gcc-7-branch > | \ > |\ '--------. > | gcc-8-branch | > | | \ '------------. | > | : openacc-gcc-8-branch gcn-gcc-8-branch > | \ / | > | gcn-openacc-8-branch | > |\ ,---------------------------------' > | gcn > |/ > gcc-9 > > It's slightly complex to describe, but hopefully logical and workable. > > Comments? Better suggestions? Sounds good but I'd not do 1. given the github repo can serve as archiving point, too. Having 2. doesn't sound too useful over 3. so in the end I'd do only 3. and 4. Of course 1 and 2 might help you in doing 3 and 4. Richard. > -- > Andrew Stubbs > Mentor Graphics / CodeSourcery. From david@westcontrol.com Fri May 11 11:06:00 2018 From: david@westcontrol.com (David Brown) Date: Fri, 11 May 2018 11:06:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> Message-ID: <5AF5793A.1050900@westcontrol.com> On 11/05/18 11:19, Richard Biener wrote: > On Thu, May 10, 2018 at 11:32 PM, Freddie Chopin wrote: >> Hi! >> >> In one of my embedded projects I have an option to enable LTO. This was >> working more or less fine for GCC 6 and GCC 7, however for GCC 8.1.0 >> (and binutils 2.30) - with the same set of options - I see something >> like this >> >> -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >> >> $ arm-none-eabi-g++ -Wall -Wextra -Wshadow -std=gnu++11 -mcpu=cortex-m4 >> -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -g -ggdb3 -O2 -flto -ffat- >> lto-objects -fno-use-cxa-atexit -ffunction-sections -fdata-sections >> -fno-rtti -fno-exceptions ... [include paths] ... -MD -MP -c >> test/TestCase.cpp -o output/test/TestCase.o >> > Hmm, can you try without --gc-sections? "Old" GNU ld versions have > a bug that wrecks debug info (sourceware PR20882). > For the Cortex-M devices (and probably many other RISC targets), -fdata-sections comes at a big cost - it effectively blocks -fsection-anchors and makes access to file-static data a lot bigger. People often use -fdata-sections and -ffunction-sections along with -Wl,--gc-sections with the aim of removing unused code and data (and thus saving space, useful on small devices) - I would expect LTO would manage that anyway. The other purpose of these is to improve locality of reference - again LTO should do that for you. But even without LTO, I find the cost of -fdata-sections high compared to -fsection-anchors. From ams@codesourcery.com Fri May 11 11:19:00 2018 From: ams@codesourcery.com (Andrew Stubbs) Date: Fri, 11 May 2018 11:19:00 -0000 Subject: AMD GCN port In-Reply-To: References: <1997f00f-0390-b0e8-fe8f-ba4fc04dd1d3@codesourcery.com> Message-ID: On 11/05/18 10:26, Richard Biener wrote: > Sounds good but I'd not do 1. given the github repo can serve as archiving > point, too. Having 2. doesn't sound too useful over 3. so in the end I'd > do only 3. and 4. Of course 1 and 2 might help you in doing 3 and 4. Indeed, I've been worried that I'm basically planning to expose internal steps. The problem I'm trying to solve with 2 is that what I need is 3, but that means code dependencies on things I don't own, which makes it harder to get to 4. The other thing that's occurred to me is that with og8 being new, maybe it's a good time to merge the GCN stuff into that, and work with the NVidia folks to share it. [Adding Cesar and Thomas to CC.] I'm aware of some incompatibilities with og7, but those are going to need fixing either way. Here's another proposal. trunk |\ | gcc-7-branch | |\ | : gcn-gcc-7-branch (1 - possibly notional) | \ |\ | | gcc-8-branch | | |\ / | | gcn-gcc-8-branch (2. trunk compatible) | | | '----------------------------. | |\ | | | : openacc-gcc-8-branch (3. share existing) | | | |\ ,------------------------------------------' | gcn (4. temporary) |/ gcc-9 Obviously, the description "trunk compatible" would become less true over time, but it will be less diverged than og8. I suppose this branch could also be notional, only named internally, though? I guess it makes no difference to me -- I'm going to have to go through all the steps anyway -- but it depends how transparent others would like me to be. Andrew From info@tabtimer.com.au Fri May 11 12:35:00 2018 From: info@tabtimer.com.au (=?utf-8?Q?TabTimer_=2D_helping_to_keep_people_on=2Dtime?=) Date: Fri, 11 May 2018 12:35:00 -0000 Subject: =?utf-8?Q?Your_TabTimer_newsletter_subscription?= Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ams@codesourcery.com Fri May 11 13:54:00 2018 From: ams@codesourcery.com (Andrew Stubbs) Date: Fri, 11 May 2018 13:54:00 -0000 Subject: AMD GCN port In-Reply-To: References: <1997f00f-0390-b0e8-fe8f-ba4fc04dd1d3@codesourcery.com> Message-ID: <01d25d95-77b5-f15c-4366-dc40f22a8a4c@codesourcery.com> On 11/05/18 12:18, Andrew Stubbs wrote: > The other thing that's occurred to me is that with og8 being new, maybe > it's a good time to merge the GCN stuff into that, and work with the > NVidia folks to share it. [Adding Cesar and Thomas to CC.] I'm aware of > some incompatibilities with og7, but those are going to need fixing > either way. I've spoken with Thomas. He's happy to take GCN patches there, or GCN related patches anyway, so that's an option. I can use it as my upstream, or push my changes directly. > I guess it makes no difference to me -- I'm going to have to go through > all the steps anyway -- but it depends how transparent others would like > me to be. The more I think about this, the more I'm coming to the conclusion that nobody but me really cares about GCN for GCC 8, and nobody cares about the development history, so I'm over-complicating my upstreaming plans. I should just do what I need to do in my local repo, as before, and submit a somewhat flattened patch series for inclusion in trunk in the traditional manner, as soon as I have it. Andrew From freddie_chopin@op.pl Fri May 11 15:33:00 2018 From: freddie_chopin@op.pl (Freddie Chopin) Date: Fri, 11 May 2018 15:33:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> Message-ID: On Fri, 2018-05-11 at 11:19 +0200, Richard Biener wrote: > Hmm, can you try without --gc-sections? "Old" GNU ld versions have > a bug that wrecks debug info (sourceware PR20882). Yes - you are right. Without --gc-sections the errors are gone. The bug was marked as resolved and fixed a year ago, however from the comments I presume that it was only a partial fix, so possibly 2.31 will be working fine for arm-none-abi, right? What is also interesting is that there was no problem for gcc 7.3 with binutils 2.29.1 and for gcc 6.3 with binutils 2.28 - only 8.1 + 2.30 behave like this. Is there a workaround for the problem? Maybe I could mark some sections as KEEP() in the linker script while waiting for binutils 2.31? Regards, FCh From law@redhat.com Fri May 11 15:44:00 2018 From: law@redhat.com (Jeff Law) Date: Fri, 11 May 2018 15:44:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> Message-ID: On 05/10/2018 11:45 PM, A. Skrobov wrote: > On Fri, May 11, 2018 at 12:09 AM, Jeff Law wrote: >> >> My recollection is that auto-increment addressing modes should not >> appear in the RTL in the CSE pass. > > Fair enough; but they're explicitly listed in the big switch block in > hash_rtx_cb (). > Should my added line change from "invalidate_dest (XEXP (x, 0));" to > "gcc_unreachable ();" ? > Such a patch wouldn't need a testcase, I suppose. To change into a gcc_unreachable we'd need to verify that hash_rtx_cb is never called from outside the CSE pass. If we look in sel-sched-ir.c we see that it calls into hash_rtx_cb (sigh, bad modularity). I'm not at all familiar with how the hashing is used within the selective scheduler, so I can't really say what the selective scheduler *should* be doing here. Note that if you were generating these insns using your own port, you can get the same effect by using a PARALLEL rather than an auto-increment. PARALLELs are supported throughout the RTL IL. jeff From freddie_chopin@op.pl Fri May 11 15:50:00 2018 From: freddie_chopin@op.pl (Freddie Chopin) Date: Fri, 11 May 2018 15:50:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: <5AF5793A.1050900@westcontrol.com> References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> <5AF5793A.1050900@westcontrol.com> Message-ID: On Fri, 2018-05-11 at 13:06 +0200, David Brown wrote: > For the Cortex-M devices (and probably many other RISC targets), > -fdata-sections comes at a big cost - it effectively blocks > -fsection-anchors and makes access to file-static data a lot bigger. > People often use -fdata-sections and -ffunction-sections along with > -Wl,--gc-sections with the aim of removing unused code and data (and > thus saving space, useful on small devices) - I would expect LTO > would > manage that anyway. The other purpose of these is to improve > locality > of reference - again LTO should do that for you. But even without > LTO, > I find the cost of -fdata-sections high compared to -fsection- > anchors. Unfortunatelly having LTO doesn't make -ffunction-sections + -fdata- sections + --gc-sections useless. My test project compiled: - without LTO and without these attributes - 150824 B ROM + 4240 B RAM - with LTO and without these attributes - 133812 B ROM + 4208 B RAM - without LTO and with these attributes - 124456 B ROM + 3484 B RAM - with LTO and with these attributes - 120280 B ROM + 3680 B RAM As you see these attributes give much more than LTO in terms of size. As for the -fsection-anchors I guess this has no use for non-PIC code for arm-none-eabi. Whether I use it or not, the sizes are identical. Regards, FCh From richard.guenther@gmail.com Fri May 11 16:51:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Fri, 11 May 2018 16:51:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> <5AF5793A.1050900@westcontrol.com> Message-ID: <49DC31D9-92D9-48A1-935C-4B5B8A3BE47E@gmail.com> On May 11, 2018 5:49:44 PM GMT+02:00, Freddie Chopin wrote: >On Fri, 2018-05-11 at 13:06 +0200, David Brown wrote: >> For the Cortex-M devices (and probably many other RISC targets), >> -fdata-sections comes at a big cost - it effectively blocks >> -fsection-anchors and makes access to file-static data a lot bigger. >> People often use -fdata-sections and -ffunction-sections along with >> -Wl,--gc-sections with the aim of removing unused code and data (and >> thus saving space, useful on small devices) - I would expect LTO >> would >> manage that anyway. The other purpose of these is to improve >> locality >> of reference - again LTO should do that for you. But even without >> LTO, >> I find the cost of -fdata-sections high compared to -fsection- >> anchors. > >Unfortunatelly having LTO doesn't make -ffunction-sections + -fdata- >sections + --gc-sections useless. > >My test project compiled: >- without LTO and without these attributes - 150824 B ROM + 4240 B RAM >- with LTO and without these attributes - 133812 B ROM + 4208 B RAM >- without LTO and with these attributes - 124456 B ROM + 3484 B RAM >- with LTO and with these attributes - 120280 B ROM + 3680 B RAM > >As you see these attributes give much more than LTO in terms of size. > >As for the -fsection-anchors I guess this has no use for non-PIC code >for arm-none-eabi. Whether I use it or not, the sizes are identical. That's an interesting result. Do you have any non-LTO objects? Basically I'm curious what ld eliminates that gcc with LTO doesn't. As to a workaround for the ld bug you can try keeping all .debug_* sections. IIRC 2.30 has the bug fixed (on the branch). Richard. >Regards, >FCh From gccadmin@gcc.gnu.org Fri May 11 22:40:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Fri, 11 May 2018 22:40:00 -0000 Subject: gcc-8-20180511 is now available Message-ID: <20180511224023.33700.qmail@sourceware.org> Snapshot gcc-8-20180511 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/8-20180511/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch revision 260189 You'll find: gcc-8-20180511.tar.xz Complete GCC SHA256=a6bbccb257d46460ed93ba9fb350a5ba92a0fd888578aa52e8956f08a8d8e274 SHA1=e43fc660b66753e292ec2f2d3bfd1c19f7ba647e Diffs from 8-20180504 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From info@webexepert.com Sat May 12 09:41:00 2018 From: info@webexepert.com (info@webexepert.com) Date: Sat, 12 May 2018 09:41:00 -0000 Subject: Web Listing Message-ID: Hello, I am following up on an email that I sent you a few weeks ago. To recap from my last email, I came across your website and saw that there is some potential to help increase your traffic. Our offerings have the potential to help improve your current site, so you can achieve greater profitability. Please let me know a good day and time that I could contact you. With our detailed reporting, we will be able to show you exactly what needs to be done to improve your online presence. I look forward to helping your business thrive in a digital world. Kind Regards, Rachel Business Development Manager ------------------------------------------------------------------------------------- From: info@webexepert.com [mailto: info@webexepert.com] Sent: Fri, May 11, 2018 2:54 PM Subject: Web Listing Hi, Hope you are well, I was surfing through your website and realized that despite having a great design; it was not ranking on any of the search engines (Google and AOL) for most of the keywords relating to your business. I am affiliated with an SEO company helped over 200 businesses rank on the 1st Page of GOOGLE for even the most competitive Industries. Let me know if you are interested and I will send you our company details or create a proposal so you can see exactly where you rank compared to your competitors. I look forward to your mail. Kind Regards, Rachel Business Development Manager PS: - If this is something you are interested, please respond to this email. If this is not of your interest, don't worry, we will not email you again once you reply back me as not interested. From tyomitch@gmail.com Sat May 12 14:21:00 2018 From: tyomitch@gmail.com (A. Skrobov) Date: Sat, 12 May 2018 14:21:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> Message-ID: > If we look in sel-sched-ir.c we see that it calls into hash_rtx_cb > (sigh, bad modularity). I'm not at all familiar with how the hashing > is used within the selective scheduler, so I can't really say what the > selective scheduler *should* be doing here. OK, I see. Now what do you think would be the best course of action? Leave everything as it is? The selective scheduler may or may not want these memory accesses ignored. From richard.sandiford@linaro.org Sat May 12 16:02:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Sat, 12 May 2018 16:02:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: (A. Skrobov's message of "Sat, 12 May 2018 17:21:16 +0300") References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> Message-ID: <87d0y0q2v5.fsf@linaro.org> "A. Skrobov" writes: >> If we look in sel-sched-ir.c we see that it calls into hash_rtx_cb >> (sigh, bad modularity). I'm not at all familiar with how the hashing >> is used within the selective scheduler, so I can't really say what the >> selective scheduler *should* be doing here. > > OK, I see. Now what do you think would be the best course of action? > Leave everything as it is? The selective scheduler may or may not want > these memory accesses ignored. I don't think we can assert even for cse, since AIUI these codes can still be used for stack pushes and pops. Thanks, Richard From tyomitch@gmail.com Sat May 12 16:50:00 2018 From: tyomitch@gmail.com (A. Skrobov) Date: Sat, 12 May 2018 16:50:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: <87d0y0q2v5.fsf@linaro.org> References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> Message-ID: >>> If we look in sel-sched-ir.c we see that it calls into hash_rtx_cb >>> (sigh, bad modularity). I'm not at all familiar with how the hashing >>> is used within the selective scheduler, so I can't really say what the >>> selective scheduler *should* be doing here. >> >> OK, I see. Now what do you think would be the best course of action? >> Leave everything as it is? The selective scheduler may or may not want >> these memory accesses ignored. > > I don't think we can assert even for cse, since AIUI these codes can > still be used for stack pushes and pops. But then cse would generate incorrect code when these are used? From law@redhat.com Sat May 12 17:01:00 2018 From: law@redhat.com (Jeff Law) Date: Sat, 12 May 2018 17:01:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: <87d0y0q2v5.fsf@linaro.org> References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> Message-ID: On 05/12/2018 10:02 AM, Richard Sandiford wrote: > "A. Skrobov" writes: >>> If we look in sel-sched-ir.c we see that it calls into hash_rtx_cb >>> (sigh, bad modularity). I'm not at all familiar with how the hashing >>> is used within the selective scheduler, so I can't really say what the >>> selective scheduler *should* be doing here. >> >> OK, I see. Now what do you think would be the best course of action? >> Leave everything as it is? The selective scheduler may or may not want >> these memory accesses ignored. > > I don't think we can assert even for cse, since AIUI these codes can > still be used for stack pushes and pops. No. We're not supposed to have any auto-inc insns prior to the auto-inc pass. A stack push/pop early in the compiler would have to be represented by a PARALLEL. It's been this way forever. It's documented in the internals manual somewhere. jeff From richard.sandiford@linaro.org Sat May 12 19:35:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Sat, 12 May 2018 19:35:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: (Jeff Law's message of "Sat, 12 May 2018 11:01:37 -0600") References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> Message-ID: <878t8opsyv.fsf@linaro.org> Jeff Law writes: > On 05/12/2018 10:02 AM, Richard Sandiford wrote: >> "A. Skrobov" writes: >>>> If we look in sel-sched-ir.c we see that it calls into hash_rtx_cb >>>> (sigh, bad modularity). I'm not at all familiar with how the hashing >>>> is used within the selective scheduler, so I can't really say what the >>>> selective scheduler *should* be doing here. >>> >>> OK, I see. Now what do you think would be the best course of action? >>> Leave everything as it is? The selective scheduler may or may not want >>> these memory accesses ignored. >> >> I don't think we can assert even for cse, since AIUI these codes can >> still be used for stack pushes and pops. > No. We're not supposed to have any auto-inc insns prior to the auto-inc > pass. A stack push/pop early in the compiler would have to be > represented by a PARALLEL. > > It's been this way forever. It's documented in the internals manual > somewhere. Maybe pops was a generalisation too far :-) but I was going off: if (MEM_P (dest)) { #ifdef PUSH_ROUNDING /* Stack pushes invalidate the stack pointer. */ rtx addr = XEXP (dest, 0); if (GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC && XEXP (addr, 0) == stack_pointer_rtx) invalidate (stack_pointer_rtx, VOIDmode); #endif dest = fold_rtx (dest, insn); } in cse_insn. These kinds of push are generated by emit_single_push_insn during expand, so if we asserted for RTX_AUTOINC rtxes then it would fire for this case. Richard From bernds_cb1@t-online.de Sat May 12 19:36:00 2018 From: bernds_cb1@t-online.de (Bernd Schmidt) Date: Sat, 12 May 2018 19:36:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> Message-ID: On 05/12/2018 07:01 PM, Jeff Law wrote: > No. We're not supposed to have any auto-inc insns prior to the auto-inc > pass. A stack push/pop early in the compiler would have to be > represented by a PARALLEL. > > It's been this way forever. It's documented in the internals manual > somewhere. Sorry, but you're misremembering this. Stack pushes/pops were always represented with autoinc, these being the only exception to the rule you remember. You can easily verify this by looking at a .expand dump from a 32-bit i386 compiler - I just did so with 2.95 and 6.4. It's all pre_dec for argument passing. Bernd From gccadmin@gcc.gnu.org Sun May 13 22:40:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Sun, 13 May 2018 22:40:00 -0000 Subject: gcc-9-20180513 is now available Message-ID: <20180513224021.84154.qmail@sourceware.org> Snapshot gcc-9-20180513 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/9-20180513/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 260217 You'll find: gcc-9-20180513.tar.xz Complete GCC SHA256=3b297fd353637ce8af796f5ff00e504dda01cb8971062c2d33c599a3fe8c9c45 SHA1=e44f02303d748c8ea0ae908fbe4dd011aec107f8 Diffs from 9-20180506 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From david@westcontrol.com Mon May 14 14:34:00 2018 From: david@westcontrol.com (David Brown) Date: Mon, 14 May 2018 14:34:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> <5AF5793A.1050900@westcontrol.com> Message-ID: <5AF99E67.4050004@westcontrol.com> On 11/05/18 17:49, Freddie Chopin wrote: > On Fri, 2018-05-11 at 13:06 +0200, David Brown wrote: >> For the Cortex-M devices (and probably many other RISC targets), >> -fdata-sections comes at a big cost - it effectively blocks >> -fsection-anchors and makes access to file-static data a lot bigger. >> People often use -fdata-sections and -ffunction-sections along with >> -Wl,--gc-sections with the aim of removing unused code and data (and >> thus saving space, useful on small devices) - I would expect LTO >> would >> manage that anyway. The other purpose of these is to improve >> locality >> of reference - again LTO should do that for you. But even without >> LTO, >> I find the cost of -fdata-sections high compared to -fsection- >> anchors. > > Unfortunatelly having LTO doesn't make -ffunction-sections + -fdata- > sections + --gc-sections useless. > > My test project compiled: > - without LTO and without these attributes - 150824 B ROM + 4240 B RAM > - with LTO and without these attributes - 133812 B ROM + 4208 B RAM > - without LTO and with these attributes - 124456 B ROM + 3484 B RAM > - with LTO and with these attributes - 120280 B ROM + 3680 B RAM > > As you see these attributes give much more than LTO in terms of size. > Interesting. Making these sections and then using gc-sections should only remove code that is not used - LTO should do that anyway. Have you tried with -ffunction-sections and not -fdata-sections? It is the -fdata-sections that ruins -fsection-anchors - the -ffunction-sections doesn't have the same kind of cost. > > As for the -fsection-anchors I guess this has no use for non-PIC code > for arm-none-eabi. Whether I use it or not, the sizes are identical. > No, -fsection-anchors has plenty of use for fixed-position eabi code. Take this little example code: static int x; static int y; static int z; void foo(void) { int t = x; x = y; y = z; z = t; } Compiled with gcc (4.8, as that's what I had convenient) with -O2 -mcpu=cortex-m4 -mthumb and -fsection-anchors (enabled automatically with -O2, I believe), this gives: 21 foo: 22 @ args = 0, pretend = 0, frame = 0 23 @ frame_needed = 0, uses_anonymous_args = 0 24 @ link register save eliminated. 25 0000 034B ldr r3, .L2 26 0002 93E80500 ldmia r3, {r0, r2} 27 0006 9968 ldr r1, [r3, #8] 28 0008 1A60 str r2, [r3] 29 000a 9860 str r0, [r3, #8] 30 000c 5960 str r1, [r3, #4] 31 000e 7047 bx lr 32 .L3: 33 .align 2 34 .L2: 35 0010 00000000 .word .LANCHOR0 37 .bss 38 .align 2 39 .set .LANCHOR0,. + 0 42 x: 43 0000 00000000 .space 4 46 y: 47 0004 00000000 .space 4 50 z: 51 0008 00000000 .space 4 With -fdata-sections, I get: 21 foo: 22 @ args = 0, pretend = 0, frame = 0 23 @ frame_needed = 0, uses_anonymous_args = 0 24 @ link register save eliminated. 25 0000 30B4 push {r4, r5} 26 0002 0549 ldr r1, .L2 27 0004 054B ldr r3, .L2+4 28 0006 064A ldr r2, .L2+8 29 0008 0D68 ldr r5, [r1] 30 000a 1468 ldr r4, [r2] 31 000c 1868 ldr r0, [r3] 32 000e 1560 str r5, [r2] 33 0010 1C60 str r4, [r3] 34 0012 0860 str r0, [r1] 35 0014 30BC pop {r4, r5} 36 0016 7047 bx lr 37 .L3: 38 .align 2 39 .L2: 40 0018 00000000 .word .LANCHOR0 41 001c 00000000 .word .LANCHOR1 42 0020 00000000 .word .LANCHOR2 44 .section .bss.x,"aw",%nobits 45 .align 2 46 .set .LANCHOR0,. + 0 49 x: 50 0000 00000000 .space 4 51 .section .bss.y,"aw",%nobits 52 .align 2 53 .set .LANCHOR1,. + 0 56 y: 57 0000 00000000 .space 4 58 .section .bss.z,"aw",%nobits 59 .align 2 60 .set .LANCHOR2,. + 0 63 z: 64 0000 00000000 .space 4 The code is clearly bigger and slower, and uses more anchors in the code section. Note that to get similar improvements with non-static data, you need "-fno-common" - a flag that I believe should be the default for the compiler. From law@redhat.com Mon May 14 20:55:00 2018 From: law@redhat.com (Jeff Law) Date: Mon, 14 May 2018 20:55:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> Message-ID: <12889970-6000-3282-b6c8-531fcb826bda@redhat.com> On 05/12/2018 01:35 PM, Bernd Schmidt wrote: > On 05/12/2018 07:01 PM, Jeff Law wrote: > >> No. We're not supposed to have any auto-inc insns prior to the auto-inc >> pass. A stack push/pop early in the compiler would have to be >> represented by a PARALLEL. >> >> It's been this way forever. It's documented in the internals manual >> somewhere. > > Sorry, but you're misremembering this. Stack pushes/pops were always > represented with autoinc, these being the only exception to the rule you > remember. You can easily verify this by looking at a .expand dump from a > 32-bit i386 compiler - I just did so with 2.95 and 6.4. It's all pre_dec > for argument passing. That does sound vaguely familiar. Did we put autoinc notes on the stack pushes? That makes me wonder if there is a latent bug though. Consider pushing args to a pure function. Could we then try to CSE the memory reference and get it wrong because we haven't accounted for the autoinc? Jeff From rodrivg@gmail.com Mon May 14 21:32:00 2018 From: rodrivg@gmail.com (Rodrigo V. G.) Date: Mon, 14 May 2018 21:32:00 -0000 Subject: Please support the _Atomic keyword in C++ Message-ID: In addition to the bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932 I wanted to add some comment: It would be very useful if the _Atomic keyword would be supported in C++. This way the header could be included inconditionally in C++ code. Even if it is not compatible with the C++ header, it would be useful. Supporting the _Atomic keyword in C++ would benefit at least two cases: - When mixing C and C++ code for interoperability (using, in C++, some variables declared as _Atomic in a C header). - When developing operating systems or kernels in C++, in a freestanding environment (cross compiler), is not available, but is. So to correctly use things like __atomic_fetch_add in C++ in freestanding mode, this is the only way. Otherwise one cannot use atomics at all in these conditions. From jwakely.gcc@gmail.com Mon May 14 22:27:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Mon, 14 May 2018 22:27:00 -0000 Subject: Please support the _Atomic keyword in C++ In-Reply-To: References: Message-ID: On 14 May 2018 at 22:32, Rodrigo V. G. wrote: > In addition to the bug: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932 > I wanted to add some comment: > > It would be very useful if the _Atomic keyword would be supported in C++. > This way the header could be included inconditionally in C++ code. > Even if it is not compatible with the C++ header, it would be useful. > > Supporting the _Atomic keyword in C++ would benefit at least two cases: > > - When mixing C and C++ code for interoperability (using, in C++, some > variables declared as _Atomic in a C header). > > - When developing operating systems or kernels in C++, in a > freestanding environment (cross compiler), is not available, Why not? It's part of a freestanding C++ implementation. > but is. How? It's not part of any C++ implementation at all, freestanding or not. > So to correctly use things like __atomic_fetch_add in C++ in > freestanding mode, this is the only way. Otherwise one cannot use > atomics at all in these conditions. Why can't you use __atomic_fetch_add directly? From jwerner@chromium.org Mon May 14 23:38:00 2018 From: jwerner@chromium.org (Julius Werner) Date: Mon, 14 May 2018 23:38:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) Message-ID: Hi, I'm a firmware/embedded engineer and frequently run into cases where certain parts of the code need to be placed in a special memory area (for example, because the area that contains the other code is not yet initialized or currently inaccessible). My go-to method to solve this is to mark all functions and globals used by this code with __attribute__((section)), and using a linker script to map those special sections to the desired area. This mostly works pretty well. However, I just found an issue with this when the functions include local variables like this: const int some_array[] = { 1, 2, 3, 4, 5, 6 }; In this case (and with -Os optimization), GCC seems to automatically reserve some space in the .rodata section to place the array, and the generated code accesses it there. Of course this breaks my use case if the generic .rodata section is inaccessible while that function executes. I have not found any way to work around this without either rewriting the code to completely avoid those constructs, or manipulating sections manually at the linker level (in particular, you can't just mark the array itself with __attribute__((section)), since that attribute is not legal for locals). Is this intentional, and if so, does it make sense that it is? I can understand that it may technically be compliant with the description of __attribute__((section)) in the GCC manual -- but I think the use case I'm trying to solve is one of the most common uses of that attribute, and it seems to become completely impossible due to this. Wouldn't it make more sense and be more useful if __attribute__((section)) meant "place *everything* generated as part of this function source code into that section"? Or at least offer some sort of other extension to be able to control section placement for those special constants? (Note that GCC usually seems to place constants for individual variables in the text section, simply behind the epilogue of the function... so it's also quite unclear to me why arrays get treated differently at all.) Apart from this issue, this behavior also seems to "break" -ffunction-sections/-fdata-sections. Even with both of those set, these sorts of constants seem to get placed into the same big, common .rodata section (as opposed to either .text.functionname or .rodata.functionname as you'd expect). That means that they won't get collected when linking the binary with --gc-sections and will bloat the code size for projects that link a lot of code opportunistically and rely on --gc-sections to drop everything that's not needed for the current configuration. Is there some clever trick that I missed to work around this, or is this really not possible with the current GCC? And if so, would you agree that this is a valid problem that GCC should provide a solution for (in some form or another)? Thanks, Julius From bernds_cb1@t-online.de Mon May 14 23:58:00 2018 From: bernds_cb1@t-online.de (Bernd Schmidt) Date: Mon, 14 May 2018 23:58:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: <12889970-6000-3282-b6c8-531fcb826bda@redhat.com> References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> <12889970-6000-3282-b6c8-531fcb826bda@redhat.com> Message-ID: <3ecad6c8-c4c8-5a66-a5e4-af8c9614af7c@t-online.de> On 05/14/2018 10:55 PM, Jeff Law wrote: > That does sound vaguely familiar. Did we put autoinc notes on the stack > pushes? Not as far as I recall. I only see REG_ARGS_SIZE notes in the dumps. > That makes me wonder if there is a latent bug though. Consider pushing > args to a pure function. Could we then try to CSE the memory reference > and get it wrong because we haven't accounted for the autoinc? Can't know for sure but one would hope something would test for side_effects_p. Bernd From richard.guenther@gmail.com Tue May 15 09:29:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Tue, 15 May 2018 09:29:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: Message-ID: On Tue, May 15, 2018 at 1:38 AM Julius Werner wrote: > Hi, > I'm a firmware/embedded engineer and frequently run into cases where > certain parts of the code need to be placed in a special memory area (for > example, because the area that contains the other code is not yet > initialized or currently inaccessible). My go-to method to solve this is to > mark all functions and globals used by this code with > __attribute__((section)), and using a linker script to map those special > sections to the desired area. This mostly works pretty well. > However, I just found an issue with this when the functions include local > variables like this: > const int some_array[] = { 1, 2, 3, 4, 5, 6 }; > In this case (and with -Os optimization), GCC seems to automatically > reserve some space in the .rodata section to place the array, and the > generated code accesses it there. Of course this breaks my use case if the > generic .rodata section is inaccessible while that function executes. I > have not found any way to work around this without either rewriting the > code to completely avoid those constructs, or manipulating sections > manually at the linker level (in particular, you can't just mark the array > itself with __attribute__((section)), since that attribute is not legal for > locals). > Is this intentional, and if so, does it make sense that it is? I can > understand that it may technically be compliant with the description of > __attribute__((section)) in the GCC manual -- but I think the use case I'm > trying to solve is one of the most common uses of that attribute, and it > seems to become completely impossible due to this. Wouldn't it make more > sense and be more useful if __attribute__((section)) meant "place > *everything* generated as part of this function source code into that > section"? Or at least offer some sort of other extension to be able to > control section placement for those special constants? (Note that GCC > usually seems to place constants for individual variables in the text > section, simply behind the epilogue of the function... so it's also quite > unclear to me why arrays get treated differently at all.) > Apart from this issue, this behavior also seems to "break" > -ffunction-sections/-fdata-sections. Even with both of those set, these > sorts of constants seem to get placed into the same big, common .rodata > section (as opposed to either .text.functionname or .rodata.functionname as > you'd expect). That means that they won't get collected when linking the > binary with --gc-sections and will bloat the code size for projects that > link a lot of code opportunistically and rely on --gc-sections to drop > everything that's not needed for the current configuration. > Is there some clever trick that I missed to work around this, or is this > really not possible with the current GCC? And if so, would you agree that > this is a valid problem that GCC should provide a solution for (in some > form or another)? I think you are asking for per-function constant pool sections. Because we generally cannot avoid the need of a constant pool and dependent on the target that is always global. Note with per-function constant pools you will not benefit from constant pool entry merging across functions. I'm also not aware of any non-target-specific (and thus not implemented on some targets) option to get these. Richard. > Thanks, > Julius From rodrivg@gmail.com Tue May 15 10:01:00 2018 From: rodrivg@gmail.com (Rodrigo V. G.) Date: Tue, 15 May 2018 10:01:00 -0000 Subject: Please support the _Atomic keyword in C++ In-Reply-To: References: Message-ID: On Tue, May 15, 2018 at 12:27 AM, Jonathan Wakely wrote: > On 14 May 2018 at 22:32, Rodrigo V. G. wrote: >> In addition to the bug: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932 >> I wanted to add some comment: >> >> It would be very useful if the _Atomic keyword would be supported in C++. >> This way the header could be included inconditionally in C++ code. >> Even if it is not compatible with the C++ header, it would be useful. >> >> Supporting the _Atomic keyword in C++ would benefit at least two cases: >> >> - When mixing C and C++ code for interoperability (using, in C++, some >> variables declared as _Atomic in a C header). >> >> - When developing operating systems or kernels in C++, in a >> freestanding environment (cross compiler), is not available, > > Why not? It's part of a freestanding C++ implementation. When building a cross compiler as indicated in: https://wiki.osdev.org/GCC_Cross-Compiler#GCC it does not install the header. >> but is. > > How? It's not part of any C++ implementation at all, freestanding or not. As far as I can see the can be included from C++ code (also in the cross compiler). Only that it complains about _Atomic and _Bool so it does not work. So _Atomic and _Bool seem to be the missing pieces. >> So to correctly use things like __atomic_fetch_add in C++ in >> freestanding mode, this is the only way. Otherwise one cannot use >> atomics at all in these conditions. > > Why can't you use __atomic_fetch_add directly? I tried to use __atomic_fetch_add in C++ with a volatile (non _Atomic) variable, and it seems to generate the same assembler code. The only difference that I saw was that with _Atomic it generates a "mfence" instruction after initialization but with volatile it does not. So I think it might not provide the same guarantees. (Sorry, I forgot to cc the list, now I do cc) From aph@redhat.com Tue May 15 10:44:00 2018 From: aph@redhat.com (Andrew Haley) Date: Tue, 15 May 2018 10:44:00 -0000 Subject: Please support the _Atomic keyword in C++ In-Reply-To: References: Message-ID: <76df5a25-9eeb-975a-869b-ba6a0259ea25@redhat.com> On 05/15/2018 11:01 AM, Rodrigo V. G. wrote: > I tried to use __atomic_fetch_add in C++ with a volatile (non _Atomic) variable, > and it seems to generate the same assembler code. > The only difference that I saw was that with _Atomic > it generates a "mfence" instruction after initialization but with > volatile it does not. That's what should happen on x86. > So I think it might not provide the same guarantees. I think it does. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jwakely.gcc@gmail.com Tue May 15 10:50:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Tue, 15 May 2018 10:50:00 -0000 Subject: Please support the _Atomic keyword in C++ In-Reply-To: References: Message-ID: On 15 May 2018 at 11:01, Rodrigo V. G. wrote: > On Tue, May 15, 2018 at 12:27 AM, Jonathan Wakely wrote: >> On 14 May 2018 at 22:32, Rodrigo V. G. wrote: >>> In addition to the bug: >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932 >>> I wanted to add some comment: >>> >>> It would be very useful if the _Atomic keyword would be supported in C++. >>> This way the header could be included inconditionally in C++ code. >>> Even if it is not compatible with the C++ header, it would be useful. >>> >>> Supporting the _Atomic keyword in C++ would benefit at least two cases: >>> >>> - When mixing C and C++ code for interoperability (using, in C++, some >>> variables declared as _Atomic in a C header). >>> >>> - When developing operating systems or kernels in C++, in a >>> freestanding environment (cross compiler), is not available, >> >> Why not? It's part of a freestanding C++ implementation. > > When building a cross compiler as indicated in: > https://wiki.osdev.org/GCC_Cross-Compiler#GCC > it does not install the header. That's not a problem with GCC, that's a problem with those instructions. If you want a freestanding C++ implementation then configure GCC with --disable-hosted-libstdcxx and build+install libstdc++. Those instructions do neither of those things, so it's unsurprising you don't get a proper freestanding C++ implementation. >>> but is. >> >> How? It's not part of any C++ implementation at all, freestanding or not. > > As far as I can see the can be included from C++ code > (also in the cross compiler). > Only that it complains about _Atomic and _Bool so it does not work. > So _Atomic and _Bool seem to be the missing pieces. So you're saying it can be included, it just doesn't work. Great! That's true for this header too: #error "This header doesn't compile" >>> So to correctly use things like __atomic_fetch_add in C++ in >>> freestanding mode, this is the only way. Otherwise one cannot use >>> atomics at all in these conditions. >> >> Why can't you use __atomic_fetch_add directly? > > I tried to use __atomic_fetch_add in C++ with a volatile (non _Atomic) variable, > and it seems to generate the same assembler code. Why use volatile? http://isvolatileusefulwiththreads.com/ > The only difference that I saw was that with _Atomic > it generates a "mfence" instruction after initialization but with > volatile it does not. > So I think it might not provide the same guarantees. > > > (Sorry, I forgot to cc the list, now I do cc) From tyomitch@gmail.com Tue May 15 10:58:00 2018 From: tyomitch@gmail.com (A. Skrobov) Date: Tue, 15 May 2018 10:58:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: <3ecad6c8-c4c8-5a66-a5e4-af8c9614af7c@t-online.de> References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> <12889970-6000-3282-b6c8-531fcb826bda@redhat.com> <3ecad6c8-c4c8-5a66-a5e4-af8c9614af7c@t-online.de> Message-ID: >> That makes me wonder if there is a latent bug though. Consider pushing >> args to a pure function. Could we then try to CSE the memory reference >> and get it wrong because we haven't accounted for the autoinc? > > Can't know for sure but one would hope something would test for > side_effects_p. If side_effects_p were checked in all the right places, then our port (which is more liberal at generating auto-inc insns in early passes) wouldn't have cse generate incorrect code, right? Our observation is that it did. From bernds_cb1@t-online.de Tue May 15 11:16:00 2018 From: bernds_cb1@t-online.de (Bernd Schmidt) Date: Tue, 15 May 2018 11:16:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> <12889970-6000-3282-b6c8-531fcb826bda@redhat.com> <3ecad6c8-c4c8-5a66-a5e4-af8c9614af7c@t-online.de> Message-ID: <5af487f8-b23b-1fce-eff1-6573728be04d@t-online.de> On 05/15/2018 12:58 PM, A. Skrobov wrote: >>> That makes me wonder if there is a latent bug though. Consider pushing >>> args to a pure function. Could we then try to CSE the memory reference >>> and get it wrong because we haven't accounted for the autoinc? >> >> Can't know for sure but one would hope something would test for >> side_effects_p. > > If side_effects_p were checked in all the right places, then our port > (which is more liberal at generating auto-inc insns in early passes) > wouldn't have cse generate incorrect code, right? No, I'd expect you'd also need to make sure cse and other passes understand the side effects. I think it's best not to emit these insns early, unless you are prepared to put a lot of effort in to fix up the early passes. My recommendation is to change the port. Bernd From macro@mips.com Tue May 15 14:45:00 2018 From: macro@mips.com (Maciej W. Rozycki) Date: Tue, 15 May 2018 14:45:00 -0000 Subject: [PATCH] gdb/x86: Fix `-Wstrict-overflow' build error in `i387_collect_xsave' Message-ID: Make `i' defined within `i387_collect_xsave' unsigned, removing a `-Werror=strict-overflow' compilation error: .../gdb/i387-tdep.c: In function 'void i387_collect_xsave(const regcache*, int, void*, int)': .../gdb/i387-tdep.c:1348:1: error: assuming signed overflow does not occur when assuming that (X + c) < X is always false [-Werror=strict-overflow] i387_collect_xsave (const struct regcache *regcache, int regnum, ^ cc1plus: all warnings being treated as errors Makefile:1610: recipe for target 'i387-tdep.o' failed make: *** [i387-tdep.o] Error 1 seen with GCC 5.4.0, a commit 8ee22052f690 ("gdb/x86: Handle kernels using compact xsave format") regression. While `regnum' can be -1 on entry to the function, to mean all registers, `i' is only used with non-negative register numbers. gdb/ * i387-tdep.c (i387_collect_xsave): Make `i' unsigned. --- Hi, I believe this comes from register numbers being retrieved from the `tdep' structure at run time and therefore making the compiler unable to statically determine in loop expression processing that the calculations made on `i' will not cause a signed overflow. NB the error message pointing at the function definition rather than the declaration of `i' makes it rather difficult to determine what the actual cause might be. I just hope it's only a peculiarity of the somewhat older version of the compiler and it has been fixed since. Could someone from the GCC mailing list please comment on that? OK to apply? Maciej --- gdb/i387-tdep.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) gdb-i387-collect-xsave-signed-overflow.diff Index: gdb/gdb/i387-tdep.c =================================================================== --- gdb.orig/gdb/i387-tdep.c 2018-05-10 22:13:05.000000000 +0100 +++ gdb/gdb/i387-tdep.c 2018-05-13 23:27:46.194412211 +0100 @@ -1354,7 +1354,7 @@ i387_collect_xsave (const struct regcach gdb_byte *p, *regs = (gdb_byte *) xsave; gdb_byte raw[I386_MAX_REGISTER_SIZE]; ULONGEST initial_xstate_bv, clear_bv, xstate_bv = 0; - int i; + unsigned int i; enum { x87_ctrl_or_mxcsr = 0x1, From tyomitch@gmail.com Tue May 15 16:45:00 2018 From: tyomitch@gmail.com (A. Skrobov) Date: Tue, 15 May 2018 16:45:00 -0000 Subject: Possible bug in cse.c affecting pre/post-modify mem access In-Reply-To: <5af487f8-b23b-1fce-eff1-6573728be04d@t-online.de> References: <49cab185-65cc-5fe7-9fe8-d31758ab752c@redhat.com> <87d0y0q2v5.fsf@linaro.org> <12889970-6000-3282-b6c8-531fcb826bda@redhat.com> <3ecad6c8-c4c8-5a66-a5e4-af8c9614af7c@t-online.de> <5af487f8-b23b-1fce-eff1-6573728be04d@t-online.de> Message-ID: >>>> That makes me wonder if there is a latent bug though. Consider pushing >>>> args to a pure function. Could we then try to CSE the memory reference >>>> and get it wrong because we haven't accounted for the autoinc? >>> >>> Can't know for sure but one would hope something would test for >>> side_effects_p. >> >> If side_effects_p were checked in all the right places, then our port >> (which is more liberal at generating auto-inc insns in early passes) >> wouldn't have cse generate incorrect code, right? > > No, I'd expect you'd also need to make sure cse and other passes > understand the side effects. I think it's best not to emit these insns > early, unless you are prepared to put a lot of effort in to fix up the > early passes. My recommendation is to change the port. I understand that we should change our port; my point is, if cse behaves incorrectly for _our_ auto-inc insns, wouldn't it also behave incorrectly for _others'_ auto-inc insns such as stack pushes? From segher@kernel.crashing.org Tue May 15 17:06:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Tue, 15 May 2018 17:06:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: Message-ID: <20180515170632.GK17342@gate.crashing.org> On Mon, May 14, 2018 at 04:38:09PM -0700, Julius Werner wrote: > However, I just found an issue with this when the functions include local > variables like this: > > const int some_array[] = { 1, 2, 3, 4, 5, 6 }; Does it work better if you make this "static const"? Segher From Francesco.Petrogalli@arm.com Tue May 15 18:29:00 2018 From: Francesco.Petrogalli@arm.com (Francesco Petrogalli) Date: Tue, 15 May 2018 18:29:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <1518212868.14236.47.camel@cavium.com> References: <1518212868.14236.47.camel@cavium.com> Message-ID: <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> > On Feb 9, 2018, at 3:47 PM, Steve Ellcey wrote: > > [?] > I was wondering if the function vector ABI has been published yet and > if so, where I could find it. > Hi Steve, I am happy to let you know that the Vector Function ABI for AArch64 is now public and available via the link at [1]. Don?t hesitate to contact me in case you have any questions. Kind regards, Francesco [1] https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi > Steve Ellcey > sellcey@cavium.com From freddie_chopin@op.pl Tue May 15 19:39:00 2018 From: freddie_chopin@op.pl (Freddie Chopin) Date: Tue, 15 May 2018 19:39:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: <49DC31D9-92D9-48A1-935C-4B5B8A3BE47E@gmail.com> References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> <5AF5793A.1050900@westcontrol.com> <49DC31D9-92D9-48A1-935C-4B5B8A3BE47E@gmail.com> Message-ID: On Fri, 2018-05-11 at 18:51 +0200, Richard Biener wrote: > That's an interesting result. Do you have any non-LTO objects? > Basically I'm curious what ld eliminates that gcc with LTO doesn't. Whole project is compiled with LTO, part of the project is provided as a library (which is archived with arm-none-eabi-gcc-ar). Only non-LTO stuff in the final executable are objects from standard toolchain libraries and I suppose they are the culprit here - the toolchain is compiled with -ffunction-sections -fdata-sections, but without -flto. Maybe I should actually compile the whole toolchain with -flto -ffat- lto-objects? Is this a sane idea? > As to a workaround for the ld bug you can try keeping all .debug_* > sections. IIRC 2.30 has the bug fixed (on the branch). Indeed - "keeping" all the debug sections is a viable alternative. I've found out that it is enough to "keep" just these: /* DWARF 2 */ .debug_info 0 : { KEEP(*(.debug_info .gnu.linkonce.wi.*)); } ... .debug_frame 0 : { KEEP(*(.debug_frame)); } I have to check whether debugging something like that is actually possible (; Thanks for the workaround! Regards, FCh From jwerner@chromium.org Tue May 15 19:56:00 2018 From: jwerner@chromium.org (Julius Werner) Date: Tue, 15 May 2018 19:56:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: Message-ID: > I think you are asking for per-function constant pool sections. Because > we generally cannot avoid the need of a constant pool and dependent > on the target that is always global. Note with per-function constant > pools you will not benefit from constant pool entry merging across > functions. I'm also not aware of any non-target-specific (and thus not > implemented on some targets) option to get these. Thanks, yeah, that sounds like what I need. Is there any way to get that behavior today, even for a specific target? (I'm mostly interested in x86_64, armv7 and aarch64.) And are you saying that there are some targets for which it would be impossible to provide this behavior? Or just that it's not implemented for all targets today? Are constant pool entries merged at compile time or at link time? I would presume it should be done at link time because otherwise you're only merging entries within a single compilation unit (which doesn't sound that useful in a big project with hundreds of source files), right? So if they're merged at link time, shouldn't it be possible to do that merging after a linker script condensed all the per-function input sections that are left after --gc-sections back into a single big .rodata output section? (In my case, the linker script would instead condense all the constant pool sections for marked functions into .special_area.rodata and all the others into .rodata, and then it should be merging internally within those two output sections.) > Does it work better if you make this "static const"? Yes, then I can declare an __attribute__((section)) for that specific variable. However, that doesn't really seem like a safe and scalable approach, especially since it's hard to notice when you missed a variable. I'd like to have a way that I can annotate a function and that whole function (with everything it needs, except for globals) gets put into a special section (or a set of special sections with a common prefix or suffix), without having to rewrite the source to accommodate for this every time. From freddie_chopin@op.pl Tue May 15 20:04:00 2018 From: freddie_chopin@op.pl (Freddie Chopin) Date: Tue, 15 May 2018 20:04:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: <5AF99E67.4050004@westcontrol.com> References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> <5AF5793A.1050900@westcontrol.com> <5AF99E67.4050004@westcontrol.com> Message-ID: On Mon, 2018-05-14 at 16:34 +0200, David Brown wrote: > Interesting. Making these sections and then using gc-sections should > only remove code that is not used - LTO should do that anyway. My guess - expressed in the other e-mail to the list - is that the things LTO cannot remove but --gc-sections can are objects from toolchain library. > Have you tried with -ffunction-sections and not -fdata-sections? It > is > the -fdata-sections that ruins -fsection-anchors - the > -ffunction-sections doesn't have the same kind of cost. Results: - -ffunction-sections + -fdata-sections = 124396 ROM + 3484 RAM - -ffunction-sections = 125168 ROM + 3676 RAM - -ffunction-sections + -fsection-anchors = 125168 ROM + 3676 RAM - -ffunction-sections + -fsection-anchors + -fno-common = 125168 ROM + 3676 RAM Generated executables for the second, third and fourth case are identical - assembly listings for these three cases have no differences at all. I've also tried with -fno-section-anchors, and this makes a minor (negative) difference - 125352 ROM + 3676 RAM. > No, -fsection-anchors has plenty of use for fixed-position eabi code. > ... > The code is clearly bigger and slower, and uses more anchors in the > code > section. > > Note that to get similar improvements with non-static data, you need > "-fno-common" - a flag that I believe should be the default for the > compiler. I cannot reproduce this here ); Don't get me wrong - if there's a "free" way to improve code size/speed with some compiler flags which I did not use previously, then I'm very much interested, however in my particular case the best result (size-wise) I get is with just -ffunction-sections + -fdata-sections. The difference is not huge, but it's also not negligible. Maybe this has to do with different compiler versions we are comparing (4.8 vs 8.1)? I guess this is not LTO (which I did not enable for these measurements), as you did not mention it in your flags... Regards, FCh From freddie_chopin@op.pl Tue May 15 20:13:00 2018 From: freddie_chopin@op.pl (Freddie Chopin) Date: Tue, 15 May 2018 20:13:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> <5AF5793A.1050900@westcontrol.com> <49DC31D9-92D9-48A1-935C-4B5B8A3BE47E@gmail.com> Message-ID: <31a46b0c6c7c0b0af496d98b8697587bcf25338c.camel@op.pl> On Tue, 2018-05-15 at 21:39 +0200, Freddie Chopin wrote: > On Fri, 2018-05-11 at 18:51 +0200, Richard Biener wrote: > > As to a workaround for the ld bug you can try keeping all .debug_* > > sections. IIRC 2.30 has the bug fixed (on the branch). > > Indeed - "keeping" all the debug sections is a viable alternative. > I've > found out that it is enough to "keep" just these: > > /* DWARF 2 */ > .debug_info 0 : { KEEP(*(.debug_info .gnu.linkonce.wi.*)); } > ... > .debug_frame 0 : { KEEP(*(.debug_frame)); } > > I have to check whether debugging something like that is actually > possible (; Thanks for the workaround! Nope, sent it too fast... With these two (three) sections "kept" --gc- sections stops working and the executable I get is almost identical to the case when I have no --gc-sections at all: - lto + --gc-sections, sections "kept" - 133504 ROM + 4196 RAM - lto + --gc-sections, sections not "kept" (causes previously mentioned errors) - 120288 ROM + 3676 RAM - lto, sections not "kept" - 133812 ROM + 4220 RAM So it seems I have to patiently wait for new binutils if I would like to use LTO (; Regards, FCh From joseph@codesourcery.com Tue May 15 20:15:00 2018 From: joseph@codesourcery.com (Joseph Myers) Date: Tue, 15 May 2018 20:15:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: Message-ID: This has been listed as a desirable feature for a long time: https://gcc.gnu.org/projects/optimize.html#putting_constants_in_special_sections -- Joseph S. Myers joseph@codesourcery.com From segher@kernel.crashing.org Tue May 15 20:51:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Tue, 15 May 2018 20:51:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: Message-ID: <20180515205054.GL17342@gate.crashing.org> On Tue, May 15, 2018 at 12:56:22PM -0700, Julius Werner wrote: > > I think you are asking for per-function constant pool sections. Because > > we generally cannot avoid the need of a constant pool and dependent > > on the target that is always global. Note with per-function constant > > pools you will not benefit from constant pool entry merging across > > functions. I'm also not aware of any non-target-specific (and thus not > > implemented on some targets) option to get these. > > Thanks, yeah, that sounds like what I need. Is there any way to get that > behavior today, even for a specific target? (I'm mostly interested in > x86_64, armv7 and aarch64.) And are you saying that there are some targets > for which it would be impossible to provide this behavior? Or just that > it's not implemented for all targets today? For aarch64 there is -mpc-relative-literal-loads, I think that will do what you want. This option is implied by -mcmodel=tiny which you may want anyway, if your code is small enough. Segher From sky.li@a-glory.net Wed May 16 01:01:00 2018 From: sky.li@a-glory.net (SKY LEE) Date: Wed, 16 May 2018 01:01:00 -0000 Subject: =?utf-8?B?VXBkYXRlIHJhdGXvvIxpdCdzIHNvIGNoZWFwLg==?= Message-ID: 55a2c9dd-d182-4a4d-b7ea-368f5e560c5e Dear Cilent   Good day, We specialized in international freight forwarder for several years, always provide the best service. Melbourne/Sydney/Brisbane: USD550/1100 /1100Auckland/Tauranga/Lyttelton/Napier/Wellington: USD525/850/850Mexico: USD350/350/350Chile:  USD350/350/350We can offer more rate from china to all around the world, as long as you need it.Whenever,I'm waiting for you.Thanks&Regards,Sky Lee----------------------------------------------------------------------- Shenzhen A-glory International Logistics Co., Ltd Tel : 86-755- 23326162 Ext 823     Wechat/WhatsApp : (+86) 136 1286 3311 Skype: zhaolin3311Mp: (+86)136 1286 3311 E-mail: sky.li@a-glory.net From richard.guenther@gmail.com Wed May 16 05:26:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Wed, 16 May 2018 05:26:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: <31a46b0c6c7c0b0af496d98b8697587bcf25338c.camel@op.pl> References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> <5AF5793A.1050900@westcontrol.com> <49DC31D9-92D9-48A1-935C-4B5B8A3BE47E@gmail.com> <31a46b0c6c7c0b0af496d98b8697587bcf25338c.camel@op.pl> Message-ID: <9E2296F0-E95B-4142-A6D2-E97C307919DE@gmail.com> On May 15, 2018 10:12:45 PM GMT+02:00, Freddie Chopin wrote: >On Tue, 2018-05-15 at 21:39 +0200, Freddie Chopin wrote: >> On Fri, 2018-05-11 at 18:51 +0200, Richard Biener wrote: >> > As to a workaround for the ld bug you can try keeping all .debug_* >> > sections. IIRC 2.30 has the bug fixed (on the branch). >> >> Indeed - "keeping" all the debug sections is a viable alternative. >> I've >> found out that it is enough to "keep" just these: >> >> /* DWARF 2 */ >> .debug_info 0 : { KEEP(*(.debug_info .gnu.linkonce.wi.*)); } >> ... >> .debug_frame 0 : { KEEP(*(.debug_frame)); } >> >> I have to check whether debugging something like that is actually >> possible (; Thanks for the workaround! > >Nope, sent it too fast... With these two (three) sections "kept" --gc- >sections stops working and the executable I get is almost identical to >the case when I have no --gc-sections at all: >- lto + --gc-sections, sections "kept" - 133504 ROM + 4196 RAM >- lto + --gc-sections, sections not "kept" (causes previously mentioned >errors) - 120288 ROM + 3676 RAM >- lto, sections not "kept" - 133812 ROM + 4220 RAM > >So it seems I have to patiently wait for new binutils if I would like >to use LTO (; Build your own (patched) binutils :) Richard. >Regards, >FCh From david@westcontrol.com Wed May 16 07:37:00 2018 From: david@westcontrol.com (David Brown) Date: Wed, 16 May 2018 07:37:00 -0000 Subject: LTO vs GCC 8 In-Reply-To: References: <7837e9b3a56c6eb8806ae42a0b2447d09b7e1078.camel@op.pl> <5AF5793A.1050900@westcontrol.com> <5AF99E67.4050004@westcontrol.com> Message-ID: <5AFBDF8F.7030205@westcontrol.com> On 15/05/18 22:03, Freddie Chopin wrote: > > I cannot reproduce this here ); Don't get me wrong - if there's a > "free" way to improve code size/speed with some compiler flags which I > did not use previously, then I'm very much interested, however in my > particular case the best result (size-wise) I get is with just > -ffunction-sections + -fdata-sections. The difference is not huge, but > it's also not negligible. Maybe this has to do with different compiler > versions we are comparing (4.8 vs 8.1)? I guess this is not LTO (which > I did not enable for these measurements), as you did not mention it in > your flags... > It is quite possible that the difference is from the gcc versions - there have been many improvements since 4.8, and it is entirely possible that gcc now gives the benefits of -fsection-anchors even with -fdata-sections. And I was looking here at the differences for short code sections, rather than the whole program. I will try a few more tests when I have the chance. This computer has such an old Linux installation that I can no longer use modern pre-built versions of the gnu arm embedded toolchain - I can't even use godbolt.org because the browser version is too old. I'll test from home, where I have newer arm gcc versions (including a "bleeding edge" toolchain or too, provided by some nice chap off the internet :-) ). From richard.guenther@gmail.com Wed May 16 09:33:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Wed, 16 May 2018 09:33:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: Message-ID: On Tue, May 15, 2018 at 9:56 PM Julius Werner wrote: > > I think you are asking for per-function constant pool sections. Because > > we generally cannot avoid the need of a constant pool and dependent > > on the target that is always global. Note with per-function constant > > pools you will not benefit from constant pool entry merging across > > functions. I'm also not aware of any non-target-specific (and thus not > > implemented on some targets) option to get these. > Thanks, yeah, that sounds like what I need. Is there any way to get that > behavior today, even for a specific target? (I'm mostly interested in > x86_64, armv7 and aarch64.) And are you saying that there are some targets > for which it would be impossible to provide this behavior? Or just that > it's not implemented for all targets today? It's not implemented for all targets and there may be no way to force it for all constants. > Are constant pool entries merged at compile time or at link time? I would > presume it should be done at link time because otherwise you're only > merging entries within a single compilation unit (which doesn't sound that > useful in a big project with hundreds of source files), right? constant pool entries are merged at compile time. There's no such thing as mergeable constant pool sections so the closest thing would be to emit each entry into its own comdat or linkonce section and have the general linker script merge them pattern-based into .rodata from .rodata.HASHVALUEOFACTUALCONSTANT. And then of course you'll run into hash collisions so I guess it's not a too bright idea but instead some way of making a mergeable rodata section (which requires meta data for its entries) would be a solution. > So if > they're merged at link time, shouldn't it be possible to do that merging > after a linker script condensed all the per-function input sections that > are left after --gc-sections back into a single big .rodata output section? > (In my case, the linker script would instead condense all the constant pool > sections for marked functions into .special_area.rodata and all the others > into .rodata, and then it should be merging internally within those two > output sections.) > > Does it work better if you make this "static const"? > Yes, then I can declare an __attribute__((section)) for that specific > variable. However, that doesn't really seem like a safe and scalable > approach, especially since it's hard to notice when you missed a variable. > I'd like to have a way that I can annotate a function and that whole > function (with everything it needs, except for globals) gets put into a > special section (or a set of special sections with a common prefix or > suffix), without having to rewrite the source to accommodate for this every > time. From rodrivg@gmail.com Wed May 16 10:23:00 2018 From: rodrivg@gmail.com (Rodrigo V. G.) Date: Wed, 16 May 2018 10:23:00 -0000 Subject: Please support the _Atomic keyword in C++ In-Reply-To: References: Message-ID: On Tue, May 15, 2018 at 12:50 PM, Jonathan Wakely wrote: > On 15 May 2018 at 11:01, Rodrigo V. G. wrote: >> On Tue, May 15, 2018 at 12:27 AM, Jonathan Wakely wrote: >>> On 14 May 2018 at 22:32, Rodrigo V. G. wrote: >>>> In addition to the bug: >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932 >>>> I wanted to add some comment: >>>> >>>> It would be very useful if the _Atomic keyword would be supported in C++. >>>> This way the header could be included inconditionally in C++ code. >>>> Even if it is not compatible with the C++ header, it would be useful. >>>> >>>> Supporting the _Atomic keyword in C++ would benefit at least two cases: >>>> >>>> - When mixing C and C++ code for interoperability (using, in C++, some >>>> variables declared as _Atomic in a C header). >>>> >>>> - When developing operating systems or kernels in C++, in a >>>> freestanding environment (cross compiler), is not available, >>> >>> Why not? It's part of a freestanding C++ implementation. >> >> When building a cross compiler as indicated in: >> https://wiki.osdev.org/GCC_Cross-Compiler#GCC >> it does not install the header. > > That's not a problem with GCC, that's a problem with those instructions. > > If you want a freestanding C++ implementation then configure GCC with > --disable-hosted-libstdcxx and build+install libstdc++. Those > instructions do neither of those things, so it's unsurprising you > don't get a proper freestanding C++ implementation. Ok. I give up for the moment. I will continue without libstdc++, as building libstdc++-v3, even with --disable-hosted-libstdcxx in my case seems to require some workarounds (enable_dlopen=no in the configure script) and some headers that I don't have at the moment. Maybe I return to it at a later time. >>>> but is. >>> >>> How? It's not part of any C++ implementation at all, freestanding or not. >> >> As far as I can see the can be included from C++ code >> (also in the cross compiler). >> Only that it complains about _Atomic and _Bool so it does not work. >> So _Atomic and _Bool seem to be the missing pieces. > > So you're saying it can be included, it just doesn't work. > > Great! > > That's true for this header too: > > #error "This header doesn't compile" > > > >>>> So to correctly use things like __atomic_fetch_add in C++ in >>>> freestanding mode, this is the only way. Otherwise one cannot use >>>> atomics at all in these conditions. >>> >>> Why can't you use __atomic_fetch_add directly? >> >> I tried to use __atomic_fetch_add in C++ with a volatile (non _Atomic) variable, >> and it seems to generate the same assembler code. > > Why use volatile? > http://isvolatileusefulwiththreads.com/ I thought that it would make a difference, but if it doesn't, maybe I can omit 'volatile'. >> The only difference that I saw was that with _Atomic >> it generates a "mfence" instruction after initialization but with >> volatile it does not. >> So I think it might not provide the same guarantees. >> >> >> (Sorry, I forgot to cc the list, now I do cc) From richard.sandiford@linaro.org Wed May 16 11:11:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Wed, 16 May 2018 11:11:00 -0000 Subject: Display priority in "Serious" bugs for gcc 8 from web page In-Reply-To: <54219409-7992-92c2-c314-089d3ebdad3b@netcologne.de> (Thomas Koenig's message of "Sat, 14 Apr 2018 16:09:30 +0200") References: <54219409-7992-92c2-c314-089d3ebdad3b@netcologne.de> Message-ID: <87in7nonx3.fsf@linaro.org> Thomas Koenig writes: > Hello world, > > whenever I look at the list of serious bugs, I find myself chaning the > columns to add the priority field. > > What do you think about adding the priority field when clicking on that > link? A patch is attached. I don't think anyone replied to this so far, but +1 FWIW. It seemed strange that the priority was one of the main search criteria but wasn't included in the results. The "Resolution" field seems a bit redundant when the search is only for open bugs, so if people are worried about having too many columns, maybe it would be OK to do a swap? Thanks, Richard From jwakely.gcc@gmail.com Wed May 16 13:01:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Wed, 16 May 2018 13:01:00 -0000 Subject: Please support the _Atomic keyword in C++ In-Reply-To: References: Message-ID: On 16 May 2018 at 11:23, Rodrigo V. G. wrote: > On Tue, May 15, 2018 at 12:50 PM, Jonathan Wakely wrote: >> If you want a freestanding C++ implementation then configure GCC with >> --disable-hosted-libstdcxx and build+install libstdc++. Those >> instructions do neither of those things, so it's unsurprising you >> don't get a proper freestanding C++ implementation. > > Ok. I give up for the moment. I will continue without libstdc++, > as building libstdc++-v3, even with --disable-hosted-libstdcxx > in my case seems to require some workarounds > (enable_dlopen=no in the configure script) and some headers > that I don't have at the moment. Maybe I return to it at a later time. If the freestanding build for libstdc++ doesn't work on your target that might be a bug in libstdc++, but it will be a lot easier to fix that than to support _Atomic in C++. From victoriab55778@gmail.com Wed May 16 15:36:00 2018 From: victoriab55778@gmail.com (Victoria) Date: Wed, 16 May 2018 15:36:00 -0000 Subject: Developers as your Virtual Employees Message-ID: Hello, We have a large team of more than* 220 offshore developers* and we specialise in PHP, AngularJS, iOS, Android and React development. I wanted to reach out to discuss and see if you'd like to learn more about our dev resources and how we can help you expedite your projects, reduce development costs and ensure timely delivery. *Our Service Models:* - Virtual Employee - Full-time - Virtual Employee - Part-time - Time and Material *Why Work with Us?* - Excellent Communication in English - Direct Contact with the Developer - 4.30 am EST to 1.30 pm EST Work hours - Tempo time sheets, JIRA, BitBucket and Confluence for tracking - No Long-Term Commitments If you are interested in learning more, *reply to this email* and one of our reps will be in touch with our company domain. Regards, Olivia If you are not interested and find this email intrusive, please reply REMOVE and we will not reach out again. From ams@codesourcery.com Wed May 16 16:04:00 2018 From: ams@codesourcery.com (Andrew Stubbs) Date: Wed, 16 May 2018 16:04:00 -0000 Subject: Vector pointer modes Message-ID: <784b39d2-3e00-5936-1fd5-ee6c8b794efa@codesourcery.com> Hi all, I'm in the process of trying to update our AMD GCN port from GCC 7 to GCC 8+, but I've hit a problem ... It seems there's a new assumption that pointers and addresses will be scalar, but GCN load instructions require vectors of pointers. Basically, machine_mode has been replaced with scalar_int_machine mode in many places, and we were relying on vector modes being allowed. The changes are all coming from the Richard Sandiford's SVE patches. Is there a new way of dealing with vectors of pointers? Thanks Andrew From sellcey@cavium.com Wed May 16 16:21:00 2018 From: sellcey@cavium.com (Steve Ellcey) Date: Wed, 16 May 2018 16:21:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> Message-ID: <1526487700.29509.6.camel@cavium.com> On Tue, 2018-05-15 at 18:29 +0000, Francesco Petrogalli wrote: > Hi Steve, > > I am happy to let you know that the Vector Function ABI for AArch64 > is now public and available via the link at [1]. > > Don???t hesitate to contact me in case you have any questions. > > Kind regards, > > Francesco > > [1] https://developer.arm.com/products/software-development-tools/hpc > /arm-compiler-for-hpc/vector-function-abi > > > > > Steve Ellcey > > sellcey@cavium.com Thanks for publishing this Francesco, it looks like the main issue for GCC is that the Vector Function ABI has different caller saved / callee saved register conventions than the standard ARM calling convention. If I understand things correctly, in the standard calling convention the callee will only save the bottom 64 bits of V8-V15 and so the caller needs to save those registers if it is using the top half.????In the Vector calling convention the callee will save all 128 bits of these registers (and possibly more registers) so the caller does not have to save these registers at all, even if it is using all 128 bits of them. It doesn't look like GCC has any existing mechanism for having different sets of caller saved/callee saved registers depending on the function attributes of the calling or called function. Changing what registers a callee function saves and restores shouldn't be too difficult since that can be done when generating the prologue and epilogue code but changing what registers a caller saves/restores when doing the call seems trickier.????The macro TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the function being called.????It returns true/false depending on just the register number and mode. Steve Ellcey sellcey@cavium.com From richard.guenther@gmail.com Wed May 16 16:24:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Wed, 16 May 2018 16:24:00 -0000 Subject: Vector pointer modes In-Reply-To: <784b39d2-3e00-5936-1fd5-ee6c8b794efa@codesourcery.com> References: <784b39d2-3e00-5936-1fd5-ee6c8b794efa@codesourcery.com> Message-ID: On May 16, 2018 6:03:35 PM GMT+02:00, Andrew Stubbs wrote: >Hi all, > >I'm in the process of trying to update our AMD GCN port from GCC 7 to >GCC 8+, but I've hit a problem ... > >It seems there's a new assumption that pointers and addresses will be >scalar, but GCN load instructions require vectors of pointers. >Basically, machine_mode has been replaced with scalar_int_machine mode >in many places, and we were relying on vector modes being allowed. > >The changes are all coming from the Richard Sandiford's SVE patches. > >Is there a new way of dealing with vectors of pointers? Maybe you can masquerade it behind a large scalar integer mode?... Richard. >Thanks > >Andrew From Richard.Earnshaw@arm.com Wed May 16 16:30:00 2018 From: Richard.Earnshaw@arm.com (Richard Earnshaw (lists)) Date: Wed, 16 May 2018 16:30:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <1526487700.29509.6.camel@cavium.com> References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> Message-ID: On 16/05/18 17:21, Steve Ellcey wrote: > On Tue, 2018-05-15 at 18:29 +0000, Francesco Petrogalli wrote: > >> Hi Steve, >> >> I am happy to let you know that the Vector Function ABI for AArch64 >> is now public and available via the link at [1]. >> >> Don???t hesitate to contact me in case you have any questions. >> >> Kind regards, >> >> Francesco >> >> [1] https://developer.arm.com/products/software-development-tools/hpc >> /arm-compiler-for-hpc/vector-function-abi >> >>> >>> Steve Ellcey >>> sellcey@cavium.com > > Thanks for publishing this Francesco, it looks like the main issue for > GCC is that the Vector Function ABI has different caller saved / callee > saved register conventions than the standard ARM calling convention. > > If I understand things correctly, in the standard calling convention > the callee will only save the bottom 64 bits of V8-V15 and so the > caller needs to save those registers if it is using the top half.????In > the Vector calling convention the callee will save all 128 bits of > these registers (and possibly more registers) so the caller does not > have to save these registers at all, even if it is using all 128 bits > of them. > > It doesn't look like GCC has any existing mechanism for having different > sets of caller saved/callee saved registers depending on the function > attributes of the calling or called function. > > Changing what registers a callee function saves and restores shouldn't > be too difficult since that can be done when generating the prologue > and epilogue code but changing what registers a caller saves/restores > when doing the call seems trickier.????The macro > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the > function being called.????It returns true/false depending on just the > register number and mode. > > Steve Ellcey > sellcey@cavium.com > Actually, we can. See, for example, the attribute((pcs)) for the ARM port. I think we could probably handle this automagically for the SVE vector calling convention in AArch64. R. From ams@codesourcery.com Wed May 16 16:35:00 2018 From: ams@codesourcery.com (Andrew Stubbs) Date: Wed, 16 May 2018 16:35:00 -0000 Subject: Vector pointer modes In-Reply-To: References: <784b39d2-3e00-5936-1fd5-ee6c8b794efa@codesourcery.com> Message-ID: On 16/05/18 17:24, Richard Biener wrote: > On May 16, 2018 6:03:35 PM GMT+02:00, Andrew Stubbs wrote: >> Is there a new way of dealing with vectors of pointers? > > Maybe you can masquerade it behind a large scalar integer mode?... We're using V64DImode to represent a vector of 64 64-bit pointers. The architecture can hold this in a pair of V64SImode registers; it is not equivalent to 128 consecutive smaller registers, like NEON does. We could use plain DImode to get the same effect from print_operand, but that then chooses the wrong alternative, or whole wrong insn pattern and bad things would happen. Or, do you mean something else? Andrew From greg.casamento@gmail.com Wed May 16 16:36:00 2018 From: greg.casamento@gmail.com (Gregory Casamento) Date: Wed, 16 May 2018 16:36:00 -0000 Subject: Developers as your Virtual Employees In-Reply-To: References: Message-ID: I am very interested On Wed, May 16, 2018 at 11:36 Victoria wrote: > Hello, > > We have a large team of more than* 220 offshore developers* and we > specialise in PHP, AngularJS, iOS, Android and React development. I wanted > to reach out to discuss and see if you'd like to learn more about our dev > resources and how we can help you expedite your projects, reduce > development costs and ensure timely delivery. > > *Our Service Models:* > > - Virtual Employee - Full-time > - Virtual Employee - Part-time > - Time and Material > > *Why Work with Us?* > > - Excellent Communication in English > - Direct Contact with the Developer > - 4.30 am EST to 1.30 pm EST Work hours > - Tempo time sheets, JIRA, BitBucket and Confluence for tracking > - No Long-Term Commitments > > If you are interested in learning more, *reply to this email* and one of > our reps will be in touch with our company domain. > > Regards, > Olivia > > If you are not interested and find this email intrusive, please reply > REMOVE and we will not reach out again. > -- Gregory Casamento GNUstep Lead Developer / OLC, Principal Consultant http://www.gnustep.org - http://heronsperch.blogspot.com http://ind.ie/phoenix/ From richard.guenther@gmail.com Wed May 16 17:02:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Wed, 16 May 2018 17:02:00 -0000 Subject: Vector pointer modes In-Reply-To: References: <784b39d2-3e00-5936-1fd5-ee6c8b794efa@codesourcery.com> Message-ID: On May 16, 2018 6:35:05 PM GMT+02:00, Andrew Stubbs wrote: >On 16/05/18 17:24, Richard Biener wrote: >> On May 16, 2018 6:03:35 PM GMT+02:00, Andrew Stubbs > wrote: >>> Is there a new way of dealing with vectors of pointers? >> >> Maybe you can masquerade it behind a large scalar integer mode?... > >We're using V64DImode to represent a vector of 64 64-bit pointers. The >architecture can hold this in a pair of V64SImode registers; it is not >equivalent to 128 consecutive smaller registers, like NEON does. > >We could use plain DImode to get the same effect from print_operand, >but >that then chooses the wrong alternative, or whole wrong insn pattern >and >bad things would happen. > >Or, do you mean something else? I was thinking of using ZImode where hopefully ZI is large enough to hold V64DI... Richard. >Andrew From sellcey@cavium.com Wed May 16 17:30:00 2018 From: sellcey@cavium.com (Steve Ellcey) Date: Wed, 16 May 2018 17:30:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> Message-ID: <1526491802.29509.19.camel@cavium.com> On Wed, 2018-05-16 at 17:30 +0100, Richard Earnshaw (lists) wrote: > On 16/05/18 17:21, Steve Ellcey wrote: > >?? > > It doesn't look like GCC has any existing mechanism for having different > > sets of caller saved/callee saved registers depending on the function > > attributes of the calling or called function. > > > > Changing what registers a callee function saves and restores shouldn't > > be too difficult since that can be done when generating the prologue > > and epilogue code but changing what registers a caller saves/restores > > when doing the call seems trickier.????The macro > > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the > > function being called.????It returns true/false depending on just the > > register number and mode. > > > > Steve Ellcey > > sellcey@cavium.com > > > > Actually, we can.????See, for example, the attribute((pcs)) for the ARM > port.????I think we could probably handle this automagically for the SVE > vector calling convention in AArch64. > > R. Interesting, it looks like one could use aarch64_emit_call to emit extra use_reg / clobber_reg instructions but in this case we want to tell the caller that some registers are not being clobbered by the callee. ??The ARM port does not define??TARGET_HARD_REGNO_CALL_PART_CLOBBERED and that seemed like one of the most problamatic issues with Aarch64. ??Maybe we would have to undefine this for aarch64 and use explicit clobbers to say what floating point registers / vector registers are clobbered for each call? ??I wonder how that would affect register allocation. Steve Ellcey sellcey@cavium.com From richard.sandiford@linaro.org Wed May 16 21:01:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Wed, 16 May 2018 21:01:00 -0000 Subject: Vector pointer modes In-Reply-To: <784b39d2-3e00-5936-1fd5-ee6c8b794efa@codesourcery.com> (Andrew Stubbs's message of "Wed, 16 May 2018 17:03:35 +0100") References: <784b39d2-3e00-5936-1fd5-ee6c8b794efa@codesourcery.com> Message-ID: <87efibnwli.fsf@linaro.org> Andrew Stubbs writes: > Hi all, > > I'm in the process of trying to update our AMD GCN port from GCC 7 to > GCC 8+, but I've hit a problem ... > > It seems there's a new assumption that pointers and addresses will be > scalar, but GCN load instructions require vectors of pointers. > Basically, machine_mode has been replaced with scalar_int_machine mode > in many places, and we were relying on vector modes being allowed. > > The changes are all coming from the Richard Sandiford's SVE patches. FWIW, I think that assumption was always there. The scalar_int_mode patches just made it more explicit (as in, more code would fail to build if it wasn't honoured, rather than just potentially ICEing). Is this mostly about the RTL level concept of an address or pointer? If so, in what situations do you need the address or pointer itself to be a vector? SVE and AVX use unspecs for gathers and scatters, and I don't think in practice we lose anything by doing that. Thanks, Richard From richard.sandiford@linaro.org Wed May 16 21:11:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Wed, 16 May 2018 21:11:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <1526491802.29509.19.camel@cavium.com> (Steve Ellcey's message of "Wed, 16 May 2018 10:30:02 -0700") References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> <1526491802.29509.19.camel@cavium.com> Message-ID: <87a7sznw5c.fsf@linaro.org> Steve Ellcey writes: > On Wed, 2018-05-16 at 17:30 +0100, Richard Earnshaw (lists) wrote: >> On 16/05/18 17:21, Steve Ellcey wrote: >> >? >> > It doesn't look like GCC has any existing mechanism for having different >> > sets of caller saved/callee saved registers depending on the function >> > attributes of the calling or called function. >> > >> > Changing what registers a callee function saves and restores shouldn't >> > be too difficult since that can be done when generating the prologue >> > and epilogue code but changing what registers a caller saves/restores >> > when doing the call seems trickier.??The macro >> > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the >> > function being called.??It returns true/false depending on just the >> > register number and mode. >> > >> > Steve Ellcey >> > sellcey@cavium.com >> > >> >> Actually, we can.??See, for example, the attribute((pcs)) for the ARM >> port.??I think we could probably handle this automagically for the SVE >> vector calling convention in AArch64. >> >> R. > > Interesting, it looks like one could use aarch64_emit_call to emit > extra use_reg / clobber_reg instructions but in this case we want to > tell the caller that some registers are not being clobbered by the > callee. ?The ARM port does not > define?TARGET_HARD_REGNO_CALL_PART_CLOBBERED and that seemed like one > of the most problamatic issues with Aarch64. ?Maybe we would have to > undefine this for aarch64 and use explicit clobbers to say what > floating point registers / vector registers are clobbered for each > call? ?I wonder how that would affect register allocation. TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way of saying that an rtl instruction preserves the low part of a register but clobbers the high part. We would need something like Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. Another approach would be to piggy-back on the -fipa-ra infrastructure and record that vector PCS functions only clobber Q0-Q7. If -fipa-ra knows that a function doesn't clobber Q8-Q15 then that should override TARGET_HARD_REGNO_CALL_PART_CLOBBERED. (I'm not sure whether it does in practice, but it should :-) And if it doesn't that's a bug that's worth fixing for its own sake.) Thanks, Richard From kalamatee@gmail.com Wed May 16 21:17:00 2018 From: kalamatee@gmail.com (Kalamatee) Date: Wed, 16 May 2018 21:17:00 -0000 Subject: Bug in m68k ASM softloat implementation? Message-ID: Hi, After hunting out a problem using the softloat code on m68k, with the assistance of the WinUAE author (Toni Wilen) we think we have noticed a bug dating back to 1994. Laddsf$nf returns values with the wrong sign, because it clears the sign bit, before caching the wrong value and then attempting to use it. the movel d0,d7 is either in the wrong place (should be before the sign bit is cleared), or it should be removed completely and the following andl IMM (0x80000000),d7 should be changed to use d2 instead (which should have the correct sign from the prior eorl) Yours, Nick Andrews. From kalamatee@gmail.com Wed May 16 21:25:00 2018 From: kalamatee@gmail.com (Kalamatee) Date: Wed, 16 May 2018 21:25:00 -0000 Subject: Bug in m68k ASM softloat implementation? In-Reply-To: References: Message-ID: On 16 May 2018 at 22:17, Kalamatee wrote: > Hi, > > After hunting out a problem using the softloat code on m68k, with the > assistance of the WinUAE author (Toni Wilen) we think we have noticed a bug > dating back to 1994. > > Laddsf$nf returns values with the wrong sign, because it clears the sign > bit, before caching the wrong value and then attempting to use it. > > the movel d0,d7 is either in the wrong place (should be before the sign > bit is cleared), or it should be removed completely and the following > andl IMM (0x80000000),d7 should be changed to use d2 instead (which should > have the correct sign from the prior eorl) > I forgot to mention this is in libgcc/config/m68k/lb1sf68.S From gccadmin@gcc.gnu.org Wed May 16 22:42:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Wed, 16 May 2018 22:42:00 -0000 Subject: gcc-6-20180516 is now available Message-ID: <20180516224224.85843.qmail@sourceware.org> Snapshot gcc-6-20180516 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/6-20180516/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch revision 260300 You'll find: gcc-6-20180516.tar.xz Complete GCC SHA256=fd5ca24c672554ab0d8dafd2b9319099d170509e432121992290232c079f8244 SHA1=cef68d0a230a31384d4bea2594353e8285e4a807 Diffs from 6-20180509 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From gcc@gcc.gnu.org Thu May 17 06:17:00 2018 From: gcc@gcc.gnu.org (冠人 王 via gcc) Date: Thu, 17 May 2018 06:17:00 -0000 Subject: I want to dump something when I compile the program. How should I do ? References: <1203265537.2309564.1526537840163.ref@mail.yahoo.com> Message-ID: <1203265537.2309564.1526537840163@mail.yahoo.com> My work is to modify the gcc source code so as to customize the warning message, when the programmer writing the program violating some rules.? When the violation occurs, I want to reveal some message such as "guanjen375 warning: the rule ### is violated" on the window. How should I do ? Should I modify some file in the gcc source code so that I can print my own message? Besides, I want to realize how GCC code executed, so I want to insert some code into GCC source code to check waht happens when those code be executed. I try to use "printf" but failed, the message I print reveals when I make(after configure), but not reveals at compile time If you do not really understand what do I say, let me show an example: $gcc -Wunused test.c test.c:5:6: warning: unused variable ?a? [-Wunused-variable]? int a;? ? ? ^ becomes $gcc -Wunused test.c HI,DAVIDtest.c:5:6: warning: unused variable ?a? [-Wunused-variable]? int a;? ? ? ^ From andrew_stubbs@mentor.com Thu May 17 11:43:00 2018 From: andrew_stubbs@mentor.com (Andrew Stubbs) Date: Thu, 17 May 2018 11:43:00 -0000 Subject: Vector pointer modes In-Reply-To: <87efibnwli.fsf@linaro.org> References: <784b39d2-3e00-5936-1fd5-ee6c8b794efa@codesourcery.com> <87efibnwli.fsf@linaro.org> Message-ID: <1a76db37-ec8f-30c7-b496-0d955e23232e@mentor.com> On 16/05/18 22:01, Richard Sandiford wrote: > Andrew Stubbs writes: >> Hi all, >> >> I'm in the process of trying to update our AMD GCN port from GCC 7 to >> GCC 8+, but I've hit a problem ... >> >> It seems there's a new assumption that pointers and addresses will be >> scalar, but GCN load instructions require vectors of pointers. >> Basically, machine_mode has been replaced with scalar_int_machine mode >> in many places, and we were relying on vector modes being allowed. >> >> The changes are all coming from the Richard Sandiford's SVE patches. > > FWIW, I think that assumption was always there. The scalar_int_mode > patches just made it more explicit (as in, more code would fail to > build if it wasn't honoured, rather than just potentially ICEing). It was fine if done late enough, but now it's just blocked in TARGET_ADDR_SPACE_POINTER_MODE et al. However, having now finished a first rough forward-port (with the relevant bits of these hooks commented out and gcc_unreachable), I find that vector loads and stores are working perfectly, and there are no related ICEs in the testsuite (although, with vector widths less than 64 still on the to-do list, a lot of the testsuite doesn't do much vectorizing). > Is this mostly about the RTL level concept of an address or pointer? > If so, in what situations do you need the address or pointer itself to > be a vector? SVE and AVX use unspecs for gathers and scatters, and I > don't think in practice we lose anything by doing that. As far as the ISA is concerned, *all* vector loads and stores are scatter/gather. In our port we model a normal, contiguous vector load/store as a DImode base pointer until reload_completed, and then have a splitter expand that into a V64DImode with the appropriate set of lane addresses. Ideally this would happen earlier, so as to allow CSE to optimize the expansion, but we've not got there yet (and, as you say, would probably hit trouble). Andrew From mjambor@suse.cz Thu May 17 11:50:00 2018 From: mjambor@suse.cz (Martin Jambor) Date: Thu, 17 May 2018 11:50:00 -0000 Subject: I want to dump something when I compile the program. How should I do ? In-Reply-To: <1203265537.2309564.1526537840163@mail.yahoo.com> References: <1203265537.2309564.1526537840163.ref@mail.yahoo.com> <1203265537.2309564.1526537840163@mail.yahoo.com> Message-ID: Hi, On Thu, May 17 2018, ?? ? via gcc wrote: > My work is to modify the gcc source code so as to customize the warning message, when the programmer writing the program violating some rules.? > When the violation occurs, I want to reveal some message such as "guanjen375 warning: the rule ### is violated" on the window. > How should I do ? Should I modify some file in the gcc source code so that I can print my own message? > Besides, I want to realize how GCC code executed, so I want to insert some code into GCC source code to check waht happens when those code > be executed. I try to use "printf" but failed, the message I print > reveals when I make(after configure), but not reveals at compile time For this purpose, use fprintf to write your debug message to stderr. Martin > If you do not really understand what do I say, let me show an example: > $gcc -Wunused test.c > test.c:5:6: warning: unused variable ?a? [-Wunused-variable]? int a;? ? ? ^ > becomes > $gcc -Wunused test.c > HI,DAVIDtest.c:5:6: warning: unused variable ?a? [-Wunused-variable]? int a;? ? ? ^ From Richard.Earnshaw@arm.com Thu May 17 12:25:00 2018 From: Richard.Earnshaw@arm.com (Richard Earnshaw (lists)) Date: Thu, 17 May 2018 12:25:00 -0000 Subject: So what's the status of the Git migration? Message-ID: Another year; another release; and still no sign of progress on the git migration. Any ideas on how much longer this is going to take? R. From joseph@codesourcery.com Thu May 17 15:23:00 2018 From: joseph@codesourcery.com (Joseph Myers) Date: Thu, 17 May 2018 15:23:00 -0000 Subject: So what's the status of the Git migration? In-Reply-To: References: Message-ID: On Thu, 17 May 2018, Richard Earnshaw (lists) wrote: > Another year; another release; and still no sign of progress on the git > migration. > > Any ideas on how much longer this is going to take? See git://thyrsus.com/repositories/gcc-conversion.git for the current version of the conversion machinery, including a TODO list (and see also http://esr.ibiblio.org/?p=7959 ). Presumably required work on the GCC side (deciding appropriate policies on branch deletion / non-fast-forward pushes, developing hooks / repository configuration to implement those policies and send commit mails, writing updates to documentation and scripts) could be done in parallel with the conversion work, but Jason may already have some of that done anyway. -- Joseph S. Myers joseph@codesourcery.com From matz@suse.de Thu May 17 16:10:00 2018 From: matz@suse.de (Michael Matz) Date: Thu, 17 May 2018 16:10:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: Message-ID: Hi, On Wed, 16 May 2018, Richard Biener wrote: > > Are constant pool entries merged at compile time or at link time? I > > would presume it should be done at link time because otherwise you're > > only merging entries within a single compilation unit (which doesn't > > sound that useful in a big project with hundreds of source files), > > right? > > constant pool entries are merged at compile time. There's no such thing > as mergeable constant pool sections Actually there is in ELF. Mergable sections can not only hold strings, but also fixed-size entities (e.g. 4 or 8 byte constants). Those are merged content-wise at link time and references properly rewritten. Of course, those still aren't per-function. Ciao, Michael. From esr@thyrsus.com Thu May 17 16:27:00 2018 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 17 May 2018 16:27:00 -0000 Subject: So what's the status of the Git migration? In-Reply-To: References: Message-ID: <20180517162730.GA9989@thyrsus.com> Richard Earnshaw (lists) : > Another year; another release; and still no sign of progress on the git > migration. > > Any ideas on how much longer this is going to take? > > R. I'm still working on it. It's a slow process because the repo is so huge that full conversions take around 9 hours each, That means that on a good day I can test naybe two point changes. The current issue - and, I think, the last major one - is that there are over 150 nid-branch deletes to be resolved. -- Eric S. Raymond My work is funded by the Internet Civil Engineering Institute: https://icei.org Please visit their site and donate: the civilization you save might be your own. From ippolito.marco@gmail.com Thu May 17 18:06:00 2018 From: ippolito.marco@gmail.com (Marco Ippolito) Date: Thu, 17 May 2018 18:06:00 -0000 Subject: Are the extended algorithms in the header file going to be supported by gcc ? Message-ID: Hi, the good book "C++17 STL Cookbook" in chapter 9 "Parallelism and Concurrency" describes some of the 69 algorithms that were extended to accept execution policies in order to run parallel on multiple cores. And, according to the provided examples, they all greatly simplify some aspects of parallelization with the standard algorithms. I'm using Ubuntu 16.04 with upgraded gcc version 7.2.0 (Ubuntu 7.2.0-1ubuntu1~16.04) and the header file is not present. In the official gcc documentation: https://gcc.gnu.org/onlinedocs/libstdc++/ manual/status.html the support to is flagged as "no" in the Table 1.5. C++ 2017 Implementation Status, and it seems that it is even not foreseen to be included in the upgraded gcc 8 version: https://gcc.gnu.org/projects/cxx-status.html So, the crucial question about these 69 extended algorithms in the header file is: are these extended algorithms going to be supported by the next releases of gcc, as they are already supported by Visual Studio, and soon by clang ? Looking forward to your kind feedback about this extremely important aspect of gcc. Marco From trodgers@redhat.com Thu May 17 19:05:00 2018 From: trodgers@redhat.com (Thomas Rodgers) Date: Thu, 17 May 2018 19:05:00 -0000 Subject: Are the extended algorithms in the header file going to be supported by gcc ? In-Reply-To: References: Message-ID: There is work ongoing to complete C++17 support in libstdc++, this includes providing support for the C++17 parallel algorithms. Marco Ippolito writes: > Hi, > > the good book "C++17 STL Cookbook" in chapter 9 "Parallelism and > Concurrency" describes some of the 69 algorithms that were extended to > accept execution policies in order to run parallel on multiple cores. And, > according to the provided examples, they all greatly simplify some aspects > of parallelization with the standard algorithms. > > I'm using Ubuntu 16.04 with upgraded gcc version 7.2.0 (Ubuntu > 7.2.0-1ubuntu1~16.04) and the header file is not present. > > In the official gcc documentation: https://gcc.gnu.org/onlinedocs/libstdc++/ > manual/status.html the support to is flagged as "no" in > the Table 1.5. C++ 2017 Implementation Status, > and it seems that it is even not foreseen to be included in the upgraded > gcc 8 version: https://gcc.gnu.org/projects/cxx-status.html > > So, the crucial question about these 69 extended algorithms in the header > file is: > are these extended algorithms going to be supported by the next releases of > gcc, as they are already supported by Visual Studio, and soon by clang ? > > Looking forward to your kind feedback about this extremely important aspect > of gcc. > Marco From pamela.crawford@techusersinfo.com Thu May 17 20:34:00 2018 From: pamela.crawford@techusersinfo.com (Pamela Crawford) Date: Thu, 17 May 2018 20:34:00 -0000 Subject: Business intelligence of Accounting Software Message-ID: Hello, Would you be interested in acquiring newly released Accounting Software Contact Information? Our entire list comes with complete contact information including direct phone numbers and emails. We also have other specialist such as: ? QuickBooks Software Users ? Financial Analytical Applications Users ? Ecommerce Users ? ERP Users ? Configure Price Quote (CPQ) Users Please let me know the below and I shall get back to you with other list details accordingly. Target Specialist___? Target Geography___? Hope to hear from you soon. Regards, Pamela Crawford - Marketing Analyst This is an attempt to begin a conversation with you. If you would rather not hear from us, please respond mentioning UNSUBSCRIBE in the subject line. If you are not the right person please forward this email to the right person in your organization. From segher@kernel.crashing.org Thu May 17 21:10:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Thu, 17 May 2018 21:10:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: Message-ID: <20180517211036.GB17342@gate.crashing.org> On Thu, May 17, 2018 at 06:10:13PM +0200, Michael Matz wrote: > On Wed, 16 May 2018, Richard Biener wrote: > > > Are constant pool entries merged at compile time or at link time? I > > > would presume it should be done at link time because otherwise you're > > > only merging entries within a single compilation unit (which doesn't > > > sound that useful in a big project with hundreds of source files), > > > right? > > > > constant pool entries are merged at compile time. There's no such thing > > as mergeable constant pool sections > > Actually there is in ELF. Mergable sections can not only hold strings, > but also fixed-size entities (e.g. 4 or 8 byte constants). Those are > merged content-wise at link time and references properly rewritten. Of > course, those still aren't per-function. It also works correctly in combination with -ffunction-sections, -fdata-sections, -Wl,--gc-sections. And not with per-function constant pools like on arm-linux; I'm not sure how that could ever work. Segher From gccadmin@gcc.gnu.org Thu May 17 22:41:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Thu, 17 May 2018 22:41:00 -0000 Subject: gcc-7-20180517 is now available Message-ID: <20180517224057.124577.qmail@sourceware.org> Snapshot gcc-7-20180517 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20180517/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 260339 You'll find: gcc-7-20180517.tar.xz Complete GCC SHA256=78acc1dbad063ca5b42887b993de25bda208f1ebde20c1475d6c008de14bb067 SHA1=0455eca25828df63d5649420603694e131f117d1 Diffs from 7-20180510 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From richard.guenther@gmail.com Fri May 18 07:33:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Fri, 18 May 2018 07:33:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: <20180517211036.GB17342@gate.crashing.org> References: <20180517211036.GB17342@gate.crashing.org> Message-ID: On Thu, May 17, 2018 at 11:10 PM Segher Boessenkool < segher@kernel.crashing.org> wrote: > On Thu, May 17, 2018 at 06:10:13PM +0200, Michael Matz wrote: > > On Wed, 16 May 2018, Richard Biener wrote: > > > > Are constant pool entries merged at compile time or at link time? I > > > > would presume it should be done at link time because otherwise you're > > > > only merging entries within a single compilation unit (which doesn't > > > > sound that useful in a big project with hundreds of source files), > > > > right? > > > > > > constant pool entries are merged at compile time. There's no such thing > > > as mergeable constant pool sections > > > > Actually there is in ELF. Mergable sections can not only hold strings, > > but also fixed-size entities (e.g. 4 or 8 byte constants). Those are > > merged content-wise at link time and references properly rewritten. Of > > course, those still aren't per-function. Interesting. Do they allow merging across such sections? Consider a 8 byte entity 0x12345678 and 4 byte entities 0x1234 0x5678, will the 4 byte entities share the rodata with the 8 byte one? I believe GCC pulls off such tricks in its internal constant pool merging code. It might be worth gathering statistics on the size of constant pool entries for this. Now the question is of course if BFD contains support for optimizing those sections. > It also works correctly in combination with -ffunction-sections, > -fdata-sections, -Wl,--gc-sections. And not with per-function constant > pools like on arm-linux; I'm not sure how that could ever work. > Segher From vendas@lojadearmas.pt Fri May 18 15:04:00 2018 From: vendas@lojadearmas.pt (=?utf-8?Q?Jos=C3=A9_Carmo?=) Date: Fri, 18 May 2018 15:04:00 -0000 Subject: Your newsletter subscription Message-ID: <201805181504.w4IF4Rm0095196@eweb06.namesco.net> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From paulkoning@comcast.net Fri May 18 18:03:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Fri, 18 May 2018 18:03:00 -0000 Subject: How do I stop gcc from loading data into registers when that's not needed? Message-ID: Gents, In some targets, like pdp11 and vax, most instructions can reference data in memory directly. So when I have "if (x < y) ..." I would expect something like this: cmpw x, y bgeq 1f ... What I actually see, with -O2 and/or -Os, is: movw x, r0 movw y, r1 cmpw r0, r1 bgeq 1f ... which is both longer and slower. I can't tell why this happens, or how to stop it. The machine description has "general_operand" so it doesn't seem to be the place that forces things into registers. paul From richard.guenther@gmail.com Fri May 18 18:07:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Fri, 18 May 2018 18:07:00 -0000 Subject: How do I stop gcc from loading data into registers when that's not needed? In-Reply-To: References: Message-ID: On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning wrote: >Gents, > >In some targets, like pdp11 and vax, most instructions can reference >data in memory directly. > >So when I have "if (x < y) ..." I would expect something like this: > > cmpw x, y > bgeq 1f > ... > >What I actually see, with -O2 and/or -Os, is: > > movw x, r0 > movw y, r1 > cmpw r0, r1 > bgeq 1f > ... > >which is both longer and slower. I can't tell why this happens, or how >to stop it. The machine description has "general_operand" so it >doesn't seem to be the place that forces things into registers. I would expect combine to merge the load and arithmetic and thus it is eventually the target costing that makes that not succeed. Richard. > paul From gccadmin@gcc.gnu.org Fri May 18 22:40:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Fri, 18 May 2018 22:40:00 -0000 Subject: gcc-8-20180518 is now available Message-ID: <20180518224032.22764.qmail@sourceware.org> Snapshot gcc-8-20180518 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/8-20180518/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch revision 260383 You'll find: gcc-8-20180518.tar.xz Complete GCC SHA256=f02e8f919de390afef0afffd5ea573b42e21175c5a9bd36727f3298f8e7ce430 SHA1=8d2b739997dec25ee13720589e19090ba44fc61f Diffs from 8-20180511 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From gcc@gcc.gnu.org Sat May 19 06:01:00 2018 From: gcc@gcc.gnu.org (sumit kasliwal; via gcc) Date: Sat, 19 May 2018 06:01:00 -0000 Subject: How to disable multiple declarations References: <690331560.3047642.1526709674214.ref@mail.yahoo.com> Message-ID: <690331560.3047642.1526709674214@mail.yahoo.com> > Hi, >? > As a visiting faculty member at a local college, I am preparing > documentation for some students that details bad practices to > avoid when programming with C. >? > One of the practices I admonish against is multiple declarations > in a single statement. While my doc does whatever little is > possible at the documentation level, I was also looking for a way > for being able to disable multiple declarations in the Makefile > itself. >? > Is there some way I can get the gcc compiler to throw an error > at compile time if it spots multiple declarations being made > in a single C statement ? The error mechanism must be specific > to multiple-declarations. >? > Thanks for any help From jwakely.gcc@gmail.com Sat May 19 06:12:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Sat, 19 May 2018 06:12:00 -0000 Subject: How to disable multiple declarations In-Reply-To: <690331560.3047642.1526709674214@mail.yahoo.com> References: <690331560.3047642.1526709674214.ref@mail.yahoo.com> <690331560.3047642.1526709674214@mail.yahoo.com> Message-ID: On 19 May 2018 at 07:01, sumit kasliwal; wrote: >> Hi, >> >> As a visiting faculty member at a local college, I am preparing >> documentation for some students that details bad practices to >> avoid when programming with C. >> >> One of the practices I admonish against is multiple declarations >> in a single statement. While my doc does whatever little is >> possible at the documentation level, I was also looking for a way >> for being able to disable multiple declarations in the Makefile >> itself. >> >> Is there some way I can get the gcc compiler to throw an error >> at compile time if it spots multiple declarations being made >> in a single C statement ? The error mechanism must be specific >> to multiple-declarations. >> >> Thanks for any help There was a thread about this recently: https://gcc.gnu.org/ml/gcc/2018-04/msg00110.html But the short answer is no. From sgk@troutmask.apl.washington.edu Sun May 20 05:20:00 2018 From: sgk@troutmask.apl.washington.edu (Steve Kargl) Date: Sun, 20 May 2018 05:20:00 -0000 Subject: Policy for reverting someone else commit? Message-ID: <20180520052025.GA25869@troutmask.apl.washington.edu> So, there is a P1 blocking bootstrap failure on trunk. I've opened a PR and finally had time to locate the offending commit. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85843 As I cannot bootstrap gcc, I cannot test a set of patches for gfortran that I have in my tree nor identify which recent commit introduced a regression in the gfortran testsuite. I've scanned gcc.gnu.org and wiki, but have not been able to find a stated policy of reverting a patch committed by someone. The offending commit was done on a Friday. I have no idea if the committer responsible for the bootstrap failure works on the weekend. So, can I revert the commit (and don't in my local repository)? -- Steve From richard.guenther@gmail.com Sun May 20 05:27:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Sun, 20 May 2018 05:27:00 -0000 Subject: Policy for reverting someone else commit? In-Reply-To: <20180520052025.GA25869@troutmask.apl.washington.edu> References: <20180520052025.GA25869@troutmask.apl.washington.edu> Message-ID: <32A89A3F-7811-4122-A627-56CE6D32FD60@gmail.com> On May 20, 2018 7:20:25 AM GMT+02:00, Steve Kargl wrote: >So, there is a P1 blocking bootstrap failure on trunk. >I've opened a PR and finally had time to locate the >offending commit. > >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85843 > >As I cannot bootstrap gcc, I cannot test a set of >patches for gfortran that I have in my tree nor >identify which recent commit introduced a regression >in the gfortran testsuite. > >I've scanned gcc.gnu.org and wiki, but have not >been able to find a stated policy of reverting a >patch committed by someone. > >The offending commit was done on a Friday. I >have no idea if the committer responsible for >the bootstrap failure works on the weekend. > >So, can I revert the commit (and don't in my >local repository)? IIRC there is a 24h rule that global maintainers can invoke. Not sure if that is formally documented somewhere. Usually it's much easier to revert this in your local repo for the time being. Richard. From sgk@troutmask.apl.washington.edu Sun May 20 05:42:00 2018 From: sgk@troutmask.apl.washington.edu (Steve Kargl) Date: Sun, 20 May 2018 05:42:00 -0000 Subject: Policy for reverting someone else commit? In-Reply-To: <32A89A3F-7811-4122-A627-56CE6D32FD60@gmail.com> References: <20180520052025.GA25869@troutmask.apl.washington.edu> <32A89A3F-7811-4122-A627-56CE6D32FD60@gmail.com> Message-ID: <20180520054200.GA26015@troutmask.apl.washington.edu> On Sun, May 20, 2018 at 07:27:06AM +0200, Richard Biener wrote: > On May 20, 2018 7:20:25 AM GMT+02:00, Steve Kargl wrote: > >So, there is a P1 blocking bootstrap failure on trunk. > >I've opened a PR and finally had time to locate the > >offending commit. > > > >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85843 > > > >As I cannot bootstrap gcc, I cannot test a set of > >patches for gfortran that I have in my tree nor > >identify which recent commit introduced a regression > >in the gfortran testsuite. > > > >I've scanned gcc.gnu.org and wiki, but have not > >been able to find a stated policy of reverting a > >patch committed by someone. > > > >The offending commit was done on a Friday. I > >have no idea if the committer responsible for > >the bootstrap failure works on the weekend. > > > >So, can I revert the commit (and don't in my > >local repository)? > > IIRC there is a 24h rule that global maintainers can invoke. > Not sure if that is formally documented somewhere. > > Usually it's much easier to revert this in your local repo > for the time being. > Yes, I've reverted locally. I must be much more cautious than others. I bootstrap gcc in a clean directory prior to committing. Adding additional commits on top of a known bad commit, which causes a bootstrap failure, seems to be asking for trouble. I'll wait until tomorrow. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From jwakely.gcc@gmail.com Sun May 20 19:51:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Sun, 20 May 2018 19:51:00 -0000 Subject: Policy for reverting someone else commit? In-Reply-To: <32A89A3F-7811-4122-A627-56CE6D32FD60@gmail.com> References: <20180520052025.GA25869@troutmask.apl.washington.edu> <32A89A3F-7811-4122-A627-56CE6D32FD60@gmail.com> Message-ID: On 20 May 2018 at 06:27, Richard Biener wrote: > On May 20, 2018 7:20:25 AM GMT+02:00, Steve Kargl wrote: >>So, there is a P1 blocking bootstrap failure on trunk. >>I've opened a PR and finally had time to locate the >>offending commit. >> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85843 >> >>As I cannot bootstrap gcc, I cannot test a set of >>patches for gfortran that I have in my tree nor >>identify which recent commit introduced a regression >>in the gfortran testsuite. >> >>I've scanned gcc.gnu.org and wiki, but have not >>been able to find a stated policy of reverting a >>patch committed by someone. >> >>The offending commit was done on a Friday. I >>have no idea if the committer responsible for >>the bootstrap failure works on the weekend. >> >>So, can I revert the commit (and don't in my >>local repository)? > > IIRC there is a 24h rule that global maintainers can invoke. Not sure if that is formally documented somewhere. > > Usually it's much easier to revert this in your local repo for the time being. Or just stop using -Werror, so you can build. The code isn't invalid, it's just a style warning. From gerald@pfeifer.com Sun May 20 20:30:00 2018 From: gerald@pfeifer.com (Gerald Pfeifer) Date: Sun, 20 May 2018 20:30:00 -0000 Subject: [wwwdocs PATCH] for Re: Policy for reverting someone else commit? In-Reply-To: <32A89A3F-7811-4122-A627-56CE6D32FD60@gmail.com> References: <20180520052025.GA25869@troutmask.apl.washington.edu> <32A89A3F-7811-4122-A627-56CE6D32FD60@gmail.com> Message-ID: On Sun, 20 May 2018, Richard Biener wrote: > IIRC there is a 24h rule that global maintainers can invoke. Not > sure if that is formally documented somewhere. Yes, we have a reversion policy; it is documented at https://gcc.gnu.org/develop.html And, after me just having applied the patch below, now directly reachable at https://gcc.gnu.org/develop.html#reversion Gerald Index: develop.html =================================================================== RCS file: /cvs/gcc/wwwdocs/htdocs/develop.html,v retrieving revision 1.180 diff -u -r1.180 develop.html --- develop.html 25 Apr 2018 08:44:26 -0000 1.180 +++ develop.html 20 May 2018 20:20:35 -0000 @@ -154,7 +154,7 @@ so it is unlikely that many conflicts will occur.

-

Patch Reversion

+

Patch Reversion

If a patch is committed which introduces a regression on any target which the Steering Committee considers to be important and if:

From gccadmin@gcc.gnu.org Sun May 20 22:42:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Sun, 20 May 2018 22:42:00 -0000 Subject: gcc-9-20180520 is now available Message-ID: <20180520224142.41199.qmail@sourceware.org> Snapshot gcc-9-20180520 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/9-20180520/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 260425 You'll find: gcc-9-20180520.tar.xz Complete GCC SHA256=7c55b0fb9f2d3369ab52ef9107602bb7c9dedc5f6dbcb9242e0a7fe56945eab5 SHA1=4ada8e391c4752188209ebe77b7c341a9a17d04f Diffs from 9-20180513 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From claudio.velasco@nacio.unlp.edu.ar Mon May 21 00:09:00 2018 From: claudio.velasco@nacio.unlp.edu.ar (T. K Acharya) Date: Mon, 21 May 2018 00:09:00 -0000 Subject: Important, please! Message-ID: <3284332239940680790@mail.nacio.unlp.edu.ar> Hello, Did you receive my previous email? Pls inform. Best regards, T. K Acharya From jason.vas.dias@gmail.com Mon May 21 16:04:00 2018 From: jason.vas.dias@gmail.com (Jason Vas Dias) Date: Mon, 21 May 2018 16:04:00 -0000 Subject: gcc-6-branch test failures: g++ c-c++-common/asan/pr59063-2.c Message-ID: Good day - Attempts to build latest GCC gcc-6-branch version from SVN ( Revision 260441 ), with the GCC 6.4.1 from the last time I built it ( git commit starting '4f2cbe2' ), now fail in 'make check' , on a Linux x86_64 host (RHEL 7.5, glibc 2.17) : When 'pr59063-2.c' is built with the '-static-libasan' option and WITHOUT any '-lasan' option being the first DT_NEEDED shared library, (ie. being the FIRST -l option) then the program produced emits the message: '==$pid==ASan runtime does not come first in initial library list; \ you should either link runtime to your application or manually preload \ it with LD_PRELOAD. ' . Then all tests that run that program fail. So, if I modify gcc/testsuite/lib/asan-dg.exp to add '-lasan' as the first '-l' argument, OR I set libasan to be preloaded, the program loads, but coredumps, as shown in this gdb session: Reading symbols from ./pr59063-2.exe...done. (gdb) b sigaction Breakpoint 1 at 0x42d800: file ../../.././libsanitizer/asan/asan_interceptors.cc, line 287. (gdb) b real_sigaction Breakpoint 2 at 0x4ac710: file ../../.././libsanitizer/asan/asan_interceptors.cc, line 297. (gdb) c The program is not being run. (gdb) run Starting program: /tmp/pr59063-2.exe [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". Breakpoint 2, __sanitizer::real_sigaction (signum=11, act=0x7fffffffe230, oldact=0x0) at ../../.././libsanitizer/asan/asan_interceptors.cc:297 297 (struct sigaction *)oldact); (gdb) n Breakpoint 1, __interceptor_sigaction (signum=11, act=0x7fffffffe230, oldact=0x0) at ../../.././libsanitizer/asan/asan_interceptors.cc:287 287 struct sigaction *oldact) { (gdb) 288 if (!IsDeadlySignal(signum) || common_flags()->allow_user_segv_handler) { (gdb) 289 return REAL(sigaction)(signum, act, oldact); (gdb) 292 } (gdb) 289 return REAL(sigaction)(signum, act, oldact); (gdb) 292 } (gdb) 289 return REAL(sigaction)(signum, act, oldact); (gdb) 0x0000000000000000 in ?? () (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) where #0 0x0000000000000000 in ?? () #1 0x00000000004d02a2 in __sanitizer::MaybeInstallSigaction (signum=11, handler=0x4c6880 <__asan::AsanOnDeadlySignal(int, void*, void*)>) at ../../.././libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc:178 #2 0x00000000004d0603 in __sanitizer::InstallDeadlySignalHandlers (handler=0x4c6880 <__asan::AsanOnDeadlySignal(int, void*, void*)>) at ../../.././libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc:187 #3 0x00000000004cd50d in __asan::AsanInitInternal () at ../../.././libsanitizer/asan/asan_rtl.cc:502 #4 0x00007ffff7de8d83 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2 #5 0x00007ffff7dda0ba in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #6 0x0000000000000001 in ?? () #7 0x00007fffffffe5ab in ?? () #8 0x0000000000000000 in ?? () (gdb) It appears that the '__interception::PTR_TO_REAL' mechanism REAL(sigaction) is using to resolve address of sigaction is being foxed by the fact that libasan.a is statically linked, AND the libasan.so is also linked to, but hardly any symbols are resolved to it. When I remove '-static-libasan', and supply only '-lasan', in the build command line generated by asan-dg.exp, the test compiles and runs fine . I am not using libasan for my project, just trying to build latest GCC 6.4.1 with latest retpoline backports & bugfixes into a version that passes its test suite, and I noticed this problem - this is the main 'unexpected failure' source . I guess I should class this an 'unexpected but don't care' failure ? I just thought I should let the GCC developers know . Thanks & Best Regards, Jason ' From info@youpic.com Mon May 21 16:58:00 2018 From: info@youpic.com (YouPic Team) Date: Mon, 21 May 2018 16:58:00 -0000 Subject: Let's stay in touch! Message-ID: Welcome http://youpic.com?utm_content=Email&utm_campaign=Email%20marketing&utm_source=SendGrid&utm_term=Email%20marketing&utm_medium=Email Hello YouPic Community! We need your help. Due to the recent tragedy in Manchester, we have decided to bring our community together and show every day life in Manchester. As we all know, photography has a way of telling a great story without words. Whether you are a resident in Manchester or there on vacation, we want to create a photo series "Life in Manchester". We want to show that the world is still standing strong and kindness continues to fill the air. ? We will choose 20 photos and of course credit each of them. ?We look forward to seeing your photos. Lets make this happen! Don't forget to hashtag #lovemanchester Deadline: Sunday? Take me to YouPic http://youpic.com/?utm_content=Email&utm_campaign=Email%20marketing&utm_source=SendGrid&utm_term=Email%20marketing&utm_medium=Email Sent with ? from YouPic Holtermansgatan 1, 41129 Gothenburg, Sweden Unsubscribe https://youpic.com/settings?utm_content=Email&utm_campaign=Email%20marketing&utm_source=SendGrid&utm_term=Email%20marketing&utm_medium=Email From jason.vas.dias@gmail.com Mon May 21 17:14:00 2018 From: jason.vas.dias@gmail.com (Jason Vas Dias) Date: Mon, 21 May 2018 17:14:00 -0000 Subject: gcc-6-branch test failures: g++ c-c++-common/asan/pr59063-2.c In-Reply-To: (Jason Vas Dias's message of "Mon, 21 May 2018 16:04:21 +0000") References: Message-ID: To me it looks like the definition of 'real_sigaction' in 'asan_interceptors.cc' is actually going to recursively call itself - so I tried patching libsanitizer: {BEGIN PATCH: Index: asan/asan_interceptors.cc =================================================================== --- asan/asan_interceptors.cc (revision 260441) +++ asan/asan_interceptors.cc (working copy) @@ -293,8 +293,8 @@ namespace __sanitizer { int real_sigaction(int signum, const void *act, void *oldact) { - return REAL(sigaction)(signum, (const struct sigaction *)act, - (struct sigaction *)oldact); + return ::sigaction(signum, (const struct sigaction *)act, + (struct sigaction *)oldact); } } // namespace __sanitizer Index: sanitizer_common/sanitizer_common_interceptors.inc =================================================================== --- sanitizer_common/sanitizer_common_interceptors.inc (revision 260441) +++ sanitizer_common/sanitizer_common_interceptors.inc (working copy) @@ -2446,7 +2446,7 @@ INTERCEPTOR(uptr, ptrace, int request, int pid, void *addr, void *data) { void *ctx; COMMON_INTERCEPTOR_ENTER(ctx, ptrace, request, pid, addr, data); - __sanitizer_iovec local_iovec; + __sanitizer_iovec local_iovec {}; if (data) { if (request == ptrace_setregs) END PATCH} The second patch line removes a warning about local_iovec being used uninitialized. With the patch applied, the program does not coredump and shows more what is going on: $ ./pr59063-2.exe ==5898==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING. ==5898==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range. ==5898==Process memory map follows: 0x000000400000-0x000000522000 /tmp/pr59063-2.exe 0x000000721000-0x000000723000 /tmp/pr59063-2.exe 0x000000723000-0x000000726000 /tmp/pr59063-2.exe 0x000000726000-0x000001398000 0x00007fff7000-0x00008fff7000 0x00008fff7000-0x02008fff7000 0x02008fff7000-0x10007fff8000 0x600000000000-0x640000000000 0x640000000000-0x640000003000 0x7f05baec2000-0x7f05bb566000 0x7f05bb566000-0x7f05bb72f000 /usr/lib64/libc-2.17.so 0x7f05bb72f000-0x7f05bb92f000 /usr/lib64/libc-2.17.so 0x7f05bb92f000-0x7f05bb933000 /usr/lib64/libc-2.17.so 0x7f05bb933000-0x7f05bb935000 /usr/lib64/libc-2.17.so 0x7f05bb935000-0x7f05bb93a000 0x7f05bb93a000-0x7f05bb950000 /home/devel/OS/gcc-6-branch/host-x86_64-linux-gnu/gcc/libgcc_s.so.1 0x7f05bb950000-0x7f05bbb4f000 /home/devel/OS/gcc-6-branch/host-x86_64-linux-gnu/gcc/libgcc_s.so.1 0x7f05bbb4f000-0x7f05bbb50000 /home/devel/OS/gcc-6-branch/host-x86_64-linux-gnu/gcc/libgcc_s.so.1 0x7f05bbb50000-0x7f05bbb51000 /home/devel/OS/gcc-6-branch/host-x86_64-linux-gnu/gcc/libgcc_s.so.1 0x7f05bbb51000-0x7f05bbb69000 /usr/lib64/libpthread-2.17.so 0x7f05bbb69000-0x7f05bbd68000 /usr/lib64/libpthread-2.17.so 0x7f05bbd68000-0x7f05bbd69000 /usr/lib64/libpthread-2.17.so 0x7f05bbd69000-0x7f05bbd6a000 /usr/lib64/libpthread-2.17.so 0x7f05bbd6a000-0x7f05bbd6e000 0x7f05bbd6e000-0x7f05bbd70000 /usr/lib64/libdl-2.17.so 0x7f05bbd70000-0x7f05bbf70000 /usr/lib64/libdl-2.17.so 0x7f05bbf70000-0x7f05bbf71000 /usr/lib64/libdl-2.17.so 0x7f05bbf71000-0x7f05bbf72000 /usr/lib64/libdl-2.17.so 0x7f05bbf72000-0x7f05bbf79000 /usr/lib64/librt-2.17.so 0x7f05bbf79000-0x7f05bc178000 /usr/lib64/librt-2.17.so 0x7f05bc178000-0x7f05bc179000 /usr/lib64/librt-2.17.so 0x7f05bc179000-0x7f05bc17a000 /usr/lib64/librt-2.17.so 0x7f05bc17a000-0x7f05bc27b000 /usr/lib64/libm-2.17.so 0x7f05bc27b000-0x7f05bc47a000 /usr/lib64/libm-2.17.so 0x7f05bc47a000-0x7f05bc47b000 /usr/lib64/libm-2.17.so 0x7f05bc47b000-0x7f05bc47c000 /usr/lib64/libm-2.17.so 0x7f05bc47c000-0x7f05bc5f6000 /home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6.0.22 0x7f05bc5f6000-0x7f05bc7f6000 /home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6.0.22 0x7f05bc7f6000-0x7f05bc800000 /home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6.0.22 0x7f05bc800000-0x7f05bc802000 /home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6.0.22 0x7f05bc802000-0x7f05bc806000 0x7f05bc806000-0x7f05bc92f000 /home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libsanitizer/asan/.libs/libasan.so.3.0.0 0x7f05bc92f000-0x7f05bcb2e000 /home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libsanitizer/asan/.libs/libasan.so.3.0.0 0x7f05bcb2e000-0x7f05bcb31000 /home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libsanitizer/asan/.libs/libasan.so.3.0.0 0x7f05bcb31000-0x7f05bcb34000 /home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libsanitizer/asan/.libs/libasan.so.3.0.0 0x7f05bcb34000-0x7f05bd7a6000 0x7f05bd7a6000-0x7f05bd7ca000 /usr/lib64/ld-2.17.so 0x7f05bd998000-0x7f05bd99c000 0x7f05bd9a0000-0x7f05bd9c9000 0x7f05bd9c9000-0x7f05bd9ca000 /usr/lib64/ld-2.17.so 0x7f05bd9ca000-0x7f05bd9cb000 /usr/lib64/ld-2.17.so 0x7f05bd9cb000-0x7f05bd9cc000 0x7ffc828f0000-0x7ffc82911000 [stack] 0x7ffc82984000-0x7ffc82986000 [vdso] 0xffffffffff600000-0xffffffffff601000 [vsyscall] ==5898==End of process memory map. So: ==5898==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range. And there are existing maps in that range : 0x00007fff7000-0x00008fff7000 0x00008fff7000-0x02008fff7000 0x02008fff7000-0x10007fff8000 Could it be that somehow a link module init() (__attribute__((constructor))) 'constructor' function in the static libasan constructor has run. and set up the maps, and then when dynamic libraries are loaded, the constructor in the dynamic library tries to construct the same maps ? Really, I think perhaps the 'static-libasan' option ought to be disabled until it is implemented correctly , and/or the constructor should not be insisting on a DT_NEEDED of libasan.so if -static-libasan was supplied, and in any case it should be impossible to call the constructor twice , even if the library is loaded twice. It looks like libasan still needs some major re-working and I question why it is being included in the standard GCC distribution & whether its test suite has ever passed. Best Regards, Jason From edelsohn@gnu.org Mon May 21 17:20:00 2018 From: edelsohn@gnu.org (David Edelsohn) Date: Mon, 21 May 2018 17:20:00 -0000 Subject: Bin Cheng appointed Loop Optimizer co-maintainer Message-ID: I am pleased to announce that the GCC Steering Committee has appointed Bin Cheng as Loop Optimizer co-maintainer. Please join me in congratulating Bin on his new role. Bin, please update your listing in the MAINTAINERS file. Happy hacking! David From law@redhat.com Mon May 21 22:43:00 2018 From: law@redhat.com (Jeff Law) Date: Mon, 21 May 2018 22:43:00 -0000 Subject: wrong comment in gcc/testsuite/gcc.c-torture/compile/simd-5.c In-Reply-To: <5fb48660-4ec7-fee8-213c-4d1b68ec4755@groessler.org> References: <5fb48660-4ec7-fee8-213c-4d1b68ec4755@groessler.org> Message-ID: <551e8409-a26c-cc09-1239-60031030f9f3@redhat.com> On 05/02/2018 04:49 AM, Christian Groessler wrote: > Hi, > > --- a/gcc/testsuite/gcc.c-torture/compile/simd-5.c > +++ b/gcc/testsuite/gcc.c-torture/compile/simd-5.c > @@ -6,7 +6,7 @@ main(){ > ??vector64 int a = {1, -1}; > ??vector64 int b = {2, -2}; > ??c = -a + b*b*(-1LL); > -/* c is now {5, 3} */ > +/* c is now {-5, -3} */ Thanks. Fixed. jeff From kugan.vivekanandarajah@linaro.org Mon May 21 22:51:00 2018 From: kugan.vivekanandarajah@linaro.org (Kugan Vivekanandarajah) Date: Mon, 21 May 2018 22:51:00 -0000 Subject: Generating gimple assign stmt that changes sign Message-ID: Hi, I am looking to introduce ABSU_EXPR and that would create: unsigned short res = ABSU_EXPR (short); Note that the argument is signed and result is unsigned. As per the review, I have a match.pd entry to generate this as: (simplify (abs (convert @0)) (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))) (convert (absu @0)))) Now when gimplifying the converted tree, how do we tell that ABSU_EXPR will take a signed arg and return unsigned. I will have other match.pd entries so this will be generated while in gimple.passes too. Should I add new functions in gimple.[h|c] for this. Is there any examples I can refer to. Conversion expressions seems to be the only place where sign can change in gimple assignment but they are very specific. Thanks, Kugan From law@redhat.com Mon May 21 23:10:00 2018 From: law@redhat.com (Jeff Law) Date: Mon, 21 May 2018 23:10:00 -0000 Subject: Generating gimple assign stmt that changes sign In-Reply-To: References: Message-ID: <39988331-1f6a-514b-7774-b7eea63b3ef1@redhat.com> On 05/21/2018 04:50 PM, Kugan Vivekanandarajah wrote: > Hi, > > I am looking to introduce ABSU_EXPR and that would create: > > unsigned short res = ABSU_EXPR (short); > > Note that the argument is signed and result is unsigned. As per the > review, I have a match.pd entry to generate this as: > (simplify (abs (convert @0)) > (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))) > (convert (absu @0)))) > > > Now when gimplifying the converted tree, how do we tell that ABSU_EXPR > will take a signed arg and return unsigned. I will have other match.pd > entries so this will be generated while in gimple.passes too. Should I > add new functions in gimple.[h|c] for this. > > Is there any examples I can refer to. Conversion expressions seems to > be the only place where sign can change in gimple assignment but they > are very specific. What's the value in representing ABSU vs a standard ABS followed by a conversion? You'll certainly want to do verification of the type signedness in the gimple verifier. In general the source and destination types have to be the same. Conversions are the obvious exception. There's a few other nodes that have more complex type rules (MEM_REF, COND_EXPR and a few others). But I don't think they're necessarily going to be helpful. jeff From kugan.vivekanandarajah@linaro.org Mon May 21 23:26:00 2018 From: kugan.vivekanandarajah@linaro.org (Kugan Vivekanandarajah) Date: Mon, 21 May 2018 23:26:00 -0000 Subject: Generating gimple assign stmt that changes sign In-Reply-To: <39988331-1f6a-514b-7774-b7eea63b3ef1@redhat.com> References: <39988331-1f6a-514b-7774-b7eea63b3ef1@redhat.com> Message-ID: Hi Jeff, Thanks for the prompt reply. On 22 May 2018 at 09:10, Jeff Law wrote: > On 05/21/2018 04:50 PM, Kugan Vivekanandarajah wrote: >> Hi, >> >> I am looking to introduce ABSU_EXPR and that would create: >> >> unsigned short res = ABSU_EXPR (short); >> >> Note that the argument is signed and result is unsigned. As per the >> review, I have a match.pd entry to generate this as: >> (simplify (abs (convert @0)) >> (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))) >> (convert (absu @0)))) >> >> >> Now when gimplifying the converted tree, how do we tell that ABSU_EXPR >> will take a signed arg and return unsigned. I will have other match.pd >> entries so this will be generated while in gimple.passes too. Should I >> add new functions in gimple.[h|c] for this. >> >> Is there any examples I can refer to. Conversion expressions seems to >> be the only place where sign can change in gimple assignment but they >> are very specific. > What's the value in representing ABSU vs a standard ABS followed by a > conversion? It is based on PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946. Specifically, comment 13. > > You'll certainly want to do verification of the type signedness in the > gimple verifier. I am doing it and it is failing now. > > In general the source and destination types have to be the same. > Conversions are the obvious exception. There's a few other nodes that > have more complex type rules (MEM_REF, COND_EXPR and a few others). But > I don't think they're necessarily going to be helpful. Thanks, Kugan > > jeff From marc.glisse@inria.fr Tue May 22 05:38:00 2018 From: marc.glisse@inria.fr (Marc Glisse) Date: Tue, 22 May 2018 05:38:00 -0000 Subject: Generating gimple assign stmt that changes sign In-Reply-To: References: Message-ID: On Tue, 22 May 2018, Kugan Vivekanandarajah wrote: > Hi, > > I am looking to introduce ABSU_EXPR and that would create: > > unsigned short res = ABSU_EXPR (short); > > Note that the argument is signed and result is unsigned. As per the > review, I have a match.pd entry to generate this as: > (simplify (abs (convert @0)) > (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))) > (convert (absu @0)))) Not sure, but we may want a few more restrictions on this transformation. > Now when gimplifying the converted tree, how do we tell that ABSU_EXPR > will take a signed arg and return unsigned. I will have other match.pd > entries so this will be generated while in gimple.passes too. Should I > add new functions in gimple.[h|c] for this. > > Is there any examples I can refer to. Conversion expressions seems to > be the only place where sign can change in gimple assignment but they > are very specific. You'll probably want to patch genmatch.c (near get_operand_type maybe?) so it doesn't try to guess that the type of absu is the same as its argument. You can also specify a type in transformations, look for :utype or :etype in match.pd. -- Marc Glisse From leonie@care-hsa.co.za Tue May 22 08:10:00 2018 From: leonie@care-hsa.co.za (Leonie Botha) Date: Tue, 22 May 2018 08:10:00 -0000 Subject: Hope for Warmth Message-ID: Helping SA?s Hope for Warmth Many children have to go through this winter with blankets that?s torn and has holes in it they have to share 1 blanket with their siblings, some has no blanket at all and has to rely on a plastic bag or box to cover themselves, for these families it?s a difficulty just to have bread to eat, not to mention to be able to take a few rand and buy a blanket, we have situations where children gets in to fights with their brothers or sisters over a jersey, the elderly suffers just as much we have elderly staying in garages, shacks back rooms with raw cement floors, holes in the walls, broken windows, this is elderly that the state feels do not qualify for a pension grant, they also cannot find employment due to their age and illnesses. In a lot of elderly care centres the number of death rises in the winter time as the cold makes it difficult to fight against viruses and already illnesses, the one thing we seldom hear about is the number of deaths of elderly during winter time, who has been forgotten by their children, who lives on their own in this cold back rooms and garages, who mostly suffers because they do not have warm clothing and bedding most have illnesses such as arthritis, which is already very painful and gets worse with the cold, they will roll up in balls and just try and keep warm ending up not to be able to get up and find something to eat, most not having a proper meal to start with resulting in them getting weaker and weaker. Winter is a beautiful season but a dreadful one for all destitute families, children and elderly, it?s the longest 3 months of your life if you have no warm clothes, no warm bedding, live in a room with raw cement floors, holes in your roof and still on top of this you have to fight off the hunger that overcomes you, where can they find the strength that they need to overcome all of this? We need to be the change, we can give them the strength we can lift them up by providing blankets, warm clothing and warm food, please help us to make this difference in the lives of the ones who suffers so much already. The children, families and elderly that we assist is located across South Africa, there is so many that need assistance that we have start this drive as soon as possible and not wait for when the coldest days is upon us. It Cost us +/- R1800 per month per family for their food parcel this includes healthy balanced meals. It Cost us R 800- R1200 for a family of 3 for blankets and at least 2 jerseys and pants for each The fundraising will be done by asking for a R 100 donation towards our hope for warmth project you can do a bank transfer to help us save on cost so we can help many more in return you will receive a tax (Section 18A) invoice to claim back from tax and a Helping Sa Warmth Hope ribbon that we hope you will wear with pride knowing what a difference you have made. Examples of what your R 100 donation WILL do: Feed a child / elderly for 3 days / 1 pair of warm shoes and socks 1 Winter pants / 1 Jersey 1 Jacket Warm bedding/ Blankets 1 Child will be able to take the bus to school for a week and no need to walk over 6 km You can give hope where there is none!! If you are doing a donation through the bank for this worthy project, please reply to this email with your details and postal address to ensure you receive your Invoice and special Ribbon. To view our NPO and Section 18A certificates please visit our website, while you are visiting please enjoy our photos and videos. (To receive copies of our certificates / proof of registration, accounting letters etc. you can also email admin (@) hsa-team.co.za or reply to this email requesting documents) Hope you can find it your heart to assist with this project. ABSA HELPING SA ACC NR; 4080925296 BRANCH: 632005 CURRENT ACCOUNT PLEASE USE REFERENCE LC AND YOUR NAME OR COMPANY NAME WHEN DOING A TRANSFER - THANKING YOU IN ADVANCE OR NEDBANK HELPING SA ACC NR; 1162381620 BRANCH: 198765 CURRENT ACCOUNT We would like this project to go on for a long time, so that it can really make an impact on the ones we assist, thus we do have debit facilities now, if you would like to assist this project more than once we would really appreciate it, you can request a debit order form by email. Yours Faithfully Leonie Botha Vice-Chairperson/Fundraiser Helping SA 081 703 6774 Fax: 086 509 9871 Alternative email: leonie (@) helpingsa.co.za / leonie (@) hsa-team.co.za Web: (www) helping-sa.co.za NPO: 115-333 PBO: 930043138 "Generosity consists not the sum given, but the manner in which it is bestowed" If you wish not to receive emails from me, please reply with a blank email and I will never email you again, we do not mean to spam anyone we are just trying to assist as many as possible. Thanking you in advance. OR UNSUBSCRIBE We rather email than trouble you on the phone while you are busy this way you can decide when you have time to go through our email. From amker.cheng@gmail.com Tue May 22 08:31:00 2018 From: amker.cheng@gmail.com (Bin.Cheng) Date: Tue, 22 May 2018 08:31:00 -0000 Subject: Bin Cheng appointed Loop Optimizer co-maintainer In-Reply-To: References: Message-ID: On Mon, May 21, 2018 at 6:20 PM, David Edelsohn wrote: > I am pleased to announce that the GCC Steering Committee has > appointed Bin Cheng as Loop Optimizer co-maintainer. > > Please join me in congratulating Bin on his new role. > Bin, please update your listing in the MAINTAINERS file. Thanks very much for the trust in me. Patch applied as r260500. Thanks, bin > > Happy hacking! > David > From richard.guenther@gmail.com Tue May 22 08:49:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Tue, 22 May 2018 08:49:00 -0000 Subject: How do I stop gcc from loading data into registers when that's not needed? In-Reply-To: <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> References: <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> Message-ID: On Tue, May 22, 2018 at 2:19 AM Paul Koning wrote: > > On May 18, 2018, at 2:07 PM, Richard Biener wrote: > > > > On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning < paulkoning@comcast.net> wrote: > >> Gents, > >> > >> In some targets, like pdp11 and vax, most instructions can reference > >> data in memory directly. > >> > >> So when I have "if (x < y) ..." I would expect something like this: > >> > >> cmpw x, y > >> bgeq 1f > >> ... > >> > >> What I actually see, with -O2 and/or -Os, is: > >> > >> movw x, r0 > >> movw y, r1 > >> cmpw r0, r1 > >> bgeq 1f > >> ... > >> > >> which is both longer and slower. I can't tell why this happens, or how > >> to stop it. The machine description has "general_operand" so it > >> doesn't seem to be the place that forces things into registers. > > > > I would expect combine to merge the load and arithmetic and thus it is eventually the target costing that makes that not succeed. > > > > Richard. > Thanks Richard. I am not having a whole lot of luck figuring out where precisely I need to adjust or how to make the adjustment. I'm doing the adjusting on the pdp11 port right now. That has a TARGET_RTX_COSTS hook which looks fairly plausible. It doesn't currently have a TARGET_MEMORY_MOVE_COST, or TARGET_ADDRESS_COST, or TARGET_INSN_COST. It is likely that I need some or all of those to get this working better? If yes, any hints you can offer where to start? Sorry, I'm not very familiar with this area of GCC either. Did you confirm that combine at least tries to merge the memory ops into the instruction? Maybe it is only RA / reload that will try. You can look at how it works for x86, for example whether there's already memory ops in the stmts during RTL expansion which can happen I think when the load has a single use (and if that works for pdp11). Richard. > paul From richard.guenther@gmail.com Tue May 22 08:58:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Tue, 22 May 2018 08:58:00 -0000 Subject: Generating gimple assign stmt that changes sign In-Reply-To: References: Message-ID: On Tue, May 22, 2018 at 12:51 AM Kugan Vivekanandarajah < kugan.vivekanandarajah@linaro.org> wrote: > Hi, > I am looking to introduce ABSU_EXPR and that would create: > unsigned short res = ABSU_EXPR (short); > Note that the argument is signed and result is unsigned. As per the > review, I have a match.pd entry to generate this as: > (simplify (abs (convert @0)) > (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))) > (convert (absu @0)))) This would convert abs ((int) unsigned_int_var) to (int) absu (unsigned_int_var) which is bogus given it results in -1 for abs (-1). You want to restrict this to sign-extensions, thus !TYPE_UNSIGNED (TREE_TYPE (@0)) && !TYPE_UNSIGNED (type) && element_precision (type) < element_precision (TREE_TYPE (@0)) > Now when gimplifying the converted tree, how do we tell that ABSU_EXPR > will take a signed arg and return unsigned. I will have other match.pd > entries so this will be generated while in gimple.passes too. Should I > add new functions in gimple.[h|c] for this. As Marc says you can use sth like (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); } (convert (absu:utype @0)) or you can hack genmatch to recognize ABSU_EXPR. > Is there any examples I can refer to. Conversion expressions seems to > be the only place where sign can change in gimple assignment but they > are very specific. > Thanks, > Kugan From macro@mips.com Tue May 22 11:15:00 2018 From: macro@mips.com (Maciej W. Rozycki) Date: Tue, 22 May 2018 11:15:00 -0000 Subject: [PING][PATCH] gdb/x86: Fix `-Wstrict-overflow' build error in `i387_collect_xsave' In-Reply-To: References: Message-ID: On Tue, 15 May 2018, Maciej W. Rozycki wrote: > gdb/ > * i387-tdep.c (i387_collect_xsave): Make `i' unsigned. Ping for: . Maciej From palves@redhat.com Tue May 22 13:34:00 2018 From: palves@redhat.com (Pedro Alves) Date: Tue, 22 May 2018 13:34:00 -0000 Subject: [PING][PATCH] gdb/x86: Fix `-Wstrict-overflow' build error in `i387_collect_xsave' In-Reply-To: References: Message-ID: <4bb6c426-5bbd-fc09-38b5-52c260f310b4@redhat.com> On 05/22/2018 12:14 PM, Maciej W. Rozycki wrote: > On Tue, 15 May 2018, Maciej W. Rozycki wrote: > >> gdb/ >> * i387-tdep.c (i387_collect_xsave): Make `i' unsigned. > > Ping for: . OK. Thanks, Pedro Alves From jason.vas.dias@gmail.com Tue May 22 16:26:00 2018 From: jason.vas.dias@gmail.com (Jason Vas Dias) Date: Tue, 22 May 2018 16:26:00 -0000 Subject: gcc-6-branch test failures: g++ c-c++-common/asan/pr59063-2.c Message-ID: <8t8birms.fsf@gmail.com> This problem turned out to be because the objects in libasan.a of gcc-6-branch are built with -fPIC / -DPIC : if 'PIC' is defined, the code in asan_linux.c is assuming it has been dynamically loaded , and so does not allow libasan.so NOT to be the first required dynamic library. But adding '-lasan' is a bad idea, since then the dlsym(RTLD_NEXT,"sigaction") actually resolves to __interception::sigaction which ends up calling itself until stack is exhausted. For some reason (which I am trying to figure out, but which is not obvious) the gcc-7-branch libasan build does build all the libasan.a objects without -fPIC -DPIC correctly, and so does not have this problem. It looks like use of 'static-libasan' in GCC 6 builds is thoroughly disabled and broken because of libasan.a objects ARE built with -fPIC / -DPIC . Maybe I should raise a bug about this? Thanks & Regards, Jason From segher@kernel.crashing.org Tue May 22 17:30:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Tue, 22 May 2018 17:30:00 -0000 Subject: DF mangling Message-ID: <20180522173003.GO17342@gate.crashing.org> Hi! The Itanium C++ ABI defines https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-builtin DF_ as the mangling for the bit IEEE binary float type, i.e. _Float. But the libiberty unwinders decode DF as fixed point type, see https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libiberty/cp-demangle.c;h=3f2a097e7f2075e5750e40a31ce46589d4ab83d5;hb=HEAD#l2659 This conflicts. How are we going to resolve it? Segher From joseph@codesourcery.com Tue May 22 17:40:00 2018 From: joseph@codesourcery.com (Joseph Myers) Date: Tue, 22 May 2018 17:40:00 -0000 Subject: DF mangling In-Reply-To: <20180522173003.GO17342@gate.crashing.org> References: <20180522173003.GO17342@gate.crashing.org> Message-ID: On Tue, 22 May 2018, Segher Boessenkool wrote: > Hi! > > The Itanium C++ ABI defines > https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-builtin > DF_ as the mangling for the bit IEEE binary float type, > i.e. _Float. > > But the libiberty unwinders decode DF as fixed point type, see > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libiberty/cp-demangle.c;h=3f2a097e7f2075e5750e40a31ce46589d4ab83d5;hb=HEAD#l2659 > > This conflicts. How are we going to resolve it? I don't think there's any conflict between the mangling DF_ for _FloatN, and DF[ijstlmxy][sn] for fixed point. If DF is followed by a number, the following character is '_' for _FloatN and the code for an integer type for fixed point. -- Joseph S. Myers joseph@codesourcery.com From segher@kernel.crashing.org Tue May 22 18:29:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Tue, 22 May 2018 18:29:00 -0000 Subject: DF mangling In-Reply-To: References: <20180522173003.GO17342@gate.crashing.org> Message-ID: <20180522182908.GP17342@gate.crashing.org> On Tue, May 22, 2018 at 05:40:49PM +0000, Joseph Myers wrote: > On Tue, 22 May 2018, Segher Boessenkool wrote: > > > Hi! > > > > The Itanium C++ ABI defines > > https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-builtin > > DF_ as the mangling for the bit IEEE binary float type, > > i.e. _Float. > > > > But the libiberty unwinders decode DF as fixed point type, see > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libiberty/cp-demangle.c;h=3f2a097e7f2075e5750e40a31ce46589d4ab83d5;hb=HEAD#l2659 > > > > This conflicts. How are we going to resolve it? > > I don't think there's any conflict between the mangling DF_ for > _FloatN, and DF[ijstlmxy][sn] for fixed point. If DF is > followed by a number, the following character is '_' for _FloatN and the > code for an integer type for fixed point. Ah! Sneaky. Yes that will work; I'll make a patch. Segher From segher@kernel.crashing.org Tue May 22 19:26:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Tue, 22 May 2018 19:26:00 -0000 Subject: How do I stop gcc from loading data into registers when that's not needed? In-Reply-To: References: <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> Message-ID: <20180522192609.GQ17342@gate.crashing.org> On Tue, May 22, 2018 at 10:49:35AM +0200, Richard Biener wrote: > On Tue, May 22, 2018 at 2:19 AM Paul Koning wrote: > > > On May 18, 2018, at 2:07 PM, Richard Biener > wrote: > > > On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning < > paulkoning@comcast.net> wrote: > > >> In some targets, like pdp11 and vax, most instructions can reference > > >> data in memory directly. > > >> > > >> So when I have "if (x < y) ..." I would expect something like this: > > >> > > >> cmpw x, y > > >> bgeq 1f > > >> ... > > >> > > >> What I actually see, with -O2 and/or -Os, is: > > >> > > >> movw x, r0 > > >> movw y, r1 > > >> cmpw r0, r1 > > >> bgeq 1f > > >> ... > > >> > > >> which is both longer and slower. I can't tell why this happens, or how > > >> to stop it. The machine description has "general_operand" so it > > >> doesn't seem to be the place that forces things into registers. > > > > > > I would expect combine to merge the load and arithmetic and thus it is > eventually the target costing that makes that not succeed. > > > Thanks Richard. I am not having a whole lot of luck figuring out where > > precisely I need to adjust or how to make the adjustment. I'm doing the > > adjusting on the pdp11 port right now. That has a TARGET_RTX_COSTS hook > > which looks fairly plausible. It doesn't currently have a > > TARGET_MEMORY_MOVE_COST, or TARGET_ADDRESS_COST, or TARGET_INSN_COST. It > > is likely that I need some or all of those to get this working better? If > > yes, any hints you can offer where to start? -fdump-rtl-combine-all (or just -da or -dap), and then look at the dump file. Does combine try this combination? If so, it will tell you what the resulting costs are. If not, why does it not try it? > Sorry, I'm not very familiar with this area of GCC either. Did you confirm > that combine at least tries to merge the memory ops into the instruction? It should, it's a simple reg dependency. In many cases it will even do it if it is not single-use (via a 3->2 combination). Segher From law@redhat.com Tue May 22 21:16:00 2018 From: law@redhat.com (Jeff Law) Date: Tue, 22 May 2018 21:16:00 -0000 Subject: Generating gimple assign stmt that changes sign In-Reply-To: References: <39988331-1f6a-514b-7774-b7eea63b3ef1@redhat.com> Message-ID: On 05/21/2018 05:25 PM, Kugan Vivekanandarajah wrote: > Hi Jeff, > > Thanks for the prompt reply. > > On 22 May 2018 at 09:10, Jeff Law wrote: >> On 05/21/2018 04:50 PM, Kugan Vivekanandarajah wrote: >>> Hi, >>> >>> I am looking to introduce ABSU_EXPR and that would create: >>> >>> unsigned short res = ABSU_EXPR (short); >>> >>> Note that the argument is signed and result is unsigned. As per the >>> review, I have a match.pd entry to generate this as: >>> (simplify (abs (convert @0)) >>> (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))) >>> (convert (absu @0)))) >>> >>> >>> Now when gimplifying the converted tree, how do we tell that ABSU_EXPR >>> will take a signed arg and return unsigned. I will have other match.pd >>> entries so this will be generated while in gimple.passes too. Should I >>> add new functions in gimple.[h|c] for this. >>> >>> Is there any examples I can refer to. Conversion expressions seems to >>> be the only place where sign can change in gimple assignment but they >>> are very specific. >> What's the value in representing ABSU vs a standard ABS followed by a >> conversion? > > It is based on PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946. > Specifically, comment 13. Thanks. I was wondering if it was supposed to allow you to exploit hardware capability in the vectorizer better. > >> >> You'll certainly want to do verification of the type signedness in the >> gimple verifier. > I am doing it and it is failing now. Yea, you're going to have to dig into those. There may also be some propagations you want to disable or enable given the semantics you're defining for ABSU. Jeff From paulkoning@comcast.net Wed May 23 00:50:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Wed, 23 May 2018 00:50:00 -0000 Subject: How do I stop gcc from loading data into registers when that's not needed? In-Reply-To: <20180522192609.GQ17342@gate.crashing.org> References: <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> <20180522192609.GQ17342@gate.crashing.org> Message-ID: <50B057F6-F0AB-4213-91DE-2A988C72436F@comcast.net> > On May 22, 2018, at 3:26 PM, Segher Boessenkool wrote: > > > -fdump-rtl-combine-all (or just -da or -dap), and then look at the dump > file. Does combine try this combination? If so, it will tell you what > the resulting costs are. If not, why does it not try it? > >> Sorry, I'm not very familiar with this area of GCC either. Did you confirm >> that combine at least tries to merge the memory ops into the instruction? > > It should, it's a simple reg dependency. In many cases it will even do > it if it is not single-use (via a 3->2 combination). I examined what gcc does with two simple functions: void c2(void) { if (x < y) z = 1; else if (x != y) z = 42; else z = 9; } void c3(void) { if (x < y) z = 1; else z = 9; } Two things popped out. 1. The original RTL (from the expand phase) has a memory->register move for x and y in c2, but it doesn't for c3 (it simply generates a memory/memory compare there). What triggers the different choice in that phase? 2. The reported costs for the various insns are r22:HI=['x'] 6 cmp(r22:HI,r23:HI) 4 cmp(['x'],['y']) 16 so the added cost for the memory argument in the cmp is 6 -- the same as the whole cost for the mov. That certainly explains the behavior. It isn't what I want it to be. Which target hook(s) are involved in these numbers? I don't see them in my rtx_costs hook. paul From sreekanth.gudisi@gmail.com Wed May 23 06:04:00 2018 From: sreekanth.gudisi@gmail.com (Sreekanth G) Date: Wed, 23 May 2018 06:04:00 -0000 Subject: Suse Linux ( s390x-linux-gnu-gcc ) Message-ID: <875E974D-84FE-4AE5-A13C-889EA38091AB@gmail.com> Hi, I have install GoLang on Suse Linux on IBM cloud. go version is go1.10.2 linux/s390x When running the go build command it?s showing error. see below , linux1@sfhyperledger:~/go1.X/src/syndicateLoans> go build # github.com/hyperledger/fabric/vendor/github.com/miekg/pkcs11 exec: "s390x-linux-gnu-gcc": executable file not found in $PATH I got stuck here from 2 days.. I don?t know how will proceed. Could help from this error ? Thanks & regards, Sreekanth G From prathamesh.kulkarni@linaro.org Wed May 23 07:27:00 2018 From: prathamesh.kulkarni@linaro.org (Prathamesh Kulkarni) Date: Wed, 23 May 2018 07:27:00 -0000 Subject: PR80155: Code hoisting and register pressure Message-ID: Hi, I am trying to work on PR80155, which exposes a problem with code hoisting and register pressure on a leading embedded benchmark for ARM cortex-m7, where code-hoisting causes an extra register spill. I have attached two test-cases which (hopefully) are representative of the original test-case. The first one (trans_dfa.c) is bigger and somewhat similar to the original test-case and trans_dfa_2.c is hand-reduced version of trans_dfa.c. There's 2 spills caused with trans_dfa.c and one spill with trans_dfa_2.c due to lesser amount of cases. The test-cases in the PR are probably not relevant. Initially I thought the spill was happening because of "too many hoistings" taking place in original test-case thus increasing the register pressure, but it seems the spill is possibly caused because expression gets hoisted out of a block that is on loop exit. For example, the following hoistings take place with trans_dfa_2.c: (1) Inserting expression in block 4 for code hoisting: {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) (3) Inserting expression in block 4 for code hoisting: {pointer_plus_expr,s_33,1} (0023) (4) Inserting expression in block 3 for code hoisting: {pointer_plus_expr,s_33,1} (0023) The issue seems to be hoisting of (*tab + 1) which consists of first two hoistings in block 4 from blocks 5 and 9, which causes the extra spill. I verified that by disabling hoisting into block 4, which resulted in no extra spills. I wonder if that's because the expression (*tab + 1) is getting hoisted from blocks 5 and 9, which are on loop exit ? So the expression that was previously computed in a block on loop exit, gets hoisted outside that block which possibly makes the allocator more defensive ? Similarly disabling hoisting of expressions which appeared in blocks on loop exit in original test-case prevented the extra spill. The other hoistings didn't seem to matter. If that's the case, would it make sense to add another constraint to hoisting to not hoist expression if it appears in a block that is on loop exit (exception is if block has only single successor), at-least for targets like cortex-m7 where the effect of spill is more pronounced ? I tried to write an untested prototype patch (approach-8.diff) on these lines, to refuse hoisting if an expression appears in block on loop exit. The patch adds a new map pre_expr_block_d, that maps pre_expr to the set of blocks (as a bitmap) it appears in and are on loop exit, which is computed during compute_avail. During do_hoist_insertion, it simply checks if the bitmap of blocks is not empty. It works for the attached test-cases and passes ssa-pre-*.c and ssa-hoist-*.c tests. Alternatively, we could restrict replacing expression by it's leader in eliminate_dom_walker::before_dom_children if the expression appears in a block on loop exit. In principle this is more general than hoisting, but I have restricted it to only hoisted expressions to avoid being overly conservative. Similarly, this constrained could be made conditional, only for targets like cortex-m7. I have attached a prototype patch based on this approach (approach-9.diff). Although it works for attached test-cases, unfortunately it ICE's with the original test-case in tail-merge, I am working on that. I am not sure though if either of these approaches are in the correct direction and would be grateful for suggestions on the issue! Thanks, Prathamesh -------------- next part -------------- A non-text attachment was scrubbed... Name: trans_dfa.c Type: text/x-csrc Size: 940 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: trans_dfa_2.c Type: text/x-csrc Size: 385 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: approach-9.diff Type: text/x-patch Size: 6933 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: approach-8.diff Type: text/x-patch Size: 4657 bytes Desc: not available URL: From rguenther@suse.de Wed May 23 08:29:00 2018 From: rguenther@suse.de (Richard Biener) Date: Wed, 23 May 2018 08:29:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: Message-ID: On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > Hi, > I am trying to work on PR80155, which exposes a problem with code > hoisting and register pressure on a leading embedded benchmark for ARM > cortex-m7, where code-hoisting causes an extra register spill. > > I have attached two test-cases which (hopefully) are representative of > the original test-case. > The first one (trans_dfa.c) is bigger and somewhat similar to the > original test-case and trans_dfa_2.c is hand-reduced version of > trans_dfa.c. There's 2 spills caused with trans_dfa.c > and one spill with trans_dfa_2.c due to lesser amount of cases. > The test-cases in the PR are probably not relevant. > > Initially I thought the spill was happening because of "too many > hoistings" taking place in original test-case thus increasing the > register pressure, but it seems the spill is possibly caused because > expression gets hoisted out of a block that is on loop exit. > > For example, the following hoistings take place with trans_dfa_2.c: > > (1) Inserting expression in block 4 for code hoisting: > {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) > > (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) > > (3) Inserting expression in block 4 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > (4) Inserting expression in block 3 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > The issue seems to be hoisting of (*tab + 1) which consists of first > two hoistings in block 4 > from blocks 5 and 9, which causes the extra spill. I verified that by > disabling hoisting into block 4, > which resulted in no extra spills. > > I wonder if that's because the expression (*tab + 1) is getting > hoisted from blocks 5 and 9, > which are on loop exit ? So the expression that was previously > computed in a block on loop exit, gets hoisted outside that block > which possibly makes the allocator more defensive ? Similarly > disabling hoisting of expressions which appeared in blocks on loop > exit in original test-case prevented the extra spill. The other > hoistings didn't seem to matter. I think that's simply co-incidence. The only thing that makes a block that also exits from the loop special is that an expression could be sunk out of the loop and hoisting (commoning with another path) could prevent that. But that isn't what is happening here and it would be a pass ordering issue as the sinking pass runs only after hoisting (no idea why exactly but I guess there are cases where we want to prefer CSE over sinking). So you could try if re-ordering PRE and sinking helps your testcase. What I do see is a missed opportunity to merge the successors of BB 4. After PRE we have [local count: 159303558]: : pretmp_123 = *tab_37(D); _87 = pretmp_123 + 1; if (c_36 == 65) goto ; [34.00%] else goto ; [66.00%] [local count: 54163210]: *tab_37(D) = _87; _96 = MEM[(char *)s_57 + 1B]; if (_96 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105140348]: *tab_37(D) = _87; _56 = MEM[(char *)s_57 + 1B]; if (_56 != 0) goto ; [89.00%] else goto ; [11.00%] here at least the stores and loads can be hoisted. Note this may also point at the real issue of the code hoisting which is tearing apart the RMW operation? > If that's the case, would it make sense to add another constraint to > hoisting to not hoist expression if it appears in a block that is on > loop exit (exception is if block has only single successor), at-least > for targets like cortex-m7 where the effect of spill is more > pronounced ? > > I tried to write an untested prototype patch (approach-8.diff) on > these lines, to refuse hoisting if an expression appears in block on > loop exit. The patch adds a new map pre_expr_block_d, that maps > pre_expr to the set of blocks (as a bitmap) it appears in and are on > loop exit, which is computed during compute_avail. > During do_hoist_insertion, it simply checks if the bitmap of blocks is > not empty. > It works for the attached test-cases and passes ssa-pre-*.c and > ssa-hoist-*.c tests. As said to me the heuristic doesn't make much sense. There is btw loop_exits_from_bb_p (). If the issue in the end is a RA one (that is, there _is_ a better allocation possible?) then you may also want to look at out-of-SSA coalescing. So overall I'm not convinced the hoisting decision is wrong. It doesn't increase register pressure at all. It does expose a missed store hoisting and load CSE opportunity (but we don't have a way to "PRE" or hoist stores at the moment). Stores do not fit data flow problems well given they need to be kept in order with respect to other stores and loads and appearantly *tab aliases *s (yeah, s is char *... make tab restrict and I guess things will work much smoother). Richard. > Alternatively, we could restrict replacing expression by it's leader > in eliminate_dom_walker::before_dom_children if the expression appears > in a block on loop exit. > In principle this is more general than hoisting, but I have restricted > it to only hoisted expressions to avoid being overly conservative. > Similarly, this constrained could be made conditional, only for > targets like cortex-m7. I have attached a prototype patch based on > this approach (approach-9.diff). Although it works for attached > test-cases, unfortunately it ICE's with the original test-case in > tail-merge, I am working on that. > > I am not sure though if either of these approaches are in the correct > direction and would > be grateful for suggestions on the issue! > > Thanks, > Prathamesh > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) From prathamesh.kulkarni@linaro.org Wed May 23 09:20:00 2018 From: prathamesh.kulkarni@linaro.org (Prathamesh Kulkarni) Date: Wed, 23 May 2018 09:20:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: Message-ID: On 23 May 2018 at 13:58, Richard Biener wrote: > On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > >> Hi, >> I am trying to work on PR80155, which exposes a problem with code >> hoisting and register pressure on a leading embedded benchmark for ARM >> cortex-m7, where code-hoisting causes an extra register spill. >> >> I have attached two test-cases which (hopefully) are representative of >> the original test-case. >> The first one (trans_dfa.c) is bigger and somewhat similar to the >> original test-case and trans_dfa_2.c is hand-reduced version of >> trans_dfa.c. There's 2 spills caused with trans_dfa.c >> and one spill with trans_dfa_2.c due to lesser amount of cases. >> The test-cases in the PR are probably not relevant. >> >> Initially I thought the spill was happening because of "too many >> hoistings" taking place in original test-case thus increasing the >> register pressure, but it seems the spill is possibly caused because >> expression gets hoisted out of a block that is on loop exit. >> >> For example, the following hoistings take place with trans_dfa_2.c: >> >> (1) Inserting expression in block 4 for code hoisting: >> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >> >> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) >> >> (3) Inserting expression in block 4 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> (4) Inserting expression in block 3 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> The issue seems to be hoisting of (*tab + 1) which consists of first >> two hoistings in block 4 >> from blocks 5 and 9, which causes the extra spill. I verified that by >> disabling hoisting into block 4, >> which resulted in no extra spills. >> >> I wonder if that's because the expression (*tab + 1) is getting >> hoisted from blocks 5 and 9, >> which are on loop exit ? So the expression that was previously >> computed in a block on loop exit, gets hoisted outside that block >> which possibly makes the allocator more defensive ? Similarly >> disabling hoisting of expressions which appeared in blocks on loop >> exit in original test-case prevented the extra spill. The other >> hoistings didn't seem to matter. > > I think that's simply co-incidence. The only thing that makes > a block that also exits from the loop special is that an > expression could be sunk out of the loop and hoisting (commoning > with another path) could prevent that. But that isn't what is > happening here and it would be a pass ordering issue as > the sinking pass runs only after hoisting (no idea why exactly > but I guess there are cases where we want to prefer CSE over > sinking). So you could try if re-ordering PRE and sinking helps > your testcase. Thanks for the suggestions. Placing sink pass before PRE works for both these test-cases! Sadly it still causes the spill for the benchmark -:( I will try to create a better approximation of the original test-case. > > What I do see is a missed opportunity to merge the successors > of BB 4. After PRE we have > > [local count: 159303558]: > : > pretmp_123 = *tab_37(D); > _87 = pretmp_123 + 1; > if (c_36 == 65) > goto ; [34.00%] > else > goto ; [66.00%] > > [local count: 54163210]: > *tab_37(D) = _87; > _96 = MEM[(char *)s_57 + 1B]; > if (_96 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > [local count: 105140348]: > *tab_37(D) = _87; > _56 = MEM[(char *)s_57 + 1B]; > if (_56 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > here at least the stores and loads can be hoisted. Note this > may also point at the real issue of the code hoisting which is > tearing apart the RMW operation? Indeed, this possibility seems much more likely than block being on loop exit. I will try to "hardcode" the load/store hoists into block 4 for this specific test-case to check if that prevents the spill. > >> If that's the case, would it make sense to add another constraint to >> hoisting to not hoist expression if it appears in a block that is on >> loop exit (exception is if block has only single successor), at-least >> for targets like cortex-m7 where the effect of spill is more >> pronounced ? >> >> I tried to write an untested prototype patch (approach-8.diff) on >> these lines, to refuse hoisting if an expression appears in block on >> loop exit. The patch adds a new map pre_expr_block_d, that maps >> pre_expr to the set of blocks (as a bitmap) it appears in and are on >> loop exit, which is computed during compute_avail. >> During do_hoist_insertion, it simply checks if the bitmap of blocks is >> not empty. >> It works for the attached test-cases and passes ssa-pre-*.c and >> ssa-hoist-*.c tests. > > As said to me the heuristic doesn't make much sense. There is > btw loop_exits_from_bb_p (). > > If the issue in the end is a RA one (that is, there _is_ a better > allocation possible?) then you may also want to look at out-of-SSA > coalescing. > > So overall I'm not convinced the hoisting decision is wrong. > It doesn't increase register pressure at all. It does expose > a missed store hoisting and load CSE opportunity (but we don't > have a way to "PRE" or hoist stores at the moment). Stores > do not fit data flow problems well given they need to be kept > in order with respect to other stores and loads and appearantly > *tab aliases *s (yeah, s is char *... make tab restrict and I > guess things will work much smoother). Well strangely, making tab restrict resulted in two extra spills compared to without hoisting. Thanks, Prathamesh > > Richard. > >> Alternatively, we could restrict replacing expression by it's leader >> in eliminate_dom_walker::before_dom_children if the expression appears >> in a block on loop exit. >> In principle this is more general than hoisting, but I have restricted >> it to only hoisted expressions to avoid being overly conservative. >> Similarly, this constrained could be made conditional, only for >> targets like cortex-m7. I have attached a prototype patch based on >> this approach (approach-9.diff). Although it works for attached >> test-cases, unfortunately it ICE's with the original test-case in >> tail-merge, I am working on that. >> >> I am not sure though if either of these approaches are in the correct >> direction and would >> be grateful for suggestions on the issue! >> >> Thanks, >> Prathamesh >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) From thomas@codesourcery.com Wed May 23 09:23:00 2018 From: thomas@codesourcery.com (Thomas Schwinge) Date: Wed, 23 May 2018 09:23:00 -0000 Subject: [og8] New Git-only development branch: openacc-gcc-8-branch Message-ID: <87sh6iafoz.fsf@euler.schwinge.homeip.net> Hi! There is a new Git-only development branch: openacc-gcc-8-branch, , . This branch is for collaborative development of OpenACC and related functionality, based on gcc-8-branch. Please send email with a short-hand "[og8]" tag, and use "ChangeLog.openacc" files. With some clean-up, the new branch contains the features/changes of the current openacc-gcc-7-branch (commit 917e247055a37f912129ed545719182de0046adb), and is based on the gcc-8-branch branch point, trunk r259629, with the recent gcc-8-branch r260395 merged in. In contrast to openacc-gcc-7-branch, instead of *merging*, I have this time *rebased* all the many commits accumulated on gomp-4_0-branch, and later openacc-gcc-7-branch, and weeded out all (most of?) what is no longer required, grouped/combined related changes, and so on. This shall make it easier to get changes into GCC trunk. OpenACC wiki page updated: , and committed to wwwdocs: Index: htdocs/svn.html =================================================================== RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v retrieving revision 1.219 retrieving revision 1.220 diff -u -p -r1.219 -r1.220 --- htdocs/svn.html 2 Apr 2018 15:22:36 -0000 1.219 +++ htdocs/svn.html 23 May 2018 09:02:54 -0000 1.220 @@ -289,18 +289,18 @@ the command svn log --stop-on-copy Patches should be marked with the tag [no-undefined-overflow] in the subject line. The branch is maintained by Richard Biener. -
openacc-gcc-7-branch
+
openacc-gcc-8-branch
This Git-only branch is used for collaborative development - of OpenACC and related + of OpenACC support and related functionality, such as offloading support. The - branch is based on gcc-7-branch. Find it + branch is based on gcc-8-branch. Find it at git://gcc.gnu.org/git/gcc.git, - <https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-7-branch>, + <https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-8-branch>, or - <https://github.com/gcc-mirror/gcc/tree/openacc-gcc-7-branch>. - Please send email with a short-hand [og7] tag in the subject + <https://github.com/gcc-mirror/gcc/tree/openacc-gcc-8-branch>. + Please send email with a short-hand [og8] tag in the subject line, and use ChangeLog.openacc files.
plugins
@@ -787,8 +787,8 @@ inactive.

GCC as well as support for OpenACC. These features got merged into trunk. Based on gcc-6-branch then, this branch was used for - on-going development of OpenACC support and related functionality, which has - now moved to a new openacc-gcc-7-branch. + on-going development of OpenACC support and related functionality, which then + moved to openacc-gcc-7-branch, and now openacc-gcc-8-branch.
java-gui-20050128-branch
This was a temporary branch for development of java GUI libraries @@ -824,6 +824,11 @@ inactive.

<eager@eagercon.com>. All changes have been merged into mainline.
+
openacc-gcc-7-branch
+
Based on gcc-7-branch, this branch was used for on-going development + of OpenACC support and related + functionality, which now moved openacc-gcc-8-branch.
+
pch-branch
tree-ssa-20020619-branch
var-tracking-assignments*-branch
Gr??e Thomas From jwakely.gcc@gmail.com Wed May 23 09:31:00 2018 From: jwakely.gcc@gmail.com (Jonathan Wakely) Date: Wed, 23 May 2018 09:31:00 -0000 Subject: Suse Linux ( s390x-linux-gnu-gcc ) In-Reply-To: <875E974D-84FE-4AE5-A13C-889EA38091AB@gmail.com> References: <875E974D-84FE-4AE5-A13C-889EA38091AB@gmail.com> Message-ID: You already sent this email to the gcc-help list, please don't cross-post here as well, see https://gcc.gnu.org/lists.html#policies It is almost never appropriate to send to both lists, either you're asking for help using GCC or you're talking about development of GCC, not both. On 23 May 2018 at 07:04, Sreekanth G wrote: > Hi, > > I have install GoLang on Suse Linux on IBM cloud. > go version is go1.10.2 linux/s390x > > When running the go build command it?s showing error. see below , > > linux1@sfhyperledger:~/go1.X/src/syndicateLoans> go build > # github.com/hyperledger/fabric/vendor/github.com/miekg/pkcs11 > exec: "s390x-linux-gnu-gcc": executable file not found in $PATH > > I got stuck here from 2 days.. I don?t know how will proceed. Could help from this error ? > > > > Thanks & regards, > Sreekanth G From amker.cheng@gmail.com Wed May 23 09:40:00 2018 From: amker.cheng@gmail.com (Bin.Cheng) Date: Wed, 23 May 2018 09:40:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: Message-ID: On Wed, May 23, 2018 at 9:28 AM, Richard Biener wrote: > On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > >> Hi, >> I am trying to work on PR80155, which exposes a problem with code >> hoisting and register pressure on a leading embedded benchmark for ARM >> cortex-m7, where code-hoisting causes an extra register spill. >> >> I have attached two test-cases which (hopefully) are representative of >> the original test-case. >> The first one (trans_dfa.c) is bigger and somewhat similar to the >> original test-case and trans_dfa_2.c is hand-reduced version of >> trans_dfa.c. There's 2 spills caused with trans_dfa.c >> and one spill with trans_dfa_2.c due to lesser amount of cases. >> The test-cases in the PR are probably not relevant. >> >> Initially I thought the spill was happening because of "too many >> hoistings" taking place in original test-case thus increasing the >> register pressure, but it seems the spill is possibly caused because >> expression gets hoisted out of a block that is on loop exit. >> >> For example, the following hoistings take place with trans_dfa_2.c: >> >> (1) Inserting expression in block 4 for code hoisting: >> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >> >> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) >> >> (3) Inserting expression in block 4 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> (4) Inserting expression in block 3 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> The issue seems to be hoisting of (*tab + 1) which consists of first >> two hoistings in block 4 >> from blocks 5 and 9, which causes the extra spill. I verified that by >> disabling hoisting into block 4, >> which resulted in no extra spills. >> >> I wonder if that's because the expression (*tab + 1) is getting >> hoisted from blocks 5 and 9, >> which are on loop exit ? So the expression that was previously >> computed in a block on loop exit, gets hoisted outside that block >> which possibly makes the allocator more defensive ? Similarly >> disabling hoisting of expressions which appeared in blocks on loop >> exit in original test-case prevented the extra spill. The other >> hoistings didn't seem to matter. > > I think that's simply co-incidence. The only thing that makes > a block that also exits from the loop special is that an > expression could be sunk out of the loop and hoisting (commoning > with another path) could prevent that. But that isn't what is > happening here and it would be a pass ordering issue as > the sinking pass runs only after hoisting (no idea why exactly > but I guess there are cases where we want to prefer CSE over > sinking). So you could try if re-ordering PRE and sinking helps > your testcase. > > What I do see is a missed opportunity to merge the successors > of BB 4. After PRE we have > > [local count: 159303558]: > : > pretmp_123 = *tab_37(D); > _87 = pretmp_123 + 1; > if (c_36 == 65) > goto ; [34.00%] > else > goto ; [66.00%] > > [local count: 54163210]: > *tab_37(D) = _87; > _96 = MEM[(char *)s_57 + 1B]; > if (_96 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > [local count: 105140348]: > *tab_37(D) = _87; > _56 = MEM[(char *)s_57 + 1B]; > if (_56 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > here at least the stores and loads can be hoisted. Note this > may also point at the real issue of the code hoisting which is > tearing apart the RMW operation? > >> If that's the case, would it make sense to add another constraint to >> hoisting to not hoist expression if it appears in a block that is on >> loop exit (exception is if block has only single successor), at-least >> for targets like cortex-m7 where the effect of spill is more >> pronounced ? >> >> I tried to write an untested prototype patch (approach-8.diff) on >> these lines, to refuse hoisting if an expression appears in block on >> loop exit. The patch adds a new map pre_expr_block_d, that maps >> pre_expr to the set of blocks (as a bitmap) it appears in and are on >> loop exit, which is computed during compute_avail. >> During do_hoist_insertion, it simply checks if the bitmap of blocks is >> not empty. >> It works for the attached test-cases and passes ssa-pre-*.c and >> ssa-hoist-*.c tests. > > As said to me the heuristic doesn't make much sense. There is > btw loop_exits_from_bb_p (). > > If the issue in the end is a RA one (that is, there _is_ a better > allocation possible?) then you may also want to look at out-of-SSA > coalescing. > > So overall I'm not convinced the hoisting decision is wrong. > It doesn't increase register pressure at all. It does expose Not quite. There are two hoisting in case of trans_dfa_2.c For the first one: Inserting expression in block 4 for code hoisting: {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) Inserted pretmp_30 = *tab_20(D); in predecessor 4 (0005) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) Inserted _12 = pretmp_30 + 1; in predecessor 4 (0006) I agree hoisting separates RM from W, but for the second one: Inserting expression in block 4 for code hoisting: {pointer_plus_expr,s_33,1} (0023) Inserted _14 = s_33 + 1; in predecessor 4 (0023) Inserting expression in block 3 for code hoisting: {pointer_plus_expr,s_33,1} (0023) Inserted _62 = s_33 + 1; in predecessor 3 (0023) It does increase register pressure because _62(originally s_34 and s_27) is extended and s_33 is still alive. Thanks, bin > a missed store hoisting and load CSE opportunity (but we don't > have a way to "PRE" or hoist stores at the moment). Stores > do not fit data flow problems well given they need to be kept > in order with respect to other stores and loads and appearantly > *tab aliases *s (yeah, s is char *... make tab restrict and I > guess things will work much smoother). > > Richard. > >> Alternatively, we could restrict replacing expression by it's leader >> in eliminate_dom_walker::before_dom_children if the expression appears >> in a block on loop exit. >> In principle this is more general than hoisting, but I have restricted >> it to only hoisted expressions to avoid being overly conservative. >> Similarly, this constrained could be made conditional, only for >> targets like cortex-m7. I have attached a prototype patch based on >> this approach (approach-9.diff). Although it works for attached >> test-cases, unfortunately it ICE's with the original test-case in >> tail-merge, I am working on that. >> >> I am not sure though if either of these approaches are in the correct >> direction and would >> be grateful for suggestions on the issue! >> >> Thanks, >> Prathamesh >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) From richard.guenther@gmail.com Wed May 23 09:47:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Wed, 23 May 2018 09:47:00 -0000 Subject: How do I stop gcc from loading data into registers when that's not needed? In-Reply-To: <50B057F6-F0AB-4213-91DE-2A988C72436F@comcast.net> References: <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> <20180522192609.GQ17342@gate.crashing.org> <50B057F6-F0AB-4213-91DE-2A988C72436F@comcast.net> Message-ID: On Wed, May 23, 2018 at 2:50 AM Paul Koning wrote: > > On May 22, 2018, at 3:26 PM, Segher Boessenkool < segher@kernel.crashing.org> wrote: > > > > > > -fdump-rtl-combine-all (or just -da or -dap), and then look at the dump > > file. Does combine try this combination? If so, it will tell you what > > the resulting costs are. If not, why does it not try it? > > > >> Sorry, I'm not very familiar with this area of GCC either. Did you confirm > >> that combine at least tries to merge the memory ops into the instruction? > > > > It should, it's a simple reg dependency. In many cases it will even do > > it if it is not single-use (via a 3->2 combination). > I examined what gcc does with two simple functions: > void c2(void) > { > if (x < y) > z = 1; > else if (x != y) > z = 42; > else > z = 9; > } > void c3(void) > { > if (x < y) > z = 1; > else > z = 9; > } > Two things popped out. > 1. The original RTL (from the expand phase) has a memory->register move for x and y in c2, but it doesn't for c3 (it simply generates a memory/memory compare there). What triggers the different choice in that phase? > 2. The reported costs for the various insns are > r22:HI=['x'] 6 > cmp(r22:HI,r23:HI) 4 > cmp(['x'],['y']) 16 > so the added cost for the memory argument in the cmp is 6 -- the same as the whole cost for the mov. That certainly explains the behavior. It isn't what I want it to be. Which target hook(s) are involved in these numbers? I don't see them in my rtx_costs hook. The rtx_cost hook. I think the costs above make sense. There's also a new insn_cost hook but you have to dig whether combine uses that. Otherwise address_cost might be involved. Richard. > paul From dommanat4@gmail.com Wed May 23 12:01:00 2018 From: dommanat4@gmail.com (Warunee Sommanat) Date: Wed, 23 May 2018 12:01:00 -0000 Subject: J Message-ID: ?????? iPhone ?????? From matz@suse.de Wed May 23 12:55:00 2018 From: matz@suse.de (Michael Matz) Date: Wed, 23 May 2018 12:55:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: <20180517211036.GB17342@gate.crashing.org> Message-ID: Hi, On Fri, 18 May 2018, Richard Biener wrote: > Interesting. Do they allow merging across such sections? Consider a 8 > byte entity 0x12345678 and 4 byte entities 0x1234 0x5678, will the 4 > byte entities share the rodata with the 8 byte one? There's no language to forbid this (as long as the alignments are respected), but at least GNU ld currently only merges same-sized entities. > I believe GCC pulls off such tricks in its internal constant pool > merging code. > > It might be worth gathering statistics on the size of constant pool > entries for this. > > Now the question is of course if BFD contains support for optimizing > those sections. You mean to ask if GNU ld is actually uniquifying contents of such mergable sections over object files? If so the answer is yes. Ciao, Michael. From law@redhat.com Wed May 23 13:07:00 2018 From: law@redhat.com (Jeff Law) Date: Wed, 23 May 2018 13:07:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: Message-ID: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: > On 23 May 2018 at 13:58, Richard Biener wrote: >> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >> >>> Hi, >>> I am trying to work on PR80155, which exposes a problem with code >>> hoisting and register pressure on a leading embedded benchmark for ARM >>> cortex-m7, where code-hoisting causes an extra register spill. >>> >>> I have attached two test-cases which (hopefully) are representative of >>> the original test-case. >>> The first one (trans_dfa.c) is bigger and somewhat similar to the >>> original test-case and trans_dfa_2.c is hand-reduced version of >>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>> The test-cases in the PR are probably not relevant. >>> >>> Initially I thought the spill was happening because of "too many >>> hoistings" taking place in original test-case thus increasing the >>> register pressure, but it seems the spill is possibly caused because >>> expression gets hoisted out of a block that is on loop exit. >>> >>> For example, the following hoistings take place with trans_dfa_2.c: >>> >>> (1) Inserting expression in block 4 for code hoisting: >>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>> >>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) >>> >>> (3) Inserting expression in block 4 for code hoisting: >>> {pointer_plus_expr,s_33,1} (0023) >>> >>> (4) Inserting expression in block 3 for code hoisting: >>> {pointer_plus_expr,s_33,1} (0023) >>> >>> The issue seems to be hoisting of (*tab + 1) which consists of first >>> two hoistings in block 4 >>> from blocks 5 and 9, which causes the extra spill. I verified that by >>> disabling hoisting into block 4, >>> which resulted in no extra spills. >>> >>> I wonder if that's because the expression (*tab + 1) is getting >>> hoisted from blocks 5 and 9, >>> which are on loop exit ? So the expression that was previously >>> computed in a block on loop exit, gets hoisted outside that block >>> which possibly makes the allocator more defensive ? Similarly >>> disabling hoisting of expressions which appeared in blocks on loop >>> exit in original test-case prevented the extra spill. The other >>> hoistings didn't seem to matter. >> >> I think that's simply co-incidence. The only thing that makes >> a block that also exits from the loop special is that an >> expression could be sunk out of the loop and hoisting (commoning >> with another path) could prevent that. But that isn't what is >> happening here and it would be a pass ordering issue as >> the sinking pass runs only after hoisting (no idea why exactly >> but I guess there are cases where we want to prefer CSE over >> sinking). So you could try if re-ordering PRE and sinking helps >> your testcase. > Thanks for the suggestions. Placing sink pass before PRE works > for both these test-cases! Sadly it still causes the spill for the benchmark -:( > I will try to create a better approximation of the original test-case. >> >> What I do see is a missed opportunity to merge the successors >> of BB 4. After PRE we have >> >> [local count: 159303558]: >> : >> pretmp_123 = *tab_37(D); >> _87 = pretmp_123 + 1; >> if (c_36 == 65) >> goto ; [34.00%] >> else >> goto ; [66.00%] >> >> [local count: 54163210]: >> *tab_37(D) = _87; >> _96 = MEM[(char *)s_57 + 1B]; >> if (_96 != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >> [local count: 105140348]: >> *tab_37(D) = _87; >> _56 = MEM[(char *)s_57 + 1B]; >> if (_56 != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >> here at least the stores and loads can be hoisted. Note this >> may also point at the real issue of the code hoisting which is >> tearing apart the RMW operation? > Indeed, this possibility seems much more likely than block being on loop exit. > I will try to "hardcode" the load/store hoists into block 4 for this > specific test-case to check > if that prevents the spill. Even if it prevents the spill in this case, it's likely a good thing to do. The statements prior to the conditional in bb5 and bb8 should be hoisted, leaving bb5 and bb8 with just their conditionals. Jeff From law@redhat.com Wed May 23 15:01:00 2018 From: law@redhat.com (Jeff Law) Date: Wed, 23 May 2018 15:01:00 -0000 Subject: Bug in m68k ASM softloat implementation? In-Reply-To: References: Message-ID: On 05/16/2018 03:17 PM, Kalamatee wrote: > Hi, > > After hunting out a problem using the softloat code on m68k, with the > assistance of the WinUAE author (Toni Wilen) we think we have noticed a bug > dating back to 1994. > > Laddsf$nf returns values with the wrong sign, because it clears the sign > bit, before caching the wrong value and then attempting to use it. > > the movel d0,d7 is either in the wrong place (should be before the sign bit > is cleared), or it should be removed completely and the following andl IMM > (0x80000000),d7 should be changed to use d2 instead (which should have the > correct sign from the prior eorl) It would really help if you could provide the input values that trigger this erroneous path. It helps with initial validation of the change as well as long term as we can add it to the regression suite. Jeff From fche@redhat.com Wed May 23 22:04:00 2018 From: fche@redhat.com (Frank Ch. Eigler) Date: Wed, 23 May 2018 22:04:00 -0000 Subject: So what's the status of the Git migration? References: <20180517162730.GA9989@thyrsus.com> Message-ID: <87wovuav0p.fsf@redhat.com> esr wrote: > [...] >> Another year; another release; and still no sign of progress on the git >> migration. > [...] > The current issue - and, I think, the last major one - is that there are > over 150 nid-branch deletes to be resolved. Is there a mandate that this conversion be Perfect? How harmful would it be to retain some ambiguity / imperfection in the resulting git repo, considering that the svn repo can stick around indefinitely as a historical reference? - FChE From gccadmin@gcc.gnu.org Wed May 23 22:42:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Wed, 23 May 2018 22:42:00 -0000 Subject: gcc-6-20180523 is now available Message-ID: <20180523224222.71733.qmail@sourceware.org> Snapshot gcc-6-20180523 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/6-20180523/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch revision 260626 You'll find: gcc-6-20180523.tar.xz Complete GCC SHA256=9ab74d00b5fc4d0f1062e2c7b0c79ae790c06e6b98a5101ecdca4c003a2dc235 SHA1=289e9f8cf30e73cfa8d8e7178da6b97f696a920b Diffs from 6-20180516 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From paulkoning@comcast.net Thu May 24 00:33:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Thu, 24 May 2018 00:33:00 -0000 Subject: How do I stop gcc from loading data into registers when that's not needed? In-Reply-To: References: <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> <20180522192609.GQ17342@gate.crashing.org> <50B057F6-F0AB-4213-91DE-2A988C72436F@comcast.net> Message-ID: <7B2B2AAD-9A14-42AD-B05F-0447178E4BC8@comcast.net> > On May 23, 2018, at 5:46 AM, Richard Biener wrote: > > ... > >> 2. The reported costs for the various insns are >> r22:HI=['x'] 6 >> cmp(r22:HI,r23:HI) 4 >> cmp(['x'],['y']) 16 >> so the added cost for the memory argument in the cmp is 6 -- the same >> as the whole cost for the mov. That certainly explains the behavior. It >> isn't what I want it to be. Which target hook(s) are involved in these >> numbers? I don't see them in my rtx_costs hook. > > The rtx_cost hook. I think the costs above make sense. There's also a > new insn_cost hook but you have to dig whether combine uses that. > Otherwise address_cost might be involved. Thanks. For a pdp11, those costs aren't right because mov and cmp and a memory reference each have about the same cost. So 8, 4, 12 would be closer. But the real question for me at this point is where to find the knobs that adjust these choices. The various cost hooks have me confused, and the GCCINT manual is not really enlightening. There is rtx_costs, insn_cost, and addr_cost. It sort of feels like insn_cost and addr_cost together would provide roughly the same information that rtx_costs gives. In the existing platforms, I see rtx_costs everywhere, addr_cost in a fair number of targets, and insn_cost in just one (rs6000). Can someone explain the interaction of these various cost hooks, and what happens if you define various combinations of the three? paul From info@napinet.hu Thu May 24 05:21:00 2018 From: info@napinet.hu (NapiNet) Date: Thu, 24 May 2018 05:21:00 -0000 Subject: =?utf-8?Q?Egy_t=C3=B6k=C3=A9letes_vil=C3=A1gban?= Message-ID: Egy t??k??letes vil??gban a h??zi kedvenced mancsai mindig tiszt??k, a gyerekeid, akiknek soha sincs el??g idej??k arra, hogy megt??r??lj??k a l??bukat, miel??tt beszaladn??nak a szob??ba, most sz??raz ??s tiszta cip??vel fognak bel??pni. Persze a vil??g nem t??k??letes - viszont el??g r??l??pni a Clean Step Mat l??bt??rl??re amely olyan, mint egy m??gnes, amely besz??vja a l??badr??l a nedvess??get, mag??hoz vonzza a port, hogy legal??bb az ill??zi??d meglegyen. Ez a k??l??nleges l??bt??rl?? megakad??lyozza, hogy a s??r, a v??z ??s a kosz bejusson a lak??sodba. Titka a szomjas mikrosz??las rostjaiban ??s a tervez??s??ben rejlik, ??gy... 14.800 Ft 2.990 Ft 80% Megn??zem ( http://first.hetimail.hu/?email_id=557&user_id=38243&urlpassed=aHR0cHM6Ly93d3cubmFwaW5ldC5odS9jbGVhbi1zdGVwLW1hdC1sYWJ0b3Jsbw&controller=stats&action=analyse&wysija-page=1&wysijap=subscriptions ) Mindenki arc??ra mosolyt csal m??g a legborong??sabb reggeleken is, hisz tudjuk, hogy most teljesen nyugodtan lehetsz egy icipicit lusta! Nem is hinn??d milyen praktikus, am??g nem haszn??ltad! Csak egy gombnyom??s ??s a turmix m??r keveri is a k??v??dat tejjel ??s cukorral, vagy a te??dat cukorral, illetve ak??r a kokt??lod hozz??val??it, ??gy... 12.000 Ft 2.290 Ft 81% Megn??zem ( http://first.hetimail.hu/?email_id=557&user_id=38243&urlpassed=aHR0cHM6Ly93d3cubmFwaW5ldC5odS9PbmthdmFyb3MtYm9ncmUtQ3Nhay1lZ3ktZ29tYm55b21hcw&controller=stats&action=analyse&wysija-page=1&wysijap=subscriptions ) K??zeled??nk a ny??r fel??, ez sokakban j?? ??rz??st kelt, ??m egy apr?? probl??ma akad... Nem el??g, hogy a nagy h??s??gben alig tudsz elaludni, m??g egy k??t sz??nyog is gondoskodik az ??jjeli muzsik??r??l. Itt az id?? tenni ellene, hogy a nyitott ablakn??l alv??s zavartalan legyen. Ez a konnektorb??l m??k??dtethet?? sz??nyogriaszt?? meg??v a kellemetlen cs??p??sekt??l ??s zajong??st??l. 8.300 Ft 1.390 Ft 83% Megn??zem ( http://first.hetimail.hu/?email_id=557&user_id=38243&urlpassed=aHR0cHM6Ly93d3cubmFwaW5ldC5odS9TenVueW9naXJ0by1BLXN6dW55b2dtZW50ZXMtbnl1Z29kdC1lanN6YWtha2VydA&controller=stats&action=analyse&wysija-page=1&wysijap=subscriptions ) Zavarnak az aut??don megjelen?? apr?? fel??leti s??r??l??sek? Eleged van a kavicsfelver??d??s, vagy b??rmi m??s miatt kialakul?? kisebb k??rokb??l? Itt a t??k??letes, olcs??, gyors ??s hat??kony megold??s: Fix It Pro karcelt??vol??t?? toll, ami azonnal ??s tart??san t??nteti el a fel??leti karcol??sokat! A mostani aj??nlat 1 + 1 (2 darab) Fix it Pro Karcelt??vol??t?? Tollat tartalmaz! 17.000 Ft 1.590 Ft 91% Megn??zem ( http://first.hetimail.hu/?email_id=557&user_id=38243&urlpassed=aHR0cHM6Ly93d3cubmFwaW5ldC5odS9maXgtaXQtcHJvLWthcmNlbHRhdm9saXRvLXRvbGw&controller=stats&action=analyse&wysija-page=1&wysijap=subscriptions ) Buda sz??v??ben a V??rmez??n tal??lhat?? a Vagon ??tterem, a maga nem??ben p??ratlan kialak??t??s?? ??tterem ??s s??r??z??, mely 1993-??ta t??len-ny??ron v??rja, r??gi ??s ??j vend??geit, h??zias ??zvil??g?? magyaros ??s nemzetk??zi ??teleivel, figyelmes kiszolg??l??ssal ??s semmihez nem hasonl??that?? hangulat??val! Most a 2 szem??lyes b??s??gt??l, p??link??val, borral nagyon meg??ri! 12.000 Ft 4.790 Ft 60% Megn??zem ( http://first.hetimail.hu/?email_id=557&user_id=38243&urlpassed=aHR0cHM6Ly9kZWFsZXh0cmEuaHUvZGVhbC8yMTEtMi1zemVtZWx5ZXMtdmFnb24tdGFsLXBhbGlua2FqYXZhbC0uaHRtbD91dG1fc291cmNlPW5ld3NsZXR0ZXImYW1wO3V0bV9tZWRpdW09bmFwaW5ldG1haWw&controller=stats&action=analyse&wysija-page=1&wysijap=subscriptions ) Jogi Nyilatkozat ( http://first.hetimail.hu/?email_id=557&user_id=38243&urlpassed=aHR0cDovL3d3dy5uYXBpbmV0Lmh1L3Nob3BfaGVscC5waHA&controller=stats&action=analyse&wysija-page=1&wysijap=subscriptions ) Fogyaszt?? Bar??t ( http://first.hetimail.hu/?email_id=557&user_id=38243&urlpassed=aHR0cDovL3d3dy5uYXBpbmV0Lmh1L2ZvZ3lhc3p0by1iYXJhdA&controller=stats&action=analyse&wysija-page=1&wysijap=subscriptions ) Leiratkoz??s ( http://first.hetimail.hu/?email_id=557&user_id=38243&urlpassed=W3Vuc3Vic2NyaWJlX2xpbmtd&controller=stats&hash=ba3c862c1354530f10d3348de3f3b64b&action=analyse&wysija-page=1&wysijap=subscriptions ) Megaline St??di?? Kft. | 1089 Budapest, Benyovszky M??ric utca 10. From ticllxyi@tiny.com Thu May 24 08:55:00 2018 From: ticllxyi@tiny.com (ticllxyi) Date: Thu, 24 May 2018 08:55:00 -0000 Subject: =?utf-8?B?5oKo5aW977yM5pys5Y+45pyJ5aKe5YC856iFMTcl5oq1562Y44CBMyXlgZrluJDmma7pgJrnpajlj6/plovvvIznlLXvvJoxNTAtMTU4OC01MjY277yI6JaH5L+h5ZCM5Y+377yJ5byg57uP55CGICAgICAgICAgICAgICAgICAgTXpDY0k=?= Message-ID:                                                                               gcc@gcc.gnu.orgu From fweimer@redhat.com Thu May 24 12:14:00 2018 From: fweimer@redhat.com (Florian Weimer) Date: Thu, 24 May 2018 12:14:00 -0000 Subject: Auto-generated .rodata contents and __attribute__((section)) In-Reply-To: References: <20180517211036.GB17342@gate.crashing.org> Message-ID: <0fa77795-5077-8ac6-dd99-118762aa71d3@redhat.com> On 05/23/2018 02:55 PM, Michael Matz wrote: > On Fri, 18 May 2018, Richard Biener wrote: > >> Interesting. Do they allow merging across such sections? Consider a 8 >> byte entity 0x12345678 and 4 byte entities 0x1234 0x5678, will the 4 >> byte entities share the rodata with the 8 byte one? > There's no language to forbid this (as long as the alignments > are respected), but at least GNU ld currently only merges same-sized > entities. I'm not entirely sure if this is valid for C, particularly if the objects have the same address, but not the same type. There is a discussion on the generic-abi list about providing information to the linker about when it is safe to do such merging: https://groups.google.com/forum/#!topic/generic-abi/MPr8TVtnVn4 Thanks, Florian From inikk@oiyi.com Thu May 24 14:02:00 2018 From: inikk@oiyi.com (=?utf-8?Q?=E5=8F=B7=E7=A0=81=E5=B7=B2=E6=9B=B4=E6=96=B0?= =?utf-8?Q?135---MUoEr?=) Date: Thu, 24 May 2018 14:02:00 -0000 Subject: =?utf-8?B?LS0zMzM0LS0zOTkx5b6uL+eUtSAtLS0tLXpnYndw?= Message-ID: 8c07b06e-773d-43e6-841c-9c0f1cb35fdc ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? Q Q ?160?852?5627 ------------------------------------------------------------------------------------- ?????????????????? ???????????     From sellcey@cavium.com Thu May 24 17:50:00 2018 From: sellcey@cavium.com (Steve Ellcey) Date: Thu, 24 May 2018 17:50:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <87a7sznw5c.fsf@linaro.org> References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> <1526491802.29509.19.camel@cavium.com> <87a7sznw5c.fsf@linaro.org> Message-ID: <1527184223.22014.13.camel@cavium.com> On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote: >?? > TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way > of saying that an rtl instruction preserves the low part of a > register but clobbers the high part.????We would need something like > Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. > > Another approach would be to piggy-back on the -fipa-ra > infrastructure > and record that vector PCS functions only clobber Q0-Q7.????If -fipa-ra > knows that a function doesn't clobber Q8-Q15 then that should > override > TARGET_HARD_REGNO_CALL_PART_CLOBBERED.????(I'm not sure whether it does > in practice, but it should :-)????And if it doesn't that's a bug that's > worth fixing for its own sake.) > > Thanks, > Richard Alan, I have been looking at your CLOBBER_HIGH patches to see if they might be helpful in implementing the ARM SIMD Vector ABI in GCC. I have also been looking at the -fipa-ra flag and how it works. I was wondering if you considered using the ipa-ra infrastructure for the SVE work that you are currently trying to support with?? the CLOBBER_HIGH macro? My current thought for the ABI work is to mark all the floating point / vector registers as caller saved (the lower half of V8-V15 are currently callee saved) and remove TARGET_HARD_REGNO_CALL_PART_CLOBBERED. This should work but would be inefficient. The next step would be to split get_call_reg_set_usage up into two functions so that I don't have to pass in a default set of registers.????One function would return call_used_reg_set by default (but could return a smaller set if it had actual used register information) and the other would return regs_invalidated by_call by default (but could also return a smaller set). Next I would add a 'largest mode used' array to call_cgraph_rtl_info structure in addition to the current function_used_regs register set. Then I could turn the get_call_reg_set_usage replacement functions into target specific functions and with the information in the call_cgraph_rtl_info structure and any simd attribute information on a function I could modify what registers are really being used/invalidated without being saved. If the called function only uses the bottom half of a register it would not be marked as used/invalidated.????If it uses the entire register and the function is not marked as simd, then the register would marked as used/invalidated.????If the function was marked as simd the register would not be marked because a simd function would save both the upper and lower halves of a callee saved register (whereas a non simd function would only save the lower half). Does this sound like something that could be used in place of your?? CLOBBER_HIGH patch? Steve Ellcey sellcey@cavium.com From jozef.l@mittosystems.com Thu May 24 18:01:00 2018 From: jozef.l@mittosystems.com (Jozef Lawrynowicz) Date: Thu, 24 May 2018 18:01:00 -0000 Subject: Must TYPE_MODE of a UNION_TYPE be of MODE_INT class? Message-ID: <3115e5bc-cd54-24e8-727d-6e88b79f3f46@mittosystems.com> I've written a patch to fix the transparent_union attribute when the first field in a union is MODE_PARTIAL_INT, but I noticed that the current code only allows the TYPE_MODE of a UNION_TYPE to be of MODE_INT class. See stor-layout.c (compute_record_mode), particularly this section: /* If we only have one real field; use its mode if that mode's size matches the type's size. This only applies to RECORD_TYPE. This does not apply to unions. */ if (TREE_CODE (type) == RECORD_TYPE && mode != VOIDmode && tree_fits_uhwi_p (TYPE_SIZE (type)) && known_eq (GET_MODE_BITSIZE (mode), tree_to_uhwi (TYPE_SIZE (type)))) ; else mode = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1).else_blk (); Is there a reason the type of the union must be of MODE_INT class but a RECORD_TYPE with one field can have the class of it's single field? If the else clause is changed to the following, then transparent_union unions with a first field of double (e.g. gcc/testsuite/g++.dg/ext/transparent-union.C) no longer error. mode = mode_for_size_tree (TYPE_SIZE (type), (GET_MODE_CLASS (mode) != MODE_RANDOM ? GET_MODE_CLASS (mode) : MODE_INT), 1).else_blk (); Could anyone provide some insight on whether the TYPE_MODE of a union should stay as a MODE_INT class or if it would be acceptable for the TYPE_MODE to be other classes e.g. MODE_FLOAT? From segher@kernel.crashing.org Thu May 24 18:25:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Thu, 24 May 2018 18:25:00 -0000 Subject: How do I stop gcc from loading data into registers when that's not needed? In-Reply-To: <7B2B2AAD-9A14-42AD-B05F-0447178E4BC8@comcast.net> References: <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> <20180522192609.GQ17342@gate.crashing.org> <50B057F6-F0AB-4213-91DE-2A988C72436F@comcast.net> <7B2B2AAD-9A14-42AD-B05F-0447178E4BC8@comcast.net> Message-ID: <20180524182436.GC17342@gate.crashing.org> On Wed, May 23, 2018 at 08:33:13PM -0400, Paul Koning wrote: > > On May 23, 2018, at 5:46 AM, Richard Biener wrote: > >> 2. The reported costs for the various insns are > >> r22:HI=['x'] 6 > >> cmp(r22:HI,r23:HI) 4 > >> cmp(['x'],['y']) 16 > >> so the added cost for the memory argument in the cmp is 6 -- the same > >> as the whole cost for the mov. That certainly explains the behavior. It > >> isn't what I want it to be. Which target hook(s) are involved in these > >> numbers? I don't see them in my rtx_costs hook. > > > > The rtx_cost hook. I think the costs above make sense. There's also a > > new insn_cost hook but you have to dig whether combine uses that. > > Otherwise address_cost might be involved. > > Thanks. For a pdp11, those costs aren't right because mov and cmp and > a memory reference each have about the same cost. So 8, 4, 12 would be > closer. But the real question for me at this point is where to find > the knobs that adjust these choices. > > The various cost hooks have me confused, and the GCCINT manual is not > really enlightening. There is rtx_costs, insn_cost, and addr_cost. > It sort of feels like insn_cost and addr_cost together would provide > roughly the same information that rtx_costs gives. In the existing > platforms, I see rtx_costs everywhere, addr_cost in a fair number of > targets, and insn_cost in just one (rs6000). Can someone explain the > interaction of these various cost hooks, and what happens if you define > various combinations of the three? rtx_costs computes the cost for any rtx (an insn, a set, a set source, any random piece of one). set_src_cost, set_rtx_cost, etc. are helper functions that use that. Those functions do not work for parallels. Also, costs are not additive like this simplified model assumes. Also, more complex backends tend to miss many cases in their rtx_costs function. Many passes that want costs want to know the cost of a full insn. Like combine. That's why I created insn_cost: it solves all of the above problems. I'll hopefully make most passes use insn_cost for GCC 9. All of the very easy ones already do. Segher From ebotcazou@adacore.com Thu May 24 19:14:00 2018 From: ebotcazou@adacore.com (Eric Botcazou) Date: Thu, 24 May 2018 19:14:00 -0000 Subject: Must TYPE_MODE of a UNION_TYPE be of MODE_INT class? In-Reply-To: <3115e5bc-cd54-24e8-727d-6e88b79f3f46@mittosystems.com> References: <3115e5bc-cd54-24e8-727d-6e88b79f3f46@mittosystems.com> Message-ID: <2007177.c9C0e82VNH@polaris> > See stor-layout.c (compute_record_mode), particularly this section: > > /* If we only have one real field; use its mode if that mode's size > matches the type's size. This only applies to RECORD_TYPE. This > does not apply to unions. */ > if (TREE_CODE (type) == RECORD_TYPE && mode != VOIDmode > && tree_fits_uhwi_p (TYPE_SIZE (type)) > && known_eq (GET_MODE_BITSIZE (mode), tree_to_uhwi (TYPE_SIZE > (type)))) > ; > else > mode = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1).else_blk (); > > Is there a reason the type of the union must be of MODE_INT class but a > RECORD_TYPE with one field can have the class of it's single field? Yes, ABIs that pass structures or unions in registers traditionally pass the unions always in integer registers, whereas for structures it's dependent on the types of the fields. > Could anyone provide some insight on whether the TYPE_MODE of a union should > stay as a MODE_INT class or if it would be acceptable for the TYPE_MODE to > be other classes e.g. MODE_FLOAT? No, I don't think we want to change that. -- Eric Botcazou From dani.ward@managebusinesses.com Thu May 24 19:33:00 2018 From: dani.ward@managebusinesses.com (dani.ward@managebusinesses.com) Date: Thu, 24 May 2018 19:33:00 -0000 Subject: Blockchain users contacts list Message-ID: <00000000000082b748056cf8baef@google.com>
Hi,

Would you be interested in Blockchain users list which can help you to grow up your business.

Titles: C-level, VP-level, Directors, Managers Etc.

We also have 5000+ Technology verified users contacts.

Please let me know if you?re interested, and I will get back to your with more information on the same.

Feel free to share your thoughts.

Regards,
Dani

 

powered by GSM. Free mail merge and email marketing software for Gmail. From msebor@gmail.com Thu May 24 21:36:00 2018 From: msebor@gmail.com (Martin Sebor) Date: Thu, 24 May 2018 21:36:00 -0000 Subject: [PATCH] tighten up -Wclass-memaccess for ctors/dtors (PR 84851) Message-ID: A fix for 84851 - missing -Wclass-memaccess for a memcpy in a copy ctor with a non-trivial member was implemented but disabled for GCC 8 but because it was late, with the expectation we would enable it for GCC 9. The attached removes the code that guards the full fix to enable it. Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: gcc-84851.diff Type: text/x-patch Size: 1479 bytes Desc: not available URL: From ibayzjst@netvigator.com Thu May 24 21:46:00 2018 From: ibayzjst@netvigator.com (Dena Schroeder) Date: Thu, 24 May 2018 21:46:00 -0000 Subject: Account Over Due Gcc, hanson brenda beneath lissajous Message-ID: <1BDEAB0FA65C2EC@soon> aldrich [4 From gccadmin@gcc.gnu.org Thu May 24 22:40:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Thu, 24 May 2018 22:40:00 -0000 Subject: gcc-7-20180524 is now available Message-ID: <20180524224041.116401.qmail@sourceware.org> Snapshot gcc-7-20180524 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20180524/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 260697 You'll find: gcc-7-20180524.tar.xz Complete GCC SHA256=19a344865b725953665d4794930e95f875a28f5359843215ebb66afcf1e19471 SHA1=2bae79e6b1072545e338c8f959de62a405efda1a Diffs from 7-20180517 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From paulkoning@comcast.net Fri May 25 00:35:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Fri, 25 May 2018 00:35:00 -0000 Subject: virtual-stack-vars reference not resolved in vregs Message-ID: <815571FD-AE59-4D56-9AFA-73F605A4DCC0@comcast.net> I'm doing cleanup on the pdp11 back end to get rid of a number of ICE in the test suite. One is in gcc.c-torture/compile/20001221.c -- it works in GCC 4 but fails in GCC 5 and later. In the dumps, I see in the output from the expand phase a large number of memory reference via the "virtual-stack-vars" pseudo-register. In vregs those all become frame pointer references in V4, but in V5 and later, about 10% of them remain pseudo-register references all the way through final assembly output generation. There things blow up because it's an invalid base register. Here are some snippets: static void foo () { long maplength; int type; { const long nibbles = 8; char buf1[nibbles + 1]; ... V4, expand output: ;; nibbles_18 = 8; (insn 7 6 0 (set (mem/c:SI (plus:HI (reg/f:HI 17 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 nibbles+0 S4 A16]) (const_int 8 [0x8])) /Users/pkoning/Documents/svn/gcc/gcc/testsuite/gcc.c-torture/compile/20001221-1.c:8 -1 (nil)) ;; _19 = (sizetype) nibbles_18; (insn 8 7 0 (set (mem/c:HI (plus:HI (reg/f:HI 17 virtual-stack-vars) (const_int -10 [0xfffffffffffffff6])) [0 D.1480+0 S2 A16]) (subreg:HI (mem/c:SI (plus:HI (reg/f:HI 17 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 nibbles+0 S4 A16]) 0)) /Users/pkoning/Documents/svn/gcc/gcc/testsuite/gcc.c-torture/compile/20001221-1.c:9 -1 (nil)) The second insn is picking up the low order half of "nibbles" to use as the index; note that Pmode is HI for this target. The vregs pass turns it into this: (insn 7 6 8 2 (set (mem/c:SI (plus:HI (reg/f:HI 14 fp) (const_int -8 [0xfffffffffffffff8])) [0 nibbles+0 S4 A16]) (const_int 8 [0x8])) /Users/pkoning/Documents/svn/gcc/gcc/testsuite/gcc.c-torture/compile/20001221-1.c:8 12 {movsi} (nil)) (insn 8 7 9 2 (set (mem/c:HI (plus:HI (reg/f:HI 14 fp) (const_int -10 [0xfffffffffffffff6])) [0 D.1480+0 S2 A16]) (subreg:HI (mem/c:SI (plus:HI (reg/f:HI 17 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 nibbles+0 S4 A16]) 0)) /Users/pkoning/Documents/svn/gcc/gcc/testsuite/gcc.c-torture/compile/20001221-1.c:9 14 {movhi} (nil)) which is what I would expect. In V5, the expand output is the same: (insn 7 6 0 (set (mem/c:SI (plus:HI (reg/f:HI 17 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 nibbles+0 S4 A16]) (const_int 8 [0x8])) /Users/pkoning/Documents/svn/gcc/gcc/testsuite/gcc.c-torture/compile/20001221-1.c:8 -1 (nil)) ;; _19 = (sizetype) nibbles_18; (insn 8 7 0 (set (mem/c:HI (plus:HI (reg/f:HI 17 virtual-stack-vars) (const_int -10 [0xfffffffffffffff6])) [0 D.1480+0 S2 A16]) (subreg:HI (mem/c:SI (plus:HI (reg/f:HI 17 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 nibbles+0 S4 A16]) 0)) /Users/pkoning/Documents/svn/gcc/gcc/testsuite/gcc.c-torture/compile/20001221-1.c:9 -1 (nil)) but now vregs doesn't convert the virtual-stack-vars reference in the subreg operand: (insn 7 6 8 2 (set (mem/c:SI (plus:HI (reg/f:HI 14 fp) (const_int -8 [0xfffffffffffffff8])) [0 nibbles+0 S4 A16]) (const_int 8 [0x8])) /Users/pkoning/Documents/svn/gcc/gcc/testsuite/gcc.c-torture/compile/20001221-1.c:8 12 {movsi} (nil)) (insn 8 7 9 2 (set (mem/c:HI (plus:HI (reg/f:HI 14 fp) (const_int -10 [0xfffffffffffffff6])) [0 D.1480+0 S2 A16]) (subreg:HI (mem/c:SI (plus:HI (reg/f:HI 17 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 nibbles+0 S4 A16]) 0)) /Users/pkoning/Documents/svn/gcc/gcc/testsuite/gcc.c-torture/compile/20001221-1.c:9 14 {movhi} (nil)) Is this something the back end is responsible for getting right, for example via the machine description file? If so, any hints where to start? paul From ebotcazou@adacore.com Fri May 25 06:11:00 2018 From: ebotcazou@adacore.com (Eric Botcazou) Date: Fri, 25 May 2018 06:11:00 -0000 Subject: virtual-stack-vars reference not resolved in vregs In-Reply-To: <815571FD-AE59-4D56-9AFA-73F605A4DCC0@comcast.net> References: <815571FD-AE59-4D56-9AFA-73F605A4DCC0@comcast.net> Message-ID: <6976655.QsN8PLxYON@polaris> > Is this something the back end is responsible for getting right, for example > via the machine description file? If so, any hints where to start? The SUBREG of MEM is invalid at this stage. -- Eric Botcazou From xiaohong.a.wang@ericsson.com Fri May 25 06:51:00 2018 From: xiaohong.a.wang@ericsson.com (Xiaohong Wang A) Date: Fri, 25 May 2018 06:51:00 -0000 Subject: about update gcc Message-ID: Hi: My wind river kernel version is 4.1 and the gcc version is 5.2.0 in it.how can I update the gcc tool to 6.0 or upper version? From prathamesh.kulkarni@linaro.org Fri May 25 09:23:00 2018 From: prathamesh.kulkarni@linaro.org (Prathamesh Kulkarni) Date: Fri, 25 May 2018 09:23:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> Message-ID: On 23 May 2018 at 18:37, Jeff Law wrote: > On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >> On 23 May 2018 at 13:58, Richard Biener wrote: >>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>> >>>> Hi, >>>> I am trying to work on PR80155, which exposes a problem with code >>>> hoisting and register pressure on a leading embedded benchmark for ARM >>>> cortex-m7, where code-hoisting causes an extra register spill. >>>> >>>> I have attached two test-cases which (hopefully) are representative of >>>> the original test-case. >>>> The first one (trans_dfa.c) is bigger and somewhat similar to the >>>> original test-case and trans_dfa_2.c is hand-reduced version of >>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>>> The test-cases in the PR are probably not relevant. >>>> >>>> Initially I thought the spill was happening because of "too many >>>> hoistings" taking place in original test-case thus increasing the >>>> register pressure, but it seems the spill is possibly caused because >>>> expression gets hoisted out of a block that is on loop exit. >>>> >>>> For example, the following hoistings take place with trans_dfa_2.c: >>>> >>>> (1) Inserting expression in block 4 for code hoisting: >>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>> >>>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) >>>> >>>> (3) Inserting expression in block 4 for code hoisting: >>>> {pointer_plus_expr,s_33,1} (0023) >>>> >>>> (4) Inserting expression in block 3 for code hoisting: >>>> {pointer_plus_expr,s_33,1} (0023) >>>> >>>> The issue seems to be hoisting of (*tab + 1) which consists of first >>>> two hoistings in block 4 >>>> from blocks 5 and 9, which causes the extra spill. I verified that by >>>> disabling hoisting into block 4, >>>> which resulted in no extra spills. >>>> >>>> I wonder if that's because the expression (*tab + 1) is getting >>>> hoisted from blocks 5 and 9, >>>> which are on loop exit ? So the expression that was previously >>>> computed in a block on loop exit, gets hoisted outside that block >>>> which possibly makes the allocator more defensive ? Similarly >>>> disabling hoisting of expressions which appeared in blocks on loop >>>> exit in original test-case prevented the extra spill. The other >>>> hoistings didn't seem to matter. >>> >>> I think that's simply co-incidence. The only thing that makes >>> a block that also exits from the loop special is that an >>> expression could be sunk out of the loop and hoisting (commoning >>> with another path) could prevent that. But that isn't what is >>> happening here and it would be a pass ordering issue as >>> the sinking pass runs only after hoisting (no idea why exactly >>> but I guess there are cases where we want to prefer CSE over >>> sinking). So you could try if re-ordering PRE and sinking helps >>> your testcase. >> Thanks for the suggestions. Placing sink pass before PRE works >> for both these test-cases! Sadly it still causes the spill for the benchmark -:( >> I will try to create a better approximation of the original test-case. >>> >>> What I do see is a missed opportunity to merge the successors >>> of BB 4. After PRE we have >>> >>> [local count: 159303558]: >>> : >>> pretmp_123 = *tab_37(D); >>> _87 = pretmp_123 + 1; >>> if (c_36 == 65) >>> goto ; [34.00%] >>> else >>> goto ; [66.00%] >>> >>> [local count: 54163210]: >>> *tab_37(D) = _87; >>> _96 = MEM[(char *)s_57 + 1B]; >>> if (_96 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> [local count: 105140348]: >>> *tab_37(D) = _87; >>> _56 = MEM[(char *)s_57 + 1B]; >>> if (_56 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> here at least the stores and loads can be hoisted. Note this >>> may also point at the real issue of the code hoisting which is >>> tearing apart the RMW operation? >> Indeed, this possibility seems much more likely than block being on loop exit. >> I will try to "hardcode" the load/store hoists into block 4 for this >> specific test-case to check >> if that prevents the spill. > Even if it prevents the spill in this case, it's likely a good thing to > do. The statements prior to the conditional in bb5 and bb8 should be > hoisted, leaving bb5 and bb8 with just their conditionals. Hi, It seems disabling forwprop somehow works for causing no extra spills on the original test-case. For instance, Hoisting without forwprop: bb 3: _1 = tab_1(D) + 8 pretmp_268 = MEM[tab_1(D) + 8B]; _2 = pretmp_268 + 1; goto or bb 4: *_1 = _ 2 bb 5: *_1 = _2 Hoisting with forwprop: bb 3: pretmp_164 = MEM[tab_1(D) + 8B]; _2 = pretmp_164 + 1 goto or bb 4: MEM[tab_1(D) + 8] = _2; bb 5: MEM[tab_1(D) + 8] = _2; Although in both cases, we aren't hoisting stores, the issues with forwprop for this case seems to be the folding of *_1 = _2 into MEM[tab_1(D) + 8] = _2 ? Disabling folding to mem_ref[base + offset] in forwprop "works" in the sense it created same set of hoistings as without forwprop, however it still results in additional spills (albeit different registers). That's because forwprop seems to be increasing live range of prephitmp_217 by substituting _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + 1). On the other hand, Bin pointed out to me in private that forwprop also helps to restrict register pressure by propagating "tab + const_int" for same test-case. So I am not really sure if there's an easier fix than having heuristics for estimating register pressure at TREE level ? I would be grateful for suggestions on how to proceed from here. Thanks! Regards, Prathamesh > > Jeff From amker.cheng@gmail.com Fri May 25 09:49:00 2018 From: amker.cheng@gmail.com (Bin.Cheng) Date: Fri, 25 May 2018 09:49:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> Message-ID: On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni wrote: > On 23 May 2018 at 18:37, Jeff Law wrote: >> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>> On 23 May 2018 at 13:58, Richard Biener wrote: >>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>>> >>>>> Hi, >>>>> I am trying to work on PR80155, which exposes a problem with code >>>>> hoisting and register pressure on a leading embedded benchmark for ARM >>>>> cortex-m7, where code-hoisting causes an extra register spill. >>>>> >>>>> I have attached two test-cases which (hopefully) are representative of >>>>> the original test-case. >>>>> The first one (trans_dfa.c) is bigger and somewhat similar to the >>>>> original test-case and trans_dfa_2.c is hand-reduced version of >>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>>>> The test-cases in the PR are probably not relevant. >>>>> >>>>> Initially I thought the spill was happening because of "too many >>>>> hoistings" taking place in original test-case thus increasing the >>>>> register pressure, but it seems the spill is possibly caused because >>>>> expression gets hoisted out of a block that is on loop exit. >>>>> >>>>> For example, the following hoistings take place with trans_dfa_2.c: >>>>> >>>>> (1) Inserting expression in block 4 for code hoisting: >>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>>> >>>>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) >>>>> >>>>> (3) Inserting expression in block 4 for code hoisting: >>>>> {pointer_plus_expr,s_33,1} (0023) >>>>> >>>>> (4) Inserting expression in block 3 for code hoisting: >>>>> {pointer_plus_expr,s_33,1} (0023) >>>>> >>>>> The issue seems to be hoisting of (*tab + 1) which consists of first >>>>> two hoistings in block 4 >>>>> from blocks 5 and 9, which causes the extra spill. I verified that by >>>>> disabling hoisting into block 4, >>>>> which resulted in no extra spills. >>>>> >>>>> I wonder if that's because the expression (*tab + 1) is getting >>>>> hoisted from blocks 5 and 9, >>>>> which are on loop exit ? So the expression that was previously >>>>> computed in a block on loop exit, gets hoisted outside that block >>>>> which possibly makes the allocator more defensive ? Similarly >>>>> disabling hoisting of expressions which appeared in blocks on loop >>>>> exit in original test-case prevented the extra spill. The other >>>>> hoistings didn't seem to matter. >>>> >>>> I think that's simply co-incidence. The only thing that makes >>>> a block that also exits from the loop special is that an >>>> expression could be sunk out of the loop and hoisting (commoning >>>> with another path) could prevent that. But that isn't what is >>>> happening here and it would be a pass ordering issue as >>>> the sinking pass runs only after hoisting (no idea why exactly >>>> but I guess there are cases where we want to prefer CSE over >>>> sinking). So you could try if re-ordering PRE and sinking helps >>>> your testcase. >>> Thanks for the suggestions. Placing sink pass before PRE works >>> for both these test-cases! Sadly it still causes the spill for the benchmark -:( >>> I will try to create a better approximation of the original test-case. >>>> >>>> What I do see is a missed opportunity to merge the successors >>>> of BB 4. After PRE we have >>>> >>>> [local count: 159303558]: >>>> : >>>> pretmp_123 = *tab_37(D); >>>> _87 = pretmp_123 + 1; >>>> if (c_36 == 65) >>>> goto ; [34.00%] >>>> else >>>> goto ; [66.00%] >>>> >>>> [local count: 54163210]: >>>> *tab_37(D) = _87; >>>> _96 = MEM[(char *)s_57 + 1B]; >>>> if (_96 != 0) >>>> goto ; [89.00%] >>>> else >>>> goto ; [11.00%] >>>> >>>> [local count: 105140348]: >>>> *tab_37(D) = _87; >>>> _56 = MEM[(char *)s_57 + 1B]; >>>> if (_56 != 0) >>>> goto ; [89.00%] >>>> else >>>> goto ; [11.00%] >>>> >>>> here at least the stores and loads can be hoisted. Note this >>>> may also point at the real issue of the code hoisting which is >>>> tearing apart the RMW operation? >>> Indeed, this possibility seems much more likely than block being on loop exit. >>> I will try to "hardcode" the load/store hoists into block 4 for this >>> specific test-case to check >>> if that prevents the spill. >> Even if it prevents the spill in this case, it's likely a good thing to >> do. The statements prior to the conditional in bb5 and bb8 should be >> hoisted, leaving bb5 and bb8 with just their conditionals. > Hi, > It seems disabling forwprop somehow works for causing no extra spills > on the original test-case. > > For instance, > Hoisting without forwprop: > > bb 3: > _1 = tab_1(D) + 8 > pretmp_268 = MEM[tab_1(D) + 8B]; > _2 = pretmp_268 + 1; > goto or > > bb 4: > *_1 = _ 2 > > bb 5: > *_1 = _2 > > Hoisting with forwprop: > > bb 3: > pretmp_164 = MEM[tab_1(D) + 8B]; > _2 = pretmp_164 + 1 > goto or > > bb 4: > MEM[tab_1(D) + 8] = _2; > > bb 5: > MEM[tab_1(D) + 8] = _2; > > Although in both cases, we aren't hoisting stores, the issues with forwprop > for this case seems to be the folding of > *_1 = _2 > into > MEM[tab_1(D) + 8] = _2 ? This isn't an issue, right? IIUC, tab_1(D) used all over the loop thus propagating _1 using (tab_1(D) + 8) actually removes one live range. > > Disabling folding to mem_ref[base + offset] in forwprop "works" in the > sense it created same set of hoistings as without forwprop, however it > still results in additional spills (albeit different registers). > > That's because forwprop seems to be increasing live range of > prephitmp_217 by substituting > _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + 1). Hmm, it's hard to discuss private benchmarks, not sure which dump shall I find prephitmp_221/prephitmp_217 stuff. > On the other hand, Bin pointed out to me in private that forwprop also > helps to restrict register pressure by propagating "tab + const_int" > for same test-case. > > So I am not really sure if there's an easier fix than having > heuristics for estimating register pressure at TREE level ? I would be Easy fix, maybe not. OTOH, I am more convinced passes like forwprop/sink/hoisting can be improved by taking live range into consideration. Specifically, to direct such passes when moving code around different basic blocks, because inter-block register pressure is hard to resolve afterwards. As suggested by Jeff and Richi, I guess the first step would be doing experiments, collecting more benchmark data for reordering sink before pre? It enables code sink as well as decreases register pressure in the original reduced cases IIRC. Thanks, bin > grateful for suggestions on how to proceed from here. > Thanks! > > Regards, > Prathamesh >> >> Jeff From rguenther@suse.de Fri May 25 09:58:00 2018 From: rguenther@suse.de (Richard Biener) Date: Fri, 25 May 2018 09:58:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> Message-ID: On Fri, 25 May 2018, Bin.Cheng wrote: > On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni > wrote: > > On 23 May 2018 at 18:37, Jeff Law wrote: > >> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: > >>> On 23 May 2018 at 13:58, Richard Biener wrote: > >>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > >>>> > >>>>> Hi, > >>>>> I am trying to work on PR80155, which exposes a problem with code > >>>>> hoisting and register pressure on a leading embedded benchmark for ARM > >>>>> cortex-m7, where code-hoisting causes an extra register spill. > >>>>> > >>>>> I have attached two test-cases which (hopefully) are representative of > >>>>> the original test-case. > >>>>> The first one (trans_dfa.c) is bigger and somewhat similar to the > >>>>> original test-case and trans_dfa_2.c is hand-reduced version of > >>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c > >>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. > >>>>> The test-cases in the PR are probably not relevant. > >>>>> > >>>>> Initially I thought the spill was happening because of "too many > >>>>> hoistings" taking place in original test-case thus increasing the > >>>>> register pressure, but it seems the spill is possibly caused because > >>>>> expression gets hoisted out of a block that is on loop exit. > >>>>> > >>>>> For example, the following hoistings take place with trans_dfa_2.c: > >>>>> > >>>>> (1) Inserting expression in block 4 for code hoisting: > >>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) > >>>>> > >>>>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) > >>>>> > >>>>> (3) Inserting expression in block 4 for code hoisting: > >>>>> {pointer_plus_expr,s_33,1} (0023) > >>>>> > >>>>> (4) Inserting expression in block 3 for code hoisting: > >>>>> {pointer_plus_expr,s_33,1} (0023) > >>>>> > >>>>> The issue seems to be hoisting of (*tab + 1) which consists of first > >>>>> two hoistings in block 4 > >>>>> from blocks 5 and 9, which causes the extra spill. I verified that by > >>>>> disabling hoisting into block 4, > >>>>> which resulted in no extra spills. > >>>>> > >>>>> I wonder if that's because the expression (*tab + 1) is getting > >>>>> hoisted from blocks 5 and 9, > >>>>> which are on loop exit ? So the expression that was previously > >>>>> computed in a block on loop exit, gets hoisted outside that block > >>>>> which possibly makes the allocator more defensive ? Similarly > >>>>> disabling hoisting of expressions which appeared in blocks on loop > >>>>> exit in original test-case prevented the extra spill. The other > >>>>> hoistings didn't seem to matter. > >>>> > >>>> I think that's simply co-incidence. The only thing that makes > >>>> a block that also exits from the loop special is that an > >>>> expression could be sunk out of the loop and hoisting (commoning > >>>> with another path) could prevent that. But that isn't what is > >>>> happening here and it would be a pass ordering issue as > >>>> the sinking pass runs only after hoisting (no idea why exactly > >>>> but I guess there are cases where we want to prefer CSE over > >>>> sinking). So you could try if re-ordering PRE and sinking helps > >>>> your testcase. > >>> Thanks for the suggestions. Placing sink pass before PRE works > >>> for both these test-cases! Sadly it still causes the spill for the benchmark -:( > >>> I will try to create a better approximation of the original test-case. > >>>> > >>>> What I do see is a missed opportunity to merge the successors > >>>> of BB 4. After PRE we have > >>>> > >>>> [local count: 159303558]: > >>>> : > >>>> pretmp_123 = *tab_37(D); > >>>> _87 = pretmp_123 + 1; > >>>> if (c_36 == 65) > >>>> goto ; [34.00%] > >>>> else > >>>> goto ; [66.00%] > >>>> > >>>> [local count: 54163210]: > >>>> *tab_37(D) = _87; > >>>> _96 = MEM[(char *)s_57 + 1B]; > >>>> if (_96 != 0) > >>>> goto ; [89.00%] > >>>> else > >>>> goto ; [11.00%] > >>>> > >>>> [local count: 105140348]: > >>>> *tab_37(D) = _87; > >>>> _56 = MEM[(char *)s_57 + 1B]; > >>>> if (_56 != 0) > >>>> goto ; [89.00%] > >>>> else > >>>> goto ; [11.00%] > >>>> > >>>> here at least the stores and loads can be hoisted. Note this > >>>> may also point at the real issue of the code hoisting which is > >>>> tearing apart the RMW operation? > >>> Indeed, this possibility seems much more likely than block being on loop exit. > >>> I will try to "hardcode" the load/store hoists into block 4 for this > >>> specific test-case to check > >>> if that prevents the spill. > >> Even if it prevents the spill in this case, it's likely a good thing to > >> do. The statements prior to the conditional in bb5 and bb8 should be > >> hoisted, leaving bb5 and bb8 with just their conditionals. > > Hi, > > It seems disabling forwprop somehow works for causing no extra spills > > on the original test-case. > > > > For instance, > > Hoisting without forwprop: > > > > bb 3: > > _1 = tab_1(D) + 8 > > pretmp_268 = MEM[tab_1(D) + 8B]; > > _2 = pretmp_268 + 1; > > goto or > > > > bb 4: > > *_1 = _ 2 > > > > bb 5: > > *_1 = _2 > > > > Hoisting with forwprop: > > > > bb 3: > > pretmp_164 = MEM[tab_1(D) + 8B]; > > _2 = pretmp_164 + 1 > > goto or > > > > bb 4: > > MEM[tab_1(D) + 8] = _2; > > > > bb 5: > > MEM[tab_1(D) + 8] = _2; > > > > Although in both cases, we aren't hoisting stores, the issues with forwprop > > for this case seems to be the folding of > > *_1 = _2 > > into > > MEM[tab_1(D) + 8] = _2 ? > > This isn't an issue, right? IIUC, tab_1(D) used all over the loop > thus propagating _1 using (tab_1(D) + 8) actually removes one live > range. > > > > > Disabling folding to mem_ref[base + offset] in forwprop "works" in the > > sense it created same set of hoistings as without forwprop, however it > > still results in additional spills (albeit different registers). > > > > That's because forwprop seems to be increasing live range of > > prephitmp_217 by substituting > > _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + 1). > Hmm, it's hard to discuss private benchmarks, not sure which dump > shall I find prephitmp_221/prephitmp_217 stuff. > > > On the other hand, Bin pointed out to me in private that forwprop also > > helps to restrict register pressure by propagating "tab + const_int" > > for same test-case. > > > > So I am not really sure if there's an easier fix than having > > heuristics for estimating register pressure at TREE level ? I would be > Easy fix, maybe not. OTOH, I am more convinced passes like > forwprop/sink/hoisting can be improved by taking live range into > consideration. Specifically, to direct such passes when moving code > around different basic blocks, because inter-block register pressure > is hard to resolve afterwards. > > As suggested by Jeff and Richi, I guess the first step would be doing > experiments, collecting more benchmark data for reordering sink before > pre? It enables code sink as well as decreases register pressure in > the original reduced cases IIRC. Note sinking also doesn't have a cost model that takes into account register pressure. But yes, I can't think of a reason to do sinking after PRE (well, apart from PRE performing value-numbering and that of course might expose sinking opportunities). So yes, perform some experiments - you should be able to use -fdump-statistics[-stats] and look at the number of sunk stmts reported (and also the number of PRE eliminations ultimatively performed). Richard. > Thanks, > bin > > grateful for suggestions on how to proceed from here. > > Thanks! > > > > Regards, > > Prathamesh > >> > >> Jeff > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) From law@redhat.com Fri May 25 16:57:00 2018 From: law@redhat.com (Jeff Law) Date: Fri, 25 May 2018 16:57:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> Message-ID: <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> On 05/25/2018 03:49 AM, Bin.Cheng wrote: > On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni > wrote: >> On 23 May 2018 at 18:37, Jeff Law wrote: >>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>>> On 23 May 2018 at 13:58, Richard Biener wrote: >>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>>>> >>>>>> Hi, >>>>>> I am trying to work on PR80155, which exposes a problem with code >>>>>> hoisting and register pressure on a leading embedded benchmark for ARM >>>>>> cortex-m7, where code-hoisting causes an extra register spill. >>>>>> >>>>>> I have attached two test-cases which (hopefully) are representative of >>>>>> the original test-case. >>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to the >>>>>> original test-case and trans_dfa_2.c is hand-reduced version of >>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>>>>> The test-cases in the PR are probably not relevant. >>>>>> >>>>>> Initially I thought the spill was happening because of "too many >>>>>> hoistings" taking place in original test-case thus increasing the >>>>>> register pressure, but it seems the spill is possibly caused because >>>>>> expression gets hoisted out of a block that is on loop exit. >>>>>> >>>>>> For example, the following hoistings take place with trans_dfa_2.c: >>>>>> >>>>>> (1) Inserting expression in block 4 for code hoisting: >>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>>>> >>>>>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) >>>>>> >>>>>> (3) Inserting expression in block 4 for code hoisting: >>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>> >>>>>> (4) Inserting expression in block 3 for code hoisting: >>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>> >>>>>> The issue seems to be hoisting of (*tab + 1) which consists of first >>>>>> two hoistings in block 4 >>>>>> from blocks 5 and 9, which causes the extra spill. I verified that by >>>>>> disabling hoisting into block 4, >>>>>> which resulted in no extra spills. >>>>>> >>>>>> I wonder if that's because the expression (*tab + 1) is getting >>>>>> hoisted from blocks 5 and 9, >>>>>> which are on loop exit ? So the expression that was previously >>>>>> computed in a block on loop exit, gets hoisted outside that block >>>>>> which possibly makes the allocator more defensive ? Similarly >>>>>> disabling hoisting of expressions which appeared in blocks on loop >>>>>> exit in original test-case prevented the extra spill. The other >>>>>> hoistings didn't seem to matter. >>>>> >>>>> I think that's simply co-incidence. The only thing that makes >>>>> a block that also exits from the loop special is that an >>>>> expression could be sunk out of the loop and hoisting (commoning >>>>> with another path) could prevent that. But that isn't what is >>>>> happening here and it would be a pass ordering issue as >>>>> the sinking pass runs only after hoisting (no idea why exactly >>>>> but I guess there are cases where we want to prefer CSE over >>>>> sinking). So you could try if re-ordering PRE and sinking helps >>>>> your testcase. >>>> Thanks for the suggestions. Placing sink pass before PRE works >>>> for both these test-cases! Sadly it still causes the spill for the benchmark -:( >>>> I will try to create a better approximation of the original test-case. >>>>> >>>>> What I do see is a missed opportunity to merge the successors >>>>> of BB 4. After PRE we have >>>>> >>>>> [local count: 159303558]: >>>>> : >>>>> pretmp_123 = *tab_37(D); >>>>> _87 = pretmp_123 + 1; >>>>> if (c_36 == 65) >>>>> goto ; [34.00%] >>>>> else >>>>> goto ; [66.00%] >>>>> >>>>> [local count: 54163210]: >>>>> *tab_37(D) = _87; >>>>> _96 = MEM[(char *)s_57 + 1B]; >>>>> if (_96 != 0) >>>>> goto ; [89.00%] >>>>> else >>>>> goto ; [11.00%] >>>>> >>>>> [local count: 105140348]: >>>>> *tab_37(D) = _87; >>>>> _56 = MEM[(char *)s_57 + 1B]; >>>>> if (_56 != 0) >>>>> goto ; [89.00%] >>>>> else >>>>> goto ; [11.00%] >>>>> >>>>> here at least the stores and loads can be hoisted. Note this >>>>> may also point at the real issue of the code hoisting which is >>>>> tearing apart the RMW operation? >>>> Indeed, this possibility seems much more likely than block being on loop exit. >>>> I will try to "hardcode" the load/store hoists into block 4 for this >>>> specific test-case to check >>>> if that prevents the spill. >>> Even if it prevents the spill in this case, it's likely a good thing to >>> do. The statements prior to the conditional in bb5 and bb8 should be >>> hoisted, leaving bb5 and bb8 with just their conditionals. >> Hi, >> It seems disabling forwprop somehow works for causing no extra spills >> on the original test-case. >> >> For instance, >> Hoisting without forwprop: >> >> bb 3: >> _1 = tab_1(D) + 8 >> pretmp_268 = MEM[tab_1(D) + 8B]; >> _2 = pretmp_268 + 1; >> goto or >> >> bb 4: >> *_1 = _ 2 >> >> bb 5: >> *_1 = _2 >> >> Hoisting with forwprop: >> >> bb 3: >> pretmp_164 = MEM[tab_1(D) + 8B]; >> _2 = pretmp_164 + 1 >> goto or >> >> bb 4: >> MEM[tab_1(D) + 8] = _2; >> >> bb 5: >> MEM[tab_1(D) + 8] = _2; >> >> Although in both cases, we aren't hoisting stores, the issues with forwprop >> for this case seems to be the folding of >> *_1 = _2 >> into >> MEM[tab_1(D) + 8] = _2 ? > > This isn't an issue, right? IIUC, tab_1(D) used all over the loop > thus propagating _1 using (tab_1(D) + 8) actually removes one live > range. > >> >> Disabling folding to mem_ref[base + offset] in forwprop "works" in the >> sense it created same set of hoistings as without forwprop, however it >> still results in additional spills (albeit different registers). >> >> That's because forwprop seems to be increasing live range of >> prephitmp_217 by substituting >> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + 1). > Hmm, it's hard to discuss private benchmarks, not sure which dump > shall I find prephitmp_221/prephitmp_217 stuff. > >> On the other hand, Bin pointed out to me in private that forwprop also >> helps to restrict register pressure by propagating "tab + const_int" >> for same test-case. >> >> So I am not really sure if there's an easier fix than having >> heuristics for estimating register pressure at TREE level ? I would be > Easy fix, maybe not. OTOH, I am more convinced passes like > forwprop/sink/hoisting can be improved by taking live range into > consideration. Specifically, to direct such passes when moving code > around different basic blocks, because inter-block register pressure > is hard to resolve afterwards. > > As suggested by Jeff and Richi, I guess the first step would be doing > experiments, collecting more benchmark data for reordering sink before > pre? It enables code sink as well as decreases register pressure in > the original reduced cases IIRC. We might even consider re-evaluating Bernd's work on what is effectively a gimple scheduler to minimize register pressure. Or we could look to extend your work into a generalized pressure reducing pass that we could run near the gimple/rtl border. The final possibility would be Click's algorithm from '95 adjusted to just do pressure reduction. jeff From rguenther@suse.de Fri May 25 17:54:00 2018 From: rguenther@suse.de (Richard Biener) Date: Fri, 25 May 2018 17:54:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> Message-ID: <3D61699E-73EC-4936-9FF3-494DF6816616@suse.de> On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: >On 05/25/2018 03:49 AM, Bin.Cheng wrote: >> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni >> wrote: >>> On 23 May 2018 at 18:37, Jeff Law wrote: >>>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>>>> On 23 May 2018 at 13:58, Richard Biener wrote: >>>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>>>>> >>>>>>> Hi, >>>>>>> I am trying to work on PR80155, which exposes a problem with >code >>>>>>> hoisting and register pressure on a leading embedded benchmark >for ARM >>>>>>> cortex-m7, where code-hoisting causes an extra register spill. >>>>>>> >>>>>>> I have attached two test-cases which (hopefully) are >representative of >>>>>>> the original test-case. >>>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to >the >>>>>>> original test-case and trans_dfa_2.c is hand-reduced version of >>>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>>>>>> The test-cases in the PR are probably not relevant. >>>>>>> >>>>>>> Initially I thought the spill was happening because of "too many >>>>>>> hoistings" taking place in original test-case thus increasing >the >>>>>>> register pressure, but it seems the spill is possibly caused >because >>>>>>> expression gets hoisted out of a block that is on loop exit. >>>>>>> >>>>>>> For example, the following hoistings take place with >trans_dfa_2.c: >>>>>>> >>>>>>> (1) Inserting expression in block 4 for code hoisting: >>>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>>>>> >>>>>>> (2) Inserting expression in block 4 for code hoisting: >{plus_expr,_4,1} (0006) >>>>>>> >>>>>>> (3) Inserting expression in block 4 for code hoisting: >>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>> >>>>>>> (4) Inserting expression in block 3 for code hoisting: >>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>> >>>>>>> The issue seems to be hoisting of (*tab + 1) which consists of >first >>>>>>> two hoistings in block 4 >>>>>>> from blocks 5 and 9, which causes the extra spill. I verified >that by >>>>>>> disabling hoisting into block 4, >>>>>>> which resulted in no extra spills. >>>>>>> >>>>>>> I wonder if that's because the expression (*tab + 1) is getting >>>>>>> hoisted from blocks 5 and 9, >>>>>>> which are on loop exit ? So the expression that was previously >>>>>>> computed in a block on loop exit, gets hoisted outside that >block >>>>>>> which possibly makes the allocator more defensive ? Similarly >>>>>>> disabling hoisting of expressions which appeared in blocks on >loop >>>>>>> exit in original test-case prevented the extra spill. The other >>>>>>> hoistings didn't seem to matter. >>>>>> >>>>>> I think that's simply co-incidence. The only thing that makes >>>>>> a block that also exits from the loop special is that an >>>>>> expression could be sunk out of the loop and hoisting (commoning >>>>>> with another path) could prevent that. But that isn't what is >>>>>> happening here and it would be a pass ordering issue as >>>>>> the sinking pass runs only after hoisting (no idea why exactly >>>>>> but I guess there are cases where we want to prefer CSE over >>>>>> sinking). So you could try if re-ordering PRE and sinking helps >>>>>> your testcase. >>>>> Thanks for the suggestions. Placing sink pass before PRE works >>>>> for both these test-cases! Sadly it still causes the spill for the >benchmark -:( >>>>> I will try to create a better approximation of the original >test-case. >>>>>> >>>>>> What I do see is a missed opportunity to merge the successors >>>>>> of BB 4. After PRE we have >>>>>> >>>>>> [local count: 159303558]: >>>>>> : >>>>>> pretmp_123 = *tab_37(D); >>>>>> _87 = pretmp_123 + 1; >>>>>> if (c_36 == 65) >>>>>> goto ; [34.00%] >>>>>> else >>>>>> goto ; [66.00%] >>>>>> >>>>>> [local count: 54163210]: >>>>>> *tab_37(D) = _87; >>>>>> _96 = MEM[(char *)s_57 + 1B]; >>>>>> if (_96 != 0) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> [local count: 105140348]: >>>>>> *tab_37(D) = _87; >>>>>> _56 = MEM[(char *)s_57 + 1B]; >>>>>> if (_56 != 0) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> here at least the stores and loads can be hoisted. Note this >>>>>> may also point at the real issue of the code hoisting which is >>>>>> tearing apart the RMW operation? >>>>> Indeed, this possibility seems much more likely than block being >on loop exit. >>>>> I will try to "hardcode" the load/store hoists into block 4 for >this >>>>> specific test-case to check >>>>> if that prevents the spill. >>>> Even if it prevents the spill in this case, it's likely a good >thing to >>>> do. The statements prior to the conditional in bb5 and bb8 should >be >>>> hoisted, leaving bb5 and bb8 with just their conditionals. >>> Hi, >>> It seems disabling forwprop somehow works for causing no extra >spills >>> on the original test-case. >>> >>> For instance, >>> Hoisting without forwprop: >>> >>> bb 3: >>> _1 = tab_1(D) + 8 >>> pretmp_268 = MEM[tab_1(D) + 8B]; >>> _2 = pretmp_268 + 1; >>> goto or >>> >>> bb 4: >>> *_1 = _ 2 >>> >>> bb 5: >>> *_1 = _2 >>> >>> Hoisting with forwprop: >>> >>> bb 3: >>> pretmp_164 = MEM[tab_1(D) + 8B]; >>> _2 = pretmp_164 + 1 >>> goto or >>> >>> bb 4: >>> MEM[tab_1(D) + 8] = _2; >>> >>> bb 5: >>> MEM[tab_1(D) + 8] = _2; >>> >>> Although in both cases, we aren't hoisting stores, the issues with >forwprop >>> for this case seems to be the folding of >>> *_1 = _2 >>> into >>> MEM[tab_1(D) + 8] = _2 ? >> >> This isn't an issue, right? IIUC, tab_1(D) used all over the loop >> thus propagating _1 using (tab_1(D) + 8) actually removes one live >> range. >> >>> >>> Disabling folding to mem_ref[base + offset] in forwprop "works" in >the >>> sense it created same set of hoistings as without forwprop, however >it >>> still results in additional spills (albeit different registers). >>> >>> That's because forwprop seems to be increasing live range of >>> prephitmp_217 by substituting >>> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + >1). >> Hmm, it's hard to discuss private benchmarks, not sure which dump >> shall I find prephitmp_221/prephitmp_217 stuff. >> >>> On the other hand, Bin pointed out to me in private that forwprop >also >>> helps to restrict register pressure by propagating "tab + const_int" >>> for same test-case. >>> >>> So I am not really sure if there's an easier fix than having >>> heuristics for estimating register pressure at TREE level ? I would >be >> Easy fix, maybe not. OTOH, I am more convinced passes like >> forwprop/sink/hoisting can be improved by taking live range into >> consideration. Specifically, to direct such passes when moving code >> around different basic blocks, because inter-block register pressure >> is hard to resolve afterwards. >> >> As suggested by Jeff and Richi, I guess the first step would be doing >> experiments, collecting more benchmark data for reordering sink >before >> pre? It enables code sink as well as decreases register pressure in >> the original reduced cases IIRC. >We might even consider re-evaluating Bernd's work on what is >effectively >a gimple scheduler to minimize register pressure. Sure. The main issue here I see is with the interaction with TER which we unfortunately still rely on. Enough GIMPLE instruction selection might help to get rid of the remaining pieces... >Or we could look to extend your work into a generalized pressure >reducing pass that we could run near the gimple/rtl border. > >The final possibility would be Click's algorithm from '95 adjusted to >just do pressure reduction. > >jeff From paulkoning@comcast.net Fri May 25 18:05:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Fri, 25 May 2018 18:05:00 -0000 Subject: not computable at load time Message-ID: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> One of my testsuite failures for the pdp11 back end is gcc.c-torture/compile/930326-1.c which is: struct { char a, b, f[3]; } s; long i = s.f-&s.b; It fails with "error: initializer element is not computable at load time". I don't understand why because it seems to be a perfectly reasonable compile time constant; "load time" doesn't enter into the picture that I can see. If I replace "long" by "short" it works correctly. So presumably it has something to do with the fact that Pmode == HImode. But how that translates into this failure I don't know. paul From law@redhat.com Fri May 25 19:26:00 2018 From: law@redhat.com (Jeff Law) Date: Fri, 25 May 2018 19:26:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: <3D61699E-73EC-4936-9FF3-494DF6816616@suse.de> References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> <3D61699E-73EC-4936-9FF3-494DF6816616@suse.de> Message-ID: On 05/25/2018 11:54 AM, Richard Biener wrote: > On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: >> On 05/25/2018 03:49 AM, Bin.Cheng wrote: >>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni >>> wrote: >>>> On 23 May 2018 at 18:37, Jeff Law wrote: >>>>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>>>>> On 23 May 2018 at 13:58, Richard Biener wrote: >>>>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I am trying to work on PR80155, which exposes a problem with >> code >>>>>>>> hoisting and register pressure on a leading embedded benchmark >> for ARM >>>>>>>> cortex-m7, where code-hoisting causes an extra register spill. >>>>>>>> >>>>>>>> I have attached two test-cases which (hopefully) are >> representative of >>>>>>>> the original test-case. >>>>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to >> the >>>>>>>> original test-case and trans_dfa_2.c is hand-reduced version of >>>>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>>>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>>>>>>> The test-cases in the PR are probably not relevant. >>>>>>>> >>>>>>>> Initially I thought the spill was happening because of "too many >>>>>>>> hoistings" taking place in original test-case thus increasing >> the >>>>>>>> register pressure, but it seems the spill is possibly caused >> because >>>>>>>> expression gets hoisted out of a block that is on loop exit. >>>>>>>> >>>>>>>> For example, the following hoistings take place with >> trans_dfa_2.c: >>>>>>>> >>>>>>>> (1) Inserting expression in block 4 for code hoisting: >>>>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>>>>>> >>>>>>>> (2) Inserting expression in block 4 for code hoisting: >> {plus_expr,_4,1} (0006) >>>>>>>> >>>>>>>> (3) Inserting expression in block 4 for code hoisting: >>>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>>> >>>>>>>> (4) Inserting expression in block 3 for code hoisting: >>>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>>> >>>>>>>> The issue seems to be hoisting of (*tab + 1) which consists of >> first >>>>>>>> two hoistings in block 4 >>>>>>>> from blocks 5 and 9, which causes the extra spill. I verified >> that by >>>>>>>> disabling hoisting into block 4, >>>>>>>> which resulted in no extra spills. >>>>>>>> >>>>>>>> I wonder if that's because the expression (*tab + 1) is getting >>>>>>>> hoisted from blocks 5 and 9, >>>>>>>> which are on loop exit ? So the expression that was previously >>>>>>>> computed in a block on loop exit, gets hoisted outside that >> block >>>>>>>> which possibly makes the allocator more defensive ? Similarly >>>>>>>> disabling hoisting of expressions which appeared in blocks on >> loop >>>>>>>> exit in original test-case prevented the extra spill. The other >>>>>>>> hoistings didn't seem to matter. >>>>>>> >>>>>>> I think that's simply co-incidence. The only thing that makes >>>>>>> a block that also exits from the loop special is that an >>>>>>> expression could be sunk out of the loop and hoisting (commoning >>>>>>> with another path) could prevent that. But that isn't what is >>>>>>> happening here and it would be a pass ordering issue as >>>>>>> the sinking pass runs only after hoisting (no idea why exactly >>>>>>> but I guess there are cases where we want to prefer CSE over >>>>>>> sinking). So you could try if re-ordering PRE and sinking helps >>>>>>> your testcase. >>>>>> Thanks for the suggestions. Placing sink pass before PRE works >>>>>> for both these test-cases! Sadly it still causes the spill for the >> benchmark -:( >>>>>> I will try to create a better approximation of the original >> test-case. >>>>>>> >>>>>>> What I do see is a missed opportunity to merge the successors >>>>>>> of BB 4. After PRE we have >>>>>>> >>>>>>> [local count: 159303558]: >>>>>>> : >>>>>>> pretmp_123 = *tab_37(D); >>>>>>> _87 = pretmp_123 + 1; >>>>>>> if (c_36 == 65) >>>>>>> goto ; [34.00%] >>>>>>> else >>>>>>> goto ; [66.00%] >>>>>>> >>>>>>> [local count: 54163210]: >>>>>>> *tab_37(D) = _87; >>>>>>> _96 = MEM[(char *)s_57 + 1B]; >>>>>>> if (_96 != 0) >>>>>>> goto ; [89.00%] >>>>>>> else >>>>>>> goto ; [11.00%] >>>>>>> >>>>>>> [local count: 105140348]: >>>>>>> *tab_37(D) = _87; >>>>>>> _56 = MEM[(char *)s_57 + 1B]; >>>>>>> if (_56 != 0) >>>>>>> goto ; [89.00%] >>>>>>> else >>>>>>> goto ; [11.00%] >>>>>>> >>>>>>> here at least the stores and loads can be hoisted. Note this >>>>>>> may also point at the real issue of the code hoisting which is >>>>>>> tearing apart the RMW operation? >>>>>> Indeed, this possibility seems much more likely than block being >> on loop exit. >>>>>> I will try to "hardcode" the load/store hoists into block 4 for >> this >>>>>> specific test-case to check >>>>>> if that prevents the spill. >>>>> Even if it prevents the spill in this case, it's likely a good >> thing to >>>>> do. The statements prior to the conditional in bb5 and bb8 should >> be >>>>> hoisted, leaving bb5 and bb8 with just their conditionals. >>>> Hi, >>>> It seems disabling forwprop somehow works for causing no extra >> spills >>>> on the original test-case. >>>> >>>> For instance, >>>> Hoisting without forwprop: >>>> >>>> bb 3: >>>> _1 = tab_1(D) + 8 >>>> pretmp_268 = MEM[tab_1(D) + 8B]; >>>> _2 = pretmp_268 + 1; >>>> goto or >>>> >>>> bb 4: >>>> *_1 = _ 2 >>>> >>>> bb 5: >>>> *_1 = _2 >>>> >>>> Hoisting with forwprop: >>>> >>>> bb 3: >>>> pretmp_164 = MEM[tab_1(D) + 8B]; >>>> _2 = pretmp_164 + 1 >>>> goto or >>>> >>>> bb 4: >>>> MEM[tab_1(D) + 8] = _2; >>>> >>>> bb 5: >>>> MEM[tab_1(D) + 8] = _2; >>>> >>>> Although in both cases, we aren't hoisting stores, the issues with >> forwprop >>>> for this case seems to be the folding of >>>> *_1 = _2 >>>> into >>>> MEM[tab_1(D) + 8] = _2 ? >>> >>> This isn't an issue, right? IIUC, tab_1(D) used all over the loop >>> thus propagating _1 using (tab_1(D) + 8) actually removes one live >>> range. >>> >>>> >>>> Disabling folding to mem_ref[base + offset] in forwprop "works" in >> the >>>> sense it created same set of hoistings as without forwprop, however >> it >>>> still results in additional spills (albeit different registers). >>>> >>>> That's because forwprop seems to be increasing live range of >>>> prephitmp_217 by substituting >>>> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + >> 1). >>> Hmm, it's hard to discuss private benchmarks, not sure which dump >>> shall I find prephitmp_221/prephitmp_217 stuff. >>> >>>> On the other hand, Bin pointed out to me in private that forwprop >> also >>>> helps to restrict register pressure by propagating "tab + const_int" >>>> for same test-case. >>>> >>>> So I am not really sure if there's an easier fix than having >>>> heuristics for estimating register pressure at TREE level ? I would >> be >>> Easy fix, maybe not. OTOH, I am more convinced passes like >>> forwprop/sink/hoisting can be improved by taking live range into >>> consideration. Specifically, to direct such passes when moving code >>> around different basic blocks, because inter-block register pressure >>> is hard to resolve afterwards. >>> >>> As suggested by Jeff and Richi, I guess the first step would be doing >>> experiments, collecting more benchmark data for reordering sink >> before >>> pre? It enables code sink as well as decreases register pressure in >>> the original reduced cases IIRC. >> We might even consider re-evaluating Bernd's work on what is >> effectively >> a gimple scheduler to minimize register pressure. > > Sure. The main issue here I see is with the interaction with TER which we unfortunately still rely on. Enough GIMPLE instruction selection might help to get rid of the remaining pieces... I really wonder how bad it would be to walk over expr.c and change the expanders to be able to walk SSA_NAME_DEF_STMT to potentially get at the more complex statements rather than relying on TER. That's really all TER is supposed to be doing anyway. Jeff From msebor@gmail.com Fri May 25 20:16:00 2018 From: msebor@gmail.com (Martin Sebor) Date: Fri, 25 May 2018 20:16:00 -0000 Subject: [PATCH] tighten up -Wclass-memaccess for ctors/dtors (PR 84851) Message-ID: <05e52e3d-9788-9113-85b5-bfeacf56424b@gmail.com> A fix for 84851 - missing -Wclass-memaccess for a memcpy in a copy ctor with a non-trivial member was implemented but disabled for GCC 8 but because it was late, with the expectation we would enable it for GCC 9. The attached removes the code that guards the full fix to enable it. Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: gcc-84851.diff Type: text/x-patch Size: 1480 bytes Desc: not available URL: From segher@kernel.crashing.org Fri May 25 22:25:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Fri, 25 May 2018 22:25:00 -0000 Subject: virtual-stack-vars reference not resolved in vregs In-Reply-To: <6976655.QsN8PLxYON@polaris> References: <815571FD-AE59-4D56-9AFA-73F605A4DCC0@comcast.net> <6976655.QsN8PLxYON@polaris> Message-ID: <20180525222529.GO17342@gate.crashing.org> On Fri, May 25, 2018 at 08:11:43AM +0200, Eric Botcazou wrote: > > Is this something the back end is responsible for getting right, for example > > via the machine description file? If so, any hints where to start? > > The SUBREG of MEM is invalid at this stage. >From rtl.texi: --- There are currently three supported types for the first operand of a @code{subreg}: @itemize @item pseudo registers This is the most common case. Most @code{subreg}s have pseudo @code{reg}s as their first operand. @item mem @code{subreg}s of @code{mem} were common in earlier versions of GCC and are still supported. During the reload pass these are replaced by plain @code{mem}s. On machines that do not do instruction scheduling, use of @code{subreg}s of @code{mem} are still used, but this is no longer recommended. Such @code{subreg}s are considered to be @code{register_operand}s rather than @code{memory_operand}s before and during reload. Because of this, the scheduling passes cannot properly schedule instructions with @code{subreg}s of @code{mem}, so for machines that do scheduling, @code{subreg}s of @code{mem} should never be used. To support this, the combine and recog passes have explicit code to inhibit the creation of @code{subreg}s of @code{mem} when @code{INSN_SCHEDULING} is defined. --- It would be very nice if we got rid of subreg-of-mem completely once and for all. The code following the comment /* In the general case, we expect virtual registers to appear only in operands, and then only as either bare registers or inside memories. */ in function.c:instantiate_virtual_regs_in_insn does not handle the subreg in this example instruction. Segher From sellcey@cavium.com Fri May 25 22:36:00 2018 From: sellcey@cavium.com (Steve Ellcey) Date: Fri, 25 May 2018 22:36:00 -0000 Subject: Why is REG_ALLOC_ORDER not defined on Aarch64 Message-ID: <1527287751.22014.45.camel@cavium.com> I was curious if there was any reason that REG_ALLOC_ORDER is not defined for Aarch64.????Has anyone tried this to see if it could help performance? ??It is defined for many other platforms. Steve Ellcey sellcey@cavium.com From pinskia@gmail.com Fri May 25 22:41:00 2018 From: pinskia@gmail.com (Andrew Pinski) Date: Fri, 25 May 2018 22:41:00 -0000 Subject: Why is REG_ALLOC_ORDER not defined on Aarch64 In-Reply-To: <1527287751.22014.45.camel@cavium.com> References: <1527287751.22014.45.camel@cavium.com> Message-ID: On Fri, May 25, 2018 at 3:35 PM, Steve Ellcey wrote: > I was curious if there was any reason that REG_ALLOC_ORDER is not > defined for Aarch64. Has anyone tried this to see if it could help > performance? It is defined for many other platforms. https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01815.html https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01822.html > > Steve Ellcey > sellcey@cavium.com From gccadmin@gcc.gnu.org Fri May 25 22:43:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Fri, 25 May 2018 22:43:00 -0000 Subject: gcc-8-20180525 is now available Message-ID: <20180525224237.126803.qmail@sourceware.org> Snapshot gcc-8-20180525 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/8-20180525/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch revision 260785 You'll find: gcc-8-20180525.tar.xz Complete GCC SHA256=96f117eaacacd8b31f527fcb5133bbf6b9efb4773c040e324664361a75ed6ebb SHA1=ebfd27eeffadb79da92386bc57fb9496996a18e0 Diffs from 8-20180518 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From rguenther@suse.de Sat May 26 06:10:00 2018 From: rguenther@suse.de (Richard Biener) Date: Sat, 26 May 2018 06:10:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> <3D61699E-73EC-4936-9FF3-494DF6816616@suse.de> Message-ID: On May 25, 2018 9:25:51 PM GMT+02:00, Jeff Law wrote: >On 05/25/2018 11:54 AM, Richard Biener wrote: >> On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law >wrote: >>> On 05/25/2018 03:49 AM, Bin.Cheng wrote: >>>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni >>>> wrote: >>>>> On 23 May 2018 at 18:37, Jeff Law wrote: >>>>>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>>>>>> On 23 May 2018 at 13:58, Richard Biener >wrote: >>>>>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> I am trying to work on PR80155, which exposes a problem with >>> code >>>>>>>>> hoisting and register pressure on a leading embedded benchmark >>> for ARM >>>>>>>>> cortex-m7, where code-hoisting causes an extra register spill. >>>>>>>>> >>>>>>>>> I have attached two test-cases which (hopefully) are >>> representative of >>>>>>>>> the original test-case. >>>>>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to >>> the >>>>>>>>> original test-case and trans_dfa_2.c is hand-reduced version >of >>>>>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>>>>>>> and one spill with trans_dfa_2.c due to lesser amount of >cases. >>>>>>>>> The test-cases in the PR are probably not relevant. >>>>>>>>> >>>>>>>>> Initially I thought the spill was happening because of "too >many >>>>>>>>> hoistings" taking place in original test-case thus increasing >>> the >>>>>>>>> register pressure, but it seems the spill is possibly caused >>> because >>>>>>>>> expression gets hoisted out of a block that is on loop exit. >>>>>>>>> >>>>>>>>> For example, the following hoistings take place with >>> trans_dfa_2.c: >>>>>>>>> >>>>>>>>> (1) Inserting expression in block 4 for code hoisting: >>>>>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>>>>>>> >>>>>>>>> (2) Inserting expression in block 4 for code hoisting: >>> {plus_expr,_4,1} (0006) >>>>>>>>> >>>>>>>>> (3) Inserting expression in block 4 for code hoisting: >>>>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>>>> >>>>>>>>> (4) Inserting expression in block 3 for code hoisting: >>>>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>>>> >>>>>>>>> The issue seems to be hoisting of (*tab + 1) which consists of >>> first >>>>>>>>> two hoistings in block 4 >>>>>>>>> from blocks 5 and 9, which causes the extra spill. I verified >>> that by >>>>>>>>> disabling hoisting into block 4, >>>>>>>>> which resulted in no extra spills. >>>>>>>>> >>>>>>>>> I wonder if that's because the expression (*tab + 1) is >getting >>>>>>>>> hoisted from blocks 5 and 9, >>>>>>>>> which are on loop exit ? So the expression that was previously >>>>>>>>> computed in a block on loop exit, gets hoisted outside that >>> block >>>>>>>>> which possibly makes the allocator more defensive ? Similarly >>>>>>>>> disabling hoisting of expressions which appeared in blocks on >>> loop >>>>>>>>> exit in original test-case prevented the extra spill. The >other >>>>>>>>> hoistings didn't seem to matter. >>>>>>>> >>>>>>>> I think that's simply co-incidence. The only thing that makes >>>>>>>> a block that also exits from the loop special is that an >>>>>>>> expression could be sunk out of the loop and hoisting >(commoning >>>>>>>> with another path) could prevent that. But that isn't what is >>>>>>>> happening here and it would be a pass ordering issue as >>>>>>>> the sinking pass runs only after hoisting (no idea why exactly >>>>>>>> but I guess there are cases where we want to prefer CSE over >>>>>>>> sinking). So you could try if re-ordering PRE and sinking >helps >>>>>>>> your testcase. >>>>>>> Thanks for the suggestions. Placing sink pass before PRE works >>>>>>> for both these test-cases! Sadly it still causes the spill for >the >>> benchmark -:( >>>>>>> I will try to create a better approximation of the original >>> test-case. >>>>>>>> >>>>>>>> What I do see is a missed opportunity to merge the successors >>>>>>>> of BB 4. After PRE we have >>>>>>>> >>>>>>>> [local count: 159303558]: >>>>>>>> : >>>>>>>> pretmp_123 = *tab_37(D); >>>>>>>> _87 = pretmp_123 + 1; >>>>>>>> if (c_36 == 65) >>>>>>>> goto ; [34.00%] >>>>>>>> else >>>>>>>> goto ; [66.00%] >>>>>>>> >>>>>>>> [local count: 54163210]: >>>>>>>> *tab_37(D) = _87; >>>>>>>> _96 = MEM[(char *)s_57 + 1B]; >>>>>>>> if (_96 != 0) >>>>>>>> goto ; [89.00%] >>>>>>>> else >>>>>>>> goto ; [11.00%] >>>>>>>> >>>>>>>> [local count: 105140348]: >>>>>>>> *tab_37(D) = _87; >>>>>>>> _56 = MEM[(char *)s_57 + 1B]; >>>>>>>> if (_56 != 0) >>>>>>>> goto ; [89.00%] >>>>>>>> else >>>>>>>> goto ; [11.00%] >>>>>>>> >>>>>>>> here at least the stores and loads can be hoisted. Note this >>>>>>>> may also point at the real issue of the code hoisting which is >>>>>>>> tearing apart the RMW operation? >>>>>>> Indeed, this possibility seems much more likely than block being >>> on loop exit. >>>>>>> I will try to "hardcode" the load/store hoists into block 4 for >>> this >>>>>>> specific test-case to check >>>>>>> if that prevents the spill. >>>>>> Even if it prevents the spill in this case, it's likely a good >>> thing to >>>>>> do. The statements prior to the conditional in bb5 and bb8 >should >>> be >>>>>> hoisted, leaving bb5 and bb8 with just their conditionals. >>>>> Hi, >>>>> It seems disabling forwprop somehow works for causing no extra >>> spills >>>>> on the original test-case. >>>>> >>>>> For instance, >>>>> Hoisting without forwprop: >>>>> >>>>> bb 3: >>>>> _1 = tab_1(D) + 8 >>>>> pretmp_268 = MEM[tab_1(D) + 8B]; >>>>> _2 = pretmp_268 + 1; >>>>> goto or >>>>> >>>>> bb 4: >>>>> *_1 = _ 2 >>>>> >>>>> bb 5: >>>>> *_1 = _2 >>>>> >>>>> Hoisting with forwprop: >>>>> >>>>> bb 3: >>>>> pretmp_164 = MEM[tab_1(D) + 8B]; >>>>> _2 = pretmp_164 + 1 >>>>> goto or >>>>> >>>>> bb 4: >>>>> MEM[tab_1(D) + 8] = _2; >>>>> >>>>> bb 5: >>>>> MEM[tab_1(D) + 8] = _2; >>>>> >>>>> Although in both cases, we aren't hoisting stores, the issues with >>> forwprop >>>>> for this case seems to be the folding of >>>>> *_1 = _2 >>>>> into >>>>> MEM[tab_1(D) + 8] = _2 ? >>>> >>>> This isn't an issue, right? IIUC, tab_1(D) used all over the loop >>>> thus propagating _1 using (tab_1(D) + 8) actually removes one live >>>> range. >>>> >>>>> >>>>> Disabling folding to mem_ref[base + offset] in forwprop "works" in >>> the >>>>> sense it created same set of hoistings as without forwprop, >however >>> it >>>>> still results in additional spills (albeit different registers). >>>>> >>>>> That's because forwprop seems to be increasing live range of >>>>> prephitmp_217 by substituting >>>>> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 >+ >>> 1). >>>> Hmm, it's hard to discuss private benchmarks, not sure which dump >>>> shall I find prephitmp_221/prephitmp_217 stuff. >>>> >>>>> On the other hand, Bin pointed out to me in private that forwprop >>> also >>>>> helps to restrict register pressure by propagating "tab + >const_int" >>>>> for same test-case. >>>>> >>>>> So I am not really sure if there's an easier fix than having >>>>> heuristics for estimating register pressure at TREE level ? I >would >>> be >>>> Easy fix, maybe not. OTOH, I am more convinced passes like >>>> forwprop/sink/hoisting can be improved by taking live range into >>>> consideration. Specifically, to direct such passes when moving >code >>>> around different basic blocks, because inter-block register >pressure >>>> is hard to resolve afterwards. >>>> >>>> As suggested by Jeff and Richi, I guess the first step would be >doing >>>> experiments, collecting more benchmark data for reordering sink >>> before >>>> pre? It enables code sink as well as decreases register pressure >in >>>> the original reduced cases IIRC. >>> We might even consider re-evaluating Bernd's work on what is >>> effectively >>> a gimple scheduler to minimize register pressure. >> >> Sure. The main issue here I see is with the interaction with TER >which we unfortunately still rely on. Enough GIMPLE instruction >selection might help to get rid of the remaining pieces... >I really wonder how bad it would be to walk over expr.c and change the >expanders to be able to walk SSA_NAME_DEF_STMT to potentially get at >the >more complex statements rather than relying on TER. But that's what they do... TER computes when this is valid and not break due to coalescing. >That's really all TER is supposed to be doing anyway. Yes. Last year I posted patches to apply the scheduling TER does on top of the above but it was difficult to dissect from the above... Maybe we need to try harder here... Richard. >Jeff From linux@carewolf.com Sat May 26 09:32:00 2018 From: linux@carewolf.com (Allan Sandfeld Jensen) Date: Sat, 26 May 2018 09:32:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os Message-ID: <2659301.XPQk3P0qmd@twilight> I brought this subject up earlier, and was told to suggest it again for gcc 9, so I have attached the preliminary changes. My studies have show that with generic x86-64 optimization it reduces binary size with around 0.5%, and when optimizing for x64 targets with SSE4 or better, it reduces binary size by 2-3% on average. The performance changes are negligible however*, and I haven't been able to detect changes in compile time big enough to penetrate general noise on my platform, but perhaps someone has a better setup for that? * I believe that is because it currently works best on non-optimized code, it is better at big basic blocks doing all kinds of things than tightly written inner loops. Anythhing else I should test or report? Best regards 'Allan diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index beba295bef5..05851229354 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -7612,6 +7612,7 @@ also turns on the following optimization flags: -fstore-merging @gol -fstrict-aliasing @gol -ftree-builtin-call-dce @gol +-ftree-slp-vectorize @gol -ftree-switch-conversion -ftree-tail-merge @gol -fcode-hoisting @gol -ftree-pre @gol @@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following optimization flags: -floop-interchange @gol -floop-unroll-and-jam @gol -fsplit-paths @gol --ftree-slp-vectorize @gol -fvect-cost-model @gol -ftree-partial-pre @gol -fpeel-loops @gol @@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is enabled by default at @item -ftree-slp-vectorize @opindex ftree-slp-vectorize Perform basic block vectorization on trees. This flag is enabled by default at -@option{-O3} and when @option{-ftree-vectorize} is enabled. +@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled. @item -fvect-cost-model=@var{model} @opindex fvect-cost-model diff --git a/gcc/opts.c b/gcc/opts.c index 33efcc0d6e7..11027b847e8 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -523,6 +523,7 @@ static const struct default_options default_options_table[] = { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 }, + { OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, /* -O3 optimizations. */ { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, @@ -539,7 +540,6 @@ static const struct default_options default_options_table[] = { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, - { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC }, { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 }, From richard.sandiford@linaro.org Sat May 26 09:39:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Sat, 26 May 2018 09:39:00 -0000 Subject: Why is REG_ALLOC_ORDER not defined on Aarch64 In-Reply-To: (Andrew Pinski's message of "Fri, 25 May 2018 15:41:14 -0700") References: <1527287751.22014.45.camel@cavium.com> Message-ID: <87fu2ebvt6.fsf@linaro.org> Andrew Pinski writes: > On Fri, May 25, 2018 at 3:35 PM, Steve Ellcey wrote: >> I was curious if there was any reason that REG_ALLOC_ORDER is not >> defined for Aarch64. Has anyone tried this to see if it could help >> performance? It is defined for many other platforms. > > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01815.html > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01822.html It looks like the immediate reason for reverting was the effect of listing the argument registers in reverse order. I wonder how much that actually helps with IRA and LRA? They track per-register costs, and would be able to increase the cost of a pseudo that conflicts with a hard-register call argument. It just felt like it might have been a "best practice" idea passed down from the old local.c and global.c days. Thanks, Richard From richard.sandiford@linaro.org Sat May 26 10:09:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Sat, 26 May 2018 10:09:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <1527184223.22014.13.camel@cavium.com> (Steve Ellcey's message of "Thu, 24 May 2018 10:50:23 -0700") References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> <1526491802.29509.19.camel@cavium.com> <87a7sznw5c.fsf@linaro.org> <1527184223.22014.13.camel@cavium.com> Message-ID: <87a7smbuej.fsf@linaro.org> Steve Ellcey writes: > On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote: >>? >> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way >> of saying that an rtl instruction preserves the low part of a >> register but clobbers the high part.??We would need something like >> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. >> >> Another approach would be to piggy-back on the -fipa-ra >> infrastructure >> and record that vector PCS functions only clobber Q0-Q7.??If -fipa-ra >> knows that a function doesn't clobber Q8-Q15 then that should >> override >> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.??(I'm not sure whether it does >> in practice, but it should :-)??And if it doesn't that's a bug that's >> worth fixing for its own sake.) >> >> Thanks, >> Richard > > Alan, > > I have been looking at your CLOBBER_HIGH patches to see if they > might be helpful in implementing the ARM SIMD Vector ABI in GCC. > I have also been looking at the -fipa-ra flag and how it works. > > I was wondering if you considered using the ipa-ra infrastructure > for the SVE work that you are currently trying to support with? > the CLOBBER_HIGH macro? > > My current thought for the ABI work is to mark all the floating > point / vector registers as caller saved (the lower half of V8-V15 > are currently callee saved) and remove > TARGET_HARD_REGNO_CALL_PART_CLOBBERED. > This should work but would be inefficient. > > The next step would be to split get_call_reg_set_usage up into > two functions so that I don't have to pass in a default set of > registers.??One function would return call_used_reg_set by > default (but could return a smaller set if it had actual used > register information) and the other would return regs_invalidated > by_call by default (but could also return a smaller set). > > Next I would add a 'largest mode used' array to call_cgraph_rtl_info > structure in addition to the current function_used_regs register > set. > > Then I could turn the get_call_reg_set_usage replacement functions > into target specific functions and with the information in the > call_cgraph_rtl_info structure and any simd attribute information on > a function I could modify what registers are really being used/invalidated > without being saved. > > If the called function only uses the bottom half of a register it would not > be marked as used/invalidated.??If it uses the entire register and the > function is not marked as simd, then the register would marked as > used/invalidated.??If the function was marked as simd the register would not > be marked because a simd function would save both the upper and lower halves > of a callee saved register (whereas a non simd function would only save the > lower half). > > Does this sound like something that could be used in place of your? > CLOBBER_HIGH patch? One of the advantages of CLOBBER_HIGH is that it can be attached to arbitrary instructions, not just calls. The motivating example was tlsdesc_small_, which isn't treated as a call but as a normal instruction. (And I don't think we want to change that, since it's much easier for rtl optimisers to deal with normal instructions compared to calls. In general a call is part of a longer sequence of instructions that includes setting up arguments, etc.) The other use case (not implemented in the posted patches) would be to represent the effect of syscalls, which clobber the "SVE part" of all vector registers. In that case the clobber would need to be attached to an inline asm insn. On the wider point about changing the way call clobber information is represented: I agree it would be good to generalise what we have now. But if possible I think we should avoid target hooks that take a specific call, and instead make it an inherent part of the call insn itself, much like CALL_INSN_FUNCTION_USAGE is now. E.g. we could add a field that points to an ABI description, with -fipa-ra effectively creating ad-hoc ABIs. That ABI description could start out with whatever we think is relevant now and could grow over time. Thanks, Richard From richard.guenther@gmail.com Sat May 26 10:36:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Sat, 26 May 2018 10:36:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <2659301.XPQk3P0qmd@twilight> References: <2659301.XPQk3P0qmd@twilight> Message-ID: <5A85555D-FF52-4666-88EE-FFBD8C498294@gmail.com> On May 26, 2018 11:32:29 AM GMT+02:00, Allan Sandfeld Jensen wrote: >I brought this subject up earlier, and was told to suggest it again for >gcc 9, >so I have attached the preliminary changes. > >My studies have show that with generic x86-64 optimization it reduces >binary >size with around 0.5%, and when optimizing for x64 targets with SSE4 or > >better, it reduces binary size by 2-3% on average. The performance >changes are >negligible however*, and I haven't been able to detect changes in >compile time >big enough to penetrate general noise on my platform, but perhaps >someone has >a better setup for that? > >* I believe that is because it currently works best on non-optimized >code, it >is better at big basic blocks doing all kinds of things than tightly >written >inner loops. > >Anythhing else I should test or report? If you have access to SPEC CPU I'd like to see performance, size and compile-time effects of the patch on that. Embedded folks may want to rhn their favorite benchmark and report results as well. Richard. >Best regards >'Allan > > >diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >index beba295bef5..05851229354 100644 >--- a/gcc/doc/invoke.texi >+++ b/gcc/doc/invoke.texi >@@ -7612,6 +7612,7 @@ also turns on the following optimization flags: > -fstore-merging @gol > -fstrict-aliasing @gol > -ftree-builtin-call-dce @gol >+-ftree-slp-vectorize @gol > -ftree-switch-conversion -ftree-tail-merge @gol > -fcode-hoisting @gol > -ftree-pre @gol >@@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following >optimization flags: > -floop-interchange @gol > -floop-unroll-and-jam @gol > -fsplit-paths @gol >--ftree-slp-vectorize @gol > -fvect-cost-model @gol > -ftree-partial-pre @gol > -fpeel-loops @gol >@@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is > >enabled by default at > @item -ftree-slp-vectorize > @opindex ftree-slp-vectorize >Perform basic block vectorization on trees. This flag is enabled by >default >at >-@option{-O3} and when @option{-ftree-vectorize} is enabled. >+@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled. > > @item -fvect-cost-model=@var{model} > @opindex fvect-cost-model >diff --git a/gcc/opts.c b/gcc/opts.c >index 33efcc0d6e7..11027b847e8 100644 >--- a/gcc/opts.c >+++ b/gcc/opts.c >@@ -523,6 +523,7 @@ static const struct default_options >default_options_table[] = > { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, > { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, > { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 }, >+ { OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, > > /* -O3 optimizations. */ > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, >@@ -539,7 +540,6 @@ static const struct default_options >default_options_table[] = > { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, >- { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, >{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, >VECT_COST_MODEL_DYNAMIC >}, > { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 }, From fw@deneb.enyo.de Sat May 26 12:25:00 2018 From: fw@deneb.enyo.de (Florian Weimer) Date: Sat, 26 May 2018 12:25:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <2659301.XPQk3P0qmd@twilight> (Allan Sandfeld Jensen's message of "Sat, 26 May 2018 11:32:29 +0200") References: <2659301.XPQk3P0qmd@twilight> Message-ID: <87h8mu1u61.fsf@mid.deneb.enyo.de> * Allan Sandfeld Jensen: > Anythhing else I should test or report? Interaction with -mstackrealign on i386, where it is required for system libraries to support applications which use the legacy ABI without stack alignment if you compile with -msse2 or -march=x86-64 -mtune=generic (and -mfpmath=sse). From sebastian.huber@embedded-brains.de Sat May 26 13:04:00 2018 From: sebastian.huber@embedded-brains.de (Sebastian Huber) Date: Sat, 26 May 2018 13:04:00 -0000 Subject: RISC-V ELF multilibs Message-ID: <233244769.189066.1527339877501.JavaMail.zimbra@embedded-brains.de> Hello, I built a riscv64-rtems5 GCC (it uses gcc/config/riscv/t-elf-multilib). The following multilibs are built: riscv64-rtems5-gcc -print-multi-lib .; rv32i/ilp32;@march=rv32i@mabi=ilp32 rv32im/ilp32;@march=rv32im@mabi=ilp32 rv32iac/ilp32;@march=rv32iac@mabi=ilp32 rv32imac/ilp32;@march=rv32imac@mabi=ilp32 rv32imafc/ilp32f;@march=rv32imafc@mabi=ilp32f rv64imac/lp64;@march=rv64imac@mabi=lp64 rv64imafdc/lp64d;@march=rv64imafdc@mabi=lp64d If I print out the builtin defines and search paths for the default settings and the -march=rv64imafdc and compare the results I get: riscv64-rtems5-gcc -E -P -v -dD empty.c > def.txt 2>&1 riscv64-rtems5-gcc -E -P -v -dD empty.c -march=rv64imafdc > rv64imafdc.txt 2>&1 diff -u def.txt rv64imafdc.txt --- def.txt 2018-05-26 14:53:26.277760090 +0200 +++ rv64imafdc.txt 2018-05-26 14:53:47.705638409 +0200 @@ -4,8 +4,8 @@ Configured with: ../gcc-7.3.0/configure --prefix=/opt/rtems/5 --bindir=/opt/rtems/5/bin --exec_prefix=/opt/rtems/5 --includedir=/opt/rtems/5/include --libdir=/opt/rtems/5/lib --libexecdir=/opt/rtems/5/libexec --mandir=/opt/rtems/5/share/man --infodir=/opt/rtems/5/share/info --datadir=/opt/rtems/5/share --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=riscv64-rtems5 --disable-libstdcxx-pch --with-gnu-as --with-gnu-ld --verbose --with-newlib --disable-nls --without-included-gettext --disable-win32-registry --enable-version-specific-runtime-libs --disable-lto --enable-newlib-io-c99-formats --enable-newlib-iconv --enable-newlib-iconv-encodings=big5,cp775,cp850,cp852,cp855,cp866,euc_jp,euc_kr,euc_tw,iso_8859_1,iso_8859_10,iso_8859_11,iso_8859_13,iso_8859_14,iso_8859_15,iso_8859_2,iso_8859_3,iso_8859_4,iso_8859_5,iso_8859_6,iso_8859_7,iso_8859_8,iso_8859_9,iso_ir_111,koi8_r,koi8_ru,koi8_u,koi8_uni,ucs_2,ucs_2_internal,ucs_2be,ucs_2le,ucs_4,ucs_4_internal,ucs_4be,ucs_4le,us_ascii,utf_16,utf_16be,utf_16le,utf_8,win_1250,win_1251,win_1252,win_1253,win_1254,win_1255,win_1256,win_1257,win_1258 --enable-threads --disable-plugin --enable-libgomp --enable-languages=c,c++,ada Thread model: rtems gcc version 7.3.0 20180125 (RTEMS 5, RSB a3a6c34c150a357e57769a26a460c475e188438f, Newlib 3.0.0) (GCC) -COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64gc' '-mabi=lp64d' - /opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/cc1 -E -quiet -v -P -imultilib rv64imafdc/lp64d empty.c -march=rv64gc -mabi=lp64d -dD +COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64imafdc' '-mabi=lp64d' + /opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/cc1 -E -quiet -v -P -imultilib rv64imafdc/lp64d empty.c -march=rv64imafdc -mabi=lp64d -dD ignoring nonexistent directory "/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/sys-include" #include "..." search starts here: #include <...> search starts here: @@ -338,4 +338,4 @@ #define __ELF__ 1 COMPILER_PATH=/opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/libexec/gcc/riscv64-rtems5/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/lib/gcc/riscv64-rtems5/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/bin/ LIBRARY_PATH=/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/rv64imafdc/lp64d/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/lib/rv64imafdc/lp64d/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/lib/:/lib/:/usr/lib/ -COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64gc' '-mabi=lp64d' +COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64imafdc' '-mabi=lp64d' This looks pretty much the same and the documentation says that G == IMAFD. Why is the default multilib and a variant identical? Most variants include the C extension. Would it be possible to add -march=rv32g and -march=rv64g variants? -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 Diese Nachricht ist keine gesch?ftliche Mitteilung im Sinne des EHUG. From amker.cheng@gmail.com Sat May 26 18:07:00 2018 From: amker.cheng@gmail.com (Bin.Cheng) Date: Sat, 26 May 2018 18:07:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: <3D61699E-73EC-4936-9FF3-494DF6816616@suse.de> References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> <3D61699E-73EC-4936-9FF3-494DF6816616@suse.de> Message-ID: On Fri, May 25, 2018 at 5:54 PM, Richard Biener wrote: > On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: >>On 05/25/2018 03:49 AM, Bin.Cheng wrote: >>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni >>> wrote: >>>> On 23 May 2018 at 18:37, Jeff Law wrote: >>>>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>>>>> On 23 May 2018 at 13:58, Richard Biener wrote: >>>>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I am trying to work on PR80155, which exposes a problem with >>code >>>>>>>> hoisting and register pressure on a leading embedded benchmark >>for ARM >>>>>>>> cortex-m7, where code-hoisting causes an extra register spill. >>>>>>>> >>>>>>>> I have attached two test-cases which (hopefully) are >>representative of >>>>>>>> the original test-case. >>>>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to >>the >>>>>>>> original test-case and trans_dfa_2.c is hand-reduced version of >>>>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>>>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>>>>>>> The test-cases in the PR are probably not relevant. >>>>>>>> >>>>>>>> Initially I thought the spill was happening because of "too many >>>>>>>> hoistings" taking place in original test-case thus increasing >>the >>>>>>>> register pressure, but it seems the spill is possibly caused >>because >>>>>>>> expression gets hoisted out of a block that is on loop exit. >>>>>>>> >>>>>>>> For example, the following hoistings take place with >>trans_dfa_2.c: >>>>>>>> >>>>>>>> (1) Inserting expression in block 4 for code hoisting: >>>>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>>>>>> >>>>>>>> (2) Inserting expression in block 4 for code hoisting: >>{plus_expr,_4,1} (0006) >>>>>>>> >>>>>>>> (3) Inserting expression in block 4 for code hoisting: >>>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>>> >>>>>>>> (4) Inserting expression in block 3 for code hoisting: >>>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>>> >>>>>>>> The issue seems to be hoisting of (*tab + 1) which consists of >>first >>>>>>>> two hoistings in block 4 >>>>>>>> from blocks 5 and 9, which causes the extra spill. I verified >>that by >>>>>>>> disabling hoisting into block 4, >>>>>>>> which resulted in no extra spills. >>>>>>>> >>>>>>>> I wonder if that's because the expression (*tab + 1) is getting >>>>>>>> hoisted from blocks 5 and 9, >>>>>>>> which are on loop exit ? So the expression that was previously >>>>>>>> computed in a block on loop exit, gets hoisted outside that >>block >>>>>>>> which possibly makes the allocator more defensive ? Similarly >>>>>>>> disabling hoisting of expressions which appeared in blocks on >>loop >>>>>>>> exit in original test-case prevented the extra spill. The other >>>>>>>> hoistings didn't seem to matter. >>>>>>> >>>>>>> I think that's simply co-incidence. The only thing that makes >>>>>>> a block that also exits from the loop special is that an >>>>>>> expression could be sunk out of the loop and hoisting (commoning >>>>>>> with another path) could prevent that. But that isn't what is >>>>>>> happening here and it would be a pass ordering issue as >>>>>>> the sinking pass runs only after hoisting (no idea why exactly >>>>>>> but I guess there are cases where we want to prefer CSE over >>>>>>> sinking). So you could try if re-ordering PRE and sinking helps >>>>>>> your testcase. >>>>>> Thanks for the suggestions. Placing sink pass before PRE works >>>>>> for both these test-cases! Sadly it still causes the spill for the >>benchmark -:( >>>>>> I will try to create a better approximation of the original >>test-case. >>>>>>> >>>>>>> What I do see is a missed opportunity to merge the successors >>>>>>> of BB 4. After PRE we have >>>>>>> >>>>>>> [local count: 159303558]: >>>>>>> : >>>>>>> pretmp_123 = *tab_37(D); >>>>>>> _87 = pretmp_123 + 1; >>>>>>> if (c_36 == 65) >>>>>>> goto ; [34.00%] >>>>>>> else >>>>>>> goto ; [66.00%] >>>>>>> >>>>>>> [local count: 54163210]: >>>>>>> *tab_37(D) = _87; >>>>>>> _96 = MEM[(char *)s_57 + 1B]; >>>>>>> if (_96 != 0) >>>>>>> goto ; [89.00%] >>>>>>> else >>>>>>> goto ; [11.00%] >>>>>>> >>>>>>> [local count: 105140348]: >>>>>>> *tab_37(D) = _87; >>>>>>> _56 = MEM[(char *)s_57 + 1B]; >>>>>>> if (_56 != 0) >>>>>>> goto ; [89.00%] >>>>>>> else >>>>>>> goto ; [11.00%] >>>>>>> >>>>>>> here at least the stores and loads can be hoisted. Note this >>>>>>> may also point at the real issue of the code hoisting which is >>>>>>> tearing apart the RMW operation? >>>>>> Indeed, this possibility seems much more likely than block being >>on loop exit. >>>>>> I will try to "hardcode" the load/store hoists into block 4 for >>this >>>>>> specific test-case to check >>>>>> if that prevents the spill. >>>>> Even if it prevents the spill in this case, it's likely a good >>thing to >>>>> do. The statements prior to the conditional in bb5 and bb8 should >>be >>>>> hoisted, leaving bb5 and bb8 with just their conditionals. >>>> Hi, >>>> It seems disabling forwprop somehow works for causing no extra >>spills >>>> on the original test-case. >>>> >>>> For instance, >>>> Hoisting without forwprop: >>>> >>>> bb 3: >>>> _1 = tab_1(D) + 8 >>>> pretmp_268 = MEM[tab_1(D) + 8B]; >>>> _2 = pretmp_268 + 1; >>>> goto or >>>> >>>> bb 4: >>>> *_1 = _ 2 >>>> >>>> bb 5: >>>> *_1 = _2 >>>> >>>> Hoisting with forwprop: >>>> >>>> bb 3: >>>> pretmp_164 = MEM[tab_1(D) + 8B]; >>>> _2 = pretmp_164 + 1 >>>> goto or >>>> >>>> bb 4: >>>> MEM[tab_1(D) + 8] = _2; >>>> >>>> bb 5: >>>> MEM[tab_1(D) + 8] = _2; >>>> >>>> Although in both cases, we aren't hoisting stores, the issues with >>forwprop >>>> for this case seems to be the folding of >>>> *_1 = _2 >>>> into >>>> MEM[tab_1(D) + 8] = _2 ? >>> >>> This isn't an issue, right? IIUC, tab_1(D) used all over the loop >>> thus propagating _1 using (tab_1(D) + 8) actually removes one live >>> range. >>> >>>> >>>> Disabling folding to mem_ref[base + offset] in forwprop "works" in >>the >>>> sense it created same set of hoistings as without forwprop, however >>it >>>> still results in additional spills (albeit different registers). >>>> >>>> That's because forwprop seems to be increasing live range of >>>> prephitmp_217 by substituting >>>> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + >>1). >>> Hmm, it's hard to discuss private benchmarks, not sure which dump >>> shall I find prephitmp_221/prephitmp_217 stuff. >>> >>>> On the other hand, Bin pointed out to me in private that forwprop >>also >>>> helps to restrict register pressure by propagating "tab + const_int" >>>> for same test-case. >>>> >>>> So I am not really sure if there's an easier fix than having >>>> heuristics for estimating register pressure at TREE level ? I would >>be >>> Easy fix, maybe not. OTOH, I am more convinced passes like >>> forwprop/sink/hoisting can be improved by taking live range into >>> consideration. Specifically, to direct such passes when moving code >>> around different basic blocks, because inter-block register pressure >>> is hard to resolve afterwards. >>> >>> As suggested by Jeff and Richi, I guess the first step would be doing >>> experiments, collecting more benchmark data for reordering sink >>before >>> pre? It enables code sink as well as decreases register pressure in >>> the original reduced cases IIRC. >>We might even consider re-evaluating Bernd's work on what is >>effectively >>a gimple scheduler to minimize register pressure. Could you please point me to Bernd's work? Does it schedule around different basic blocks to minimize register pressue? Like in this case, various optimizers extends live range to different basic blocks. I once prototyped a single-block gimple scheduler, it isn't very useful IMHO. It will be huge amount work to take register pressure into consideration in various optimizers. OTOH, having a single inter-block live range shrink pass before expanding is great, but I think it would be at least equally difficult. > > Sure. The main issue here I see is with the interaction with TER which we unfortunately still rely on. Enough GIMPLE instruction selection might help to get rid of the remaining pieces... If the scheduler is designed to focus on inter-block live range shrink, it won't interfer with TER which only replaces single use in each basic block. Thanks, bin > >>Or we could look to extend your work into a generalized pressure >>reducing pass that we could run near the gimple/rtl border. >> >>The final possibility would be Click's algorithm from '95 adjusted to >>just do pressure reduction. >> >>jeff > From segher@kernel.crashing.org Sat May 26 22:05:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Sat, 26 May 2018 22:05:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <2659301.XPQk3P0qmd@twilight> References: <2659301.XPQk3P0qmd@twilight> Message-ID: <20180526220532.GS17342@gate.crashing.org> On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote: > I brought this subject up earlier, and was told to suggest it again for gcc 9, > so I have attached the preliminary changes. > > My studies have show that with generic x86-64 optimization it reduces binary > size with around 0.5%, and when optimizing for x64 targets with SSE4 or > better, it reduces binary size by 2-3% on average. The performance changes are > negligible however*, and I haven't been able to detect changes in compile time > big enough to penetrate general noise on my platform, but perhaps someone has > a better setup for that? > > * I believe that is because it currently works best on non-optimized code, it > is better at big basic blocks doing all kinds of things than tightly written > inner loops. > > Anythhing else I should test or report? What does it do on other architectures? Segher From segher@kernel.crashing.org Sat May 26 22:13:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Sat, 26 May 2018 22:13:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <87a7smbuej.fsf@linaro.org> References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> <1526491802.29509.19.camel@cavium.com> <87a7sznw5c.fsf@linaro.org> <1527184223.22014.13.camel@cavium.com> <87a7smbuej.fsf@linaro.org> Message-ID: <20180526221240.GT17342@gate.crashing.org> On Sat, May 26, 2018 at 11:09:24AM +0100, Richard Sandiford wrote: > On the wider point about changing the way call clobber information > is represented: I agree it would be good to generalise what we have > now. But if possible I think we should avoid target hooks that take > a specific call, and instead make it an inherent part of the call insn > itself, much like CALL_INSN_FUNCTION_USAGE is now. E.g. we could add > a field that points to an ABI description, with -fipa-ra effectively > creating ad-hoc ABIs. That ABI description could start out with > whatever we think is relevant now and could grow over time. Somewhat related: there still is PR68150 open for problems with HARD_REGNO_CALL_PART_CLOBBERED in postreload-gcse (it ignores it). Segher From linux@carewolf.com Sat May 26 23:25:00 2018 From: linux@carewolf.com (Allan Sandfeld Jensen) Date: Sat, 26 May 2018 23:25:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <20180526220532.GS17342@gate.crashing.org> References: <2659301.XPQk3P0qmd@twilight> <20180526220532.GS17342@gate.crashing.org> Message-ID: <20109354.MqXXt4BNHg@twilight> On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote: > On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote: > > I brought this subject up earlier, and was told to suggest it again for > > gcc 9, so I have attached the preliminary changes. > > > > My studies have show that with generic x86-64 optimization it reduces > > binary size with around 0.5%, and when optimizing for x64 targets with > > SSE4 or better, it reduces binary size by 2-3% on average. The > > performance changes are negligible however*, and I haven't been able to > > detect changes in compile time big enough to penetrate general noise on > > my platform, but perhaps someone has a better setup for that? > > > > * I believe that is because it currently works best on non-optimized code, > > it is better at big basic blocks doing all kinds of things than tightly > > written inner loops. > > > > Anythhing else I should test or report? > > What does it do on other architectures? > > I believe NEON would do the same as SSE4, but I can do a check. For architectures without SIMD it essentially does nothing. 'Allan From segher@kernel.crashing.org Sun May 27 01:23:00 2018 From: segher@kernel.crashing.org (Segher Boessenkool) Date: Sun, 27 May 2018 01:23:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <20109354.MqXXt4BNHg@twilight> References: <2659301.XPQk3P0qmd@twilight> <20180526220532.GS17342@gate.crashing.org> <20109354.MqXXt4BNHg@twilight> Message-ID: <20180527012336.GU17342@gate.crashing.org> On Sun, May 27, 2018 at 01:25:25AM +0200, Allan Sandfeld Jensen wrote: > On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote: > > On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote: > > > I brought this subject up earlier, and was told to suggest it again for > > > gcc 9, so I have attached the preliminary changes. > > > > > > My studies have show that with generic x86-64 optimization it reduces > > > binary size with around 0.5%, and when optimizing for x64 targets with > > > SSE4 or better, it reduces binary size by 2-3% on average. The > > > performance changes are negligible however*, and I haven't been able to > > > detect changes in compile time big enough to penetrate general noise on > > > my platform, but perhaps someone has a better setup for that? > > > > > > * I believe that is because it currently works best on non-optimized code, > > > it is better at big basic blocks doing all kinds of things than tightly > > > written inner loops. > > > > > > Anythhing else I should test or report? > > > > What does it do on other architectures? > > > I believe NEON would do the same as SSE4, but I can do a check. For > architectures without SIMD it essentially does nothing. Sorry, I wasn't clear. What does it do to performance on other architectures? Is it (almost) always a win (or neutral)? If not, it doesn't belong in -O2, not for the generic options at least. (We'll test it on Power soon, it's weekend now :-) ). Segher From richard.guenther@gmail.com Sun May 27 05:37:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Sun, 27 May 2018 05:37:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <20109354.MqXXt4BNHg@twilight> References: <2659301.XPQk3P0qmd@twilight> <20180526220532.GS17342@gate.crashing.org> <20109354.MqXXt4BNHg@twilight> Message-ID: <3562F630-62CE-4F03-B0FC-2D9B4014FA5C@gmail.com> On May 27, 2018 1:25:25 AM GMT+02:00, Allan Sandfeld Jensen wrote: >On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote: >> On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen >wrote: >> > I brought this subject up earlier, and was told to suggest it again >for >> > gcc 9, so I have attached the preliminary changes. >> > >> > My studies have show that with generic x86-64 optimization it >reduces >> > binary size with around 0.5%, and when optimizing for x64 targets >with >> > SSE4 or better, it reduces binary size by 2-3% on average. The >> > performance changes are negligible however*, and I haven't been >able to >> > detect changes in compile time big enough to penetrate general >noise on >> > my platform, but perhaps someone has a better setup for that? >> > >> > * I believe that is because it currently works best on >non-optimized code, >> > it is better at big basic blocks doing all kinds of things than >tightly >> > written inner loops. >> > >> > Anythhing else I should test or report? >> >> What does it do on other architectures? >> >> >I believe NEON would do the same as SSE4, but I can do a check. For >architectures without SIMD it essentially does nothing. By default it combines integer ops where possible into word_mode registers. So yes, almost nothing. Richard. >'Allan From pyidaaq3@163.com Sun May 27 08:59:00 2018 From: pyidaaq3@163.com (zazwjzew) Date: Sun, 27 May 2018 08:59:00 -0000 Subject: =?utf-8?B?55m8W+a8guemvuWFkSnngrnkvY4gLeKAlOKAlCDkv53igJTigJTnnJ8gICBW55S1MTM1KioyMjExKioxODI4?= Message-ID: 2018/5/2716:56 From linux@carewolf.com Sun May 27 10:26:00 2018 From: linux@carewolf.com (Allan Sandfeld Jensen) Date: Sun, 27 May 2018 10:26:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <20180527012336.GU17342@gate.crashing.org> References: <2659301.XPQk3P0qmd@twilight> <20109354.MqXXt4BNHg@twilight> <20180527012336.GU17342@gate.crashing.org> Message-ID: <9618682.omils26g26@twilight> On Sonntag, 27. Mai 2018 03:23:36 CEST Segher Boessenkool wrote: > On Sun, May 27, 2018 at 01:25:25AM +0200, Allan Sandfeld Jensen wrote: > > On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote: > > > On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote: > > > > I brought this subject up earlier, and was told to suggest it again > > > > for > > > > gcc 9, so I have attached the preliminary changes. > > > > > > > > My studies have show that with generic x86-64 optimization it reduces > > > > binary size with around 0.5%, and when optimizing for x64 targets with > > > > SSE4 or better, it reduces binary size by 2-3% on average. The > > > > performance changes are negligible however*, and I haven't been able > > > > to > > > > detect changes in compile time big enough to penetrate general noise > > > > on > > > > my platform, but perhaps someone has a better setup for that? > > > > > > > > * I believe that is because it currently works best on non-optimized > > > > code, > > > > it is better at big basic blocks doing all kinds of things than > > > > tightly > > > > written inner loops. > > > > > > > > Anythhing else I should test or report? > > > > > > What does it do on other architectures? > > > > I believe NEON would do the same as SSE4, but I can do a check. For > > architectures without SIMD it essentially does nothing. > > Sorry, I wasn't clear. What does it do to performance on other > architectures? Is it (almost) always a win (or neutral)? If not, it > doesn't belong in -O2, not for the generic options at least. > It shouldnt have any way of making slower code, so it is neutral or a win in performance, and similarly in code size, merged instructions means fewer instructions. I never found a benchmark where it really made a measurable difference in performance, but I found many large binaries such as Qt or Chromium, where it made the binaries a few percent smaller. Allan From law@redhat.com Sun May 27 15:59:00 2018 From: law@redhat.com (Jeff Law) Date: Sun, 27 May 2018 15:59:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <87a7smbuej.fsf@linaro.org> References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> <1526491802.29509.19.camel@cavium.com> <87a7sznw5c.fsf@linaro.org> <1527184223.22014.13.camel@cavium.com> <87a7smbuej.fsf@linaro.org> Message-ID: On 05/26/2018 04:09 AM, Richard Sandiford wrote: > Steve Ellcey writes: >> On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote: >>> ?? >>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way >>> of saying that an rtl instruction preserves the low part of a >>> register but clobbers the high part.????We would need something like >>> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. >>> >>> Another approach would be to piggy-back on the -fipa-ra >>> infrastructure >>> and record that vector PCS functions only clobber Q0-Q7.????If -fipa-ra >>> knows that a function doesn't clobber Q8-Q15 then that should >>> override >>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.????(I'm not sure whether it does >>> in practice, but it should :-)????And if it doesn't that's a bug that's >>> worth fixing for its own sake.) >>> >>> Thanks, >>> Richard >> >> Alan, >> >> I have been looking at your CLOBBER_HIGH patches to see if they >> might be helpful in implementing the ARM SIMD Vector ABI in GCC. >> I have also been looking at the -fipa-ra flag and how it works. >> >> I was wondering if you considered using the ipa-ra infrastructure >> for the SVE work that you are currently trying to support with?? >> the CLOBBER_HIGH macro? >> >> My current thought for the ABI work is to mark all the floating >> point / vector registers as caller saved (the lower half of V8-V15 >> are currently callee saved) and remove >> TARGET_HARD_REGNO_CALL_PART_CLOBBERED. >> This should work but would be inefficient. >> >> The next step would be to split get_call_reg_set_usage up into >> two functions so that I don't have to pass in a default set of >> registers.????One function would return call_used_reg_set by >> default (but could return a smaller set if it had actual used >> register information) and the other would return regs_invalidated >> by_call by default (but could also return a smaller set). >> >> Next I would add a 'largest mode used' array to call_cgraph_rtl_info >> structure in addition to the current function_used_regs register >> set. >> >> Then I could turn the get_call_reg_set_usage replacement functions >> into target specific functions and with the information in the >> call_cgraph_rtl_info structure and any simd attribute information on >> a function I could modify what registers are really being used/invalidated >> without being saved. >> >> If the called function only uses the bottom half of a register it would not >> be marked as used/invalidated.????If it uses the entire register and the >> function is not marked as simd, then the register would marked as >> used/invalidated.????If the function was marked as simd the register would not >> be marked because a simd function would save both the upper and lower halves >> of a callee saved register (whereas a non simd function would only save the >> lower half). >> >> Does this sound like something that could be used in place of your?? >> CLOBBER_HIGH patch? > > One of the advantages of CLOBBER_HIGH is that it can be attached to > arbitrary instructions, not just calls. The motivating example was > tlsdesc_small_, which isn't treated as a call but as a normal > instruction. (And I don't think we want to change that, since it's much > easier for rtl optimisers to deal with normal instructions compared to > calls. In general a call is part of a longer sequence of instructions > that includes setting up arguments, etc.) Yea. I don't think we want to change tlsdesc*. Representing them as normal insns rather than calls seems reasonable to me. Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. When we left things I think we were trying to decide between CLOBBER_HIGH and clobbering the appropriate subreg. The problem with the latter is the dataflow we compute is inaccurate (overly pessimistic) so that'd have to be fixed. Jeff From paulkoning@comcast.net Sun May 27 17:09:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Sun, 27 May 2018 17:09:00 -0000 Subject: virtual-stack-vars reference not resolved in vregs In-Reply-To: <6976655.QsN8PLxYON@polaris> References: <815571FD-AE59-4D56-9AFA-73F605A4DCC0@comcast.net> <6976655.QsN8PLxYON@polaris> Message-ID: <31D2FA22-C91C-4C32-A82A-AFC1FB7D64DD@comcast.net> > On May 25, 2018, at 2:11 AM, Eric Botcazou wrote: > >> Is this something the back end is responsible for getting right, for example >> via the machine description file? If so, any hints where to start? > > The SUBREG of MEM is invalid at this stage. Thanks. That pointed me to the problem: the .md file contained a define_expand for truncsihi2, which is not useful given that the word length is 2. Deleting it cured the problem. paul From gccadmin@gcc.gnu.org Sun May 27 22:41:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Sun, 27 May 2018 22:41:00 -0000 Subject: gcc-9-20180527 is now available Message-ID: <20180527224054.93917.qmail@sourceware.org> Snapshot gcc-9-20180527 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/9-20180527/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 260810 You'll find: gcc-9-20180527.tar.xz Complete GCC SHA256=2dd4561d7288f1296b44683240b0d0371bb4a8e560bca3e147089f47d3e05e3e SHA1=a67f60c5b5b4bc2ec920fa3cc207f9fd441f0bae Diffs from 9-20180520 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From nicolas@lojas-virtua-is.top Mon May 28 01:48:00 2018 From: nicolas@lojas-virtua-is.top (Michel) Date: Mon, 28 May 2018 01:48:00 -0000 Subject: Res: Site Message-ID: Ol???, tudo bem ? Voc??? ainda tem interesse na cria??????o do seu site e loja virtual ? Att. http://www.e-assis.com.br/?c=gcc@gnu.org Tel: (11) 2378-7244 From richard.guenther@gmail.com Mon May 28 09:03:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Mon, 28 May 2018 09:03:00 -0000 Subject: not computable at load time In-Reply-To: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> References: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> Message-ID: On Fri, May 25, 2018 at 8:05 PM Paul Koning wrote: > One of my testsuite failures for the pdp11 back end is gcc.c-torture/compile/930326-1.c which is: > struct > { > char a, b, f[3]; > } s; > long i = s.f-&s.b; > It fails with "error: initializer element is not computable at load time". > I don't understand why because it seems to be a perfectly reasonable > compile time constant; "load time" doesn't enter into the picture that > I can see. It means there's no relocation that can express the result of 's.f - &s.b' and the frontend doesn't consider this a constant expression (likely because of the conversion). > If I replace "long" by "short" it works correctly. So presumably it has > something to do with the fact that Pmode == HImode. But how that translates > into this failure I don't know. > paul From richard.guenther@gmail.com Mon May 28 10:58:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Mon, 28 May 2018 10:58:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <5A85555D-FF52-4666-88EE-FFBD8C498294@gmail.com> References: <2659301.XPQk3P0qmd@twilight> <5A85555D-FF52-4666-88EE-FFBD8C498294@gmail.com> Message-ID: On Sat, May 26, 2018 at 12:36 PM Richard Biener wrote: > On May 26, 2018 11:32:29 AM GMT+02:00, Allan Sandfeld Jensen < linux@carewolf.com> wrote: > >I brought this subject up earlier, and was told to suggest it again for > >gcc 9, > >so I have attached the preliminary changes. > > > >My studies have show that with generic x86-64 optimization it reduces > >binary > >size with around 0.5%, and when optimizing for x64 targets with SSE4 or > > > >better, it reduces binary size by 2-3% on average. The performance > >changes are > >negligible however*, and I haven't been able to detect changes in > >compile time > >big enough to penetrate general noise on my platform, but perhaps > >someone has > >a better setup for that? > > > >* I believe that is because it currently works best on non-optimized > >code, it > >is better at big basic blocks doing all kinds of things than tightly > >written > >inner loops. > > > >Anythhing else I should test or report? > If you have access to SPEC CPU I'd like to see performance, size and compile-time effects of the patch on that. Embedded folks may want to rhn their favorite benchmark and report results as well. So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile and run and the compile-time effect where measurable (SPEC records on a second granularity) is within one second per benchmark apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). Performance-wise I notice significant slowdowns for SPEC FP and some for SPEC INT (I only did a train run sofar). I'll re-run with ref input now and will post those numbers. binary size numbers show an increase for 403.gcc, 433.milc 444.namd and otherwise decreases or no changes. The changes are in the sub-percentage area of course. Overall 12583 "BBs" are vectorized. I need to improve that reporting for multiple (non-)overlapping instances. I realize that combining -O2 with -march=haswell might not be what people do but I tried to increase the number of vectorized BBs. Richard. > Richard. > >Best regards > >'Allan > > > > > >diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > >index beba295bef5..05851229354 100644 > >--- a/gcc/doc/invoke.texi > >+++ b/gcc/doc/invoke.texi > >@@ -7612,6 +7612,7 @@ also turns on the following optimization flags: > > -fstore-merging @gol > > -fstrict-aliasing @gol > > -ftree-builtin-call-dce @gol > >+-ftree-slp-vectorize @gol > > -ftree-switch-conversion -ftree-tail-merge @gol > > -fcode-hoisting @gol > > -ftree-pre @gol > >@@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following > >optimization flags: > > -floop-interchange @gol > > -floop-unroll-and-jam @gol > > -fsplit-paths @gol > >--ftree-slp-vectorize @gol > > -fvect-cost-model @gol > > -ftree-partial-pre @gol > > -fpeel-loops @gol > >@@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is > > > >enabled by default at > > @item -ftree-slp-vectorize > > @opindex ftree-slp-vectorize > >Perform basic block vectorization on trees. This flag is enabled by > >default > >at > >-@option{-O3} and when @option{-ftree-vectorize} is enabled. > >+@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled. > > > > @item -fvect-cost-model=@var{model} > > @opindex fvect-cost-model > >diff --git a/gcc/opts.c b/gcc/opts.c > >index 33efcc0d6e7..11027b847e8 100644 > >--- a/gcc/opts.c > >+++ b/gcc/opts.c > >@@ -523,6 +523,7 @@ static const struct default_options > >default_options_table[] = > > { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, > > { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, > > { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 }, > >+ { OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, > > > > /* -O3 optimizations. */ > > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, > >@@ -539,7 +540,6 @@ static const struct default_options > >default_options_table[] = > > { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 }, > > { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, > > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, > >- { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, > >{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, > >VECT_COST_MODEL_DYNAMIC > >}, > > { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, > > { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 }, From sebastian.huber@embedded-brains.de Mon May 28 13:32:00 2018 From: sebastian.huber@embedded-brains.de (Sebastian Huber) Date: Mon, 28 May 2018 13:32:00 -0000 Subject: RISC-V problem with weak function references and -mcmodel=medany Message-ID: Hello, I try to build a 64-bit RISC-V tool chain for RTEMS. RTEMS doesn't use virtual memory. The reference chips for 64-bit RISC-V such as FU540-C000 locate the RAM at 0x8000_0000. This forces me to use -mcmodel=medany in 64-bit mode. The ctrbegin.o contains this code (via crtstuff.c): extern void *__deregister_frame_info (const void *) ???????? __attribute__ ((weak)); ... # 370 "libgcc/crtstuff.c" static void __attribute__((used)) __do_global_dtors_aux (void) { ? static _Bool completed; ? if (__builtin_expect (completed, 0)) ??? return; # 413 "libgcc/crtstuff.c" ? deregister_tm_clones (); # 423 "libgcc/crtstuff.c" ? if (__deregister_frame_info) ??? __deregister_frame_info (__EH_FRAME_BEGIN__); ? completed = 1; } Which is: ??? .text ??? .align??? 1 ??? .type??? __do_global_dtors_aux, @function __do_global_dtors_aux: ??? lbu??? a5,completed.3298 ??? bnez??? a5,.L22 ??? addi??? sp,sp,-16 ??? sd??? ra,8(sp) ??? call??? deregister_tm_clones ??? lla??? a5,__deregister_frame_info ??? beqz??? a5,.L17 ??? lla??? a0,__EH_FRAME_BEGIN__ ??? call??? __deregister_frame_info .L17: ??? ld??? ra,8(sp) ??? li??? a5,1 ??? sb??? a5,completed.3298,a4 ??? addi??? sp,sp,16 ??? jr??? ra .L22: ??? ret If I link an executable I get this: /opt/rtems/5/lib64/gcc/riscv64-rtems5/9.0.0/../../../../riscv64-rtems5/bin/ld: /opt/rtems/5/lib64/gcc/riscv64-rtems5/9.0.0/crtbegin.o: in function `.L0 ': crtstuff.c:(.text+0x72): relocation truncated to fit: R_RISCV_CALL against undefined symbol `__deregister_frame_info' I guess, that the resolution of the weak reference to the undefined symbol __deregister_frame_info somehow sets __deregister_frame_info to the absolute address 0 which is illegal in the following "call __deregister_frame_info"? Is this construct with weak references and a -mcmodel=medany supported on RISC-V at all? If I change crtstuff.c like this using weak function definitions diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c index 5e894455e16..770e3420c92 100644 --- a/libgcc/crtstuff.c +++ b/libgcc/crtstuff.c @@ -177,13 +177,24 @@ call_ ## FUNC (void)????????????????????????????????????? \ ?/* References to __register_frame_info and __deregister_frame_info should ??? be weak in this file if at all possible.? */ -extern void __register_frame_info (const void *, struct object *) -???????????????????????????????? TARGET_ATTRIBUTE_WEAK; +extern void __register_frame_info (const void *, struct object *) ; +TARGET_ATTRIBUTE_WEAK void __register_frame_info (const void *unused, struct object *unused2) +{ +?????? (void)unused; +?????? (void)unused2; +} + ?extern void __register_frame_info_bases (const void *, struct object *, ???????????????????????????????????????? void *, void *) ????????????????????????????????? TARGET_ATTRIBUTE_WEAK; -extern void *__deregister_frame_info (const void *) - TARGET_ATTRIBUTE_WEAK; + +extern void *__deregister_frame_info (const void *); +TARGET_ATTRIBUTE_WEAK void *__deregister_frame_info (const void *unused) +{ +?????? (void)unused; +?????? return 0; +} + ?extern void *__deregister_frame_info_bases (const void *) TARGET_ATTRIBUTE_WEAK; ?extern void __do_global_ctors_1 (void); then the example program links. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.huber@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine gesch?ftliche Mitteilung im Sinne des EHUG. From rguenther@suse.de Mon May 28 15:22:00 2018 From: rguenther@suse.de (Richard Biener) Date: Mon, 28 May 2018 15:22:00 -0000 Subject: PR80155: Code hoisting and register pressure In-Reply-To: References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> <3D61699E-73EC-4936-9FF3-494DF6816616@suse.de> Message-ID: On Sat, 26 May 2018, Bin.Cheng wrote: > On Fri, May 25, 2018 at 5:54 PM, Richard Biener wrote: > > On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: > >>On 05/25/2018 03:49 AM, Bin.Cheng wrote: > >>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni > >>> wrote: > >>>> On 23 May 2018 at 18:37, Jeff Law wrote: > >>>>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: > >>>>>> On 23 May 2018 at 13:58, Richard Biener wrote: > >>>>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> I am trying to work on PR80155, which exposes a problem with > >>code > >>>>>>>> hoisting and register pressure on a leading embedded benchmark > >>for ARM > >>>>>>>> cortex-m7, where code-hoisting causes an extra register spill. > >>>>>>>> > >>>>>>>> I have attached two test-cases which (hopefully) are > >>representative of > >>>>>>>> the original test-case. > >>>>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to > >>the > >>>>>>>> original test-case and trans_dfa_2.c is hand-reduced version of > >>>>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c > >>>>>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. > >>>>>>>> The test-cases in the PR are probably not relevant. > >>>>>>>> > >>>>>>>> Initially I thought the spill was happening because of "too many > >>>>>>>> hoistings" taking place in original test-case thus increasing > >>the > >>>>>>>> register pressure, but it seems the spill is possibly caused > >>because > >>>>>>>> expression gets hoisted out of a block that is on loop exit. > >>>>>>>> > >>>>>>>> For example, the following hoistings take place with > >>trans_dfa_2.c: > >>>>>>>> > >>>>>>>> (1) Inserting expression in block 4 for code hoisting: > >>>>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) > >>>>>>>> > >>>>>>>> (2) Inserting expression in block 4 for code hoisting: > >>{plus_expr,_4,1} (0006) > >>>>>>>> > >>>>>>>> (3) Inserting expression in block 4 for code hoisting: > >>>>>>>> {pointer_plus_expr,s_33,1} (0023) > >>>>>>>> > >>>>>>>> (4) Inserting expression in block 3 for code hoisting: > >>>>>>>> {pointer_plus_expr,s_33,1} (0023) > >>>>>>>> > >>>>>>>> The issue seems to be hoisting of (*tab + 1) which consists of > >>first > >>>>>>>> two hoistings in block 4 > >>>>>>>> from blocks 5 and 9, which causes the extra spill. I verified > >>that by > >>>>>>>> disabling hoisting into block 4, > >>>>>>>> which resulted in no extra spills. > >>>>>>>> > >>>>>>>> I wonder if that's because the expression (*tab + 1) is getting > >>>>>>>> hoisted from blocks 5 and 9, > >>>>>>>> which are on loop exit ? So the expression that was previously > >>>>>>>> computed in a block on loop exit, gets hoisted outside that > >>block > >>>>>>>> which possibly makes the allocator more defensive ? Similarly > >>>>>>>> disabling hoisting of expressions which appeared in blocks on > >>loop > >>>>>>>> exit in original test-case prevented the extra spill. The other > >>>>>>>> hoistings didn't seem to matter. > >>>>>>> > >>>>>>> I think that's simply co-incidence. The only thing that makes > >>>>>>> a block that also exits from the loop special is that an > >>>>>>> expression could be sunk out of the loop and hoisting (commoning > >>>>>>> with another path) could prevent that. But that isn't what is > >>>>>>> happening here and it would be a pass ordering issue as > >>>>>>> the sinking pass runs only after hoisting (no idea why exactly > >>>>>>> but I guess there are cases where we want to prefer CSE over > >>>>>>> sinking). So you could try if re-ordering PRE and sinking helps > >>>>>>> your testcase. > >>>>>> Thanks for the suggestions. Placing sink pass before PRE works > >>>>>> for both these test-cases! Sadly it still causes the spill for the > >>benchmark -:( > >>>>>> I will try to create a better approximation of the original > >>test-case. > >>>>>>> > >>>>>>> What I do see is a missed opportunity to merge the successors > >>>>>>> of BB 4. After PRE we have > >>>>>>> > >>>>>>> [local count: 159303558]: > >>>>>>> : > >>>>>>> pretmp_123 = *tab_37(D); > >>>>>>> _87 = pretmp_123 + 1; > >>>>>>> if (c_36 == 65) > >>>>>>> goto ; [34.00%] > >>>>>>> else > >>>>>>> goto ; [66.00%] > >>>>>>> > >>>>>>> [local count: 54163210]: > >>>>>>> *tab_37(D) = _87; > >>>>>>> _96 = MEM[(char *)s_57 + 1B]; > >>>>>>> if (_96 != 0) > >>>>>>> goto ; [89.00%] > >>>>>>> else > >>>>>>> goto ; [11.00%] > >>>>>>> > >>>>>>> [local count: 105140348]: > >>>>>>> *tab_37(D) = _87; > >>>>>>> _56 = MEM[(char *)s_57 + 1B]; > >>>>>>> if (_56 != 0) > >>>>>>> goto ; [89.00%] > >>>>>>> else > >>>>>>> goto ; [11.00%] > >>>>>>> > >>>>>>> here at least the stores and loads can be hoisted. Note this > >>>>>>> may also point at the real issue of the code hoisting which is > >>>>>>> tearing apart the RMW operation? > >>>>>> Indeed, this possibility seems much more likely than block being > >>on loop exit. > >>>>>> I will try to "hardcode" the load/store hoists into block 4 for > >>this > >>>>>> specific test-case to check > >>>>>> if that prevents the spill. > >>>>> Even if it prevents the spill in this case, it's likely a good > >>thing to > >>>>> do. The statements prior to the conditional in bb5 and bb8 should > >>be > >>>>> hoisted, leaving bb5 and bb8 with just their conditionals. > >>>> Hi, > >>>> It seems disabling forwprop somehow works for causing no extra > >>spills > >>>> on the original test-case. > >>>> > >>>> For instance, > >>>> Hoisting without forwprop: > >>>> > >>>> bb 3: > >>>> _1 = tab_1(D) + 8 > >>>> pretmp_268 = MEM[tab_1(D) + 8B]; > >>>> _2 = pretmp_268 + 1; > >>>> goto or > >>>> > >>>> bb 4: > >>>> *_1 = _ 2 > >>>> > >>>> bb 5: > >>>> *_1 = _2 > >>>> > >>>> Hoisting with forwprop: > >>>> > >>>> bb 3: > >>>> pretmp_164 = MEM[tab_1(D) + 8B]; > >>>> _2 = pretmp_164 + 1 > >>>> goto or > >>>> > >>>> bb 4: > >>>> MEM[tab_1(D) + 8] = _2; > >>>> > >>>> bb 5: > >>>> MEM[tab_1(D) + 8] = _2; > >>>> > >>>> Although in both cases, we aren't hoisting stores, the issues with > >>forwprop > >>>> for this case seems to be the folding of > >>>> *_1 = _2 > >>>> into > >>>> MEM[tab_1(D) + 8] = _2 ? > >>> > >>> This isn't an issue, right? IIUC, tab_1(D) used all over the loop > >>> thus propagating _1 using (tab_1(D) + 8) actually removes one live > >>> range. > >>> > >>>> > >>>> Disabling folding to mem_ref[base + offset] in forwprop "works" in > >>the > >>>> sense it created same set of hoistings as without forwprop, however > >>it > >>>> still results in additional spills (albeit different registers). > >>>> > >>>> That's because forwprop seems to be increasing live range of > >>>> prephitmp_217 by substituting > >>>> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + > >>1). > >>> Hmm, it's hard to discuss private benchmarks, not sure which dump > >>> shall I find prephitmp_221/prephitmp_217 stuff. > >>> > >>>> On the other hand, Bin pointed out to me in private that forwprop > >>also > >>>> helps to restrict register pressure by propagating "tab + const_int" > >>>> for same test-case. > >>>> > >>>> So I am not really sure if there's an easier fix than having > >>>> heuristics for estimating register pressure at TREE level ? I would > >>be > >>> Easy fix, maybe not. OTOH, I am more convinced passes like > >>> forwprop/sink/hoisting can be improved by taking live range into > >>> consideration. Specifically, to direct such passes when moving code > >>> around different basic blocks, because inter-block register pressure > >>> is hard to resolve afterwards. > >>> > >>> As suggested by Jeff and Richi, I guess the first step would be doing > >>> experiments, collecting more benchmark data for reordering sink > >>before > >>> pre? It enables code sink as well as decreases register pressure in > >>> the original reduced cases IIRC. > >>We might even consider re-evaluating Bernd's work on what is > >>effectively > >>a gimple scheduler to minimize register pressure. > Could you please point me to Bernd's work? Does it schedule around > different basic blocks to minimize register pressue? Like in this > case, various optimizers extends live range to different basic blocks. > I once prototyped a single-block gimple scheduler, it isn't very > useful IMHO. Possibly https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01058.html. Quickly re-skimming the patch it seems to work on the set of TER-able stmts, scheduling them, and if scheduled, remove them from the TER-able set (because TER will undo scheduling later). So it's partly a register-pressure driven cost-model for TER and partly doing scheduling on GIMPLE. Of course it suffers from the issue I mentioned - we rely on TER (too much) to do combine-like work during RTL expansion. > It will be huge amount work to take register pressure into > consideration in various optimizers. OTOH, having a single > inter-block live range shrink pass before expanding is great, but I > think it would be at least equally difficult. Yes, I've always wanted to do some GIMPLE level scheduling before expanding. And yes, it's equally difficult because it means we have to do all instruction selection TER enables on GIMPLE - which is of course also a good thing. And yes, the difficulty is to actually see all the expand-time benefits we get from TER - most of them are _not_ places that look at def stmts during expansion but are those that benefit from being fed "complex" RTL when expanding a stmts operands. > > Sure. The main issue here I see is with the interaction with TER which we unfortunately still rely on. Enough GIMPLE instruction selection might help to get rid of the remaining pieces... > > If the scheduler is designed to focus on inter-block live range > shrink, it won't interfer with TER which only replaces single use in > each basic block. True. Richard. From linux@carewolf.com Mon May 28 15:51:00 2018 From: linux@carewolf.com (Allan Sandfeld Jensen) Date: Mon, 28 May 2018 15:51:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: References: <2659301.XPQk3P0qmd@twilight> <5A85555D-FF52-4666-88EE-FFBD8C498294@gmail.com> Message-ID: <16437261.TNubWdSUOO@twilight> On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote: > compile-time effects of the patch on that. Embedded folks may want to rhn > their favorite benchmark and report results as well. > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile > and run and the compile-time > effect where measurable (SPEC records on a second granularity) is within > one second per benchmark > apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). > Performance-wise I notice significant > slowdowns for SPEC FP and some for SPEC INT (I only did a train run > sofar). I'll re-run with ref input now > and will post those numbers. > If you continue to see slowdowns, could you check with either no avx, or with -mprefer-avx128? The occational AVX256 instructions might be downclocking the CPU. But yes that would be a problem for this change on its own. 'Allan From schwab@suse.de Mon May 28 15:53:00 2018 From: schwab@suse.de (Andreas Schwab) Date: Mon, 28 May 2018 15:53:00 -0000 Subject: not computable at load time In-Reply-To: (Richard Biener's message of "Mon, 28 May 2018 11:02:46 +0200") References: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> Message-ID: On Mai 28 2018, Richard Biener wrote: > It means there's no relocation that can express the result of 's.f - &s.b' > and the frontend doesn't consider this a constant expression (likely because > of the conversion). Shouldn't the frontend notice that s.f - &s.b by itself is a constant? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." From richard.guenther@gmail.com Mon May 28 16:03:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Mon, 28 May 2018 16:03:00 -0000 Subject: not computable at load time In-Reply-To: References: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> Message-ID: On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab wrote: >On Mai 28 2018, Richard Biener wrote: > >> It means there's no relocation that can express the result of 's.f - >&s.b' >> and the frontend doesn't consider this a constant expression (likely >because >> of the conversion). > >Shouldn't the frontend notice that s.f - &s.b by itself is a constant? Sure - the question is whether it is required to and why it doesn't. Richard. >Andreas. From paulkoning@comcast.net Mon May 28 18:34:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Mon, 28 May 2018 18:34:00 -0000 Subject: not computable at load time In-Reply-To: References: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> Message-ID: <5D1DB310-D460-4A04-A0ED-8C9941D8A9F9@comcast.net> > On May 28, 2018, at 12:03 PM, Richard Biener wrote: > > On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab wrote: >> On Mai 28 2018, Richard Biener wrote: >> >>> It means there's no relocation that can express the result of 's.f - >> &s.b' >>> and the frontend doesn't consider this a constant expression (likely >> because >>> of the conversion). >> >> Shouldn't the frontend notice that s.f - &s.b by itself is a constant? > > Sure - the question is whether it is required to and why it doesn't. This is a test case in the C torture test suite. The only reason I can see for it being there is to verify that GCC resolves this as a compile time constant. The issue can be masked by changing the "long" in that test case to a ptrdiff_t, which eliminates the conversion. Should I do that? It would make the test pass, at the expense of masking this glitch. By the way, I get the same error if I change the "long" to a "long long" and them compile for 32-bit Intel. paul From tiffany@politic365.com Mon May 28 22:08:00 2018 From: tiffany@politic365.com (Andrea Jung) Date: Mon, 28 May 2018 22:08:00 -0000 Subject: Connect with me on LinkedIn to be on my safe supplier list we need your products Message-ID: gcc@gcc.gnu.org Here are some people you may know and would like to connect with you. Reach out and build new connections. Andrea Jung Chairperson and CEO of Avon Group of companies. View Profile Connect Unsubscribe | Help You are receiving LinkedIn notification emails. This email was intended for gcc@gcc.gnu.org. Learn why we included this. ? LinkedIn. Mailing address: Room 817, 18F, Building 18, #1 DiSheng Bei Road, Bejing Yizhuang Development Area, China. LinkedIn and the LinkedIn logo are registered trademarks of LinkedIn. // From umesh.kalappa0@gmail.com Tue May 29 04:20:00 2018 From: umesh.kalappa0@gmail.com (Umesh Kalappa) Date: Tue, 29 May 2018 04:20:00 -0000 Subject: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction In-Reply-To: <20180507083830.GF8577@tucnak> References: <20180507083830.GF8577@tucnak> Message-ID: Ok, thanks for the clarification jakub. Umesg On Mon, May 7, 2018, 2:08 PM Jakub Jelinek wrote: > On Mon, May 07, 2018 at 01:58:48PM +0530, Umesh Kalappa wrote: > > CCed Jakub, > > > > Agree that float division don't touch memory ,but fdiv result (stack > > > register ) is stored back to a memory i.e fResult . > > That doesn't really matter. It is stored to a stack spill slot, something > that doesn't have address taken and other code (e.g. in other threads) > can't > in a valid program access it. That is not considered memory for the > inline-asm, only objects that must live in memory count. > > Jakub > From qcib@mpyc.info Tue May 29 06:53:00 2018 From: qcib@mpyc.info (Lisa) Date: Tue, 29 May 2018 06:53:00 -0000 Subject: =?GB2312?B?UkU6IEkgd2FudCB0byBidXkgYmVhZHMgZnJvbSB5b3VyIHdlYnNpdGU=?= Message-ID: <20180529065121.778C8B40002@vps11413.com> Hello, All kinds of beads are widely used in cloths ,bags ,shoes ,bracelet or necklace decoration .Our company can supply you most when you needed the accessories . If order from us: 1. Free sample to check quality 2. Take photos of all the production process 4. If need do test report ,we can also help you to do from SGS company Come to us and join the COLORFUL WORLD! Thanks & Regards Lisa From richard.guenther@gmail.com Tue May 29 09:34:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Tue, 29 May 2018 09:34:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <16437261.TNubWdSUOO@twilight> References: <2659301.XPQk3P0qmd@twilight> <5A85555D-FF52-4666-88EE-FFBD8C498294@gmail.com> <16437261.TNubWdSUOO@twilight> Message-ID: On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen wrote: > On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote: > > compile-time effects of the patch on that. Embedded folks may want to rhn > > their favorite benchmark and report results as well. > > > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 compile > > and run and the compile-time > > effect where measurable (SPEC records on a second granularity) is within > > one second per benchmark > > apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). > > Performance-wise I notice significant > > slowdowns for SPEC FP and some for SPEC INT (I only did a train run > > sofar). I'll re-run with ref input now > > and will post those numbers. > > > If you continue to see slowdowns, could you check with either no avx, or with > -mprefer-avx128? The occational AVX256 instructions might be downclocking the > CPU. But yes that would be a problem for this change on its own. So here's a complete two-run with ref input, peak is -O2 -march=haswell -ftree-slp-vectorize. It confirms the slowdowns in SPEC FP but not in SPEC INT. You are right that using AVX256 (or AVX512) might be problematic on its own but that is not restricted to -O2 -ftree-slp-vectorize but also -O3. I will re-benchmark the SPEC FP part with -mprefer-avx128 to see if that is the issue. Note I did not use any -ffast-math flags in the experiment - those are as "unlikely" as using -march=native together with -O2. In theory another issue is the ability to debug code. Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 362 37.5 * 13590 370 36.7 * 410.bwaves 13590 365 37.2 S 13590 377 36.0 S 416.gamess 19580 558 35.1 * 19580 598 32.7 * 416.gamess 19580 560 35.0 S 19580 600 32.6 S 433.milc 9180 331 27.8 S 9180 374 24.6 * 433.milc 9180 331 27.8 * 9180 383 24.0 S 434.zeusmp 9100 301 30.2 S 9100 301 30.2 * 434.zeusmp 9100 301 30.2 * 9100 302 30.1 S 435.gromacs 7140 300 23.8 S 7140 303 23.6 S 435.gromacs 7140 298 23.9 * 7140 301 23.8 * 436.cactusADM 11950 495 24.1 S 11950 482 24.8 * 436.cactusADM 11950 486 24.6 * 11950 484 24.7 S 437.leslie3d 9400 289 32.5 * 9400 288 32.6 * 437.leslie3d 9400 301 31.3 S 9400 289 32.5 S 444.namd 8020 301 26.6 * 8020 301 26.6 * 444.namd 8020 301 26.6 S 8020 301 26.6 S 447.dealII 11440 255 44.9 * 11440 252 45.3 * 447.dealII 11440 255 44.9 S 11440 253 45.3 S 450.soplex 8340 212 39.4 S 8340 213 39.1 S 450.soplex 8340 211 39.5 * 8340 211 39.5 * 453.povray 5320 111 47.9 S 5320 113 47.0 S 453.povray 5320 111 48.0 * 5320 113 47.2 * 454.calculix 8250 748 11.0 * 8250 835 9.88 * 454.calculix 8250 748 11.0 S 8250 835 9.88 S 459.GemsFDTD 10610 324 32.8 S 10610 324 32.8 S 459.GemsFDTD 10610 323 32.9 * 10610 323 32.9 * 465.tonto 9840 449 21.9 S 9840 469 21.0 * 465.tonto 9840 446 22.0 * 9840 469 21.0 S 470.lbm 13740 253 54.3 * 13740 255 53.9 S 470.lbm 13740 253 54.2 S 13740 254 54.2 * 481.wrf 11170 415 26.9 * 11170 416 26.9 S 481.wrf 11170 417 26.8 S 11170 416 26.9 * 482.sphinx3 19490 456 42.7 * 19490 465 41.9 * 482.sphinx3 19490 464 42.0 S 19490 468 41.6 S Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 400.perlbench 9770 251 38.9 S 9770 252 38.8 S 400.perlbench 9770 250 39.1 * 9770 251 39.0 * 401.bzip2 9650 399 24.2 S 9650 397 24.3 S 401.bzip2 9650 395 24.4 * 9650 395 24.4 * 403.gcc 8050 246 32.8 S 8050 245 32.9 S 403.gcc 8050 244 33.0 * 8050 243 33.1 * 429.mcf 9120 251 36.3 S 9120 248 36.8 * 429.mcf 9120 250 36.5 * 9120 248 36.8 S 445.gobmk 10490 394 26.6 S 10490 392 26.8 * 445.gobmk 10490 393 26.7 * 10490 392 26.8 S 456.hmmer 9330 389 24.0 S 9330 388 24.0 * 456.hmmer 9330 389 24.0 * 9330 389 24.0 S 458.sjeng 12100 447 27.1 * 12100 439 27.5 * 458.sjeng 12100 449 27.0 S 12100 449 26.9 S 462.libquantum 20720 309 67.0 S 20720 307 67.5 S 462.libquantum 20720 302 68.7 * 20720 300 69.1 * 464.h264ref 22130 457 48.5 S 22130 459 48.2 S 464.h264ref 22130 456 48.6 * 22130 459 48.2 * 471.omnetpp 6250 307 20.4 * 6250 308 20.3 * 471.omnetpp 6250 317 19.7 S 6250 310 20.2 S 473.astar 7020 346 20.3 * 7020 347 20.2 * 473.astar 7020 346 20.3 S 7020 347 20.2 S 483.xalancbmk 6900 198 34.8 * 6900 199 34.7 * 483.xalancbmk 6900 202 34.2 S 6900 203 34.1 S > 'Allan From richard.guenther@gmail.com Tue May 29 09:49:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Tue, 29 May 2018 09:49:00 -0000 Subject: not computable at load time In-Reply-To: <5D1DB310-D460-4A04-A0ED-8C9941D8A9F9@comcast.net> References: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> <5D1DB310-D460-4A04-A0ED-8C9941D8A9F9@comcast.net> Message-ID: On Mon, May 28, 2018 at 8:34 PM Paul Koning wrote: > > On May 28, 2018, at 12:03 PM, Richard Biener > wrote: > > > > On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab wrote: > >> On Mai 28 2018, Richard Biener wrote: > >> > >>> It means there's no relocation that can express the result of 's.f - > >> &s.b' > >>> and the frontend doesn't consider this a constant expression (likely > >> because > >>> of the conversion). > >> > >> Shouldn't the frontend notice that s.f - &s.b by itself is a constant? > > > > Sure - the question is whether it is required to and why it doesn't. > This is a test case in the C torture test suite. The only reason > I can see for it being there is to verify that GCC resolves this as > a compile time constant. > The issue can be masked by changing the "long" in that test case to > a ptrdiff_t, which eliminates the conversion. Should I do that? > It would make the test pass, at the expense of masking this glitch. > By the way, I get the same error if I change the "long" to a "long long" > and them compile for 32-bit Intel. The testcase dates back to some repository creation rev. (egcs?) and I'm not sure we may compute the difference of addresses of structure members. So that GCC accepts this is probably not required. Joseph may have a definitive answer here. It might be a "regression" with the POINTER_MINUS_EXPR introduction. You can debug this with gdb when you break on 'pointer_diff'. For me on x86_64 this builds a POINTER_DIFF_EXPR: (char *) &s.f - &s.b of ptrdiff_t. That a conversion breaks the simplification tells us that somewhere we possibly fail to simplify it (maybe even during assembling). You might want to file a bug for the 'long long' issue. Richard. > paul From richard.sandiford@linaro.org Tue May 29 10:06:00 2018 From: richard.sandiford@linaro.org (Richard Sandiford) Date: Tue, 29 May 2018 10:06:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: (Jeff Law's message of "Sun, 27 May 2018 09:59:31 -0600") References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> <1526491802.29509.19.camel@cavium.com> <87a7sznw5c.fsf@linaro.org> <1527184223.22014.13.camel@cavium.com> <87a7smbuej.fsf@linaro.org> Message-ID: <871sdubwv6.fsf@linaro.org> Jeff Law writes: > Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. > When we left things I think we were trying to decide between > CLOBBER_HIGH and clobbering the appropriate subreg. The problem with > the latter is the dataflow we compute is inaccurate (overly pessimistic) > so that'd have to be fixed. The clobbered part of the register in this case is a high-part subreg, which is ill-formed for single registers. It would also be difficult to represent in terms of the mode, since there are no defined modes for what can be stored in the high part of an SVE register. For 128-bit SVE that mode would have zero bits. :-) I thought the alternative suggestion was instead to have: (set (reg:M X) (reg:M X)) when X is preserved in mode M but not in wider modes. But that seems like too much of a special case to me, both in terms of the source and the destination: - On the destination side, a SET normally provides something for later instructions to use, whereas here the effect is intended to be the opposite: the instruction has no effect at all on a value of mode M in X. As you say, this would pessimise df without specific handling. But I think all optimisations that look for the definition of a value would need to be taught to "look through" this set to find the real definition of (reg:M X) (or any value of a mode no larger than M in X). Very few passes use the df def-uses chains for this due its high cost. - On the source side, the instruction doesn't actually care what's in X, but nevertheless appears to use it. This means that most passes would need to be taught that a use of X on the rhs of a no-op SET is special and should usually be ignored. More fundamentally, it should be possible in RTL to express an instruction J that *does* read X in mode M and clobbers its high part. If we use the SET above to represent the clobber, and treat the rhs use as special, then presumably J would need two uses of X, one "dummy" one on the no-op SET and one "real" one on some other SET (or perhaps in a top-level USE). Having the number of uses determine this seems a bit awkward. IMO CLOBBER and SET have different semantics for good reason: CLOBBER represents an optimisation barrier for things that care about the value of a certain rtx object, while SET represents a productive effect or side-effect. The effect we want here is the same as a normal clobber, except that the clobber is mode-dependent. Thanks, Richard From sebastian.huber@embedded-brains.de Tue May 29 11:19:00 2018 From: sebastian.huber@embedded-brains.de (Sebastian Huber) Date: Tue, 29 May 2018 11:19:00 -0000 Subject: RISC-V problem with weak function references and -mcmodel=medany In-Reply-To: References: Message-ID: Changing the code to something like this void f(void) __attribute__((__weak__)); void _start(void) { ??????? void (*g)(void) = f; ??????? if (g != 0) { ??????????????? (*g)(); ??????? } } doesn't work either, since this is optimized to ??????? .option nopic ??????? .text ??????? .align? 1 ??????? .globl? _start ??????? .type?? _start, @function _start: ??????? lla???? a5,f ??????? beqz??? a5,.L1 ??????? tail??? f .L1: ??????? ret ??????? .size?? _start, .-_start ??????? .weak?? f Why doesn't the RISC-V generate a trampoline code to call far functions? The non-optimized example code with "tail f" replaced by "jalr a5" links well: ??????? .option nopic ??????? .text ??????? .align? 1 ??????? .globl? _start ??????? .type?? _start, @function _start: ??????? addi??? sp,sp,-32 ??????? sd????? ra,24(sp) ??????? sd????? s0,16(sp) ??????? addi??? s0,sp,32 ??????? lla???? a5,f ??????? sd????? a5,-24(s0) ??????? ld????? a5,-24(s0) ??????? beqz??? a5,.L3 ??????? ld????? a5,-24(s0) ??????? jalr??? a5 .L3: ??????? nop ??????? ld????? ra,24(sp) ??????? ld????? s0,16(sp) ??????? addi??? sp,sp,32 ??????? jr????? ra ??????? .size?? _start, .-_start ??????? .weak?? f -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.huber@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine gesch?ftliche Mitteilung im Sinne des EHUG. From paulkoning@comcast.net Tue May 29 13:35:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Tue, 29 May 2018 13:35:00 -0000 Subject: not computable at load time In-Reply-To: References: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> <5D1DB310-D460-4A04-A0ED-8C9941D8A9F9@comcast.net> Message-ID: > On May 29, 2018, at 5:49 AM, Richard Biener wrote: > ... > It might be a "regression" with the POINTER_MINUS_EXPR introduction. > You can debug this with gdb when you break on 'pointer_diff'. For me > on x86_64 this builds a POINTER_DIFF_EXPR: (char *) &s.f - &s.b > of ptrdiff_t. That a conversion breaks the simplification tells us that > somewhere we possibly fail to simplify it (maybe even during assembling). > > You might want to file a bug for the 'long long' issue. Done, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85974 paul From law@redhat.com Tue May 29 13:53:00 2018 From: law@redhat.com (Jeff Law) Date: Tue, 29 May 2018 13:53:00 -0000 Subject: not computable at load time In-Reply-To: References: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> <5D1DB310-D460-4A04-A0ED-8C9941D8A9F9@comcast.net> Message-ID: <19a8917f-0f2c-0487-da99-bee528f44dc0@redhat.com> On 05/29/2018 03:49 AM, Richard Biener wrote: > On Mon, May 28, 2018 at 8:34 PM Paul Koning wrote: > > > >>> On May 28, 2018, at 12:03 PM, Richard Biener >> > wrote: >>> >>> On May 28, 2018 12:45:04 PM GMT+02:00, Andreas Schwab > wrote: >>>> On Mai 28 2018, Richard Biener wrote: >>>> >>>>> It means there's no relocation that can express the result of 's.f - >>>> &s.b' >>>>> and the frontend doesn't consider this a constant expression (likely >>>> because >>>>> of the conversion). >>>> >>>> Shouldn't the frontend notice that s.f - &s.b by itself is a constant? >>> >>> Sure - the question is whether it is required to and why it doesn't. > >> This is a test case in the C torture test suite. The only reason >> I can see for it being there is to verify that GCC resolves this as >> a compile time constant. > >> The issue can be masked by changing the "long" in that test case to >> a ptrdiff_t, which eliminates the conversion. Should I do that? >> It would make the test pass, at the expense of masking this glitch. > >> By the way, I get the same error if I change the "long" to a "long long" >> and them compile for 32-bit Intel. > > The testcase dates back to some repository creation rev. (egcs?) and > I'm not sure we may compute the difference of addresses of structure > members. So that GCC accepts this is probably not required. Joseph > may have a definitive answer here. Given the name 93xxxx.c it goes back to the c-torture releases from Torbjorn which were separate from GCC releases. His c-torture suite helped seed the integrated regression testsuite.; Jeff From richard.guenther@gmail.com Tue May 29 14:58:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Tue, 29 May 2018 14:58:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: References: <2659301.XPQk3P0qmd@twilight> <5A85555D-FF52-4666-88EE-FFBD8C498294@gmail.com> <16437261.TNubWdSUOO@twilight> Message-ID: On Tue, May 29, 2018 at 11:32 AM Richard Biener wrote: > On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen > wrote: > > On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote: > > > compile-time effects of the patch on that. Embedded folks may want to > rhn > > > their favorite benchmark and report results as well. > > > > > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 > compile > > > and run and the compile-time > > > effect where measurable (SPEC records on a second granularity) is within > > > one second per benchmark > > > apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). > > > Performance-wise I notice significant > > > slowdowns for SPEC FP and some for SPEC INT (I only did a train run > > > sofar). I'll re-run with ref input now > > > and will post those numbers. > > > > > If you continue to see slowdowns, could you check with either no avx, or > with > > -mprefer-avx128? The occational AVX256 instructions might be downclocking > the > > CPU. But yes that would be a problem for this change on its own. > So here's a complete two-run with ref input, peak is -O2 -march=haswell > -ftree-slp-vectorize. > It confirms the slowdowns in SPEC FP but not in SPEC INT. You are right > that using > AVX256 (or AVX512) might be problematic on its own but that is not > restricted to > -O2 -ftree-slp-vectorize but also -O3. I will re-benchmark the SPEC FP > part with > -mprefer-avx128 to see if that is the issue. Note I did not use any > -ffast-math flags in the > experiment - those are as "unlikely" as using -march=native together with > -O2. In theory > another issue is the ability to debug code. > Base Base Base Peak Peak Peak > Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio > -------------- ------ --------- --------- ------ --------- --------- > 410.bwaves 13590 362 37.5 * 13590 370 36.7 > * > 410.bwaves 13590 365 37.2 S 13590 377 36.0 > S > 416.gamess 19580 558 35.1 * 19580 598 32.7 > * > 416.gamess 19580 560 35.0 S 19580 600 32.6 > S > 433.milc 9180 331 27.8 S 9180 374 24.6 > * > 433.milc 9180 331 27.8 * 9180 383 24.0 > S > 434.zeusmp 9100 301 30.2 S 9100 301 30.2 > * > 434.zeusmp 9100 301 30.2 * 9100 302 30.1 > S > 435.gromacs 7140 300 23.8 S 7140 303 23.6 > S > 435.gromacs 7140 298 23.9 * 7140 301 23.8 > * > 436.cactusADM 11950 495 24.1 S 11950 482 24.8 > * > 436.cactusADM 11950 486 24.6 * 11950 484 24.7 > S > 437.leslie3d 9400 289 32.5 * 9400 288 32.6 > * > 437.leslie3d 9400 301 31.3 S 9400 289 32.5 > S > 444.namd 8020 301 26.6 * 8020 301 26.6 > * > 444.namd 8020 301 26.6 S 8020 301 26.6 > S > 447.dealII 11440 255 44.9 * 11440 252 45.3 > * > 447.dealII 11440 255 44.9 S 11440 253 45.3 > S > 450.soplex 8340 212 39.4 S 8340 213 39.1 > S > 450.soplex 8340 211 39.5 * 8340 211 39.5 > * > 453.povray 5320 111 47.9 S 5320 113 47.0 > S > 453.povray 5320 111 48.0 * 5320 113 47.2 > * > 454.calculix 8250 748 11.0 * 8250 835 9.88 > * > 454.calculix 8250 748 11.0 S 8250 835 9.88 > S > 459.GemsFDTD 10610 324 32.8 S 10610 324 32.8 > S > 459.GemsFDTD 10610 323 32.9 * 10610 323 32.9 > * > 465.tonto 9840 449 21.9 S 9840 469 21.0 > * > 465.tonto 9840 446 22.0 * 9840 469 21.0 > S > 470.lbm 13740 253 54.3 * 13740 255 53.9 > S > 470.lbm 13740 253 54.2 S 13740 254 54.2 > * > 481.wrf 11170 415 26.9 * 11170 416 26.9 > S > 481.wrf 11170 417 26.8 S 11170 416 26.9 > * > 482.sphinx3 19490 456 42.7 * 19490 465 41.9 > * > 482.sphinx3 19490 464 42.0 S 19490 468 41.6 > S Numbers with -mprefer-avx128: Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 365 37.2 * 410.bwaves 13590 374 36.4 S 416.gamess 19580 596 32.9 * 416.gamess 19580 596 32.8 S 433.milc 9180 378 24.3 S 433.milc 9180 375 24.5 * 434.zeusmp 9100 302 30.1 S 434.zeusmp 9100 302 30.2 * 435.gromacs 7140 299 23.9 * 435.gromacs 7140 299 23.9 S 436.cactusADM 11950 483 24.7 S 436.cactusADM 11950 482 24.8 * 437.leslie3d 9400 290 32.5 * 437.leslie3d 9400 302 31.1 S 444.namd 8020 301 26.6 * 444.namd 8020 301 26.6 S 447.dealII 11440 253 45.2 * 447.dealII 11440 253 45.2 S 450.soplex 8340 212 39.3 S 450.soplex 8340 211 39.5 * 454.calculix 8250 750 11.0 * 454.calculix 8250 750 11.0 S 459.GemsFDTD 10610 323 32.9 * 459.GemsFDTD 10610 323 32.8 S 465.tonto 9840 466 21.1 * 465.tonto 9840 466 21.1 S 470.lbm 13740 254 54.2 * 470.lbm 13740 255 54.0 S 481.wrf 11170 417 26.8 * 481.wrf 11170 417 26.8 S 482.sphinx3 19490 465 41.9 * 482.sphinx3 19490 473 41.2 S so the situation improves but isn't fully fixed (STLF issues maybe?) > Base Base Base Peak Peak Peak > Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio > -------------- ------ --------- --------- ------ --------- --------- > 400.perlbench 9770 251 38.9 S 9770 252 38.8 > S > 400.perlbench 9770 250 39.1 * 9770 251 39.0 > * > 401.bzip2 9650 399 24.2 S 9650 397 24.3 > S > 401.bzip2 9650 395 24.4 * 9650 395 24.4 > * > 403.gcc 8050 246 32.8 S 8050 245 32.9 > S > 403.gcc 8050 244 33.0 * 8050 243 33.1 > * > 429.mcf 9120 251 36.3 S 9120 248 36.8 > * > 429.mcf 9120 250 36.5 * 9120 248 36.8 > S > 445.gobmk 10490 394 26.6 S 10490 392 26.8 > * > 445.gobmk 10490 393 26.7 * 10490 392 26.8 > S > 456.hmmer 9330 389 24.0 S 9330 388 24.0 > * > 456.hmmer 9330 389 24.0 * 9330 389 24.0 > S > 458.sjeng 12100 447 27.1 * 12100 439 27.5 > * > 458.sjeng 12100 449 27.0 S 12100 449 26.9 > S > 462.libquantum 20720 309 67.0 S 20720 307 67.5 > S > 462.libquantum 20720 302 68.7 * 20720 300 69.1 > * > 464.h264ref 22130 457 48.5 S 22130 459 48.2 > S > 464.h264ref 22130 456 48.6 * 22130 459 48.2 > * > 471.omnetpp 6250 307 20.4 * 6250 308 20.3 > * > 471.omnetpp 6250 317 19.7 S 6250 310 20.2 > S > 473.astar 7020 346 20.3 * 7020 347 20.2 > * > 473.astar 7020 346 20.3 S 7020 347 20.2 > S > 483.xalancbmk 6900 198 34.8 * 6900 199 34.7 > * > 483.xalancbmk 6900 202 34.2 S 6900 203 34.1 > S > > 'Allan From linux@carewolf.com Tue May 29 15:29:00 2018 From: linux@carewolf.com (Allan Sandfeld Jensen) Date: Tue, 29 May 2018 15:29:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: References: <2659301.XPQk3P0qmd@twilight> Message-ID: <5190617.IRreCldYRv@twilight> On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote: > > so the situation improves but isn't fully fixed (STLF issues maybe?) > That raises the question if it helps in these cases even in -O3? Anyway it doesn't look good for it. Did the binary size at least improve with prefer-avx128, or was that also worse or insignificant? 'Allan From hrishikeshparag@gmail.com Tue May 29 17:03:00 2018 From: hrishikeshparag@gmail.com (Hrishikesh Kulkarni) Date: Tue, 29 May 2018 17:03:00 -0000 Subject: [GSOC] LTO dump tool project Message-ID: Hi, My exams have finally ended and I have started working on the GSOC project. I have forked GCC mirror (https://github.com/hrisearch/gcc) and created a option for dumping functions and variables used in IL. Please find the patch attached herewith. Regards, Hrishikesh -------------- next part -------------- A non-text attachment was scrubbed... Name: symbols-dump.diff Type: text/x-patch Size: 1638 bytes Desc: not available URL: From prathamesh.kulkarni@linaro.org Tue May 29 17:17:00 2018 From: prathamesh.kulkarni@linaro.org (Prathamesh Kulkarni) Date: Tue, 29 May 2018 17:17:00 -0000 Subject: [GSOC] LTO dump tool project In-Reply-To: References: Message-ID: On 29 May 2018 at 22:33, Hrishikesh Kulkarni wrote: > Hi, > > My exams have finally ended and I have started working on the GSOC project. > I have forked GCC mirror (https://github.com/hrisearch/gcc) and > created a option for dumping functions and variables used in IL. > Please find the patch attached herewith. diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt index 1083f9b..ae66c06 100644 --- a/gcc/lto/lang.opt +++ b/gcc/lto/lang.opt @@ -66,7 +66,11 @@ Whole program analysis (WPA) mode with number of parallel jobs specified. fdump LTO Var(flag_lto_dump) -Call the dump function +Call the dump function. + +fdump-lto-list +LTO Var(flag_lto_dump_list) +Call the dump function for variables and function in IL. Instead of making separate options -fdump and -fdump-lto-list, would it be a good idea to make it a "sub option" to -fdump like lto1 -fdump,-l which would list all symbols within the LTO object file ? fresolution= LTO Joined diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c index b6a8b45..5e4d069 100644 --- a/gcc/lto/lto-dump.c +++ b/gcc/lto/lto-dump.c @@ -38,4 +38,21 @@ along with GCC; see the file COPYING3. If not see void dump() { fprintf(stderr, "\nHello World!\n"); +} + +void dump_list() +{ + + fprintf (stderr, "Call Graph:\n\n"); + cgraph_node *cnode; + FOR_EACH_FUNCTION (cnode) + cnode->dump (stderr); + fprintf(stderr, "\n\n" ); + + fprintf (stderr, "Varpool:\n\n"); + varpool_node *vnode; + FOR_EACH_VARIABLE (vnode) + vnode->dump (stderr); + fprintf(stderr, "\n\n" ); + } \ No newline at end of file Formatting nit - Add comments for the newly added functions. diff --git a/gcc/lto/lto-dump.h b/gcc/lto/lto-dump.h index 4a06217..5ee71c6 100644 --- a/gcc/lto/lto-dump.h +++ b/gcc/lto/lto-dump.h @@ -21,5 +21,6 @@ along with GCC; see the file COPYING3. If not see #define GCC_LTO_DUMP_H_ void dump(); +void dump_list(); #endif \ No newline at end of file diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index 9c79242..93ef52b 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -3360,6 +3360,11 @@ lto_main (void) dump(); } + if (flag_lto_dump_list) + { + dump_list(); + } + Formatting nit - Avoid braces for single statement within if. Shouldn't fdump-lto-list be enabled only if fdump is enabled ? Thanks, Prathamesh timevar_stop (TV_PHASE_STREAM_IN); if (!seen_error ()) > > Regards, > Hrishikesh From mliska@suse.cz Tue May 29 17:38:00 2018 From: mliska@suse.cz (=?UTF-8?Q?Martin_Li=c5=a1ka?=) Date: Tue, 29 May 2018 17:38:00 -0000 Subject: [GSOC] LTO dump tool project In-Reply-To: References: Message-ID: <0b7f9c71-3b4c-6720-ae02-35bb7c8caeb7@suse.cz> On 05/29/2018 07:03 PM, Hrishikesh Kulkarni wrote: > Hi, > > My exams have finally ended and I have started working on the GSOC project. > I have forked GCC mirror (https://github.com/hrisearch/gcc) and > created a option for dumping functions and variables used in IL. > Please find the patch attached herewith. Hello. Good start. You branched the repository but your forget to push the commit you sent as attachment. Second issues is that the patch is not against GCC trunk, but against your local branch. Thus one can't apply that. About the options: - once you send a new functionality, it's fine to paste a sample output - for now I would remove the dummy flag_lto_dump flag - I would expect for -fdump-lto-list something like what nm does: $ nm main.o 00000000 T main 00000000 T mystring 00000000 C pole Then of course you can add some level of verbosity which can print what you have. Would be also handy during the time to come up with some sorting, but it can wait. That said, the direction is fine. Please carry on. Thanks, Martin > > Regards, > Hrishikesh > From mliska@suse.cz Tue May 29 17:39:00 2018 From: mliska@suse.cz (=?UTF-8?Q?Martin_Li=c5=a1ka?=) Date: Tue, 29 May 2018 17:39:00 -0000 Subject: [GSOC] LTO dump tool project In-Reply-To: References: Message-ID: On 05/29/2018 07:17 PM, Prathamesh Kulkarni wrote: > Shouldn't fdump-lto-list be enabled only if fdump is enabled ? The option is dummy, and eventually all do options will be moved to a separate tool called lto-dump. Thus all the prefixed '-fdump-lto-foo' will be replaced with -foo is guess. Martin From mliska@suse.cz Tue May 29 17:43:00 2018 From: mliska@suse.cz (=?UTF-8?Q?Martin_Li=c5=a1ka?=) Date: Tue, 29 May 2018 17:43:00 -0000 Subject: [GSOC] LTO dump tool project In-Reply-To: <0b7f9c71-3b4c-6720-ae02-35bb7c8caeb7@suse.cz> References: <0b7f9c71-3b4c-6720-ae02-35bb7c8caeb7@suse.cz> Message-ID: <621f6e76-8a34-575a-9b43-dbf9e22804c7@suse.cz> On 05/29/2018 07:38 PM, Martin Li??ka wrote: > $??nm??main.o > 00000000??T??main > 00000000??T??mystring > 00000000??C??pole Or we can be inspired by readelf: $ readelf -s a.out [snip] Symbol table '.symtab' contains 74 entries: Num: Value Size Type Bind Vis Ndx Name 66: 0000000000601250 0 NOTYPE GLOBAL DEFAULT 24 _end 67: 00000000004004b0 43 FUNC GLOBAL DEFAULT 13 _start 68: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 24 __bss_start 69: 0000000000400582 70 FUNC GLOBAL DEFAULT 13 main 70: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fwrite@@GLIBC_2.2.5 Martin From jimw@sifive.com Tue May 29 18:03:00 2018 From: jimw@sifive.com (Jim Wilson) Date: Tue, 29 May 2018 18:03:00 -0000 Subject: RISC-V ELF multilibs In-Reply-To: <233244769.189066.1527339877501.JavaMail.zimbra@embedded-brains.de> References: <233244769.189066.1527339877501.JavaMail.zimbra@embedded-brains.de> Message-ID: On 05/26/2018 06:04 AM, Sebastian Huber wrote: > Why is the default multilib and a variant identical? This is supposed to be a single multilib, with two names. We use MULTILIB_REUSE to map the two names to a single multilib. rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc ./rv64imafdc/lp64d/libgcc.a rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc ./rv64imafdc/lp64d/libgcc.a rohan:1032$ ./xgcc -B./ --print-libgcc ./libgcc.a rohan:1033$ So this is working right when the -march option is given, but not when no -march is given. I'd suggest a bug report so I can track this, if you haven't already filed one. > Most variants include the C extension. Would it be possible to add -march=rv32g and -march=rv64g variants? The expectation is that most implementations will include the C extension. It reduces code size, improves performance, and I think I read somewhere that it takes only 400 gates to implement. It isn't practical to try to support every possible combination of architecture and ABI here, as there are too many possible combinations. But if there is a major RISC-V target that is rv32g or rv64g then we should consider it. You can of course define your own set of multilibs. Jim From jimw@sifive.com Tue May 29 18:27:00 2018 From: jimw@sifive.com (Jim Wilson) Date: Tue, 29 May 2018 18:27:00 -0000 Subject: RISC-V problem with weak function references and -mcmodel=medany In-Reply-To: References: Message-ID: <341fefb2-6c8c-b097-ce5f-093907e3ff21@sifive.com> On 05/28/2018 06:32 AM, Sebastian Huber wrote: > I guess, that the resolution of the weak reference to the undefined > symbol __deregister_frame_info somehow sets __deregister_frame_info to > the absolute address 0 which is illegal in the following "call > __deregister_frame_info"? Is this construct with weak references and a > -mcmodel=medany supported on RISC-V at all? Yes. It works for me. Given a simple testcase extern void *__deregister_frame_info (const void *) __attribute__ ((weak)); void * foo; int main (void) { if (__deregister_frame_info) __deregister_frame_info (foo); return 0; } and compiling with -mcmodel=medany -O -Ttext=0x80000000, I get 80000158: 80000097 auipc ra,0x80000 8000015c: ea8080e7 jalr -344(ra) # 0 <_start-0x80000000> for the weak call. It isn't clear what you are doing differently. Jim From paulkoning@comcast.net Tue May 29 18:33:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Tue, 29 May 2018 18:33:00 -0000 Subject: Adding a libgcc file Message-ID: <3C557486-765A-4A37-89CD-34F01984B2FA@comcast.net> Question about proper target maintainer procedures... The pdp11 target needs udivhi3 in libgcc. There's udivsi3, and it's really easy to tweak those files for HImode. And that works. Should I add the HI files to the libgcc directory, or under config/pdp11? There's nothing target specific about them, though I don't know of other targets that might want this. And would this change fall under target maintainer write privileges, or should I get the patch reviewed first? paul From law@redhat.com Tue May 29 18:39:00 2018 From: law@redhat.com (Jeff Law) Date: Tue, 29 May 2018 18:39:00 -0000 Subject: Adding a libgcc file In-Reply-To: <3C557486-765A-4A37-89CD-34F01984B2FA@comcast.net> References: <3C557486-765A-4A37-89CD-34F01984B2FA@comcast.net> Message-ID: On 05/29/2018 12:33 PM, Paul Koning wrote: > Question about proper target maintainer procedures... > > The pdp11 target needs udivhi3 in libgcc. There's udivsi3, and it's really easy to tweak those files for HImode. And that works. > > Should I add the HI files to the libgcc directory, or under config/pdp11? There's nothing target specific about them, though I don't know of other targets that might want this. > > And would this change fall under target maintainer write privileges, or should I get the patch reviewed first? If it's easy to tweak for HImode and you think it's generally applicable, go ahead and post and we'll put them into the toplevel libgcc directory so that other targets can use. Jeff From jimw@sifive.com Tue May 29 18:42:00 2018 From: jimw@sifive.com (Jim Wilson) Date: Tue, 29 May 2018 18:42:00 -0000 Subject: RISC-V problem with weak function references and -mcmodel=medany In-Reply-To: References: Message-ID: <81ddb05f-8a8a-c4bf-b026-ac0dc68d2ff6@sifive.com> On 05/29/2018 04:19 AM, Sebastian Huber wrote: > Changing the code to something like this > > void f(void) __attribute__((__weak__)); > > void _start(void) > { > ?????????????? void (*g)(void) = f; > > ?????????????? if (g != 0) { > ?????????????????????????????? (*g)(); > ?????????????? } > } This testcase works for me also, using -mcmodel=medany -O tmp.c -Ttext=0x80000000 -nostdlib -nostartfiles. I need enough info to reproduce your problem in order to look at it. One thing you can try is adding -Wl,--noinhibit-exec, which will produce an executable even though there was a linker error, and then you can disassemble the binary to see what you have for the weak call. That might give a clue as to what is wrong. > Why doesn't the RISC-V generate a trampoline code to call far functions? RISC-V is a new target. The answer to questions like this is that we haven't needed it yet, and hence haven't implemented it yet. But I don't see any need for trampolines to support a call to 0. We can reach anywhere in the low 32-bit address space with auipc/jalr. We can also use zero-relative addressing via the x0 register if necessary. We already have some linker relaxation support for that, but it doesn't seem to be triggering for this testcase. Jim From sebastian.huber@embedded-brains.de Tue May 29 18:43:00 2018 From: sebastian.huber@embedded-brains.de (Sebastian Huber) Date: Tue, 29 May 2018 18:43:00 -0000 Subject: RISC-V problem with weak function references and -mcmodel=medany In-Reply-To: <341fefb2-6c8c-b097-ce5f-093907e3ff21@sifive.com> References: <341fefb2-6c8c-b097-ce5f-093907e3ff21@sifive.com> Message-ID: <1614615988.24007.1527619385712.JavaMail.zimbra@embedded-brains.de> Hello Jim, ----- Am 29. Mai 2018 um 20:27 schrieb Jim Wilson jimw@sifive.com: > On 05/28/2018 06:32 AM, Sebastian Huber wrote: >> I guess, that the resolution of the weak reference to the undefined >> symbol __deregister_frame_info somehow sets __deregister_frame_info to >> the absolute address 0 which is illegal in the following "call >> __deregister_frame_info"? Is this construct with weak references and a >> -mcmodel=medany supported on RISC-V at all? > > Yes. It works for me. Given a simple testcase > > extern void *__deregister_frame_info (const void *) > __attribute__ ((weak)); > void * foo; > int > main (void) > { > if (__deregister_frame_info) > __deregister_frame_info (foo); > return 0; > } > > and compiling with -mcmodel=medany -O -Ttext=0x80000000, I get would you mind trying this with -Ttext=0x90000000? Please have a look at: https://sourceware.org/bugzilla/show_bug.cgi?id=23244 https://sourceware.org/ml/binutils/2018-05/msg00296.html From jimw@sifive.com Tue May 29 18:51:00 2018 From: jimw@sifive.com (Jim Wilson) Date: Tue, 29 May 2018 18:51:00 -0000 Subject: RISC-V problem with weak function references and -mcmodel=medany In-Reply-To: <1614615988.24007.1527619385712.JavaMail.zimbra@embedded-brains.de> References: <341fefb2-6c8c-b097-ce5f-093907e3ff21@sifive.com> <1614615988.24007.1527619385712.JavaMail.zimbra@embedded-brains.de> Message-ID: On Tue, May 29, 2018 at 11:43 AM, Sebastian Huber wrote: > would you mind trying this with -Ttext=0x90000000? This gives me for the weak call 90000014: 70000097 auipc ra,0x70000 90000018: fec080e7 jalr -20(ra) # 0 <__global_pointer$+0x6fffe7d4> > Please have a look at: > https://sourceware.org/bugzilla/show_bug.cgi?id=23244 > https://sourceware.org/ml/binutils/2018-05/msg00296.html OK. I'm still catching up on mailing lists after the US holiday weekend. Jim From amacleod@redhat.com Tue May 29 23:53:00 2018 From: amacleod@redhat.com (Andrew MacLeod) Date: Tue, 29 May 2018 23:53:00 -0000 Subject: Project Ranger Message-ID: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> I'd like to introduce a project we've been working on for the past year an a half. The original project goal was to see if we could derived accurate range information from the IL without requiring much effort on the client side. The idea being that a pass could simply ask "what is the range of this ssa_name on this statement? "?? and the compiler would go figure it out. After lots of experimenting and prototyping the project evolved into what we are introducing here. I call it the Ranger. Existing range infrastructure in the compiler works from the top down. It walks through the IL computing all ranges and propagates these values forward in case they are needed.?? For the most part, other passes are required to either use global information, or process things in dominator order and work lockstep with EVRP to get more context sensitive ranges. The Ranger's model is purely on-demand, and designed to have minimal overhead.???? When a range is requested, the Ranger walking backwards through use-def chains to determine what ranges it can find relative to the name being requested.?? This means it only looks at statements which are deemed necessary to evaluate a range.?? This can result is some significant?? speedups when a pass is only interested in a few specific cases, as is demonstrated in some of the pass conversions we have performed. We have also implemented a "quick and dirty" vrp-like pass using the ranger to demonstrate that it can also handle much heavier duty range work and still perform well. The code is located on an svn branch *ssa-range*.?? It is based on trunk at revision *259405***circa mid April 2018. **The branch currently bootstraps with no regressions.?? The top level ranger class is called 'path_ranger' and is found in the file ssa-range.h.?? It has 4 primary API's: * bool path_range_edge (irange& r, tree name, edge e); * bool path_range_entry (irange& r, tree name, basic_block bb); * bool path_range_on_stmt (irange&r, tree name, gimple *g); * bool path_range_stmt (irange& r, gimple *g); This allows queries for a range on an edge, on entry to a block, as an operand on an specific statement, or to calculate the range of the result of a statement.?? There are no prerequisites to use it, simply create a path ranger and start using the API.???? There is even an available function which can be lightly called and doesn't require knowledge of the ranger: static inline bool on_demand_get_range_on_stmt (irange &r, tree ssa, gimple *stmt) { ???? path_ranger ranger; ???? return ranger.path_range_on_stmt (r, ssa, stmt); } The Ranger consists of 3 primary components: * range.[ch] - A new range representation purely based on wide-int , and allows ranges to consist of multiple non-overlapping sub-ranges. * range-op.[ch] - Implements centralized tree-code operations on the irange class (allowing adding, masking, multiplying, etc). * ssa-range*.[ch]?? - Files containing a set of classes which implement the Ranger. We have set up a project page on the wiki which contains documentation for the approach as well as some converted pass info and a to-do list here: https://gcc.gnu.org/wiki/AndrewMacLeod/Ranger We would like to include the ranger in GCC for this release, and realize there are still numerous things to be done before its ready for integration. It has been in prototype mode until now,?? so we have not prepared the code for a merge yet.?? No real performance analysis has been done on it either, but there is an integration page where you will find information about the 4 passes that have been converted to use the Ranger and the performance of those: https://gcc.gnu.org/wiki/AndrewMacLeod/RangerIntegration One of the primary tasks over the next month or two is to improve the sharing of operation code between the VRPs and the Ranger. We haven't done a very good job of that so far.???? This is included along with a list ofknown issues we need to look at on the to-do page: https://gcc.gnu.org/wiki/AndrewMacLeod/RangerToDo . The Ranger is far enough along now that we have confidence in both its approach and ability to perform, and would like to solicit feedback on what you think of it,?? any questions, possible uses,?? as well as potential requirements to integrate with trunk later this stage. Please visit the project page and have a look.?? We've put as much documentation, comments, and to-dos there as we could think of.?? We will try to keep it up-to-date. Andrew, Aldy and Jeff From ebotcazou@adacore.com Wed May 30 07:49:00 2018 From: ebotcazou@adacore.com (Eric Botcazou) Date: Wed, 30 May 2018 07:49:00 -0000 Subject: Project Ranger In-Reply-To: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> References: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> Message-ID: <1946549.HoKu4gN0qI@polaris> > The Ranger is far enough along now that we have confidence in both its > approach and ability to perform, and would like to solicit feedback on > what you think of it, any questions, possible uses, as well as > potential requirements to integrate with trunk later this stage. The PDF document mentions that you first intended to support symbolic ranges but eventually dropped them as "too complex, and ultimately not necessary". I don't entirely disagree with the former part, but I'm curious about the latter part: how do you intent to deal in the long term with cases that do require symbolic information to optimize things? The TODO page seems to acknowledge the loophole but only mentions a plan to deal with equivalences, which is not sufficient in the general case (as acknowledged too on the page). -- Eric Botcazou From richard.guenther@gmail.com Wed May 30 09:26:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Wed, 30 May 2018 09:26:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <5190617.IRreCldYRv@twilight> References: <2659301.XPQk3P0qmd@twilight> <5190617.IRreCldYRv@twilight> Message-ID: On Tue, May 29, 2018 at 5:24 PM Allan Sandfeld Jensen wrote: > On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote: > > > > so the situation improves but isn't fully fixed (STLF issues maybe?) > > > That raises the question if it helps in these cases even in -O3? That's a good question indeed. We might end up (partly) BB vectorizing loop bodies that we'd otherwise loop vectorize with SLP. Benchmarking with BB vectorization disabled at -O3+ isn't something I've done in the past. I'm now doing a 2-run with -march=haswell -Ofast [-fno-tree-slp-vectorize] for the FP benchmarks. Note that there were some cases where disabling vectorization wholly improved things. > Anyway it doesn't look good for it. Did the binary size at least improve with > prefer-avx128, or was that also worse or insignificant? Similar to the AVX258 results. I guess where AVX256 applied we now simply do two vector ops with AVX128. Richard. > 'Allan From amacleod@redhat.com Wed May 30 14:03:00 2018 From: amacleod@redhat.com (Andrew MacLeod) Date: Wed, 30 May 2018 14:03:00 -0000 Subject: Project Ranger In-Reply-To: <1946549.HoKu4gN0qI@polaris> References: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> <1946549.HoKu4gN0qI@polaris> Message-ID: <23520229-0872-f990-9273-f0c0c61635f4@redhat.com> On 05/30/2018 03:41 AM, Eric Botcazou wrote: >> The Ranger is far enough along now that we have confidence in both its >> approach and ability to perform, and would like to solicit feedback on >> what you think of it, any questions, possible uses, as well as >> potential requirements to integrate with trunk later this stage. > The PDF document mentions that you first intended to support symbolic ranges > but eventually dropped them as "too complex, and ultimately not necessary". > > I don't entirely disagree with the former part, but I'm curious about the > latter part: how do you intent to deal in the long term with cases that do > require symbolic information to optimize things? The TODO page seems to > acknowledge the loophole but only mentions a plan to deal with equivalences, > which is not sufficient in the general case (as acknowledged too on the page). > First, we'll collect the cases that demonstrate a unique situation we care about.?? I have 4 very specific case that show current shortcomings.. Not just with the Ranger, but a couple we don't handle with VRP today. .. I'll eventually get those put onto the wiki so the list can be updated. I think most of these cases that care about symbolics are not so much range related, but rather an algorithmic layer on top. Any follow on optimization to either enhance or replace vrp or anything similar will simply use the ranger as a client.?? If it turns out there are cases where we *have* to remember the end point of a range as a symbolic, then the algorithm to track that symbolic along with the range, and request a re-evaluation of the range when the value of that symbolic is changes. Thats the high-level view.?? I'm not convinced the symbolic has to be in the range in order to solve problems for 2 reasons: ??1)?? The Ranger maintains some definition chains internally and has a clear idea of what ssa_names can affect the outcome of a range. It tracks all these dependencies on a per-bb basis in the gori-map structure as imports and exports. .?? The iterative approach I mentioned in the document would use this info to decide that ranges in a block need to be re-evaluated because an input to this block has changed.?? This is similar to the way VRP and other passes iterate until nothing changes.?? If we get a better range for an ssa_name than we had before, push it on the stack to look for potential re-evaluation, and keep going. ThIs is what I referred to as the Level 4 ranger in the document. I kind of glossed over it because I didn't want to get into the full-blown design I originally had, nor to rework it based on the current incarnation of the Ranger.?? I wanted to primarily focus on what we currently have that is working so we can move forward with it. I don't think we need to track the symbolic name in the range because the Ranger tracks these dependencies for each ssa-name and can indicate when we may need to reevaluate them.?? There is an exported routine from the block ranger : ?????????? tree single_import (tree name); If there is a sole import, it will return the ssa-name that NAME is dependent on that can affect the range of?? NAME.???? We added that API so Aldy's new threading code could utilizes this ability to a small degree.. Bottom line: The ranger has information that a pass can use to decide that a range could benefit from being reevaluated. This identifies any symbolic component of a range from that block. ??2) This entire approach is modeled on walking the IL to evaluate a range.?? If we put symbolics and expressions in the range, we are really duplicating information that is already in the IL, and have to make a choice of exactly what and how we do it.. BB3: ???? j_1 = q_6 / 2 ???? i_2 = j_1 + 3 ???? if ( i_2 < k_4) we could store the range of k_4 as [i_2 + 1, MAX]?? (which seems the obvious one) we could also store it as [j_1 + 4, MAX] or even [q_6 / 2 + 4, MAX].?? But we have to decide in advance, and we have extra work to do if it turns out to be one of the other names we ended up wanting.. At some point later on we decide we either don't know anything about i_2 (or j_1, or q_6), or we have found a range for it, and now need to take that value and evaluate the expression stashed in the range in order to get the final result.???? Note that whatever algorithm is doing this must also keep track of this range somehow in order to use it later. With the Ranger model, the same algorithm gets a range, and if it thinks it might need to be re-evaluated for whatever reason can just track the an extra bit of info (like i_2 for instance) along side the range (rather than in it).?? If we thinks the range needs to be re-evaluated , it can simply request a new range from the ranger. You also don't have to decide whether to track the range with i_2 or j_1 (or even q_6). The Ranger can tell you that the range it gives you for k_4 is accurate unless you get a new value for q_6. That is really what you want to track.?? You might later want to reevaluate the range only if q_6 changes.?? If it doesn't, you are done.?? . Bottom line:The ranger indicates what the symbolic aspect of the range is with the import. The net effect is the symbolic expression using that import is also the longest possible expression in the block available...?? it just picks it up from the IL rather than storing it in the range. I would also note that we track multiple imports, they just aren't really exposed as yet since they aren't really being used by anyone.?? k_4 is also tagged as an import to that block, and if you ask for the range of i_2, you'd get a range, and?? k_4 would be listed as the import. Also note more complexity is available.?? Once we hit statements with multiple ssa_names, we stop tracking currently, but we do note the imports at that point: BB4: ?? z_4 = a_3 + c_2 ?? z_5 = z_4 + 3 ?? if (?? q_8 < z_5) we can get a range for q_8, and the ranger does know that a_3 and c_2 are both imports to defining z_5.?? By using the import information, we effectively get a "symbolic" range of [MIN, a_3 + c_2 + 3] for q_8 in this case. Which means I think the import approach of the Ranger has the benefit of being simpler in many ways,?? yet more powerful should we wish to explore that route. The one place this falls down is if you get a range back from a call and you have no idea where it came from, but want to be able to re-evaluate it later. .?? I am not sure what this use case looks like (if it exists :-), but I would be surprised if it wasn't something that could be handled with an algorithm changed.?? I know if discussions with Aldy and Jeff as we went through various use cases, this model does sometimes require a bit of rethinking of how you approach using the information since a lot of things we're use to worrying about just happen under the covers. Does that help????? If it does, I'll add this to the coverage in the wiki page. Andrew From dmalcolm@redhat.com Wed May 30 14:39:00 2018 From: dmalcolm@redhat.com (David Malcolm) Date: Wed, 30 May 2018 14:39:00 -0000 Subject: Project Ranger In-Reply-To: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> References: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> Message-ID: <1527691152.606.27.camel@redhat.com> On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote: [...snip...] > The code is located on an svn branch *ssa-range*. It is based on > trunk > at revision *259405***circa mid April 2018. Is this svn branch mirrored on gcc's git mirror? I tried to clone it from there, but failed. [...snip...] Thanks Dave From amacleod@redhat.com Wed May 30 14:45:00 2018 From: amacleod@redhat.com (Andrew MacLeod) Date: Wed, 30 May 2018 14:45:00 -0000 Subject: Project Ranger In-Reply-To: <1527691152.606.27.camel@redhat.com> References: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> <1527691152.606.27.camel@redhat.com> Message-ID: On 05/30/2018 10:39 AM, David Malcolm wrote: > On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote: > > [...snip...] > >> The code is located on an svn branch *ssa-range*. It is based on >> trunk >> at revision *259405***circa mid April 2018. > Is this svn branch mirrored on gcc's git mirror? I tried to clone it > from there, but failed. > > [...snip...] > > Thanks > Dave I don't know :-)?????? I know that ???? svn diff -r 259405?????? worked and appeared to give me a diff of everything.???? Aldy uses git, maybe he can tell you. that was a merge from trunk to start with to an existing branch I had cut a month earlier.?? . the ACTUAL original branch cut was revision *258524 if that makes any difference * Andrew From schwab@suse.de Wed May 30 14:51:00 2018 From: schwab@suse.de (Andreas Schwab) Date: Wed, 30 May 2018 14:51:00 -0000 Subject: Project Ranger In-Reply-To: <1527691152.606.27.camel@redhat.com> (David Malcolm's message of "Wed, 30 May 2018 10:39:12 -0400") References: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> <1527691152.606.27.camel@redhat.com> Message-ID: On Mai 30 2018, David Malcolm wrote: > On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote: > > [...snip...] > >> The code is located on an svn branch *ssa-range*. It is based on >> trunk >> at revision *259405***circa mid April 2018. > > Is this svn branch mirrored on gcc's git mirror? I tried to clone it > from there, but failed. It's in refs/remotes/ssa-range, which isn't fetched by default. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." From aldyh@redhat.com Wed May 30 14:53:00 2018 From: aldyh@redhat.com (Aldy Hernandez) Date: Wed, 30 May 2018 14:53:00 -0000 Subject: Project Ranger In-Reply-To: References: <5607b582-639b-7517-e052-014fabfe0ad4@redhat.com> <1527691152.606.27.camel@redhat.com> Message-ID: <5dc17379-19e3-fc88-8c30-41213ac32c42@redhat.com> On 05/30/2018 10:51 AM, Andreas Schwab wrote: > On Mai 30 2018, David Malcolm wrote: > >> On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote: >> >> [...snip...] >> >>> The code is located on an svn branch *ssa-range*. It is based on >>> trunk >>> at revision *259405***circa mid April 2018. >> >> Is this svn branch mirrored on gcc's git mirror? I tried to clone it >> from there, but failed. > > It's in refs/remotes/ssa-range, which isn't fetched by default. Right. From my tree: $ git branch -a |grep svn.ssa.range remotes/svn/ssa-range From richard.guenther@gmail.com Wed May 30 16:40:00 2018 From: richard.guenther@gmail.com (Richard Biener) Date: Wed, 30 May 2018 16:40:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: References: <2659301.XPQk3P0qmd@twilight> <5190617.IRreCldYRv@twilight> Message-ID: On Wed, May 30, 2018 at 11:25 AM Richard Biener wrote: > On Tue, May 29, 2018 at 5:24 PM Allan Sandfeld Jensen > wrote: > > On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote: > > > > > > so the situation improves but isn't fully fixed (STLF issues maybe?) > > > > > That raises the question if it helps in these cases even in -O3? > That's a good question indeed. We might end up (partly) BB vectorizing > loop bodies that we'd otherwise loop vectorize with SLP. Benchmarking > with BB vectorization disabled at -O3+ isn't something I've done in the > past. I'm now doing a 2-run with -march=haswell -Ofast > [-fno-tree-slp-vectorize] > for the FP benchmarks. Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 178 76.4 * 13590 180 75.5 S 410.bwaves 13590 180 75.6 S 13590 179 76.0 * 416.gamess 19580 604 32.4 S 19580 576 34.0 S 416.gamess 19580 604 32.4 * 19580 575 34.0 * 433.milc 9180 339 27.1 * 9180 345 26.6 S 433.milc 9180 343 26.7 S 9180 343 26.8 * 434.zeusmp 9100 234 38.9 * 9100 234 38.9 S 434.zeusmp 9100 234 38.8 S 9100 234 38.9 * 435.gromacs 7140 251 28.5 * 7140 251 28.4 * 435.gromacs 7140 252 28.3 S 7140 252 28.3 S 436.cactusADM 11950 278 43.0 S 11950 222 53.8 S 436.cactusADM 11950 223 53.7 * 11950 221 54.1 * 437.leslie3d 9400 214 43.9 * 9400 215 43.6 * 437.leslie3d 9400 217 43.3 S 9400 222 42.4 S 444.namd 8020 302 26.5 S 8020 303 26.5 S 444.namd 8020 302 26.6 * 8020 303 26.5 * 447.dealII 11440 259 44.2 * 11440 246 46.6 * 447.dealII 11440 259 44.1 S 11440 246 46.6 S 450.soplex 8340 219 38.0 * 8340 219 38.0 * 450.soplex 8340 221 37.7 S 8340 221 37.7 S 453.povray 5320 108 49.2 * 5320 109 48.7 S 453.povray 5320 108 49.1 S 5320 109 48.8 * 454.calculix 8250 270 30.6 * 8250 269 30.6 * 454.calculix 8250 271 30.5 S 8250 270 30.5 S 459.GemsFDTD 10610 308 34.5 S 10610 306 34.7 S 459.GemsFDTD 10610 306 34.7 * 10610 306 34.7 * 465.tonto 9840 428 23.0 S 9840 423 23.3 * 465.tonto 9840 426 23.1 * 9840 423 23.2 S 470.lbm 13740 253 54.4 S 13740 252 54.5 * 470.lbm 13740 252 54.5 * 13740 252 54.5 S 481.wrf 11170 265 42.1 * 11170 265 42.2 S 481.wrf 11170 266 42.1 S 11170 264 42.3 * 482.sphinx3 19490 401 48.6 * 19490 402 48.5 S 482.sphinx3 19490 405 48.1 S 19490 399 48.9 * so we can indeed see similar detrimental effects on 416.gamess; 447.dealII seems to improve with BB vectorization. That means the 416.gamess slowdown is definitely worth investigating since it reproduces with both AVX128 and AVX256 and with loop vectorization. I'll open a bug for it. > Note that there were some cases where disabling vectorization wholly > improved things. > > Anyway it doesn't look good for it. Did the binary size at least improve > with > > prefer-avx128, or was that also worse or insignificant? > Similar to the AVX258 results. I guess where AVX256 applied we now simply > do two vector ops with AVX128. > Richard. > > 'Allan From msebor@gmail.com Wed May 30 17:47:00 2018 From: msebor@gmail.com (Martin Sebor) Date: Wed, 30 May 2018 17:47:00 -0000 Subject: bootstrap failure due to declaration mismatch in r260956 Message-ID: <1c6d4fa4-d1c9-02de-7d28-f380cc225ca7@gmail.com> Honza, I think your r260956 is missing the following hunk: Index: include/simple-object.h =================================================================== --- include/simple-object.h (revision 260969) +++ include/simple-object.h (working copy) @@ -203,7 +203,7 @@ simple_object_release_write (simple_object_write * extern const char * simple_object_copy_lto_debug_sections (simple_object_read *src_object, const char *dest, - int *err); + int *err, int rename); #ifdef __cplusplus } Martin From gerald@pfeifer.com Wed May 30 18:27:00 2018 From: gerald@pfeifer.com (Gerald Pfeifer) Date: Wed, 30 May 2018 18:27:00 -0000 Subject: bootstrap failure due to declaration mismatch in r260956 In-Reply-To: <1c6d4fa4-d1c9-02de-7d28-f380cc225ca7@gmail.com> References: <1c6d4fa4-d1c9-02de-7d28-f380cc225ca7@gmail.com> Message-ID: On Wed, 30 May 2018, Martin Sebor wrote: > I think your r260956 is missing the following hunk: If this fixes the bootstrap for you (also ran into this myself just now), can you please go ahead and commit? We can always sort out things later, if there are details to be tweaked, but fixing a bootstrap failure with a simple one-liner like this, let's not get process-heavy and just do it. ;-) Gerald From msebor@gmail.com Wed May 30 18:46:00 2018 From: msebor@gmail.com (Martin Sebor) Date: Wed, 30 May 2018 18:46:00 -0000 Subject: bootstrap failure due to declaration mismatch in r260956 In-Reply-To: References: <1c6d4fa4-d1c9-02de-7d28-f380cc225ca7@gmail.com> Message-ID: <8d96ea03-f558-889e-3183-eea25fbc91e8@gmail.com> On 05/30/2018 12:27 PM, Gerald Pfeifer wrote: > On Wed, 30 May 2018, Martin Sebor wrote: >> I think your r260956 is missing the following hunk: > > If this fixes the bootstrap for you (also ran into this myself > just now), can you please go ahead and commit? > > We can always sort out things later, if there are details to be > tweaked, but fixing a bootstrap failure with a simple one-liner > like this, let's not get process-heavy and just do it. ;-) Jakub already committed the missing change in r260970 so boostrap should be working again. Thanks Martin From hubicka@ucw.cz Wed May 30 22:40:00 2018 From: hubicka@ucw.cz (Jan Hubicka) Date: Wed, 30 May 2018 22:40:00 -0000 Subject: bootstrap failure due to declaration mismatch in r260956 In-Reply-To: <8d96ea03-f558-889e-3183-eea25fbc91e8@gmail.com> References: <1c6d4fa4-d1c9-02de-7d28-f380cc225ca7@gmail.com> <8d96ea03-f558-889e-3183-eea25fbc91e8@gmail.com> Message-ID: <20180530223913.GF55777@kam.mff.cuni.cz> > On 05/30/2018 12:27 PM, Gerald Pfeifer wrote: > >On Wed, 30 May 2018, Martin Sebor wrote: > >>I think your r260956 is missing the following hunk: > > > >If this fixes the bootstrap for you (also ran into this myself > >just now), can you please go ahead and commit? > > > >We can always sort out things later, if there are details to be > >tweaked, but fixing a bootstrap failure with a simple one-liner > >like this, let's not get process-heavy and just do it. ;-) > > Jakub already committed the missing change in r260970 so boostrap > should be working again. I apologize for that. I left svn commit waiting for commit log entry and did not notice that :( Thanks for fixing it! Honza > > Thanks > Martin > From gccadmin@gcc.gnu.org Wed May 30 22:52:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Wed, 30 May 2018 22:52:00 -0000 Subject: gcc-6-20180530 is now available Message-ID: <20180530225151.52508.qmail@sourceware.org> Snapshot gcc-6-20180530 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/6-20180530/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch revision 260976 You'll find: gcc-6-20180530.tar.xz Complete GCC SHA256=34b2ec9f1c047cde51d35c7aea31952f0f40006dd64df78943edfffc6294d9ff SHA1=9ab3ed7e4e237611dcfabf42d4e46bbb9fb5a7a4 Diffs from 6-20180523 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way. From palmer@sifive.com Thu May 31 09:09:00 2018 From: palmer@sifive.com (Palmer Dabbelt) Date: Thu, 31 May 2018 09:09:00 -0000 Subject: RISC-V ELF multilibs In-Reply-To: Message-ID: On Tue, 29 May 2018 11:02:58 PDT (-0700), Jim Wilson wrote: > On 05/26/2018 06:04 AM, Sebastian Huber wrote: >> Why is the default multilib and a variant identical? > > This is supposed to be a single multilib, with two names. We use > MULTILIB_REUSE to map the two names to a single multilib. > > rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc > ./rv64imafdc/lp64d/libgcc.a > rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc > ./rv64imafdc/lp64d/libgcc.a > rohan:1032$ ./xgcc -B./ --print-libgcc > ./libgcc.a > rohan:1033$ > > So this is working right when the -march option is given, but not when > no -march is given. I'd suggest a bug report so I can track this, if > you haven't already filed one. IIRC this is actually a limit of the GCC build system: there needs to be some default multilib, and it has to be unprefixed. I wanted to keep the library paths orthogonal (ie, not bake in a default that rv64gc/lp64d lives at /lib), so I chose to just build a redundant multilib. It'd be great to get rid of this, but I'm afraid it's way past my level of understanding as to how all this works. >> Most variants include the C extension. Would it be possible to add -march=rv32g and -march=rv64g variants? > > The expectation is that most implementations will include the C > extension. It reduces code size, improves performance, and I think I > read somewhere that it takes only 400 gates to implement. > > It isn't practical to try to support every possible combination of > architecture and ABI here, as there are too many possible combinations. > But if there is a major RISC-V target that is rv32g or rv64g then we > should consider it. You can of course define your own set of multilibs. While that's the standard answer, note that Sebastian added the RISC-V RTEMS target in the first place so as far as I'm concerned he can add additional multilibs to it if he wants. While I'm not opposed to RTEMS multilibs for rv32g/ilp32d and rv64g/lp64d, note that we have made the decision in Linux land that the C extension will be common and thus I expect most RISC-V implementations with virtual memory to also have the C extension. If you go down this route then you should move RTEMS to its own multilib target fragment (t-rtems-multilib or something similar). If you need help figuring out the patch feel free to ask. I wrote a blog that might be useful https://www.sifive.com/blog/2017/09/18/all-aboard-part-5-risc-v-multilib/ From Alan.Hayward@arm.com Thu May 31 10:39:00 2018 From: Alan.Hayward@arm.com (Alan Hayward) Date: Thu, 31 May 2018 10:39:00 -0000 Subject: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP In-Reply-To: <871sdubwv6.fsf@linaro.org> References: <1518212868.14236.47.camel@cavium.com> <32617133-64DC-4F62-B7A0-A6B417C5B14E@arm.com> <1526487700.29509.6.camel@cavium.com> <1526491802.29509.19.camel@cavium.com> <87a7sznw5c.fsf@linaro.org> <1527184223.22014.13.camel@cavium.com> <87a7smbuej.fsf@linaro.org> <871sdubwv6.fsf@linaro.org> Message-ID: <8F853649-0510-4397-B255-C64FA42C671A@arm.com> (Missed this thread initially due to incorrect email address) > On 29 May 2018, at 11:05, Richard Sandiford wrote: > > Jeff Law writes: >> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. >> When we left things I think we were trying to decide between >> CLOBBER_HIGH and clobbering the appropriate subreg. The problem with >> the latter is the dataflow we compute is inaccurate (overly pessimistic) >> so that'd have to be fixed. Yes, I want to get back to looking at this again, however I?ve been busy elsewhere. > > The clobbered part of the register in this case is a high-part subreg, > which is ill-formed for single registers. It would also be difficult > to represent in terms of the mode, since there are no defined modes for > what can be stored in the high part of an SVE register. For 128-bit > SVE that mode would have zero bits. :-) > > I thought the alternative suggestion was instead to have: > > (set (reg:M X) (reg:M X)) > > when X is preserved in mode M but not in wider modes. But that seems > like too much of a special case to me, both in terms of the source and > the destination: Agreed. When I looked at doing it that way back in Jan, my conclusion was that if we did it that way we end up with more or less the same code but instead of: if (GET_CODE (setter) == CLOBBER_HIGH && reg_is_clobbered_by_clobber_high(REGNO(dest), GET_MODE (rsp->last_set_value)) Now becomes something like: if (GET_CODE (setter) == SET && REG_P (dest) && HARD_REGISTER_P (dest) && REG_P (src) && REGNO(dst) == REGNO(src) && reg_is_clobbered_by_self_set(REGNO(dest), GET_MODE (rsp->last_set_value)) Ok, some of that code can go into a macro, but it feel much clearer to explicitly check for CLOBBER_HIGH rather then applying some special semantics to a specific SET case. Alan. From linux@carewolf.com Thu May 31 11:12:00 2018 From: linux@carewolf.com (Allan Sandfeld Jensen) Date: Thu, 31 May 2018 11:12:00 -0000 Subject: Enabling -ftree-slp-vectorize on -O2/Os In-Reply-To: <2659301.XPQk3P0qmd@twilight> References: <2659301.XPQk3P0qmd@twilight> Message-ID: <3970227.yigO0oRqvu@twilight> Okay, I think I can withdraw the suggestion. It is apparently not providing a stable end performance. I would like to end with sharing the measurements I made that motivated me to suggest the change. Hopefully it can be useful if tree-slp-vectorize gets improved and the suggestion comes up again. As I said previously, the benchmarks I ran were not affected, probably because most thing we benchmark in Qt often is hand-optimized already, but the binary size with tree-slp-vectorize was on average a one or two percent smaller, though was not universal, and many smaller libraries were unaffected. ---------------------------- gcc-8 version 8.1.0 (Debian 8.1.0-4) gcc-7 version 7.3.0 (Debian 7.3.0-19) Qt version 5.11.0 (edited to override selective use of -O3) size library g++-7 -march=corei7 -O2 8015632 libQt5Widgets.so.5.11.0 6194288 libQt5Gui.so.5.11.0 760016 libQt5DBus.so.5.11.0 5603160 libQt5Core.so.5.11.0 g++-7 -march=corei7 -O2 -ftree-slp-vectorize 8007440 libQt5Widgets.so.5.11.0 6182000 libQt5Gui.so.5.11.0 760016 libQt5DBus.so.5.11.0 5603224 libQt5Core.so.5.11.0 g++-8 -O2 8062520 libQt5Widgets.so.5.11.0 6232160 libQt5Gui.so.5.11.0 765584 libQt5DBus.so.5.11.0 5848528 libQt5Core.so.5.11.0 g++-8 -O2 -ftree-slp-vectorize 8058424 libQt5Widgets.so.5.11.0 6219872 libQt5Gui.so.5.11.0 769680 libQt5DBus.so.5.11.0 5844560 libQt5Core.so.5.11.0 g++-8 -march=corei7 -O2 8062520 libQt5Widgets.so.5.11.0 6215584 libQt5Gui.so.5.11.0 765584 libQt5DBus.so.5.11.0 5844440 libQt5Core.so.5.11.0 g++-8 -march=corei7 -O2 -ftree-slp-vectorize 8046136 libQt5Widgets.so.5.11.0 6191008 libQt5Gui.so.5.11.0 765584 libQt5DBus.so.5.11.0 5840472 libQt5Core.so.5.11.0 g++-8 -march=haswell -O2 8046136 libQt5Widgets.so.5.11.0 6170408 libQt5Gui.so.5.11.0 765584 libQt5DBus.so.5.11.0 5852448 libQt5Core.so.5.11.0 g++-8 -march=haswell -O2 -ftree-slp-vectorize 8046136 libQt5Widgets.so.5.11.0 6158120 libQt5Gui.so.5.11.0 765584 libQt5DBus.so.5.11.0 5848480 libQt5Core.so.5.11.0 g++-8 -march=haswell -Os 6990368 libQt5Widgets.so.5.11.0 5030616 libQt5Gui.so.5.11.0 624160 libQt5DBus.so.5.11.0 4847056 libQt5Core.so.5.11.0 g++-8 -march=haswell -Os -ftree-slp-vectorize 6986272 libQt5Widgets.so.5.11.0 5018328 libQt5Gui.so.5.11.0 624160 libQt5DBus.so.5.11.0 4847120 libQt5Core.so.5.11.0 g++-8 -march=haswell -Os -flto 6785760 libQt5Widgets.so.5.11.0 4844464 libQt5Gui.so.5.11.0 593488 libQt5DBus.so.5.11.0 4688432 libQt5Core.so.5.11.0 g++-8 -march=haswell -Os -flto -ftree-slp-vectorize 6777568 libQt5Widgets.so.5.11.0 4836272 libQt5Gui.so.5.11.0 593488 libQt5DBus.so.5.11.0 4688472 libQt5Core.so.5.11.0 From Matthew.Fortune@mips.com Thu May 31 14:23:00 2018 From: Matthew.Fortune@mips.com (Matthew Fortune) Date: Thu, 31 May 2018 14:23:00 -0000 Subject: RISC-V ELF multilibs In-Reply-To: References: Message-ID: Palmer Dabbelt writes: > On Tue, 29 May 2018 11:02:58 PDT (-0700), Jim Wilson wrote: > > On 05/26/2018 06:04 AM, Sebastian Huber wrote: > >> Why is the default multilib and a variant identical? > > > > This is supposed to be a single multilib, with two names. We use > > MULTILIB_REUSE to map the two names to a single multilib. > > > > rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc > > ./rv64imafdc/lp64d/libgcc.a > > rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc > > ./rv64imafdc/lp64d/libgcc.a > > rohan:1032$ ./xgcc -B./ --print-libgcc > > ./libgcc.a > > rohan:1033$ > > > > So this is working right when the -march option is given, but not > when > > no -march is given. I'd suggest a bug report so I can track this, if > > you haven't already filed one. > > IIRC this is actually a limit of the GCC build system: there needs to > be some > default multilib, and it has to be unprefixed. I wanted to keep the > library > paths orthogonal (ie, not bake in a default that rv64gc/lp64d lives at > /lib), > so I chose to just build a redundant multilib. > > It'd be great to get rid of this, but I'm afraid it's way past my level > of > understanding as to how all this works. I do actually have a solution for this but it is not submitted upstream. MIPS has basically the same set of problems that RISC-V does in this area and in an ideal world there would be no 'fallback' multilib such that if you use compiler options that map to a library variant that does not exist then the linker just fails to find any libraries at all rather than using the default multilib. I can share the raw patch for this and try to give you some idea about how it works. I am struggling to find time to do much open source support at the moment so may not be able to do all the due diligence to get it committed. Would you be willing to take a look and do some of the work to get it in tree? Matthew From joseph@codesourcery.com Thu May 31 16:26:00 2018 From: joseph@codesourcery.com (Joseph Myers) Date: Thu, 31 May 2018 16:26:00 -0000 Subject: not computable at load time In-Reply-To: References: <1A72BAC7-9DFB-4F98-9191-DDE896021A41@comcast.net> <5D1DB310-D460-4A04-A0ED-8C9941D8A9F9@comcast.net> Message-ID: On Tue, 29 May 2018, Richard Biener wrote: > The testcase dates back to some repository creation rev. (egcs?) and > I'm not sure we may compute the difference of addresses of structure > members. So that GCC accepts this is probably not required. Joseph > may have a definitive answer here. My model of constant expressions for GNU C says this sort of subtraction (of two address constants based on the same object or function address) is a symbolic difference constant expression, which should be accepted as constant in initializers as an extension; it should be folded like other offsetof-like constructs. That should not depend on whether the result gets converted to a type of different width. However, the result of converting an address constant pointer to an integer type is only expected to be a constant expression if the resulting type *is* the same width as pointers. -- Joseph S. Myers joseph@codesourcery.com From joseph@codesourcery.com Thu May 31 16:35:00 2018 From: joseph@codesourcery.com (Joseph Myers) Date: Thu, 31 May 2018 16:35:00 -0000 Subject: Adding a libgcc file In-Reply-To: <3C557486-765A-4A37-89CD-34F01984B2FA@comcast.net> References: <3C557486-765A-4A37-89CD-34F01984B2FA@comcast.net> Message-ID: On Tue, 29 May 2018, Paul Koning wrote: > Question about proper target maintainer procedures... > > The pdp11 target needs udivhi3 in libgcc. There's udivsi3, and it's > really easy to tweak those files for HImode. And that works. > > Should I add the HI files to the libgcc directory, or under > config/pdp11? There's nothing target specific about them, though I > don't know of other targets that might want this. The existing mechanism for building libgcc functions for different types is LIBGCC2_UNITS_PER_WORD. That may be defined in target .h files (currently those in gcc/), and also via siditi-object.mk which is used to build certain conversion functions for multiple types. As I understand it, you want to build certain non-conversion functions for multiple type as well. There are a few libgcc/config files that do define LIBGCC2_UNITS_PER_WORD to 2 before defining some L_* macros and including libgcc2.c, in order to define HImode functions (libgcc2.h then deals with getting the functions appropriately named via the definitions of __NW and __NDW, and getting them to use the right types via definitions of Wtype etc.). So you could just add such a file to config/pdp11, or you could try to develop a more general mechanism for targets to select HImode functions they want built in libgcc and for the common build machinery then to build those functions (and then refactor how existing targets build such functions accordingly). -- Joseph S. Myers joseph@codesourcery.com From paulkoning@comcast.net Thu May 31 18:33:00 2018 From: paulkoning@comcast.net (Paul Koning) Date: Thu, 31 May 2018 18:33:00 -0000 Subject: Adding a libgcc file In-Reply-To: References: <3C557486-765A-4A37-89CD-34F01984B2FA@comcast.net> Message-ID: <51DBDFB3-76A1-4C0F-BA05-10035567AE1D@comcast.net> > On May 31, 2018, at 12:35 PM, Joseph Myers wrote: > > On Tue, 29 May 2018, Paul Koning wrote: > >> Question about proper target maintainer procedures... >> >> The pdp11 target needs udivhi3 in libgcc. There's udivsi3, and it's >> really easy to tweak those files for HImode. And that works. >> >> Should I add the HI files to the libgcc directory, or under >> config/pdp11? There's nothing target specific about them, though I >> don't know of other targets that might want this. > > The existing mechanism for building libgcc functions for different types > is LIBGCC2_UNITS_PER_WORD. That may be defined in target .h files > (currently those in gcc/), and also via siditi-object.mk which is used to > build certain conversion functions for multiple types. > > As I understand it, you want to build certain non-conversion functions for > multiple type as well. There are a few libgcc/config files that do define > LIBGCC2_UNITS_PER_WORD to 2 before defining some L_* macros and including > libgcc2.c, in order to define HImode functions (libgcc2.h then deals with > getting the functions appropriately named via the definitions of __NW and > __NDW, and getting them to use the right types via definitions of Wtype > etc.). > > So you could just add such a file to config/pdp11, or you could try to > develop a more general mechanism for targets to select HImode functions > they want built in libgcc and for the common build machinery then to build > those functions (and then refactor how existing targets build such > functions accordingly). I see udivdi in libgcc2.c, but udivsi is provided in separate file udivmod.c. That was introduced in 2001 by Rainer Orth. This code is used in three targets that I can see: cr16, iq2000, and pdp11. So it sounds like the cleaner answer is to generalize the libgcc2 code to provide the different length [u]div functions needed: DI for whoever uses it now, SI for those three, and HI for pdp11. I can give that a try, it's a more complex change but generality seems good. paul From jimw@sifive.com Thu May 31 19:28:00 2018 From: jimw@sifive.com (Jim Wilson) Date: Thu, 31 May 2018 19:28:00 -0000 Subject: RISC-V ELF multilibs In-Reply-To: References: Message-ID: On Thu, May 31, 2018 at 7:23 AM, Matthew Fortune wrote: > I do actually have a solution for this but it is not submitted upstream. > MIPS has basically the same set of problems that RISC-V does in this area > and in an ideal world there would be no 'fallback' multilib such that if > you use compiler options that map to a library variant that does not > exist then the linker just fails to find any libraries at all rather than > using the default multilib. > > I can share the raw patch for this and try to give you some idea about how > it works. I am struggling to find time to do much open source support at > the moment so may not be able to do all the due diligence to get it > committed. Would you be willing to take a look and do some of the work to > get it in tree? I have a long list of things on my to do list. RISC-V is a new target, and there is lots of stuff that needs to be bug fixed, finished, or added. I can't make any guarantees. But if you file a bug report and then attach a patch to it, someone might volunteer to help finish it. Or if it is too big to be reasonably attached to a bug report (like the nano mips work) you could put it on a branch, and mention the branch name as unfinished work in a bug report. Jim From gccadmin@gcc.gnu.org Thu May 31 22:41:00 2018 From: gccadmin@gcc.gnu.org (gccadmin@gcc.gnu.org) Date: Thu, 31 May 2018 22:41:00 -0000 Subject: gcc-7-20180531 is now available Message-ID: <20180531224113.110540.qmail@sourceware.org> Snapshot gcc-7-20180531 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20180531/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 261044 You'll find: gcc-7-20180531.tar.xz Complete GCC SHA256=d2115e1a8be5c02f8426dff6035a1e4d74257f41597a04c6ab5a322c9dee9862 SHA1=e1c6c5cc8e4cae17182e25da778c4efa4f59d0ea Diffs from 7-20180524 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.