This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PING] lfloor/lceil and rint SSE expansion for x86_64/i?86


> 
> On Sun, 29 Oct 2006, Richard Guenther wrote:
> > [PATCH][4.3] Expand lfloor/lceil inline for x86_64/i?86 SSE math
> > http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00912.html
> 
> This is OK for mainline.
> 
> > [PATCH][4.3] Expand rint inline for x86_64/i?86 SSE math
> > http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00987.html
> 
> As is this.

Hi,
this patch seems to noticeably increase memory consumption.  Perhaps
setup cost of optabs?

Honza

comparing combine.c compilation at -O0 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 9150k to 9282k, overall 1.44%
  Peak amount of GGC memory still allocated after garbage collectin increased from 8689k to 8821k, overall 1.52%
  Amount of memory still referenced at the end of compilation increased from 6242k to 6441k, overall 3.20%
    Overall memory needed: 28191k -> 28319k
    Peak memory use before GGC: 9150k -> 9282k
    Peak memory use after GGC: 8689k -> 8821k
    Maximum of released memory in single GGC run: 2665k -> 2666k
    Garbage: 36843k -> 36829k
    Leak: 6242k -> 6441k
    Overhead: 4803k -> 4856k
    GGC runs: 288 -> 280

comparing combine.c compilation at -O1 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 17138k to 17269k, overall 0.76%
  Peak amount of GGC memory still allocated after garbage collectin increased from 16963k to 17094k, overall 0.77%
  Amount of produced GGC garbage increased from 57220k to 57302k, overall 0.14%
  Amount of memory still referenced at the end of compilation increased from 6298k to 6482k, overall 2.92%
    Overall memory needed: 39599k -> 39731k
    Peak memory use before GGC: 17138k -> 17269k
    Peak memory use after GGC: 16963k -> 17094k
    Maximum of released memory in single GGC run: 2311k -> 2263k
    Garbage: 57220k -> 57302k
    Leak: 6298k -> 6482k
    Overhead: 6104k -> 6165k
    GGC runs: 363 -> 354

comparing combine.c compilation at -O2 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 17134k to 17266k, overall 0.77%
  Peak amount of GGC memory still allocated after garbage collectin increased from 16963k to 17094k, overall 0.77%
  Amount of memory still referenced at the end of compilation increased from 6407k to 6582k, overall 2.74%
    Overall memory needed: 29646k -> 29778k
    Peak memory use before GGC: 17134k -> 17266k
    Peak memory use after GGC: 16963k -> 17094k
    Maximum of released memory in single GGC run: 2873k -> 2855k
    Garbage: 77596k -> 77544k
    Leak: 6407k -> 6582k
    Overhead: 8891k -> 8943k
    GGC runs: 432 -> 421

comparing combine.c compilation at -O3 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 18138k to 18270k, overall 0.73%
  Peak amount of GGC memory still allocated after garbage collectin increased from 17694k to 17826k, overall 0.75%
  Amount of memory still referenced at the end of compilation increased from 6472k to 6660k, overall 2.90%
    Overall memory needed: 28750k -> 28882k
    Peak memory use before GGC: 18138k -> 18270k
    Peak memory use after GGC: 17694k -> 17826k
    Maximum of released memory in single GGC run: 4032k -> 4036k
    Garbage: 108216k -> 108102k
    Leak: 6472k -> 6660k
    Overhead: 12528k -> 12567k
    GGC runs: 479 -> 470

comparing insn-attrtab.c compilation at -O0 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 69635k to 69766k, overall 0.19%
  Peak amount of GGC memory still allocated after garbage collectin increased from 44045k to 44176k, overall 0.30%
  Amount of memory still referenced at the end of compilation increased from 9302k to 9486k, overall 1.98%
    Overall memory needed: 88090k -> 88222k
    Peak memory use before GGC: 69635k -> 69766k
    Peak memory use after GGC: 44045k -> 44176k
    Maximum of released memory in single GGC run: 36964k
    Garbage: 129046k -> 129071k
    Leak: 9302k -> 9486k
    Overhead: 16936k -> 16989k
    GGC runs: 225 -> 217

comparing insn-attrtab.c compilation at -O1 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 89703k to 89834k, overall 0.15%
  Peak amount of GGC memory still allocated after garbage collectin increased from 83079k to 83210k, overall 0.16%
  Amount of memory still referenced at the end of compilation increased from 9144k to 9328k, overall 2.01%
    Overall memory needed: 114014k -> 114138k
    Peak memory use before GGC: 89703k -> 89834k
    Peak memory use after GGC: 83079k -> 83210k
    Maximum of released memory in single GGC run: 31805k -> 31806k
    Garbage: 276279k -> 276300k
    Leak: 9144k -> 9328k
    Overhead: 29429k -> 29482k
    GGC runs: 228 -> 223

comparing insn-attrtab.c compilation at -O2 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 91980k to 92111k, overall 0.14%
  Peak amount of GGC memory still allocated after garbage collectin increased from 84085k to 84216k, overall 0.16%
  Amount of memory still referenced at the end of compilation increased from 9145k to 9329k, overall 2.01%
    Overall memory needed: 111510k -> 111658k
    Peak memory use before GGC: 91980k -> 92111k
    Peak memory use after GGC: 84085k -> 84216k
    Maximum of released memory in single GGC run: 30368k
    Garbage: 319256k -> 319269k
    Leak: 9145k -> 9329k
    Overhead: 36755k -> 36808k
    GGC runs: 254 -> 249

comparing insn-attrtab.c compilation at -O3 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 92006k to 92137k, overall 0.14%
  Peak amount of GGC memory still allocated after garbage collectin increased from 84111k to 84242k, overall 0.16%
  Amount of memory still referenced at the end of compilation increased from 9148k to 9332k, overall 2.01%
    Overall memory needed: 111538k -> 111686k
    Peak memory use before GGC: 92006k -> 92137k
    Peak memory use after GGC: 84111k -> 84242k
    Maximum of released memory in single GGC run: 30559k
    Garbage: 319864k -> 319883k
    Leak: 9148k -> 9332k
    Overhead: 36945k -> 36998k
    GGC runs: 257 -> 252

comparing Gerald's testcase PR8361 compilation at -O0 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 92503k to 92635k, overall 0.14%
  Peak amount of GGC memory still allocated after garbage collectin increased from 91586k to 91717k, overall 0.14%
  Amount of memory still referenced at the end of compilation increased from 47478k to 47662k, overall 0.39%
    Overall memory needed: 119362k -> 119490k
    Peak memory use before GGC: 92503k -> 92635k
    Peak memory use after GGC: 91586k -> 91717k
    Maximum of released memory in single GGC run: 19296k -> 19299k
    Garbage: 205539k -> 205556k
    Leak: 47478k -> 47662k
    Overhead: 20759k -> 20811k
    GGC runs: 403 -> 402

comparing Gerald's testcase PR8361 compilation at -O1 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 97690k to 97821k, overall 0.13%
  Peak amount of GGC memory still allocated after garbage collectin increased from 95480k to 95611k, overall 0.14%
  Amount of memory still referenced at the end of compilation increased from 49811k to 49994k, overall 0.37%
    Overall memory needed: 119114k -> 119222k
    Peak memory use before GGC: 97690k -> 97821k
    Peak memory use after GGC: 95480k -> 95611k
    Maximum of released memory in single GGC run: 18569k
    Garbage: 440912k -> 440853k
    Leak: 49811k -> 49994k
    Overhead: 32086k -> 32124k
    GGC runs: 550

comparing Gerald's testcase PR8361 compilation at -O2 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 97690k to 97820k, overall 0.13%
  Peak amount of GGC memory still allocated after garbage collectin increased from 95481k to 95611k, overall 0.14%
  Amount of memory still referenced at the end of compilation increased from 50527k to 50711k, overall 0.36%
    Overall memory needed: 119102k -> 119178k
    Peak memory use before GGC: 97690k -> 97820k
    Peak memory use after GGC: 95481k -> 95611k
    Maximum of released memory in single GGC run: 18569k
    Garbage: 507731k -> 507740k
    Leak: 50527k -> 50711k
    Overhead: 40563k -> 40630k
    GGC runs: 614 -> 613

comparing Gerald's testcase PR8361 compilation at -O3 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 97735k to 97868k, overall 0.14%
  Peak amount of GGC memory still allocated after garbage collectin increased from 96767k to 96898k, overall 0.14%
  Amount of memory still referenced at the end of compilation increased from 50103k to 50287k, overall 0.37%
    Overall memory needed: 118774k -> 118882k
    Peak memory use before GGC: 97735k -> 97868k
    Peak memory use after GGC: 96767k -> 96898k
    Maximum of released memory in single GGC run: 18831k
    Garbage: 526554k -> 526587k
    Leak: 50103k -> 50287k
    Overhead: 41008k -> 41062k
    GGC runs: 624 -> 622

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 81755k to 81886k, overall 0.16%
  Peak amount of GGC memory still allocated after garbage collectin increased from 58635k to 58766k, overall 0.22%
  Amount of memory still referenced at the end of compilation increased from 7323k to 7507k, overall 2.51%
    Overall memory needed: 137806k -> 137934k
    Peak memory use before GGC: 81755k -> 81886k
    Peak memory use after GGC: 58635k -> 58766k
    Maximum of released memory in single GGC run: 45494k
    Garbage: 147237k -> 147250k
    Leak: 7323k -> 7507k
    Overhead: 25243k -> 25296k
    GGC runs: 85 -> 83

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
  Amount of memory still referenced at the end of compilation increased from 47389k to 47572k, overall 0.39%
    Overall memory needed: 425902k -> 426038k
    Peak memory use before GGC: 203328k -> 203459k
    Peak memory use after GGC: 199104k -> 199235k
    Maximum of released memory in single GGC run: 100817k
    Garbage: 268716k -> 268729k
    Leak: 47389k -> 47572k
    Overhead: 30176k -> 30229k
    GGC runs: 103 -> 101

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
  Amount of memory still referenced at the end of compilation increased from 47972k to 48156k, overall 0.38%
    Overall memory needed: 349298k -> 349494k
    Peak memory use before GGC: 204084k -> 204210k
    Peak memory use after GGC: 199860k -> 199987k
    Maximum of released memory in single GGC run: 107085k -> 107089k
    Garbage: 358220k -> 358246k
    Leak: 47972k -> 48156k
    Overhead: 47776k -> 47830k
    GGC runs: 110 -> 108

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
  Amount of memory still referenced at the end of compilation increased from 65304k to 65488k, overall 0.28%
    Overall memory needed: 535318k -> 535282k
    Peak memory use before GGC: 314776k -> 314907k
    Peak memory use after GGC: 293119k -> 293250k
    Maximum of released memory in single GGC run: 163448k
    Garbage: 491181k -> 491201k
    Leak: 65304k -> 65488k
    Overhead: 59034k -> 59087k
    GGC runs: 97 -> 95

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2006-10-29 03:48:29.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2006-10-29 15:20:12.000000000 +0000
@@ -1,3 +1,47 @@
+2006-10-29  Richard Guenther  <rguenther@suse.de>
+
+	* genopinit.c (optabs): Change lfloor_optab and lceil_optab
+	to conversion optabs.
+	* optabs.c (init_optabs): Initialize lfloor_optab and lceil_optab
+	as conversion optab.
+	* optabs.h (enum optab_index): Remove OTI_lfloor and OTI_lceil.
+	(enum convert_optab_index): Add COI_lfloor and COI_lceil.
+	(lfloor_optab, lceil_optab): Adjust defines.
+	* builtins.c (expand_builtin_int_roundingfn): Adjust for
+	lfloor and lceil optabs now being conversion optabs.
+	* config/i386/i386-protos.h (ix86_expand_lfloorceil): Declare.
+	* config/i386/i386.c (ix86_expand_sse_compare_and_jump):
+	New static helper function.
+	(ix86_expand_lfloorceil): New function to expand lfloor and
+	lceil inline.
+	* config/i386/i386.md (lfloor<mode>2): Split into ...
+	(lfloorxf<mode>2): ... x87 variant
+	(lfloor<mode>di2, lfloor<mode>si2): ... and SSE variants
+	using ix86_expand_lfloorceil.
+	(lceil<mode>2, lceilxf<mode>2, lceil<mode>di2, lceil<mode>si2):
+	Likewise.
+	* doc/md.texi (lfloorMN, lceilMN): Document.
+
+2006-10-29  Richard Sandiford  <richard@codesourcery.com>
+
+	* configure.ac (HAVE_AS_NO_SHARED): New AC_DEFINE.  Test for the
+	-mno-shared assembler option on mips targets.
+	* configure, config.in: Regenerate.
+	* config/mips/linux.h (NO_SHARED_SPECS): New macro.
+	(DRIVER_SELF_SPECS): Define to NO_SHARED_SPECS if non-empty.
+	* config/mips/linux64.h (DRIVER_SELF_SPECS): Include NO_SHARED_SPECS.
+
+2006-10-29  Richard Sandiford  <richard@codesourcery.com>
+
+	* config/mips/mips.c (mips_classify_symbol): Test DECL_WEAK as well
+	as TREE_PUBLIC when deciding whether to return SYMBOL_GOT_GLOBAL.
+
+2006-10-29  Kazu Hirata  <kazu@codesourcery.com>
+
+	* config/darwin.c, config/darwin.opt, config/ia64/itanium1.md,
+	config/ia64/itanium2.md, real.c, tree-ssa-structalias.c: Fix
+	comment typos.
+
 2006-10-28  Kaveh R. Ghazi  <ghazi@caip.rutgers.edu>
 
 	PR middle-end/29335
--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog.cp	2006-10-29 03:48:29.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/cp/ChangeLog	2006-10-29 15:20:12.000000000 +0000
@@ -1,3 +1,7 @@
+2006-10-29  Kazu Hirata  <kazu@codesourcery.com>
+
+	* decl.c: Fix a comment typo.
+
 2006-10-28  Andrew Pinski  <andrew_pinski@playstation.sony.com>
 
 	PR C++/29295


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]