This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH for PR 14699


>wrong formula" (the formula used may not look like the usual one, but
>it is correct to the best of my knowledge) and also why this causes an

Yes. My fail, 'wrong' is too loud:

original variant:

  exp_len = (double)total_bytes / (double)nelts;
  exp2_len = exp_len * exp_len;
  exp_len2 = (double) sum_of_squares / (double) nelts;

// ...

  fprintf (stderr, "avg. entry\t%.2f bytes (+/- %.2f)\n",
           exp_len, approx_sqrt (exp_len2 - exp2_len));

folded together:

           -----------------------------------------------
          |
          |         N                /    N     \  2
          |       ----              |   ----     |
          |   1   \      2       1  |   \        |
S =   \   |  ---   )    X    +  --- |    )    X  |
       \  |   N   /      i        2 |   /      i |
        \ |       ----           N  |   ----     |
         \|       i = 1              \  i = 1   /


but exact variant is:

           -----------------------------------------------
          |
          |         N                    /    N     \  2
          |       ----                  |   ----     |
          |   1   \      2         1    |   \        |
S =   \   |  ---   )    X    +  ------  |    )    X  |
       \  |  N-1  /      i      N (N-1) |   /      i |
        \ |       ----                  |   ----     |
         \|       i = 1                  \  i = 1   /



Obviously when N > 1000 difference is negligible, so your formula
_is_ correct.

>abort instead of garbage results.  (Presumably the issue is integer
>overflow?)

Yes, look at hashtable.c:244:
  do
    if (*p)
      {
        size_t n = HT_LEN (*p);

        total_bytes += n;
        sum_of_squares += n * n;
        if (n > longest)
          longest = n;
        nids++;
      }
  while (++p < limit);

So if size_t 32 bit wide, and n >= 65536 we get an overflow at line

        sum_of_square += n * n;

And sum_of_squares becomes fewer than it ought to be, and after

  exp_len = (double)total_bytes / (double)nelts;
  exp2_len = exp_len * exp_len;
  exp_len2 = (double) sum_of_squares / (double) nelts;

// ...

  fprintf (stderr, "avg. entry\t%.2f bytes (+/- %.2f)\n",
           exp_len, approx_sqrt (exp_len2 - exp2_len));

(exp_len2 - exp2_len) becomes less than zero, so we get

hashtable.c:292:
approx_sqrt (double x)
{
  double s, d;

  if (x < 0)
    abort ();

BAMM.

Below is a reduced patch to fix this problem:

2004-03-24  Serge Belyshev  <1319@bot.ru>

	PR 14699
        * hashtable.c (ht_dump_statistics): Change type of sum_of_squares
        from size_t to double.

diff -u hashtable.c.~1.16.~ hashtable.c
--- hashtable.c.~1.16.~	2003-08-23 02:29:17.000000000 +0400
+++ hashtable.c	2004-03-25 01:37:01.998704560 +0300
@@ -228,8 +228,8 @@
 ht_dump_statistics (hash_table *table)
 {
   size_t nelts, nids, overhead, headers;
-  size_t total_bytes, longest, sum_of_squares;
-  double exp_len, exp_len2, exp2_len;
+  size_t total_bytes, longest;
+  double exp_len, exp_len2, exp2_len, sum_of_squares;
   hashnode *p, *limit;
 
 #define SCALE(x) ((unsigned long) ((x) < 1024*10 \
@@ -248,7 +248,7 @@
 	size_t n = HT_LEN (*p);
 
 	total_bytes += n;
-	sum_of_squares += n * n;
+	sum_of_squares += (double) n * n;
 	if (n > longest)
 	  longest = n;
 	nids++;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]