This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Re: Need C optimization help
- From: Mark Rose <markrose at markrose dot ca>
- To: gcc-help at gcc dot gnu dot org
- Date: Sat, 15 May 2010 16:01:39 -0400
- Subject: Re: Need C optimization help
- References: <201005151559.30889.markrose@markrose.ca>
I accidentally forgot to save the reduced file before attaching it. Here is the
proper attchment.
-Mark
On Saturday 15 May 2010 3:59:30 pm Mark Rose wrote:
> Hi All,
>
> I have a line that gets called approxmiately 2 trillion times according to
> valgrind, and I love any suggestions for speeding it up. I have taken over
> the project from someone else and my C abilities are only intermediate.
> I've attached a trimmed down version of the function.
>
> Here are some notes about the code:
> * tsMemAlloc and tsMemFree are basically wrappers to malloc/free.
> * dsfmt_genrand_close_open() generates a number in the range [0,1)
>
> And notes about the data:
> * n_topics is 35 (loaded from a configuration file; changeable)
> * n_tokens is about 7M (and increases as the data set grows)
> * params->iter is 500
> * topics->wcount size is about 190k items
> * topics->dcount size is about 275k items
> * topics->tcount size is 35 items
>
> I'm compling on an Opteron, with options: -03 -std=gnu99 -march=native
> -ffast- math -ftree-loop-distribution -ftree-loop-linear
> -ftree-interchange -floop- block -ftree-vectorizer-verbose=8
>
> I'm using GCC 4.4.3.
>
> The tree vectorizer spits out this note about the loop I'm particularly
> interested in:
>
> for (int k = 1; k < n_topics; k++)
> cdf[k] = cdf[k-1] + (dcount[k+d_offset] + alpha)*(wcount[k+w_offset +
> beta)/(tcount[k] + wbeta);
> note: not vectorized: data ref analysis failed D.8597_92 = *D.8596_91;
>
> I've done hours of googling, playing with restrict keywords, splitting the
> cdf[k-1] addition into another loop, and nothing will help. The error
> message itself is very poor, as I can't find a decent explanation anywhere
> online as to what it means nor how to fix it.
>
> Thanks for whatever insight you can give!
>
> -Mark
>
// Perfrom the LDA operation
void tsTopicLDA(TSTopics * topics, TSParams * params, TSCorpus * corpus)
{
// Initialize the pseudo-random number generator (PRNG)
dsfmt_t prng;
dsfmt_init_gen_rand(&prng, params->seed);
// Initialize the solver
randomInit(topics, corpus->tokens, &prng);
// Stores the estimated CDF
double * cdf;
tsMemAlloc((void **)&cdf, sizeof(double)*params->topics);
// Dereference all of the pointers (most are array pointers anyway so this can be done)
const unsigned int n_tokens = corpus->n_tokens;
const unsigned int n_topics = topics->n_topics;
const double alpha = params->alpha;
const double beta = params->beta;
const double wbeta = (double)corpus->n_words*params->beta;
unsigned int * wrd_tokens = &corpus->tokens[0];
unsigned int * doc_tokens = &corpus->tokens[n_tokens];
unsigned int * restrict wcount = topics->wcount;
unsigned int * restrict dcount = topics->dcount;
unsigned int * restrict tcount = topics->tcount;
unsigned int * restrict assign = topics->assign;
// Run the Gibbs Sampler for the given burn-in duration
for (int i = 0; i < params->iter; i++)
{
// Sample each word
for (int j = 0; j < corpus->n_tokens; j++)
{
const unsigned int w = wrd_tokens[j];
const unsigned int d = doc_tokens[j];
const unsigned int t = assign[j];
const unsigned int w_offset = w*n_topics;
const unsigned int d_offset = d*n_topics;
double * restrict rcdf = cdf;
// Decrement the counters for the current word to remove its influence
--wcount[t + w_offset];
--dcount[t + d_offset];
--tcount[t];
// Infer the CDF from the current state
cdf[0] = (dcount[d_offset] + alpha)*(wcount[w_offset] + beta)/(tcount[0] + wbeta);
for (int k = 1; k < n_topics; k++)
cdf[k] = cdf[k-1] + (dcount[k+d_offset] + alpha)*(wcount[k+w_offset] + beta)/(tcount[k] + wbeta);
// Choose the most likely topic assignment based on the CDF
double u = cdf[n_topics-1]*dsfmt_genrand_close_open(&prng);
int new_topic = -1;
for (int k = 0; k < n_topics; k++)
if (rcdf[k] > u)
{
new_topic = k;
break;
}
assign[j] = new_topic;
// Increment the appropriate counts
++dcount[new_topic + d_offset];
++wcount[new_topic + w_offset];
++tcount[new_topic];
}
}
// Free unused resources
tsMemFree(cdf);
}