This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: _mm_store_pd/s translated to movntpd/s


Matthias Kretz <kretz@compeng.uni-frankfurt.de> writes:

> On Monday 21 March 2011 15:23:02 Matthias Kretz wrote:
>> I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now I
>> tested on an AMD Magny-Cours using the -march=barcelona flag and gcc
>> translated _mm_store_pd/s calls in the code to streaming stores in the
>> resulting binary.
>> 
>> Where does this "optimization" come from and how can I disable it? This
>> doesn't make much sense on a working set that fits into the cache...
>> 
>> Is this intended behavior or a bug?
>
> Additional info: If I add -fno-prefetch-loop-arrays I get normal stores as 
> expected. I don't consider this a solution, though.

That is precisely where this optimization is coming from.  The
vectorizer pretty much assumes that the working set doesn't fit in the
cache.  I think it would be reasonable to have an option to control
this.  Please consider filing a bug report as described at
http://gcc.gnu.org/bugs/ , ideally with a test case.

Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]