[J-core] [musl] Re: Aligned copies and cacheline conflicts?

Sat Sep 17 09:58:21 EDT 2016

On 9/17/2016 3:25 AM, D. Jeff Dionne wrote:
> On Sep 17, 2016, at 17:08, BGB <cr88192 at gmail.com> wrote:
>> I am left wondering what prevents using a hash in the cache lookup, where presumably a simplistic (presumably XOR based, 1) hash isn't that hard to pull off in VHDL? granted, I don't really know the specifics here.
> Cache lookup and fill are fast path operations (no waiting).  Hit shall not cause a CPU stall, and fill shall not incur any more cycles than necessary before the CPU gets the data.  Hash is hugely expensive.  Therefore, N way cache, where N is 1 to about 4 is the order of the day...  Where N>1 requires least recently used logic, etc and we have not implemented that.  Direct mapping gives you by far the largest boost.

I didn't mean one searches the entire hash table, but more does 
something analogous to:
     paddr=addr>>12;
     h=paddr^(paddr>>7)^(paddr>>14);
     h=h&255;
     mem1=cache[h];
     mem2=cache[h|256];
     if(mem1->paddr==paddr)
         { yay... }
     else if(mem2->paddr==paddr)
     {
         cache[h]=mem2;
         cache[h|256]=mem1;
         yay...
     }else
     {
         load page, put in cache[h].
     }

presumably, this should not require any additional iteration or similar 
to pull off.

admittedly, for related reasons (the cost of a full hash search), I 
don't do N-way lookups in emulators either, instead typically doing 
1-way or 2-way, albeit typically with a hash more like 
"h=((paddr*65521)>>16)&255;", but this would probably be a lot more 
expensive to pull off with logic gates vs some XORs (in terms of code on 
the CPU, it is pretty close, and a prime tends to be more effective).

I haven't usually (personally) seen enough gains much past 2-way to 
really make it worthwhile (vs the added cost of dealing with it).