[J-core] [musl] Re: Aligned copies and cacheline conflicts?
BGB
cr88192 at gmail.com
Sat Sep 17 09:58:21 EDT 2016
On 9/17/2016 3:25 AM, D. Jeff Dionne wrote:
> On Sep 17, 2016, at 17:08, BGB <cr88192 at gmail.com> wrote:
>> I am left wondering what prevents using a hash in the cache lookup, where presumably a simplistic (presumably XOR based, 1) hash isn't that hard to pull off in VHDL? granted, I don't really know the specifics here.
> Cache lookup and fill are fast path operations (no waiting). Hit shall not cause a CPU stall, and fill shall not incur any more cycles than necessary before the CPU gets the data. Hash is hugely expensive. Therefore, N way cache, where N is 1 to about 4 is the order of the day... Where N>1 requires least recently used logic, etc and we have not implemented that. Direct mapping gives you by far the largest boost.
I didn't mean one searches the entire hash table, but more does
something analogous to:
paddr=addr>>12;
h=paddr^(paddr>>7)^(paddr>>14);
h=h&255;
mem1=cache[h];
mem2=cache[h|256];
if(mem1->paddr==paddr)
{ yay... }
else if(mem2->paddr==paddr)
{
cache[h]=mem2;
cache[h|256]=mem1;
yay...
}else
{
load page, put in cache[h].
}
presumably, this should not require any additional iteration or similar
to pull off.
admittedly, for related reasons (the cost of a full hash search), I
don't do N-way lookups in emulators either, instead typically doing
1-way or 2-way, albeit typically with a hash more like
"h=((paddr*65521)>>16)&255;", but this would probably be a lot more
expensive to pull off with logic gates vs some XORs (in terms of code on
the CPU, it is pretty close, and a prime tends to be more effective).
I haven't usually (personally) seen enough gains much past 2-way to
really make it worthwhile (vs the added cost of dealing with it).
More information about the J-core
mailing list