[J-core] MMU-Design

D. Jeff Dionne Jeff at SE-Instruments.com
Mon Dec 4 10:08:03 EST 2017

Excellent paper with real metrics wrt design decisions.   Old enough that the patents have expired:  ftp://ftp.cs.wisc.edu/pub/techreports/1995/TR1277.pdf


> On Dec 4, 2017, at 09:01, BGB <cr88192 at gmail.com> wrote:
>> On 12/3/2017 1:27 PM, Fabjan Sukalia wrote:
>> Hi!
>> Other architectures that do page table walking in hardware have
>> additional caches dedicated for page tables. AFAIK it is called PDE
>> cache for x86 and intermediate table walk cache for ARM. These caches
>> are crucial for good performance but not mandatory for a working
>> implementation. Maybe these caches are a bit too much for j2+mmu/j3/j32
>> or so.
>> Personally I like the suggestion ([1]) to include a simple design like
>> that of the LM32. The LM32 design has a small overhead of two TLBs and
>> an additional interrupt source, but it lacks two features, ASID and
>> multiple page sizes. Both features can reduce the number of TLB misses
>> and the resulting performance should be good enough for embedded devices.
> ASID and multiple page sizes:
> * these are also features I was leaving out of my attempt.
> * basically, the complexity they would add didn't really seem "worth it"
> ** a traditional page table layout effectively also leave pages hard-wired at 4kB.
> ** if a TLB is both small and set associative, ASIDs aren't likely to be terribly effective.
> *** essentially, the TLB will likely have "forgotten" by the time it gets back to the process in question.
> my MMU prototypes thus far have looked like:
> * between 16x4(64) or 64x4(256) entries
> ** basically 16 or 64 buckets but each bucket is 4-way set-associative.
> ** IME, a smaller 2-way or 4-way lookup seems to better than a bigger 1-way lookup.
> ** fully associative is expensive though, so I am not doing this.
> *** this could be more viable if the TLB were small, say, 8 or 16 entries.
> ** I haven't yet settled on a final TLB size (still a lot more testing is needed in this area).
> *** my more recent design was using a 16x4 TLB.
> *** sadly, the memory budget (for buffers/caches/...) isn't particularly generous.
> *** ( though, synthesis seems to start going a bit berserk somewhat below the theoretical limits ).
> * a single TLB was also shared for both code and data.
>> Kind regards,
>> Fabjan
>> [1] http://lists.j-core.org/pipermail/j-core/2017-March/000560.html
>>> On 2017-12-03 06:35, D. Jeff Dionne wrote:
>>> Page table walking can be implemented, it's just an FSM (these sorts of things are a little easier in VHDL than Verilog, but I digress and others will disagree).
> FWIW: I wasn't saying it couldn't be done; just that trying to do so is painful.
> and noting that one way it could be done more easily (namely via microcode) had since been dropped in favor of using an emulation ISR instead.
> but, this particularly sub-project has in-general gotten a bit discouraging (mostly trying to make everything fit; targeting, eg, an XC7S25 or similar).
>>> The problem is it's large.   I think BGB is correct, and there needs to be options for not including that sort of hardware.   The number of TLB entries, and the sizes of them available, will
>>> are a large change in the achievable performance here.   The TLB fill code is a place where you really want it to be fast, and have the normal working sets cause that code to execute infrequently.   Likely, it should live in SRAM, on chip.
>>> Cheers,
>>> J.
>>>>> On Dec 3, 2017, at 03:47, BGB <cr88192 at gmail.com> wrote:
>>>>> On 12/2/2017 10:57 AM, Fabjan Sukalia wrote:
>>>>> Hello,
>>>>> I'm curious about the memory management unit in j2, as it is needed to port Debian to j-core. The last I read on this mailing list was that the design of the MMU will be discussed during a meeting in Japan.
>>>>> Are there any new information that can be shared with the community? Will it have a soft-mmu, like MIPS and LM32 have, or a hard-mmu like in x86? Also, will the design be compatible to the MMU in SH-4?
>>>> AFAIK:
>>>> * J2 doesn't have an MMU ( and presumably wont? )
>>>> * presumably, the MMU would likely be for different cores ( J4 / etc... ).
>>>> can't speak for J-Core here, but have noted:
>>>> * for many use-cases, an x86 style hard-MMU is preferable;
>>>> * however, a soft MMU is easier to implement in hardware (*1).
>>>> * the existing OS's (such as Linux) seem to basically just emulate a page table on SH4.
>>>> * the existing SH4 MMU design has a lot of "unnecessary complexity".
>>>> ** in the form of features and memory-mapped structures which make little sense to be visible.
>>>> ** Linux does not appear to make use of these, but I could be wrong here.
>>>> what seems like a sane option, at least, from my POV (as an outsider):
>>>> * optionally mandate the use of a conventional page-table style layout;
>>>> ** only mandated if the use of a firmware or "hard" MMU is enabled.
>>>> ** left it as "undefined" if something different is done while hard-MMU flag is set.
>>>> ** page table bit-layout is to be kept basically equivalent to the Linux page-tables.
>>>> * implemented as a soft MMU in hardware.
>>>> ** likely significantly stripped down vs the SH4 design.
>>>> ** presumably, the registers/... are basically the same as SH4, but specifics differ.
>>>> ** likely omits MMIO visibility for the TLB and caches.
>>>> * potentially, the MMU is (optionally) moved into firmware.
>>>> ** say, there is an MMU flag which indicates where TLB miss ISR's go.
>>>> *** say: Set=0xA0000400, Clear=VBR+0x400.
>>>> ** if there is an (optional) hardware MMU, this flag would also enable its use.
>>>> ** an OS supplied TLB ISR could still exist as a fallback (if the feature is not supported).
>>>> ** by extension, things built as ROM would need to supply a TLB ISR though.
>>>> ** the firmware ISR or hard-MMU would instead generate a Page-fault ISR.
>>>> *** say, as using the TLBMISS exception code, but transferring control to VBR+0x100
>>>> * ...
>>>> but, as noted, these are just outsider thoughts (from someone working on an independent side-project...).
>>>> *1: I was initially pretty much on the "do a hard MMU" side of the fence, but a very brief attempt at doing page-walking logic in Verilog pushed me over to the soft-MMU side of things ( pretty much anything that requires performing multiple sequential steps is a horrible PITA; and "microcode" ended up just sort of decaying into "just generate an ISR" and "casually overlook that using certain instructions within an ISR will effectively crash the CPU..."; but the "lame jokes" never end here, and this was by no means the worst of it... ).
>>>> _______________________________________________
>>>> J-core mailing list
>>>> J-core at lists.j-core.org
>>>> http://lists.j-core.org/mailman/listinfo/j-core
>>> _______________________________________________
>>> J-core mailing list
>>> J-core at lists.j-core.org
>>> http://lists.j-core.org/mailman/listinfo/j-core
>> _______________________________________________
>> J-core mailing list
>> J-core at lists.j-core.org
>> http://lists.j-core.org/mailman/listinfo/j-core
> _______________________________________________
> J-core mailing list
> J-core at lists.j-core.org
> http://lists.j-core.org/mailman/listinfo/j-core
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.j-core.org/pipermail/j-core/attachments/20171205/05677541/attachment-0001.html>

More information about the J-core mailing list