[J-core] MMU-Design

BGB cr88192 at gmail.com
Sun Dec 3 19:01:19 EST 2017


On 12/3/2017 1:27 PM, Fabjan Sukalia wrote:
> Hi!
>
> Other architectures that do page table walking in hardware have
> additional caches dedicated for page tables. AFAIK it is called PDE
> cache for x86 and intermediate table walk cache for ARM. These caches
> are crucial for good performance but not mandatory for a working
> implementation. Maybe these caches are a bit too much for j2+mmu/j3/j32
> or so.
>
> Personally I like the suggestion ([1]) to include a simple design like
> that of the LM32. The LM32 design has a small overhead of two TLBs and
> an additional interrupt source, but it lacks two features, ASID and
> multiple page sizes. Both features can reduce the number of TLB misses
> and the resulting performance should be good enough for embedded devices.

ASID and multiple page sizes:
* these are also features I was leaving out of my attempt.
* basically, the complexity they would add didn't really seem "worth it"
** a traditional page table layout effectively also leave pages 
hard-wired at 4kB.
** if a TLB is both small and set associative, ASIDs aren't likely to be 
terribly effective.
*** essentially, the TLB will likely have "forgotten" by the time it 
gets back to the process in question.

my MMU prototypes thus far have looked like:
* between 16x4(64) or 64x4(256) entries
** basically 16 or 64 buckets but each bucket is 4-way set-associative.
** IME, a smaller 2-way or 4-way lookup seems to better than a bigger 
1-way lookup.
** fully associative is expensive though, so I am not doing this.
*** this could be more viable if the TLB were small, say, 8 or 16 entries.
** I haven't yet settled on a final TLB size (still a lot more testing 
is needed in this area).
*** my more recent design was using a 16x4 TLB.
*** sadly, the memory budget (for buffers/caches/...) isn't particularly 
generous.
*** ( though, synthesis seems to start going a bit berserk somewhat 
below the theoretical limits ).
* a single TLB was also shared for both code and data.


> Kind regards,
>
> Fabjan
>
>
> [1] http://lists.j-core.org/pipermail/j-core/2017-March/000560.html
>
>
> On 2017-12-03 06:35, D. Jeff Dionne wrote:
>> Page table walking can be implemented, it's just an FSM (these sorts of things are a little easier in VHDL than Verilog, but I digress and others will disagree).

FWIW: I wasn't saying it couldn't be done; just that trying to do so is 
painful.
and noting that one way it could be done more easily (namely via 
microcode) had since been dropped in favor of using an emulation ISR 
instead.

but, this particularly sub-project has in-general gotten a bit 
discouraging (mostly trying to make everything fit; targeting, eg, an 
XC7S25 or similar).


>> The problem is it's large.   I think BGB is correct, and there needs to be options for not including that sort of hardware.   The number of TLB entries, and the sizes of them available, will
>> are a large change in the achievable performance here.   The TLB fill code is a place where you really want it to be fast, and have the normal working sets cause that code to execute infrequently.   Likely, it should live in SRAM, on chip.
>>
>> Cheers,
>> J.
>>
>>> On Dec 3, 2017, at 03:47, BGB <cr88192 at gmail.com> wrote:
>>>
>>>> On 12/2/2017 10:57 AM, Fabjan Sukalia wrote:
>>>> Hello,
>>>>
>>>> I'm curious about the memory management unit in j2, as it is needed to port Debian to j-core. The last I read on this mailing list was that the design of the MMU will be discussed during a meeting in Japan.
>>>> Are there any new information that can be shared with the community? Will it have a soft-mmu, like MIPS and LM32 have, or a hard-mmu like in x86? Also, will the design be compatible to the MMU in SH-4?
>>>>
>>> AFAIK:
>>> * J2 doesn't have an MMU ( and presumably wont? )
>>> * presumably, the MMU would likely be for different cores ( J4 / etc... ).
>>>
>>>
>>> can't speak for J-Core here, but have noted:
>>> * for many use-cases, an x86 style hard-MMU is preferable;
>>> * however, a soft MMU is easier to implement in hardware (*1).
>>> * the existing OS's (such as Linux) seem to basically just emulate a page table on SH4.
>>> * the existing SH4 MMU design has a lot of "unnecessary complexity".
>>> ** in the form of features and memory-mapped structures which make little sense to be visible.
>>> ** Linux does not appear to make use of these, but I could be wrong here.
>>>
>>>
>>> what seems like a sane option, at least, from my POV (as an outsider):
>>> * optionally mandate the use of a conventional page-table style layout;
>>> ** only mandated if the use of a firmware or "hard" MMU is enabled.
>>> ** left it as "undefined" if something different is done while hard-MMU flag is set.
>>> ** page table bit-layout is to be kept basically equivalent to the Linux page-tables.
>>> * implemented as a soft MMU in hardware.
>>> ** likely significantly stripped down vs the SH4 design.
>>> ** presumably, the registers/... are basically the same as SH4, but specifics differ.
>>> ** likely omits MMIO visibility for the TLB and caches.
>>> * potentially, the MMU is (optionally) moved into firmware.
>>> ** say, there is an MMU flag which indicates where TLB miss ISR's go.
>>> *** say: Set=0xA0000400, Clear=VBR+0x400.
>>> ** if there is an (optional) hardware MMU, this flag would also enable its use.
>>> ** an OS supplied TLB ISR could still exist as a fallback (if the feature is not supported).
>>> ** by extension, things built as ROM would need to supply a TLB ISR though.
>>> ** the firmware ISR or hard-MMU would instead generate a Page-fault ISR.
>>> *** say, as using the TLBMISS exception code, but transferring control to VBR+0x100
>>> * ...
>>>
>>>
>>> but, as noted, these are just outsider thoughts (from someone working on an independent side-project...).
>>>
>>> *1: I was initially pretty much on the "do a hard MMU" side of the fence, but a very brief attempt at doing page-walking logic in Verilog pushed me over to the soft-MMU side of things ( pretty much anything that requires performing multiple sequential steps is a horrible PITA; and "microcode" ended up just sort of decaying into "just generate an ISR" and "casually overlook that using certain instructions within an ISR will effectively crash the CPU..."; but the "lame jokes" never end here, and this was by no means the worst of it... ).
>>>
>>> _______________________________________________
>>> J-core mailing list
>>> J-core at lists.j-core.org
>>> http://lists.j-core.org/mailman/listinfo/j-core
>> _______________________________________________
>> J-core mailing list
>> J-core at lists.j-core.org
>> http://lists.j-core.org/mailman/listinfo/j-core
> _______________________________________________
> J-core mailing list
> J-core at lists.j-core.org
> http://lists.j-core.org/mailman/listinfo/j-core




More information about the J-core mailing list