<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>Excellent paper with real metrics wrt design decisions.   Old enough that the patents have expired:  <a href="ftp://ftp.cs.wisc.edu/pub/techreports/1995/TR1277.pdf">ftp://ftp.cs.wisc.edu/pub/techreports/1995/TR1277.pdf</a><br><br>Cheers,<div>J</div></div><div><br>On Dec 4, 2017, at 09:01, BGB <<a href="mailto:cr88192@gmail.com">cr88192@gmail.com</a>> wrote:<br><br></div><blockquote type="cite"><div><span>On 12/3/2017 1:27 PM, Fabjan Sukalia wrote:</span><br><blockquote type="cite"><span>Hi!</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Other architectures that do page table walking in hardware have</span><br></blockquote><blockquote type="cite"><span>additional caches dedicated for page tables. AFAIK it is called PDE</span><br></blockquote><blockquote type="cite"><span>cache for x86 and intermediate table walk cache for ARM. These caches</span><br></blockquote><blockquote type="cite"><span>are crucial for good performance but not mandatory for a working</span><br></blockquote><blockquote type="cite"><span>implementation. Maybe these caches are a bit too much for j2+mmu/j3/j32</span><br></blockquote><blockquote type="cite"><span>or so.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Personally I like the suggestion ([1]) to include a simple design like</span><br></blockquote><blockquote type="cite"><span>that of the LM32. The LM32 design has a small overhead of two TLBs and</span><br></blockquote><blockquote type="cite"><span>an additional interrupt source, but it lacks two features, ASID and</span><br></blockquote><blockquote type="cite"><span>multiple page sizes. Both features can reduce the number of TLB misses</span><br></blockquote><blockquote type="cite"><span>and the resulting performance should be good enough for embedded devices.</span><br></blockquote><span></span><br><span>ASID and multiple page sizes:</span><br><span>* these are also features I was leaving out of my attempt.</span><br><span>* basically, the complexity they would add didn't really seem "worth it"</span><br><span>** a traditional page table layout effectively also leave pages hard-wired at 4kB.</span><br><span>** if a TLB is both small and set associative, ASIDs aren't likely to be terribly effective.</span><br><span>*** essentially, the TLB will likely have "forgotten" by the time it gets back to the process in question.</span><br><span></span><br><span>my MMU prototypes thus far have looked like:</span><br><span>* between 16x4(64) or 64x4(256) entries</span><br><span>** basically 16 or 64 buckets but each bucket is 4-way set-associative.</span><br><span>** IME, a smaller 2-way or 4-way lookup seems to better than a bigger 1-way lookup.</span><br><span>** fully associative is expensive though, so I am not doing this.</span><br><span>*** this could be more viable if the TLB were small, say, 8 or 16 entries.</span><br><span>** I haven't yet settled on a final TLB size (still a lot more testing is needed in this area).</span><br><span>*** my more recent design was using a 16x4 TLB.</span><br><span>*** sadly, the memory budget (for buffers/caches/...) isn't particularly generous.</span><br><span>*** ( though, synthesis seems to start going a bit berserk somewhat below the theoretical limits ).</span><br><span>* a single TLB was also shared for both code and data.</span><br><span></span><br><span></span><br><blockquote type="cite"><span>Kind regards,</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Fabjan</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>[1] <a href="http://lists.j-core.org/pipermail/j-core/2017-March/000560.html">http://lists.j-core.org/pipermail/j-core/2017-March/000560.html</a></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>On 2017-12-03 06:35, D. Jeff Dionne wrote:</span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>Page table walking can be implemented, it's just an FSM (these sorts of things are a little easier in VHDL than Verilog, but I digress and others will disagree).</span><br></blockquote></blockquote><span></span><br><span>FWIW: I wasn't saying it couldn't be done; just that trying to do so is painful.</span><br><span>and noting that one way it could be done more easily (namely via microcode) had since been dropped in favor of using an emulation ISR instead.</span><br><span></span><br><span>but, this particularly sub-project has in-general gotten a bit discouraging (mostly trying to make everything fit; targeting, eg, an XC7S25 or similar).</span><br><span></span><br><span></span><br><blockquote type="cite"><blockquote type="cite"><span>The problem is it's large.   I think BGB is correct, and there needs to be options for not including that sort of hardware.   The number of TLB entries, and the sizes of them available, will</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>are a large change in the achievable performance here.   The TLB fill code is a place where you really want it to be fast, and have the normal working sets cause that code to execute infrequently.   Likely, it should live in SRAM, on chip.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Cheers,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>J.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>On Dec 3, 2017, at 03:47, BGB <<a href="mailto:cr88192@gmail.com">cr88192@gmail.com</a>> wrote:</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>On 12/2/2017 10:57 AM, Fabjan Sukalia wrote:</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Hello,</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>I'm curious about the memory management unit in j2, as it is needed to port Debian to j-core. The last I read on this mailing list was that the design of the MMU will be discussed during a meeting in Japan.</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Are there any new information that can be shared with the community? Will it have a soft-mmu, like MIPS and LM32 have, or a hard-mmu like in x86? Also, will the design be compatible to the MMU in SH-4?</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>AFAIK:</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* J2 doesn't have an MMU ( and presumably wont? )</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* presumably, the MMU would likely be for different cores ( J4 / etc... ).</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>can't speak for J-Core here, but have noted:</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* for many use-cases, an x86 style hard-MMU is preferable;</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* however, a soft MMU is easier to implement in hardware (*1).</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* the existing OS's (such as Linux) seem to basically just emulate a page table on SH4.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* the existing SH4 MMU design has a lot of "unnecessary complexity".</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** in the form of features and memory-mapped structures which make little sense to be visible.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** Linux does not appear to make use of these, but I could be wrong here.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>what seems like a sane option, at least, from my POV (as an outsider):</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* optionally mandate the use of a conventional page-table style layout;</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** only mandated if the use of a firmware or "hard" MMU is enabled.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** left it as "undefined" if something different is done while hard-MMU flag is set.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** page table bit-layout is to be kept basically equivalent to the Linux page-tables.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* implemented as a soft MMU in hardware.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** likely significantly stripped down vs the SH4 design.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** presumably, the registers/... are basically the same as SH4, but specifics differ.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** likely omits MMIO visibility for the TLB and caches.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* potentially, the MMU is (optionally) moved into firmware.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** say, there is an MMU flag which indicates where TLB miss ISR's go.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>*** say: Set=0xA0000400, Clear=VBR+0x400.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** if there is an (optional) hardware MMU, this flag would also enable its use.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** an OS supplied TLB ISR could still exist as a fallback (if the feature is not supported).</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** by extension, things built as ROM would need to supply a TLB ISR though.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>** the firmware ISR or hard-MMU would instead generate a Page-fault ISR.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>*** say, as using the TLBMISS exception code, but transferring control to VBR+0x100</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>* ...</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>but, as noted, these are just outsider thoughts (from someone working on an independent side-project...).</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>*1: I was initially pretty much on the "do a hard MMU" side of the fence, but a very brief attempt at doing page-walking logic in Verilog pushed me over to the soft-MMU side of things ( pretty much anything that requires performing multiple sequential steps is a horrible PITA; and "microcode" ended up just sort of decaying into "just generate an ISR" and "casually overlook that using certain instructions within an ISR will effectively crash the CPU..."; but the "lame jokes" never end here, and this was by no means the worst of it... ).</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>_______________________________________________</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>J-core mailing list</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="mailto:J-core@lists.j-core.org">J-core@lists.j-core.org</a></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="http://lists.j-core.org/mailman/listinfo/j-core">http://lists.j-core.org/mailman/listinfo/j-core</a></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>_______________________________________________</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>J-core mailing list</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span><a href="mailto:J-core@lists.j-core.org">J-core@lists.j-core.org</a></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span><a href="http://lists.j-core.org/mailman/listinfo/j-core">http://lists.j-core.org/mailman/listinfo/j-core</a></span><br></blockquote></blockquote><blockquote type="cite"><span>_______________________________________________</span><br></blockquote><blockquote type="cite"><span>J-core mailing list</span><br></blockquote><blockquote type="cite"><span><a href="mailto:J-core@lists.j-core.org">J-core@lists.j-core.org</a></span><br></blockquote><blockquote type="cite"><span><a href="http://lists.j-core.org/mailman/listinfo/j-core">http://lists.j-core.org/mailman/listinfo/j-core</a></span><br></blockquote><span></span><br><span></span><br><span>_______________________________________________</span><br><span>J-core mailing list</span><br><span><a href="mailto:J-core@lists.j-core.org">J-core@lists.j-core.org</a></span><br><span><a href="http://lists.j-core.org/mailman/listinfo/j-core">http://lists.j-core.org/mailman/listinfo/j-core</a></span><br></div></blockquote></body></html>