[J-core] j3/mmu and first silicon status.

Wed Mar 22 18:15:53 EDT 2017

On 3/22/2017 8:20 AM, Rob Landley wrote:
> Part of the reason we've been so quiet recently is that J2 development
> is now in feature freeze preparing for first silicon. We're happy enough
> with what we've got that we want to do a small initial proving run of
> actual ASICs. There's still some development on peripherals and bus size
> tweaks going on, but the SOC core you'd run on Numato or Turtle hasn't
> changed in a while and isn't expected to start changing again for a few
> months.
>
> That said, Jeff recently looked at the mmu design Hitachi did for the
> sh3, and it's very heavyweight. We'd rather do a simpler implementation
> for our first pass, and need to study what Linux actually wants out of
> an MMU first. And if we're doing a new mmu design, we'd rather not open
> that can of worms until after the ASIC tapes out. (Which is mostly
> testing work, not development work.)
>
> The problem with the existing sh3 MMU is there's no WAY you can fit it
> in an lx9, and probably not in an lx25. The problem is their MMU walks
> the page tables in software, which completely flushes a simple L1 cache
> like j-core has. To work around that they added L2 cache and made the L1
> 4-way associative, and which drives the FPGA routing nuts. A simple
> implementation of what they did would be several times larger than the
> existing j2 SOC implementation _combined_. (We talked about making our
> L1 2-way associative someday, but what the hitachi mmu needs is not a
> "sweet spot".)
>
> What we really want, to go along with our tiny processor implementation,
> is an mmu that walks the page tables in _hardware_, at least for simple
> faults where the TLB refill is just looking up the translation of an
> existing physical page address. This would avoid the need to run page
> fault handling code through the cache for "soft faults" (or run the same
> code uncached, which would suck about as badly).

... yes, if remembered, I wasn't so happy with the SH3/SH4 MMU design 
either (for similar reasons).

this was part of why in my efforts I mostly threw out this part and used 
a good ol' page-table based MMU.

in my case, basically TTB holds the page-directory for a 2-level 
page-table (using the same basic layout as in 32-bit x86 and ARM).

as-is, PTEs are roughly:
     0..8: map to the same bits as in PTEL.
     9..11: reserved/undefined
     12..28: physical page (4kB only, 1)
     29..31: reserved (probably more page bits, 2)

could probably add an execute-disable bit.

1: SZ0=1, SZ1=0; otherwise should probably page-fault.
     could possibly also support large-pages (4MB) via a similar 
mechanism to in x86.
         PDE: SZ0=1, SZ1=1: indicates 4MB page
2: I am still using the 29 bit space from SH4, rather than a full 32-bit 
space (ex: SH4A/SH5).
     granted, this limits maximum RAM to around 304MB (0C000000..1EFFFFFF).
     possibly could support extended (32-bit) addressing via the MMU
         at the cost of possibly needing bank switching type hacks in 
some cases.

this basic design also appears to be more or less what Linux is already 
using internally.
I can't verify yet whether Linux SH4 can use it as-is (theory goes Linux 
should just "automagically" be able to work with it; but sadly I can't 
verify this short of getting Linux to boot and work on it unmodified).

> But what we really want is an mmu that's a good fit for Linux. Note that
> Linus has strong opinions about this sort of thing:
>
> http://yarchive.net/comp/powerpc_page_tables.html
> http://yarchive.net/comp/linux/page_tables.html
>
> And apparently wrote his master's thesis on it:
>
> ftp://ftp.polsl.pl/pub/linux/kernel/people/torvalds/thesis/torvalds97.pdf
>
> And so did Mel Gorman a few years later:
>
> https://www.kernel.org/doc/gorman/
>
> That said stuff's changed a lot since then (we're up to what, 5 levels
> now?) so we'd like to take the time and get the design right.
>
> We'd also want to have a patch ready to push into qemu to support the
> new design if it's not the existing one...
>
> So that's the current state of that work.

yep.

the number of page-table levels mostly has to do with the address-space 
and physical-address size.

for a 32-bit arch, 2 or 3 should be sufficient (3 if using 64-bit PTE's, 
as in PAE).

for a 64-bit arch, could need up to 6 levels (for a full 64-bit address 
space with 4kB pages).