[J-core] j3/mmu and first silicon status.

Fri Mar 24 04:00:49 EDT 2017

On Mar 23, 2017, at 7:15, BGB <cr88192 at gmail.com> wrote:

I’m all for simple.

Can you write up a concise specification of what you have implemented and/or
propose?  I think 2 things that are important are support for both 8k + 4Meg pages,
and simple hardware implementation.  Diagraming the translation, tables and
entires would go a long way…

Then the Kernel guys can suggest advantageous changes/optimizations…

Cheers,
J.

> On 3/22/2017 8:20 AM, Rob Landley wrote:
>> Part of the reason we've been so quiet recently is that J2 development
>> is now in feature freeze preparing for first silicon. We're happy enough
>> with what we've got that we want to do a small initial proving run of
>> actual ASICs. There's still some development on peripherals and bus size
>> tweaks going on, but the SOC core you'd run on Numato or Turtle hasn't
>> changed in a while and isn't expected to start changing again for a few
>> months.
>> 
>> That said, Jeff recently looked at the mmu design Hitachi did for the
>> sh3, and it's very heavyweight. We'd rather do a simpler implementation
>> for our first pass, and need to study what Linux actually wants out of
>> an MMU first. And if we're doing a new mmu design, we'd rather not open
>> that can of worms until after the ASIC tapes out. (Which is mostly
>> testing work, not development work.)
>> 
>> The problem with the existing sh3 MMU is there's no WAY you can fit it
>> in an lx9, and probably not in an lx25. The problem is their MMU walks
>> the page tables in software, which completely flushes a simple L1 cache
>> like j-core has. To work around that they added L2 cache and made the L1
>> 4-way associative, and which drives the FPGA routing nuts. A simple
>> implementation of what they did would be several times larger than the
>> existing j2 SOC implementation _combined_. (We talked about making our
>> L1 2-way associative someday, but what the hitachi mmu needs is not a
>> "sweet spot".)
>> 
>> What we really want, to go along with our tiny processor implementation,
>> is an mmu that walks the page tables in _hardware_, at least for simple
>> faults where the TLB refill is just looking up the translation of an
>> existing physical page address. This would avoid the need to run page
>> fault handling code through the cache for "soft faults" (or run the same
>> code uncached, which would suck about as badly).
> 
> ... yes, if remembered, I wasn't so happy with the SH3/SH4 MMU design either (for similar reasons).
> 
> 
> this was part of why in my efforts I mostly threw out this part and used a good ol' page-table based MMU.
> 
> in my case, basically TTB holds the page-directory for a 2-level page-table (using the same basic layout as in 32-bit x86 and ARM).
> 
> as-is, PTEs are roughly:
>   0..8: map to the same bits as in PTEL.
>   9..11: reserved/undefined
>   12..28: physical page (4kB only, 1)
>   29..31: reserved (probably more page bits, 2)
> 
> could probably add an execute-disable bit.
> 
> 1: SZ0=1, SZ1=0; otherwise should probably page-fault.
>   could possibly also support large-pages (4MB) via a similar mechanism to in x86.
>       PDE: SZ0=1, SZ1=1: indicates 4MB page
> 2: I am still using the 29 bit space from SH4, rather than a full 32-bit space (ex: SH4A/SH5).
>   granted, this limits maximum RAM to around 304MB (0C000000..1EFFFFFF).
>   possibly could support extended (32-bit) addressing via the MMU
>       at the cost of possibly needing bank switching type hacks in some cases.
> 
> 
> this basic design also appears to be more or less what Linux is already using internally.
> I can't verify yet whether Linux SH4 can use it as-is (theory goes Linux should just "automagically" be able to work with it; but sadly I can't verify this short of getting Linux to boot and work on it unmodified).
> 
> 
>> But what we really want is an mmu that's a good fit for Linux. Note that
>> Linus has strong opinions about this sort of thing:
>> 
>> http://yarchive.net/comp/powerpc_page_tables.html
>> http://yarchive.net/comp/linux/page_tables.html
>> 
>> And apparently wrote his master's thesis on it:
>> 
>> ftp://ftp.polsl.pl/pub/linux/kernel/people/torvalds/thesis/torvalds97.pdf
>> 
>> And so did Mel Gorman a few years later:
>> 
>> https://www.kernel.org/doc/gorman/
>> 
>> That said stuff's changed a lot since then (we're up to what, 5 levels
>> now?) so we'd like to take the time and get the design right.
>> 
>> We'd also want to have a patch ready to push into qemu to support the
>> new design if it's not the existing one...
>> 
>> So that's the current state of that work.
> 
> yep.
> 
> the number of page-table levels mostly has to do with the address-space and physical-address size.
> 
> for a 32-bit arch, 2 or 3 should be sufficient (3 if using 64-bit PTE's, as in PAE).
> 
> for a 64-bit arch, could need up to 6 levels (for a full 64-bit address space with 4kB pages).
> 
> _______________________________________________
> J-core mailing list
> J-core at lists.j-core.org
> http://lists.j-core.org/mailman/listinfo/j-core