[J-core] j3/mmu and first silicon status.
D. Jeff Dionne
jeff at SE-Instruments.co.jp
Fri Mar 24 04:00:49 EDT 2017
On Mar 23, 2017, at 7:15, BGB <cr88192 at gmail.com> wrote:
I’m all for simple.
Can you write up a concise specification of what you have implemented and/or
propose? I think 2 things that are important are support for both 8k + 4Meg pages,
and simple hardware implementation. Diagraming the translation, tables and
entires would go a long way…
Then the Kernel guys can suggest advantageous changes/optimizations…
> On 3/22/2017 8:20 AM, Rob Landley wrote:
>> Part of the reason we've been so quiet recently is that J2 development
>> is now in feature freeze preparing for first silicon. We're happy enough
>> with what we've got that we want to do a small initial proving run of
>> actual ASICs. There's still some development on peripherals and bus size
>> tweaks going on, but the SOC core you'd run on Numato or Turtle hasn't
>> changed in a while and isn't expected to start changing again for a few
>> That said, Jeff recently looked at the mmu design Hitachi did for the
>> sh3, and it's very heavyweight. We'd rather do a simpler implementation
>> for our first pass, and need to study what Linux actually wants out of
>> an MMU first. And if we're doing a new mmu design, we'd rather not open
>> that can of worms until after the ASIC tapes out. (Which is mostly
>> testing work, not development work.)
>> The problem with the existing sh3 MMU is there's no WAY you can fit it
>> in an lx9, and probably not in an lx25. The problem is their MMU walks
>> the page tables in software, which completely flushes a simple L1 cache
>> like j-core has. To work around that they added L2 cache and made the L1
>> 4-way associative, and which drives the FPGA routing nuts. A simple
>> implementation of what they did would be several times larger than the
>> existing j2 SOC implementation _combined_. (We talked about making our
>> L1 2-way associative someday, but what the hitachi mmu needs is not a
>> "sweet spot".)
>> What we really want, to go along with our tiny processor implementation,
>> is an mmu that walks the page tables in _hardware_, at least for simple
>> faults where the TLB refill is just looking up the translation of an
>> existing physical page address. This would avoid the need to run page
>> fault handling code through the cache for "soft faults" (or run the same
>> code uncached, which would suck about as badly).
> ... yes, if remembered, I wasn't so happy with the SH3/SH4 MMU design either (for similar reasons).
> this was part of why in my efforts I mostly threw out this part and used a good ol' page-table based MMU.
> in my case, basically TTB holds the page-directory for a 2-level page-table (using the same basic layout as in 32-bit x86 and ARM).
> as-is, PTEs are roughly:
> 0..8: map to the same bits as in PTEL.
> 9..11: reserved/undefined
> 12..28: physical page (4kB only, 1)
> 29..31: reserved (probably more page bits, 2)
> could probably add an execute-disable bit.
> 1: SZ0=1, SZ1=0; otherwise should probably page-fault.
> could possibly also support large-pages (4MB) via a similar mechanism to in x86.
> PDE: SZ0=1, SZ1=1: indicates 4MB page
> 2: I am still using the 29 bit space from SH4, rather than a full 32-bit space (ex: SH4A/SH5).
> granted, this limits maximum RAM to around 304MB (0C000000..1EFFFFFF).
> possibly could support extended (32-bit) addressing via the MMU
> at the cost of possibly needing bank switching type hacks in some cases.
> this basic design also appears to be more or less what Linux is already using internally.
> I can't verify yet whether Linux SH4 can use it as-is (theory goes Linux should just "automagically" be able to work with it; but sadly I can't verify this short of getting Linux to boot and work on it unmodified).
>> But what we really want is an mmu that's a good fit for Linux. Note that
>> Linus has strong opinions about this sort of thing:
>> And apparently wrote his master's thesis on it:
>> And so did Mel Gorman a few years later:
>> That said stuff's changed a lot since then (we're up to what, 5 levels
>> now?) so we'd like to take the time and get the design right.
>> We'd also want to have a patch ready to push into qemu to support the
>> new design if it's not the existing one...
>> So that's the current state of that work.
> the number of page-table levels mostly has to do with the address-space and physical-address size.
> for a 32-bit arch, 2 or 3 should be sufficient (3 if using 64-bit PTE's, as in PAE).
> for a 64-bit arch, could need up to 6 levels (for a full 64-bit address space with 4kB pages).
> J-core mailing list
> J-core at lists.j-core.org
More information about the J-core