[J-core] j3/mmu and first silicon status.
cr88192 at gmail.com
Fri Mar 24 11:09:15 EDT 2017
On 3/24/2017 3:00 AM, D. Jeff Dionne wrote:
> On Mar 23, 2017, at 7:15, BGB <cr88192 at gmail.com> wrote:
> I’m all for simple.
> Can you write up a concise specification of what you have implemented and/or
> propose? I think 2 things that are important are support for both 8k + 4Meg pages,
> and simple hardware implementation. Diagraming the translation, tables and
> entires would go a long way…
> Then the Kernel guys can suggest advantageous changes/optimizations…
here is a spec for what I have thus far:
I think you mean 4k?... 8k pages would be a rather unusual size.
4MB comes about because 1024x 4kB = 4MB (as results from 32-bit PTE's)
if using 64-bit PTEs, this would work out to 2MB for large pages
I was looking more at the Linux kernel source to try to verify things,
and from what I can tell it mostly matches what the SH3/SH4 kernels are
currently built for (needed to also look in Kconfig and the .config
files and similar, ...).
the main alternative seems to be a 3-level table with 64-bit entries,
which is used on some other cases with SH targets.
it appears the kernel can be built either way, though as-is I don't see
an obvious way to detect which layout the kernel was built to use (since
normally the CPU wouldn't really care; but here would need some status
flags or similar).
I could probably also spec up the case for using 64-bit PTE's.
>> On 3/22/2017 8:20 AM, Rob Landley wrote:
>>> Part of the reason we've been so quiet recently is that J2 development
>>> is now in feature freeze preparing for first silicon. We're happy enough
>>> with what we've got that we want to do a small initial proving run of
>>> actual ASICs. There's still some development on peripherals and bus size
>>> tweaks going on, but the SOC core you'd run on Numato or Turtle hasn't
>>> changed in a while and isn't expected to start changing again for a few
>>> That said, Jeff recently looked at the mmu design Hitachi did for the
>>> sh3, and it's very heavyweight. We'd rather do a simpler implementation
>>> for our first pass, and need to study what Linux actually wants out of
>>> an MMU first. And if we're doing a new mmu design, we'd rather not open
>>> that can of worms until after the ASIC tapes out. (Which is mostly
>>> testing work, not development work.)
>>> The problem with the existing sh3 MMU is there's no WAY you can fit it
>>> in an lx9, and probably not in an lx25. The problem is their MMU walks
>>> the page tables in software, which completely flushes a simple L1 cache
>>> like j-core has. To work around that they added L2 cache and made the L1
>>> 4-way associative, and which drives the FPGA routing nuts. A simple
>>> implementation of what they did would be several times larger than the
>>> existing j2 SOC implementation _combined_. (We talked about making our
>>> L1 2-way associative someday, but what the hitachi mmu needs is not a
>>> "sweet spot".)
>>> What we really want, to go along with our tiny processor implementation,
>>> is an mmu that walks the page tables in _hardware_, at least for simple
>>> faults where the TLB refill is just looking up the translation of an
>>> existing physical page address. This would avoid the need to run page
>>> fault handling code through the cache for "soft faults" (or run the same
>>> code uncached, which would suck about as badly).
>> ... yes, if remembered, I wasn't so happy with the SH3/SH4 MMU design either (for similar reasons).
>> this was part of why in my efforts I mostly threw out this part and used a good ol' page-table based MMU.
>> in my case, basically TTB holds the page-directory for a 2-level page-table (using the same basic layout as in 32-bit x86 and ARM).
>> as-is, PTEs are roughly:
>> 0..8: map to the same bits as in PTEL.
>> 9..11: reserved/undefined
>> 12..28: physical page (4kB only, 1)
>> 29..31: reserved (probably more page bits, 2)
>> could probably add an execute-disable bit.
>> 1: SZ0=1, SZ1=0; otherwise should probably page-fault.
>> could possibly also support large-pages (4MB) via a similar mechanism to in x86.
>> PDE: SZ0=1, SZ1=1: indicates 4MB page
>> 2: I am still using the 29 bit space from SH4, rather than a full 32-bit space (ex: SH4A/SH5).
>> granted, this limits maximum RAM to around 304MB (0C000000..1EFFFFFF).
>> possibly could support extended (32-bit) addressing via the MMU
>> at the cost of possibly needing bank switching type hacks in some cases.
>> this basic design also appears to be more or less what Linux is already using internally.
>> I can't verify yet whether Linux SH4 can use it as-is (theory goes Linux should just "automagically" be able to work with it; but sadly I can't verify this short of getting Linux to boot and work on it unmodified).
>>> But what we really want is an mmu that's a good fit for Linux. Note that
>>> Linus has strong opinions about this sort of thing:
>>> And apparently wrote his master's thesis on it:
>>> And so did Mel Gorman a few years later:
>>> That said stuff's changed a lot since then (we're up to what, 5 levels
>>> now?) so we'd like to take the time and get the design right.
>>> We'd also want to have a patch ready to push into qemu to support the
>>> new design if it's not the existing one...
>>> So that's the current state of that work.
>> the number of page-table levels mostly has to do with the address-space and physical-address size.
>> for a 32-bit arch, 2 or 3 should be sufficient (3 if using 64-bit PTE's, as in PAE).
>> for a 64-bit arch, could need up to 6 levels (for a full 64-bit address space with 4kB pages).
>> J-core mailing list
>> J-core at lists.j-core.org
More information about the J-core