[J-core] working on SH-2 emulator.

BGB cr88192 at gmail.com
Wed Sep 7 17:02:41 EDT 2016

On 9/7/2016 1:20 PM, Rich Felker wrote:
> On Tue, Sep 06, 2016 at 02:04:48AM -0500, Rob Landley wrote:
>> On 09/06/2016 12:53 AM, D. Jeff Dionne wrote:
>>> On Sep 6, 2016, at 2:46 PM, BGB <cr88192 at gmail.com> wrote:
>>> Oh, excellent.  This means you have passed all the CPU tests, and dropped into
>>> a GDB stub.  Did you implement and test the J2 CAS instruction, or
>> patch it out?
>> CAS an the two bit-shifts from sh3. (All mentioned on http://j-core.org
>> but I really should have better docs.)
>> Alas, our arch/sh/configs/j2_defconfig does not appear to have
>> EARLY_PRINTK set, most likely because we haven't got an early printk
>> driver for our serial device. So you've got to get pretty far along in
>> the kernel boot before it dumps the printk buffer out to the serial device.
>> (I remember Rich poking at that area before, but I don't remember how it
>> turned out. Possibly it's just not enabled in the config?)
> EARLY_PRINTK is deprecated and requires arch-specific hooks. The
> modern replacement is EARLYCON, and these Kconfig settings:
> CONFIG_CMDLINE="console=ttyUL0 earlycon"
> plus an appropriate node in the device tree, like:
> 	chosen {
> 		stdout-path = "serial0";
> 	};
> where "serial0" is an alias assigning a name for the uartlite node,
> should make it work.
> It looks like I somehow omitted CONFIG_SERIAL_EARLYCON=y from
> j2_defconfig; I'll make a note to add it.


I still sort-of want a working Linux, but ran into a problem where 
during building it says:
     sh2elf-ld: target elf32-shbig-linux not found

granted, I may have been trying to use the wrong compiler for this, but 
sh2eb doesn't build...

grep only finds one occurrence of this (in a ".S" file), but commenting 
it out doesn't work.

trying to set to little endian (just to see if it builds):
     sh2elf-ld: target elf32-sh-linux not found

this was well after the point where the main aboriginal build process 
blows up, and I was trying to get the kernel built more manually.

on-off battling with this for several days now has left me a bit 

trying to get the Linux kernel built is proving somewhat more 
frustrating than it was to pull an SH-2 emulator out of thin air. I 
think maybe this says something...

in other news, I am considering adding a subset of the FPU and MMU 
facilities to my emulator.
for reference, I would want to be able to look at the SH-3 MMU and see 
how it compares with the SH-4 MMU, but a spec for SH-3 is proving elusive.

I am not sure what the standing is for the 32-bit SH-DSP/SH-2A 
instructions (ex: are these ones "safe").

FPU would probably be a scalar only subset (no vector ops for now).

MMU would probably be a hack using an x86-like design. the TTB register 
would be interpreted as holding a page-table, which would be interpreted 
similarly to how x86 and ARM page-tables work.

some specifics would need to be TBD, as this wouldn't map up exactly 
with how the SH4 does it.
in particular, the SH-4 TLB effectively uses 64-bit entries, whereas a 
normal page-table would need 32-bit entries.

skimming through Linux source: looks like they do it with 32-bit entries 
and a 2-level table.
     PTEH: set with address from PTE and a faked ASID;
     PTEL: just copies low-order bits from PTE.

if done this way, it is possible that this hacked MMU design could "just 
work" if running Linux, or another OS which does basically the same thing.

the main alternative though, is, correctly implementing the SH-4 MMU, at 
a likely performance cost.

some details would need to be decided WRT the handling of the 
trace-cache and SMC detection bitmaps in relation to the MMU. done 
naively, swapping page-tables or other things would also imply flushing 
the trace-cache and zeroing the SMC bitmap.

possible tweak could be (quietly) keying TTB with a separate internal 
context holding TLB and trace-cache state, so that changing address 
spaces doesn't necessarily flush the caches (but, instead swaps to a 
different set of caches).

however, this would be rendered ineffective (and detrimental to 
performance) if the number of active address spaces (non-sleeping 
processes) exceeds the number of caches (if done, would need to be 
fairly small to avoid potentially excessive memory use).

I am thinking I may also add a MOV.L constant-load optimization, if the 
source-address and instruction fall within the same page (the SMC 
handling mechanism being page-granular, so writing into a page in which 
code is executing triggers SMC handling, so effectively it comes "for 
free", in contrast to the case where the constant and instruction would 
fall into different pages).

or such...

More information about the J-core mailing list