[J-core] Hi - more info on Super-H advantages

BGB cr88192 at gmail.com
Thu Jan 11 16:58:51 EST 2018


(outsider perspective / thoughts; my efforts are independent from those 
of the J-Core project, so sentiments may differ).

On 1/11/2018 5:06 AM, Dru Nelson wrote:
>
> Hi,
>
> I saw a video of the announcement of the j-core project a while back.
> The project impressed me, and I made a note to learn more.
>
> Without that presentation, I would not have known about the advantages 
> of the SH architecture.
> For example, I would not have known about the code-density of the SH.
> I would have just assumed it was like all the other RISCs. (32 bit 
> instructions)
>

luckily, code density is pretty good if compared with 32-bit RISCs.

but, it comes with a drawback in frequently needing to execute longer 
sequences of instructions for many operations, which partly works as 
counter-balance to the shorter instructions in some cases.


in my own testing, it is loosely comparable to x86 (but varies a fair 
bit by compiler and other factors though; and comparison requires 
compensating for the relative sizes of the C libraries and similar).

Thumb2 is a fair bit more formidable in code density though, and 
generally gives a better performance relative to the total number of 
instructions executed (this is a weak area for the basic SH ISA vs 
Thumb2 or RISC-V's RVC coding or similar).


experimentally, it is possible to improve the ISA on both factors 
(making it both smaller and getting more work done in fewer 
instructions), but comes at the cost of extending it using 
variable-length instructions (opcode words are either 16 bits, or a 
32-bit word pair). still haven't quite matched Thumb2's density yet, but 
am "getting closer" (now seems to be within about 7% or so; vs about 
worse 17% for the baseline SH4 ISA vs Thumb2).

though, Thumb has a few cheap-shot instructions (like PUSH/POP 
multiple). an approximation (which still works with plain SH) is 
basically to call or branch to previous prologs and epilogs (just this 
trick saved ~4% off the size). this works mostly because these sequences 
tend to be fairly repetitive (so, very often, there is a match within 
branch range). granted, there is a slight cost associated with the 
branches or calls (but, avoids the hair that would be needed to 
implement something like this in hardware).

but, in general, I guess it is kind of in a similar category to the 
other ISA's with 16/32 instruction coding, namely Thumb2 and RVC 
(Compressed RISC-V). though differs in "escaping" to the wider forms, 
rather than being based on "compressing" an initially wider ISA; leading 
to a different design aesthetic.

decided mostly to leave out specifics for 32 vs 64-bit ISA variants 
(as-is there are several 64b variants, still TBD which will be 
"canonical"), but currently I am still managing to produce slightly 
smaller code with a 64-bit extended ISA than GCC manages to produce with 
SH4 (with "-Os" and strip); so probably not doing too terrible.


in my case, the project to do an FPGA implementation of these is still 
ongoing, but is going slowly; basically, the amount of work required to 
make something work plausibly, and synthesize with a plausible resource 
cost; is a fair bit harder than what one may experience writing code in 
C or similar (combined with my relative inexperience here, and being new 
to CPU design, ..., is making this project go a bit slowly).

interestingly, it isn't really the instruction decoder which is the hard 
part in this case, rather it is mostly the memory/cache subsystem and 
similar (trying to make it all work and keep synthesis from eating the 
FPGA; ...).

then there are a few cases of things which seem simple enough, but have 
a fairly high synthesis cost (for example, doing multiplies or shifts 
with large quantities, etc...).


> I did a little digging, but there is not a lot of info out there.
>
> Given your access to the original designers (or not), what are the 
> other clever design choices that this CPU pioneered?
>

can't answer this directly, but what I liked about the ISA is that it 
sort of looked like a 32-bit version of the MSP430, which was initially 
impressive with how effective it could be with such a small and simple ISA.

likewise, unlike Thumb, it is unencumbered, and not so much of an ugly 
bit-twiddly mess.

in a way, it can be contrast with x86, which effectively has thousands 
of instruction forms for many hundreds of mnemonics... (and I actually 
have little idea what a hardware-based x86 decoder would look like).



>
> BTW, I also perused the programmers guide, and one thing struck me.
> The assembly code for the SH looks a bit like 68K assembly. They even hint
> at this with mentions of "Other CPU" in the beginning of the document.
> The fact that it is 16 registers and uses a 16 bit instruction format 
> furthers that notion.
> Yet, I couldn't find a mention of that anywhere.
>

I sort of noticed before that both sort of look like the PDP-11 in some 
ways.

AFAICT: M68K sort of grew out of the PDP design, but in a more CISCy 
direction (and was more free-form variable-length).

others, like SH and MSP430 also borrowed some design elements, but went 
in a more RISCy direction.


there are differences, for example MSP430 had "@PC+" constant-loading, 
which I decided against as it "just wasn't worth it" (has both the 
drawbacks as large instruction form and a memory load at the same time).

things like "load-shift chains" and "combiner ops" can achieve similar 
ends inline without needing either memory loads or overly long 
instruction forms or similar (and can offer a more plausible alternative 
to PC-relative constant loads and similar), and also have the advantage 
of being more flexible.


or such...



More information about the J-core mailing list