[J-core] Sega saturn retrocomputing.

Tue Jul 19 16:44:08 EDT 2016

On 07/19/2016 08:55 AM, Christopher Friedt wrote:
> On Jul 18, 2016 11:51 PM, "Rob Landley" <rob at landley.net
> <mailto:rob at landley.net>> wrote:
>>
>> FYI:
>>
>> http://hackaday.com/2016/07/11/cracking-the-sega-saturn-after-20-years/
> 
> I think I sent a link on the ML to the YouTube video about a week ago.
> 
>> (Alas, when we looked at possibly doing an FPGA saturn clone with J2 as
>> a side project for the 20th anniversary, it turned out there were some
>> tight timing constraints that make games misbehave very easily, at least
>> in software emulation. I think some j2 instructions take different
>> numbers of clock cycles than sh2 did? Don't remember the details, but we
>> shelved the idea...)
> 
> I'm still interested in doing some Qemu work. Currently the SH core
> (even before J2 additions) is not cycle-accurate, which is a bit of a
> critical error made by the initial Qemu SH port.

No, qemu is never cycle accurate for anything it emulates. It's a
dynamic code translator, not a cycle accurate simulator. (Cycle accurate
simulators are 1/100th the speed of qemu. At a design level you can only
ever get cycle accuracy by slowing DOWN from the fastest you can do, and
caring about that level at all opens a can of worms of complexity that
costs you an order of magnitude of speed before that's even an issue in
the first place.)

There's entire papers on this, really:

http://web.eece.maine.edu/~vweaver/papers/wddd08/wddd08_workshop.pdf

This wasn't a "mistake", this was Fabrice bellard creating a new
category of emulator with an order of magnitude performance increase of
the previous best-of-breed stuff (like "bochs"). QEMU succeeded where
bochs became a footnote precisely BECAUSE it is not cycle accurate.

> Emulated peripherals are not likely to be cycle accurate,

If it's something like USB you can break out the packet protocol.
Otherwise you can break it down to the primitive operations (port I/O,
interrupts, and memory bus transactions). But otherwise it doesn't
emulate peripherals at the hardware level, it just turns a device driver
inside out and intercepts/responds to the primitive operations the
driver does.

> but more
> importantly, simultaneously running CPU's with tightly coupled timing
> requirements is almost guaranteed to be cycle inaccurate,

Which is why qemu -smp didn't use multiple threads for its first decade
(and for all I know still doesn't, but I'm a bit behind the times on
that). There was an entire presentation from the kvm guys about why kvm
could do that when qemu couldn't (oh, 5 years ago).

> as there is no
> efficient means of "clocking" them synchronously using Qemu's method of
> emulation (binary translated & cached blocks).

Because it doesn't operate at that level.

This is like saying "I have a compiled C binary with no source code, and
I need to insert a missing pair of curly brackets and a semicolon that
made an if/else staircase parse differently than I wanted, and optimize
out this entire chunk of code via dead code elimination. Where do I
insert the semicolon in this binary to re-enable this missing chunk of
code?"

That's not how it works. Wrong level. And the fact it's compiled instead
of interpreted a line at a time makes it noticeably faster.

Rob