[J-core] Jcore mailing list and tutrle board

Sat Jul 8 17:17:57 EDT 2017

On 07/06/2017 03:12 PM, David Summers wrote:
> Rob,
> 
> I'll just reply to the bits that aren't covered in conversation with
> Jeff ...
> 
> On 05/07/17 21:58, Rob Landley wrote:
>>
>>> Alas here at work, we go for the high end stuff; and often anti fuse
>>> FPGA - such is the space industry...
>> We talked about adding SECDED to our DRAM controller last year, but
>> current customers don't need it. And ITAR means all the US space guys
>> are read-only consumers of open source and thus developers in Canada and
>> Japan never hear from them. Oh well.
>>
>> http://www.thespacereview.com/article/528/1
>> https://en.wikipedia.org/wiki/International_Traffic_in_Arms_Regulations#Effects_on_the_U.S._space_industry
>>
>
> Well ITAR is your side of the pond, not ours!

Ah, not working for the _US_ space program. That explains it.

(In the 1990's a bunch of cryptographers voluntarily gave up their US
citizenship to keep working on cryptography. And they they inflicted
that nonsense on the space program just as the cryptographers got out
from under it.)

> My undesratdning though is
> that ITAR is typicaly attached to devices, and not alogoriths. So if we
> take a SECDED algorthm such as Hamming[8,4], I think its just the chip
> that impliments the algrothm that gets ITAR - and not the Hamming[8,4]
> algorithm.

It rates software as a munition. A software implementation is a device
according to that nonsense.

> Whats kind of interesting, is ITAR is one of the motivations behind
> ESA/Gaisler to do a VHDL of SPARC-V8 ISA. Here in europe we had
> relativlt few fault toleant CPUs that were avaiable, ERC32, and there
> was one other whose name slips me now.

We mentioned in our presentation that they looked at Leon Sparc before
doing j-core, and used that project's "two process" method of VHDL
design in doing j-core.

The problem is sparc's ISA requires way too much memory bus bandwidth.

> Now fault tollerant cpu can typically survive one bit flip, and correct
> it.

SECDED: Single Error Correction, Dual Error Detection. (Published by a
Bell labs guy in 1950, totally out from patent and they can't
retroactively classify it either.)

We were looking at adding it to our dram controller because we were
redoing the dram controller at the time. (New one's twice as fast.)
Other bits of the chip would need something similar, but usually they
just have multiple instances run the same code and compare the results
and achieve a quorum.

> It means its technology that can be used in ICBMs - so fault
> tollerant cpus from US typically do have ITAR. and ITAR is far more
> paper work than is fun going through.

An ICBM is in space for what, 15 minutes? And the "ballistic" part of it
points out that at least initially they were basically chucking a big
rock and calculating where it would come down based on physics. (A nuke
doesn't have to be _that_ precise.)

A satellite is in space for _years_. Far more interesting problem. Japan
has a nice space program, pity Elon Musk came in offered to launch stuff
at a loss (half the price of any other bid; his salesbeing said that
before the other people had _made_ their bid) to drive out

> So ESA developed the LEON, the SPARC-V8 open CPU design; Gaisler
> continued the development, to several fault tollerant designs (majority
> voting flip flops etc). This means we have reasonable poerful flight
> CPUs avaible here in Europe.

All for it. As I said, we looked at that before doing j-core. The
problem wasn't the chip design, it was the inefficiency of the sparc
instruction set. (Your memory bus and l1 cache are generally bottlenecks
at the best of times. Being able to fit twice as much code in each cache
line is a _big_win_.)

> But there it took space flight to develop an open CPU; so it amuses me
> that at least some of your motivation is CPU for synchrophasors. So two
> open cpu designs, but with motivation from opposite ends of the
> applications!

Moore's Law's brought the price of FPGAs and development workstations
down. You can squeeze our design into a dinky $50 off the shelf fpga
board (Numato Mimas v2) and we're making a realy _nice_ FPGA board that
can handle an SMP instance of our SOC for about twice that (if that darn
hardware order ever unblocks, grrr). The 8x SMP laptop with 16 gigs ram
I run my copy of the VHDL tools on was something like $1300.

This was very much _not_ the case 15 years ago. Your entry level systems
on both sides were multiple thousands of dollars. Not exactly hobbyist
territory.

>> ..
>> Last I tried using OpenOCD to flash something on a regular basis (the
>> Tin Can Tools nail board), using the fully open source drivers was
>> _really_slow_. I mean amazingly slow. There was a binary-only module you
>> could use to speed it up, but even then it was still pretty darn slow.
>>
>> *shrug* Maybe it's improved since then...
>
> well openocd is 10 years or so old.

And has not advanced that I've noticed.

> However yes its iterface is
> horrible. So last time I used it (few years ago) it wasn't nice.

If even you haven't used it in years, why are you advocating its use here?

> Speed, well jtag its the prgrammer that sets the speed. need to make
> sure its not too fast, as the device may not cope. But this means the
> default speed in openocd is often quiet slow, as then you can check you
> can talk to the device. Once you can talk though - you can up the speed,
> at least until things start falling over ...

The atmel programs the thing pretty snappily.

And the setup is 100% portable: I'm often out with a battery powered
netbook, turtle board powered by USB cable (I have one of the
prototypes), and a USB SD card programmer (because the one built into my
netbook filled up with cat hair over the years). It's a much bigger ask
to set up a jtag connector in a coffee shop, let alone fit it in my
messenger bag.

>>>> C) Requiring openocd setup as a hard prerequisite would probably
>>>> eliminate about 2/3 of our userbase. (That's why things like Numato
>>>> don't. Nor does our in-house EVB because customers wouldn't do it.)
>>> Ah - must admit I started using it becuase it is open! So anyone can
>>> download it. Horrible interface though, but then again when doing JTAG -
>>> one shouldn't expect it to be easy ;)

Open source things often have horrible user interfaces. It's a
structural problem with the open source development model. I've
explained why in talks ala:

  https://www.youtube.com/watch?v=SGmtP5Lg_t0#t=11m30s

That said, some open source UI is worse than others. Things with a large
userbase of people who also program tend to get enough feedback to at
least file the rough edges off. Things requiring significant domain
expertise that _doesn't_ necessarily overlap with software application
development... The gimp is very much not photoshop. Crypto code is
eldrich and creaky because normal programmers introduce side channel
attacks if we touch anything. The git's development has been dominated
by kernel programmers who should not be allowed anywhere near userspace.
And with openocd you need to be a hardware dev _and_ a programmer to
poke at this code.

>> We don't expect jtag to be easy. We do expect using our board to be
>> easy. That's why we made an easy way to reflash the board.
> Understood. Yes different user case.

Not necessarily. It would be great if we could level people up into jtag
development. We just refuse to have it as a prerequisite.

The simple path we have mapped out for turtle (which requires me to redo
the website but I _plan_ to when the darn hardware shows up) is:

1) Download a j-core bitstream binary.

There's a whole sidequest about building it and digging down into the
VHDL which is its own page and you can go down that rathole at your
leisure (with "VHDL toolchain install" page, "jcore build system
walkthrough" page, and "learning to program in VHDL with the 2-process
method and our coding style" pages.)

2) Download a vmlinux binary with built-in initramfs. (More sidequest
pages about installing or building cross compiler toolchain, building
userspace and kernel from source, explanations of nommu programming and
ELF vs fdpic and such, maybe some stuff on qemu...)

3) Boot to Linux shell prompt.

4) Build your own binary, copy it to the board, and run it.

The jtag stuff would be yet another sidequest. There should be pages on
it, but it's not in the critical path to start _using_ the thing. It's
more or less in the "fun with GPIO pins" or "how to use gdbserver" pile.

>>> Suprised though that not used in-house. Here its the first thing we go
>>> to - means that a chip that is otherwise dead, can be reprogrammed.
>>
>> We have an atmel boot processor that can reprogram a chip that is
>> otherwise dead. Built in.
>>
>> Keep in mind we have _3_ levels of jtag todo items. There's "jtag talks
>> to board hardware at xilinx/flash level", "jtag talks to j-core SOC",
>> and "gdb can single-step our processor using upstream vanilla gdb
>> source".
>
> Yes - and its understanding the different levels that is needed.

There's a dependency tree here. It's hard enough to explain to people
the different layers that are already there.

Installing your app means modifying the root filesystem (which can have
dependencies on what's installed in the root filesystem, but also root
filesystem format: we're using external cpio archives to load the
initramfs so they can be swapped out without replacing the kernel, but
you still can't incrementally modify them EXCEPT in the most recent
toybox release I changed the cpio generator
(https://github.com/landley/toybox/commit/32550751997d) to not add the
TRAILER!!! entry, so you can concatenate the uncompressed cpio archives
and thus yes you _can_ append to an existing one but that's recent.

In any case it's easier to stick your new code in the vfat partition if
you want it to persist (modulo naming weirdness). So you need to mount
the vfat partition of the sdcard somewhere you know about it.

Those two filesystems live on top of the kernel; your kernel config has
to support them, your kernel command line may need stuff, your init
scripts need to call the right setup stuff, your kernel command line and
bootloader setup need to point to the right stuff... we provide an
example, but it's nice to understand what you're doing here (and that's
not even talking about _modifying_ the kernel and/or understanding the
relevant kernel source).

Building this requires a toolchain! With a C library. We're using
musl-cross-make for that but there should be a walkthrough explaining
now that works at some point. And of course cross compiling sucks:

  http://landley.net/writing/docs/cross-compiling.html

If you build an app you'll currently need to cross compile it, which is
fun. We have a todo item to get codelite working, but I only do Linux
and they want it to work from macos and windows as well, which means
every time I sit down to poke at it there are these two giant unknowns
and what seems trivial goes back on the todo list because of that. Any
time there's a "I could do this simple thing..." "Oh but that's useless
because I want much bigger thing". "The perfect is the enemy of the
good, the good has been killed, moving on." And thus I have yet to learn
nontrivial uses of codelite. It's on the todo list...

Oh, and there's how to program for nommu systems so you _can_ write your
app (http://nommu.org has a bit of that, I need to push that into the
Linux Documentation directory, and write an fdpic document, and write up
warnings about how Rich Felker insists on breaking nommu on musl-libc
and how to patch his source to work around that when building your
toolchain...)

All of this needs an "installing Linux stuff onto the board" layer;
network mount, ftp server, wget, popping out the sdcard and copying
files onto it, etc. (Popping out the sdcard gives you an OS-level
"unbrick the board" option if you try to boot an unusable vmlinux.)

Now we dig down to the bitstream layer. Let's skip the "building it"
part and the "developing it" part, and just talk about the "installing
it" part. On power on, the atmel chip loads the FPGA from the spi flash,
so to update you write the bitstream in the spi flash and reboot. In
theory we could reflash from within Linux (older systems forced that
read-only by holding a line high, but the current one changed that. I
don't remember if it's in the 50 mhz release, but it should be in the
62.5 mhz one), but if that gets interrupted it can't load a good SOC
image to run vmlinux to get to the point where you try again. I.E.
brickable.

We already need the atmel chip to load the fpga image from the SPI flash
at boot time, so we also taught it to _write_ the SPI flash. It already
needs to be there and be able to talk to the spi flash, this is a minor
repurposing of existing hardware.

In theory, you can also do this flash writing with a jtag, but nothing
up to this point has required the jtag to exist, and you still need the
atmel to _read_ the spi flash. We added the

You can also use a jtag for debugging, but there's like 5 other ways to
do that. We've occasionally pulled out one of the inner logic states and
run it to a GPIO pin or an LED, which is the hardware version of
sticking a printf into the code. We also have a simulator version (ghdl
or nvc) where we CAN stick a printf into the code.

So getting back to the "many layers" thing, when you fire up the jtag,
what guarantees the FPGA has been programmed? Talking to the spi flash
when the FPGA isn't running yet, vs talking to the j-core SOC when it's
enabled but not running anything yet (kernel or bootloader), vs using
the gdb protocol (ala qemu -s) to "debug" the processor, vs using that
same gdb protocol to talk to the Linux kernel (kgdb) or a userspace
process (gdbserver) without needing a jtag...

If you're wondering why this is nowhere near the top of our todo list,
it's because there's literally dozens of individual things higher up the
list just in terms of writing documentation for j-core.org to explain to
people how to do this thing, and where it fits in Giant Dependency Tree.

> That is both the stength of jtag, but also what makes it a nightmare to
> use.

If you look at the different layers in context, the jtag isn't really
necessary. If you've got an FPGA and/or software simulation of your VHDL
soc, you don't need the jtag to view/control the internal state. (Heck,
we can run a control line out to the GPIO pins to stop/start the
processor clock, and if we use two pins we can trivially single step it.
We just haven't bothered yet.)

P.S. in theory our bootloader has a GDB stub talking the gdbserver
protocol. I've never bothered to use it, but there are multiple
solutions to a bunch of these problems, which we've used at various
points as we needed them. There's been basically working jtag support in
our SOC since late 2014, but it didn't turn out to be something we
really needed. (I vaguely recall when we left off there was a single
stepping bug, which didn't always stop at the right place. I don't
remember if we've bothered to fix that yet because nobody's needed to
use it.)

>> I think the main problem is everybody tries to do DRAM init
>> _through_the_jtag_, which is black magic at a level you really don't
>> want to get on you if you can avoid it. Boards with buckets of SRAM are
>> easier to bring up, but then you need dedicated multi-stage bootloaders
>> with relocation code and you've basically added a second layer of mess...
> DRAM init I agree isn't fun! Its why computers have BIOS, so the BIOS
> does the hard stuff like DRAM init.
> 
> At work, we had one board we couldn't get working with dram, so it had a
> *whole* board of SRAM - just so we could get it to come up ...

That ice40 stuff we were trying to get nvc to target to run j1 on is all
sram. (When we say 'arduino country' that implies no dram controller.)
And a lot of chips have a bunch of onboard sram even when they do have
dram, so you can do your dram init from software. (Or NOR flash if you
don't care about speed.) Heck, for years the reason you couldn't run
u-boot under qemu is they refused to let you configure it NOT to do DRAM
init, and although people eventually hacked their way around Wolfgang's
persistent refusal/incomprehension:

https://balau82.wordpress.com/2010/04/12/booting-linux-with-u-boot-on-qemu-arm/

They still updated the u-boot FAQ entry saying "no, clearly not doing
this is impossible because reasons, what's a QEMU?"

https://www.denx.de/wiki/DULG/CanUBootBeConfiguredSuchThatItCanBeStartedInRAM

As for j-core, we got dram init working long ago (there was an on-chip
xilinx dram controller, then we wrote our own VHDL dram controller, then
we wrote a _better_ one to replace the firs tone). So sure our designs
have dram. Of course using it requires instantiating the j-core SOC on
the board and running the boot ROM, which once again means your jtag is
only useful in certain contexts unless you want to do a whole lot more
work making it useful via duplicate infrastructure...

Once again: jtag's totally a side issue here.

> Anyway JTAG is a bit like using a puppet, and you try kniting using a
> puppet!

I'm aware of the difficulty, yes. :)

Rob