[J-core] Meetup in San Francisco

Mon Sep 19 04:40:20 EDT 2016

On 9/18/2016 9:04 PM, Rob Landley wrote:
> On 09/18/2016 12:14 PM, BGB wrote:
>> OTOH: I am still left half-wondering if anyone would care if I tried to
>> throw together a basic RTOS with SH as a target (although it would be a
>> lot of effort, half tempted to try to do a mock up of something sort-of
>> like QNX).
> I've vaguely wondered for a while about porting xv6 to sh.
>
>    https://pdos.csail.mit.edu/6.828/2016/xv6.html
>
> If MIT is going to have a modern operating systems course based on that,
> then we should have an equivalent processor design course that
> interlocks with it. :)

will have to look some into that.

>> partial motivation is partly the levels of pain I need to go through to
>> get RT stuff to work sort-of ok on Linux, and things like FreeRTOS
>> lacking network and filesystem (and, I have previously implemented a lot
>> of the same mechanisms as parts of previous projects).
> In theory Google Fuchsia is a realtime system. In practice most things
> start out as realtime systems because there's a very primitive scheduler
> and no background processes so why NOT make it realtime. (If it grows
> into something usable, that tends to fall by the wayside.)

yeah.

a lot is also how one defines "real-time".

how the RT features in Linux seem to define it, as does a lot of other 
things define it, is basically that if something is scheduled to happen 
at a certain time, then the scheduler will make a best effort to 
schedule it at that time (and they basically get really big about how 
guaranteed it is to be scheduled on-time).

the issue, however, is on the matter of time-scale. if it can very 
reliably deliver one-shot events within a time-scale of 250-500us, that 
is something. if 250-500us isn't precise enough, then there is a 
problem, and generally 250us (or +/- 125us) seems to be about the lower 
bound for the Linux scheduler.

but, for what I am often doing, where I might need RT, this variant 
isn't really all that helpful.

more often, I am doing involves various forms of "bit banging" (driving 
serial signals, driving servo PWM, ...) from GPIO pins. in this case, 
the events tend to be a bit more rapid fire (driving IO pins at 30-60 
kHz isn't particularly uncommon).

so, in effect, one needs to build a second layer of scheduler, and get 
fancy about when to sleep, ... to try to minimize Linux causing the 
occasional random 250-380us delay (and try to schedule things so, if 
Linux does cause a delay, it is preferably not at a bad time).

experimentally, on the hardware I was testing on (a 700MHz ARM11), I was 
able to push my scheduler to running events at around 1 MHz (could have 
possibly been more if my code was better, but I was also at the limits 
of the 1 MHz hardware-provided clock-register, 1).

1: pretty essential for these sort of things is having some form of 
high-precision clock, where ARM has a memory-mapped 1 MHz clock 
register. I have not seen it listed, but I will probably request that 
J2/J4 have something similar. note that it need not exactly match wall 
time and is allowed to periodically overflow/wrap, just needs to have a 
reasonably high precision. ( FWIW: a clock-cycle counter can serve a 
similar purpose, assuming it updates at a reasonably constant frequency. )

there are a few problems here (with high-speed scheduling):
Linux drives its scheduler off of timer interrupts, and (typically) 
seems to set the PIT to run at ~ 8 kHz or so;
context switches are by no means free (it seems unlikely context 
switches could achieve these speeds).

I suspect, pushing an OS scheduler up to the desired speeds would 
require somewhat rethinking how the scheduling works.

a few thoughts:
     RT and non-RT parts of the OS are kept separate;
         non-RT parts of the OS are effectively suspended whenever 
higher-priority RT events are active;
     RT stuff likely uses a single large address space and 
cooperative/event-driven scheduling.
         in effect, sort of like 16-bit Windows, with RT code mostly 
running via callbacks.
         nothing in this mode is allowed to block or run 
indeterminate-length (2) loops;

APIs are similarly non-blocking and callback driven. if an API call may 
not complete immediately, it can return a handle and set itself up to 
try to run asynchronously. when the task completes, it invokes a 
user-supplied callback. alternately, a special compiler/ABI could be 
used, but I am imagining an idea here which could be done with a more 
generic compiler.

sequential operations (with API calls) will generally be structured as a 
sequence of events (without specific times given), where each event 
queues up the next event.
the scheduler effectively works by pulling items from the queue, 
examining them, and either executing them or re-adding them to the queue 
(often at the end). different event classes generally go in different 
queues (ex: one knows that there are no time critical events by noting 
that the associated queue is empty).

2: something like:
     for(i=0; i<4; i++)
         TrivialTask();
is probably ok, but:
     for(i=0; i<nItems; i++)
         NonTrivialTask();
is probably not ok.

instead, generally, each loop iteration would be scheduled (at lower 
priorities), and then would reschedule the event for the next item. this 
may seem inefficient, but things like avoiding loops/recursion/... makes 
it a lot easier to determine that code reliably runs within the time limit.

in my ideas, I thoughts up that there could be several classes of code:
     RT: code which actually uses the high-speed scheduler
         example: logic for bit-bashing IO pins
         needs to be mapped across all address spaces (so can be run 
without context switches).
             may never be paged out, likely to have size constraints.
             likely used in a way more comparable to a kernel module 
(and run from kernel space).
     RTA (Real Time Aware): code which doesn't stop running in RT mode, 
but doesn't itself need RT.
         example: filesystem and network could go here.
             likely only allowing IO to already open files/sockets in RT 
mode.
             this simplifies things by avoiding lookups / buffer 
allocation / ...
         these would need to work within the confines of the high-speed 
scheduler.
         these will all use a shared address space (potentially disjoint 
from RT, or potentially also in kernel space).
     non-RTA:
         traditional Unix-style processes (and a POSIX style API);
             would have access to full APIs.
         effectively, these would be temporarily suspended whenever RT 
code runs.
         these would normally use preemptive threading.
             a non-RTA process will never take priority over real-time code.
             stuff that needs to run during RT mode will be written as 
RTA code in the kernel.

which mode the OS would be operating would depends on whether any events 
scheduled to run at a specific time are in the scheduled list and within 
a given target tick (say, if the scheduler sees that it needs to do 
something timing sensitive within around 250us).

note that for prior VM's I would run in this sort of scheduler, 
effectively I load and "pre-warm" the VM image in advance (ex: decode 
functions into their threaded-code forms, ...), so that nothing fancy 
needs to be done at runtime (so that hopefully nothing is capable of 
going over the limit). note that this VM lacked any sort of dynamic 
lookups, so these were not a problem (matters of scope and object layout 
and similar were determined in advance, ...).

AFAICT, none of the freely available RTOS's do all of this (though there 
is an option which uses POSIX APIs but only runs a single process at a 
time), but some stuff I have read implies that this sort of approach may 
basically be possible with QNX and VxWorks (though these are also 
proprietary OS's, and I have no real first-hand experience with them).

or such...