[J-core] Meetup in San Francisco
BGB
cr88192 at gmail.com
Mon Sep 19 04:40:20 EDT 2016
On 9/18/2016 9:04 PM, Rob Landley wrote:
> On 09/18/2016 12:14 PM, BGB wrote:
>> OTOH: I am still left half-wondering if anyone would care if I tried to
>> throw together a basic RTOS with SH as a target (although it would be a
>> lot of effort, half tempted to try to do a mock up of something sort-of
>> like QNX).
> I've vaguely wondered for a while about porting xv6 to sh.
>
> https://pdos.csail.mit.edu/6.828/2016/xv6.html
>
> If MIT is going to have a modern operating systems course based on that,
> then we should have an equivalent processor design course that
> interlocks with it. :)
will have to look some into that.
>> partial motivation is partly the levels of pain I need to go through to
>> get RT stuff to work sort-of ok on Linux, and things like FreeRTOS
>> lacking network and filesystem (and, I have previously implemented a lot
>> of the same mechanisms as parts of previous projects).
> In theory Google Fuchsia is a realtime system. In practice most things
> start out as realtime systems because there's a very primitive scheduler
> and no background processes so why NOT make it realtime. (If it grows
> into something usable, that tends to fall by the wayside.)
yeah.
a lot is also how one defines "real-time".
how the RT features in Linux seem to define it, as does a lot of other
things define it, is basically that if something is scheduled to happen
at a certain time, then the scheduler will make a best effort to
schedule it at that time (and they basically get really big about how
guaranteed it is to be scheduled on-time).
the issue, however, is on the matter of time-scale. if it can very
reliably deliver one-shot events within a time-scale of 250-500us, that
is something. if 250-500us isn't precise enough, then there is a
problem, and generally 250us (or +/- 125us) seems to be about the lower
bound for the Linux scheduler.
but, for what I am often doing, where I might need RT, this variant
isn't really all that helpful.
more often, I am doing involves various forms of "bit banging" (driving
serial signals, driving servo PWM, ...) from GPIO pins. in this case,
the events tend to be a bit more rapid fire (driving IO pins at 30-60
kHz isn't particularly uncommon).
so, in effect, one needs to build a second layer of scheduler, and get
fancy about when to sleep, ... to try to minimize Linux causing the
occasional random 250-380us delay (and try to schedule things so, if
Linux does cause a delay, it is preferably not at a bad time).
experimentally, on the hardware I was testing on (a 700MHz ARM11), I was
able to push my scheduler to running events at around 1 MHz (could have
possibly been more if my code was better, but I was also at the limits
of the 1 MHz hardware-provided clock-register, 1).
1: pretty essential for these sort of things is having some form of
high-precision clock, where ARM has a memory-mapped 1 MHz clock
register. I have not seen it listed, but I will probably request that
J2/J4 have something similar. note that it need not exactly match wall
time and is allowed to periodically overflow/wrap, just needs to have a
reasonably high precision. ( FWIW: a clock-cycle counter can serve a
similar purpose, assuming it updates at a reasonably constant frequency. )
there are a few problems here (with high-speed scheduling):
Linux drives its scheduler off of timer interrupts, and (typically)
seems to set the PIT to run at ~ 8 kHz or so;
context switches are by no means free (it seems unlikely context
switches could achieve these speeds).
I suspect, pushing an OS scheduler up to the desired speeds would
require somewhat rethinking how the scheduling works.
a few thoughts:
RT and non-RT parts of the OS are kept separate;
non-RT parts of the OS are effectively suspended whenever
higher-priority RT events are active;
RT stuff likely uses a single large address space and
cooperative/event-driven scheduling.
in effect, sort of like 16-bit Windows, with RT code mostly
running via callbacks.
nothing in this mode is allowed to block or run
indeterminate-length (2) loops;
APIs are similarly non-blocking and callback driven. if an API call may
not complete immediately, it can return a handle and set itself up to
try to run asynchronously. when the task completes, it invokes a
user-supplied callback. alternately, a special compiler/ABI could be
used, but I am imagining an idea here which could be done with a more
generic compiler.
sequential operations (with API calls) will generally be structured as a
sequence of events (without specific times given), where each event
queues up the next event.
the scheduler effectively works by pulling items from the queue,
examining them, and either executing them or re-adding them to the queue
(often at the end). different event classes generally go in different
queues (ex: one knows that there are no time critical events by noting
that the associated queue is empty).
2: something like:
for(i=0; i<4; i++)
TrivialTask();
is probably ok, but:
for(i=0; i<nItems; i++)
NonTrivialTask();
is probably not ok.
instead, generally, each loop iteration would be scheduled (at lower
priorities), and then would reschedule the event for the next item. this
may seem inefficient, but things like avoiding loops/recursion/... makes
it a lot easier to determine that code reliably runs within the time limit.
in my ideas, I thoughts up that there could be several classes of code:
RT: code which actually uses the high-speed scheduler
example: logic for bit-bashing IO pins
needs to be mapped across all address spaces (so can be run
without context switches).
may never be paged out, likely to have size constraints.
likely used in a way more comparable to a kernel module
(and run from kernel space).
RTA (Real Time Aware): code which doesn't stop running in RT mode,
but doesn't itself need RT.
example: filesystem and network could go here.
likely only allowing IO to already open files/sockets in RT
mode.
this simplifies things by avoiding lookups / buffer
allocation / ...
these would need to work within the confines of the high-speed
scheduler.
these will all use a shared address space (potentially disjoint
from RT, or potentially also in kernel space).
non-RTA:
traditional Unix-style processes (and a POSIX style API);
would have access to full APIs.
effectively, these would be temporarily suspended whenever RT
code runs.
these would normally use preemptive threading.
a non-RTA process will never take priority over real-time code.
stuff that needs to run during RT mode will be written as
RTA code in the kernel.
which mode the OS would be operating would depends on whether any events
scheduled to run at a specific time are in the scheduled list and within
a given target tick (say, if the scheduler sees that it needs to do
something timing sensitive within around 250us).
note that for prior VM's I would run in this sort of scheduler,
effectively I load and "pre-warm" the VM image in advance (ex: decode
functions into their threaded-code forms, ...), so that nothing fancy
needs to be done at runtime (so that hopefully nothing is capable of
going over the limit). note that this VM lacked any sort of dynamic
lookups, so these were not a problem (matters of scope and object layout
and similar were determined in advance, ...).
AFAICT, none of the freely available RTOS's do all of this (though there
is an option which uses POSIX APIs but only runs a single process at a
time), but some stuff I have read implies that this sort of approach may
basically be possible with QNX and VxWorks (though these are also
proprietary OS's, and I have no real first-hand experience with them).
or such...
More information about the J-core
mailing list