[J-core] PC-relative loads and delay slots

Mon Jul 18 23:23:25 EDT 2016

On Mon, Jul 18, 2016 at 09:52:16AM -0400, Rich Felker wrote:
> On Mon, Jul 18, 2016 at 02:37:55AM -0700, Robert Ou wrote:
> > Hi,
> > 
> > What is the correct behavior of PC-relative instructions such as
> > "mov.l @(disp, PC), Rn" in a branch delay slot? Is this even allowed?
> > From my testing, GAS seems to think it is "disp is multiplied by 4 and
> > added to the address of the mov.l opcode + 2" but J-core seems to
> > execute it as "disp is multiplied by 4 and added to the address of the
> > branch target + 2". I discovered this while working on my MyHDL
> > demonstration, and you can compare the difference in my demonstration
> > by running the master branch and the branch_delay_test branch.
> 
> If true I think it's a bug. The original SH ISA documentation
> specifies the behavior for "PC-relative" mov instructions as:
> 
>   "The PC points to the starting address of the second instruction
>   after this MOV instruction"
> 
> (as opposed to the actual current value of the program counter). This
> text is found on page 202 of document REJ09B0171-0500O.
> 
> I'm quite surprised we haven't run into this bug, since I would expect
> gcc to generate code with immediate loads in branch delay slots (e.g.
> when making function calls with constant arguments).

Some further info:

mova is documented to produce a result relative to the branch
destination, but pc-relative mov.l seems to be documented to behave as
I described above. However, this is only valid for sh1/2/3. On sh4,
both mova and pc-relative mov.l (and mov.w) are illegal in delay slots
and result in a trap (so the kernel can emulate them very slowly if
you really want them). This is presumably why gcc never generates
the pc-relative mov.l in delay slots and thus why the bug has never
affected us.

Rich