[J-core] PC-relative loads and delay slots

Tue Jul 19 00:02:24 EDT 2016

On Jul 19, 2016, at 12:23 PM, Rich Felker <dalias at libc.org> wrote:
> On Mon, Jul 18, 2016 at 09:52:16AM -0400, Rich Felker wrote:
>> On Mon, Jul 18, 2016 at 02:37:55AM -0700, Robert Ou wrote:
>>> Hi,
>>> 
>>> What is the correct behavior of PC-relative instructions such as
>>> "mov.l @(disp, PC), Rn" in a branch delay slot? Is this even allowed?
>>> From my testing, GAS seems to think it is "disp is multiplied by 4 and
>>> added to the address of the mov.l opcode + 2" but J-core seems to
>>> execute it as "disp is multiplied by 4 and added to the address of the
>>> branch target + 2". I discovered this while working on my MyHDL
>>> demonstration, and you can compare the difference in my demonstration
>>> by running the master branch and the branch_delay_test branch.
>> 
>> If true I think it's a bug. The original SH ISA documentation
>> specifies the behavior for "PC-relative" mov instructions as:
>> 
>>  "The PC points to the starting address of the second instruction
>>  after this MOV instruction"
>> 
>> (as opposed to the actual current value of the program counter). This
>> text is found on page 202 of document REJ09B0171-0500O.
>> 
>> I'm quite surprised we haven't run into this bug, since I would expect
>> gcc to generate code with immediate loads in branch delay slots (e.g.
>> when making function calls with constant arguments).
> 
> Some further info:
> 
> mova is documented to produce a result relative to the branch
> destination, but pc-relative mov.l seems to be documented to behave as
> I described above.

After some internal discussion, I had a look at the SH3 manual, which a member/associate of that design team seems to remember had the same behaviour as the SH1/2:

REJ09B0317-0400 PG216
"When this MOV instruction is placed immediately after a delayed branch instruction, the PC points to an address specified by (the starting address of the branch destination) + 2.”

If I’m reading this correctly, the implementation of J-Core is correct, but violates the principle of least surprise.  It would be very inconvenient to have it point where intuitively it would, because that PC value actually doesn’t exist in the pipeline at the correct time...

> However, this is only valid for sh1/2/3. On sh4,
> both mova and pc-relative mov.l (and mov.w) are illegal in delay slots
> and result in a trap (so the kernel can emulate them very slowly if
> you really want them). This is presumably why gcc never generates
> the pc-relative mov.l in delay slots and thus why the bug has never
> affected us.

I personally think the SH4 behaviour is correct, and I think that because if the pipeline were made multi issue, etc, even keeping the current (non intuitive) behaviour might be difficult.  Much better to have consistent slot illegal than execution differences.  Maybe there should be a generic to make these instructions trap...

J.

> 
> Rich
> _______________________________________________
> J-core mailing list
> J-core at lists.j-core.org
> http://lists.j-core.org/mailman/listinfo/j-core