<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 10/26/2017 8:59 AM, Christopher

      Friedt wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAF4BF-Swq2Y-ztG++Ygrtsi0rXNJ6keoHTzaud-91=SsrS1eDA@mail.gmail.com">

      <div dir="auto"><br>

        <div class="gmail_extra" dir="auto"><br>

          <div class="gmail_quote">On Oct 26, 2017 9:04 AM, "emanuel

            stiebler" <<a href="mailto:emu@e-bbes.com"

              target="_blank" moz-do-not-send="true">emu@e-bbes.com</a>>

            wrote:<br type="attribution">

            <blockquote class="m_8244701099308911606quote"

              style="margin:0 0 0 .8ex;border-left:1px #ccc

              solid;padding-left:1ex">

              <div class="m_8244701099308911606quoted-text">On

                2017-10-25 17:59, Ken Phillis Jr wrote:<br>

                <br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  New CPUID Flags:<br>

                  SIMD_INTEGER8<br>

                  SIMD_INTEGER16<br>

                  SIMD_INTEGER32<br>

                  SIMD_INTEGER64<br>

                  SIMD_HALF_PRECISION_FLOAT<br>

                  SIMD_SINGLE_PRECISION_FLOAT<br>

                  SIMD_DOUBLE_PRECISION_FLOAT<br>

                  <br>

                </blockquote>

                <br>

              </div>

              Just a short one,<br>

              for the floating points, I would prefer<br>

              <br>

              SIMD_FLOAT16<br>

              SIMD_FLOAT32<br>

              SIMD_FLOAT64<br>

              SIMD_FLOAT128<br>

            </blockquote>

          </div>

        </div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">I would suggest shortening those further.</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">simd.u8</div>

        <div dir="auto">simd.s16</div>

        <div dir="auto">simd.f32</div>

        <div dir="auto">simd.f16</div>

        <div dir="auto"><br>

        </div>

      </div>

    </blockquote>

    <br>

    yeah, we don't need Double/Float64 SIMD.<br>

    <br>

    Float128 would be pretty absurd given no hardware currently does it

    natively (also it would be crazy expensive, a full-width MAC unit

    quite possibly wouldn't fit into the FPGA...).<br>

    <br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BF-Swq2Y-ztG++Ygrtsi0rXNJ6keoHTzaud-91=SsrS1eDA@mail.gmail.com">

      <div dir="auto">

        <div dir="auto">* dot notation is a bit nicer</div>

        <div dir="auto">* u := unsigned</div>

        <div dir="auto">* s := signed</div>

        <div dir="auto">* f32 := IEEE 754 32-bit float</div>

        <div dir="auto">* f16 := half precision</div>

        <div dir="auto">* similarly, f64, u64, ...</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">BGB - could you mention on the list how the FPU

          design differs between SH and x86/mmx ?</div>

        <div dir="auto"><br>

        </div>

      </div>

    </blockquote>

    <br>

    x87 had 8x 80-bit FPU registers organized as a stack; so you

    couldn't perform operations between arbitrary registers, but more

    push/pop/swap/...<br>

    <br>

    MMX allowed using them instead as 8x 64-bit registers, but doing so

    required entering/exiting MMX mode, which would basically trash the

    FPU registers.<br>

    <br>

    SSE basically reduced this mess, and since then compilers have

    mostly abandoned x87.<br>

    <br>

    <br>

    the SH style FPU uses two banks of 16x 32-bit registers (labeled

    FR0..FR15 and XF0..XF15; though some SH variants only have a single

    bank);<br>

    operations on these registers are basically direct register

    operations (more like GPRs, or like the more modern x86-64 strategy

    of doing all the FPU stuff using SSE);<br>

    the SH FPU also implements Double operations by working on pairs of

    registers (kind of funky, but works).<br>

    <br>

    so, unlike x87, there is no metadata and no stack, and likewise no

    need to clear the registers if moving between SIMD mode and FPU mode

    (if desired).<br>

    likewise, by sharing the register state, it would make it possible

    to do arbitrary scalar operations within vector elements, which is

    not something readily supported by SSE.<br>

    <br>

    a partial limitation is due to how the "bank" mechanism works,

    though, which would hinder doing operations if doing 128-bit vectors

    between the high and low halves of a vector with 16-bit instructions

    (my SIMD extensions though would remedy this by allowing 32-bit FPU

    I-forms to access 32 float registers, rather than only banks of 16

    registers; with 8x SIMD registers this basically meant direct access

    to the entire space).<br>

    <br>

    likewise, the space can also be used as 16x 64bit vectors, given the

    vectors themselves are implemented via register pairs (and in this

    case, equivalent to the Double registers).<br>

    <br>

    <br>

    a minor extension to the FPU is that Double operations may access

    all 16x Double registers at the same time;<br>

    likewise "borrowed" the feature allowing both SZ and PR bits to be

    set, allowing moving Doubles to/from memory in the correct order

    (usually these require doing the loads/stores as pairs of

    instructions).<br>

    <br>

    sadly also, the FPU design requires frequently togging bits in the

    FPSCR register (to move between Float/Double), but I added an op to

    load the relevant bits in the register directly (vs, say, needing to

    do a constant load and then shove the desired FPU state into this

    register).<br>

    <br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BF-Swq2Y-ztG++Ygrtsi0rXNJ6keoHTzaud-91=SsrS1eDA@mail.gmail.com">

      <div dir="auto">

        <div dir="auto">Having ported FFTW over to ARM neon I'm

          extensively familiar with it and know it is strikingly similar

          to mmx. I know that (at least on ARM) the contention you've

          mentioned is quite significant. Pipeline stalls must be

          precisely inserted to ensure correct results for simd

          instructions are obtained at the correct times, etc. Vector

          loads and stores, cache-prefetching, and i/o alignment were

          critical.</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">For A8 that meant fine-tuning the instructions

          generated by the compiler. There were also some memory

          barriers, iirc. Mostly hand-written assembly. Intrinsics were

          ~ meh.</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">I did work on it before the A9 OOO pipeline was

          introduced, but also know that having an out-of-order unit

          helped simd on ARM.<br>

        </div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">Again, I'm really curious how the FPU design

          differs, because if SH / J-Core can avoid that mess, it would

          be better off.</div>

        <div dir="auto"><br>

        </div>

      </div>

    </blockquote>

    <br>

    sadly, basically out of time, so no response.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAF4BF-Swq2Y-ztG++Ygrtsi0rXNJ6keoHTzaud-91=SsrS1eDA@mail.gmail.com">

      <div dir="auto">

        <div dir="auto"><br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

J-core mailing list

<a class="moz-txt-link-abbreviated" href="mailto:J-core@lists.j-core.org">J-core@lists.j-core.org</a>

<a class="moz-txt-link-freetext" href="http://lists.j-core.org/mailman/listinfo/j-core">http://lists.j-core.org/mailman/listinfo/j-core</a>

</pre>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>