FPU assembler programming



By Erik H. Bakke



Written  13/10-93





--- I ------------------------- INTRODUCTION ---------------------- I ---



1.1 Introduction



Many people have asked me to explain how to program the 68881/68882/040

floating point coprocessors, and here it is, a guide in the "magic art"

I have tried to keep this text as system neutral as possible, but it

may, as the other articles in this series be influenced by the fact

that I usually program at Amiga computers.

If you need more information about the topics discussed herein, please

contact the author.



1.2 Index



  Chapter I--------Introduction--------------------

  1.1  Introduction

  1.2  Index

  Chapter II-----The Coprocessor interface---------

  2.1  The Interface mechanics

  2.2  The Floating Point Coprocessor

  Chapter III----Floating Point Programming--------

  3.1  Floating Point Data Formats

  3.2  Floating Point Constant ROM

  3.3  Floating Point Instructions

       1...Data transfer instructions

       2...Dyadic operations

       3...Monadic operations

       4...Program control instructions

       5...System control instructions

  Chapter IV-------The 68040 FPU-------------------

  4.1  Differences

  4.2  Instruction set

  Chapter V------------Sources---------------------

  5.1  Sourcecodes







--- II -------------------THE COPROCESSOR INTERFACE -------------- II ---



2.1  The Interface Mechanics



A coprocessor may be thought of as an extension to the main CPU,

extending its register set and instructions.

Different coprocessors that can be interfaced to the 68020+ CPU's

are the 68881/2 FPU and 68851 MMU.

Coprocessor instructions are placed inline with ordinary CPU codes,

all recognized by being LINE-F instructions.  (Having the op-code

format of $Fxxx)  In assembler, they are generally noted as cpXXXX

instructions.



The coprocessors require a communication protocol with the CPU for

various reasons:



-----------------------------------------------------------

1.  The CPU must recognize that a coprocessor is to receive

    the LINE-F op-code, and establish contact with that

    coprocessor.



2.  The coprocessor may need to signal it's internal status

    to the CPU.



3.  The coprocessor may need to read/write data to/from

    system memory or CPU registers.



4.  The coprocessor may have to inform the CPU of error

    conditions, such as an illegal instruction or divide by

    zero.  The CPU will have to process the corresponding

    exceptions.

-----------------------------------------------------------



This protocol is called the MC68000 coprocessor interface.  Knowledge

of this interface is not required of a programmer who wishes to

utilize a coprocessor, therefore I will not go into specific detail

about the interface, but briefly sum up the main mechanisms.



  The coprocessor instructions are F-line instructions that have

  all bits in the upper nibble set to generate $Fxxx op-codes.

  Up to 8 coprocessors may reside on the bus (The Amiga coprocessors

  are NOT part of this system, and should not be counted in).

  Each of these co-processors have their own 3-bit address.

  Two such addresses are reserved by Motorola:

                         %000   MC68851 PMMU

                         %001   MC68881/2 FPU

  

  It is perfectly possible to install 6 FPU's in the same

  system.

  

  The general format of a coprocessor op-code is shown below:

  

  15    1211  9 8   6 5         0

  ================================

  1 1 1 1 Cp-ID Type  Instruction dependent

  

  Followed by a number of coprocessor defined extension words and

  effective address extension words.

  

  If the instruction is not accepted by the coprocessor it is

  addressed to (if the CP is not present) the CPU will take

  an F-line exception.



2.2 The Floating Point Coprocessor



  The Motorola floating point coprocessor has the number 68881 or

  68882.  The 68882 is considerably faster than the 881, due

  to optimized internal design.  In addition, the 68040 CPU has an

  internal FPU has is even faster than the 882.  There are other

  differences between the 881/2 and the 040, but I'll return to

  those later.  The 68881 and 68882 are pin compatible, and available

  in speeds up to 20Mhz and 50MHz respectively

  The FPU implements IEEE compatible floating point formats, and

  implements instructions to perform arithmetics on these formats,

  as well as several trancendental functions, such as SIN(x),E^x

  and so on.  In addition the FPU has an on-chip constant ROM where

  different mathematical constants are stored.



2.2.1  The floating point registers



  The FPU has 8 floating point registers, each 80 bits wide.

  These are named FP0-FP7, just as D0-D7 in the CPU.  In addition

  the FPU have 3 32-bit registers:

  

  Control Register       FPCR

  

  31.................15..........7..........0

                       Exeception    Mode

                        Enable      Control

  

  Status Register       FPSR

  

  31.......23........15..........7..........0

  Condition  Quotient   Accrued   Exception

    Codes              Exception   Status

  

  Instruction Address Register    FPIAR



  31.......23........15..........7..........0



2.2.1.1  Floating point data registers



  The data registers always contain an 80 bit wide extended precision

  floating point number.  Before any floating point data is used in

  calculation, it is converted to extended-precision.

  For example, the instruction   FMOVE.L #10,FP3  converts the

  longword #10 to extended precision before transferring it to register

  FP3.  All calculations with the FPU uses the internal registers

  as either source or destination, or both.

  

2.2.1.2  Floating Point Status Register



  This register is split in two bytes, the exception enable byte, and

  the mode control byte.



2.2.1.2.1  Exception Enable byte



  This register contains a bit for each of the possible eight

  exceptions that may be generated by the FPU.  Setting or clearing

  one of these bits will enable/disable the corresponding exception.



  The exception bytes are organized this way:

  

  Bit   Name    Meaning

  ========================

  7     BSUN    Branch/Set on UNordered

  6     SNAN    Signalling Not A Number

  5     OPERR   OPerand ERRor

  4     OVFL    OVerFLow

  3     UNFL    UNderFLow

  2     DZ      Divide by Zero

  1     INEX2   INEXact operation

  0     INEX1   INEXact decimal input



2.2.1.2.2  Mode control byte



  This register controls the rounding modes and rounding precisions.

  A result may be rounded or chopped to either double, single or

  extended precision.  For most usage of the FPU, however, this

  register could be set to all zeroes, which will round the result

  to the nearest extended precision value.

  Mode control byte:

  

  Bit   Name    Meaning

  ========================

  7     PREC1   Precision bit 1

  6     PREC0   Precision bit 0

  5     RND1    Rounding bit 1

  4     RND0    Rounding bit 0

  3     ----    -----

  2     ----    -----

  1     ----    -----

  0     ----    -----



  PREC=00  Round to extended precision

  PREC=01  Round to single precision

  PREC=10  Round to double precision

  

  RND=00   Round toward nearest possible number

  RND=01   Round toward zero

  RND=10   Round toward the smallest number

  RND=11   Round toward the highest number

  

2.2.1.3  Floating point Status Register



  This register is just what you may think it is, the parallell to

  the CPU CCR register, and reflects the status of the last

  floating point computation.  The quotient byte is used with

  floating point remaindering operations.  The exception status

  byte tells what exceptions that occured during the last operation.

  The accrued exception byte contains a bitmask of the exceptions

  that have occurred since the last time this field was cleared.



  Status bits:

  

  Bit   Name    Meaning

  =============================

  7     ----    -----

  6     ----    -----

  5     ----    -----

  4     ----    -----

  3     N       Negative

  2     Z       Zero

  1     I       Infinity

  0     NAN     Not A Number



2.2.1.4  Floating point Instruction Address Register



  This register contains the address of the instruction currently

  executing.  The FPU can execute instructions in parallell with

  the CPU, so that time-consuming instrcutions, such as division

  and multiplication don't tie up the CPU unnecessary.  This means

  that if an exception occurs in the floating point operation,

  the address that are pushed on to the stack is not necessarily

  the address of the instruction that caused the exception.

  The exception handler would have to read this register to find

  the address of the offending instruction.







--- III ----------------FLOATING POINT PROGRAMMING ------------- III ---





3.1  Floating Point Data Formats



  The FPU can handle 3 integer formats, and 2 IEEE compatible formats.

  In addition, it has an extended-precision format and can handle a

  Packed-Decimal Real format.



3.1.1  Integer Formats



  The 3 integer formats that are supported by the FPU are compatible

  with the formats used by the 68000 CPU's.  They are Byte (8 bits),

  Word (16 bits), and Longword (32 bits).



3.1.2  Real Formats



  The FPU supports 4 real formats, the Single precision (32 bits),

  Double precision (64 bits), Extended precision (80 bits), and

  Packed-decimal string (80 bits)



3.1.2.1 Single Precision



  The single precision format is indicated with the extension .S

  It consists of a 23-bit fraction, an 8-bit exponent, and 1 bit

  indicating the sign of the fraction.  The Single Precision format

  is defined by IEEE and uses excess-127 notation for the exponent.

  A hidden 1 is assumed as the most significant digit of the fraction.

  The format is defined as follows:

  

        30       22                    0

      S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF

      

      S=Sign of fraction

      E=Exponent

      F=Fraction

  

  The single precision format takes 4 bytes when written to memory.



3.1.2.2 Double Precision



  The double precision format is indicated with the extension .D

  It consists of a 52-bit fraction, an 11-bit exponent, and 1 bit

  indicating the sign of the fraction.  As the single precision,

  this format is also defined by the IEEE, and uses excess-1023

  notation for the exponent.  A hidden 1 is assumed as the most

  significant digit of the fraction.  The format is defined as

  follows:

  

        62          51                      0

      S EEEEEEEEEEE FFFFFFF........FFFFFFFFFF

      

      S=Sign of fraction

      E=Exponent

      F=Fraction

      

  The double precision format takes 8 bytes when written to memory.



3.1.2.3 Extended precision



  The extended precision is indicated with the extension .X

  This is the format that is used in all computations, and

  consists of a 64-bit mantissa, a 15-bit exponent and 1 bit

  indicating the sign of the mantissa.  A hidden 1 is not assumed,

  so all digits of the mantissa are present.  Excess-16383

  is used for the exponent.  When data of this format is written

  to memory, it is "exploded" by 16 zero-bits between the mantissa

  and the exponent in order to make it longword aligned.

  The extended-precision format is defined as follows:

  

    79              63                             0

  S EEEEEEEEEEEEEEE MMMMMMMMMMMMMMM............MMMMM

  

  When written to memory, it looks like this:

    94            80          63                   0

  S EEEEEEEEEEEEEEE 000...000 MMMMMMMM...........MMM

  

  When written to memory, this format takes 12 bytes when written

  to memory



3.1.2.4  Packed-decimal string



  To simplify input/output of floating point numbers, a special

  96-bit packed-decimal format.

  This format consists of 17 BCD digits mantissa, some padding

  bits, 4 BCD digits exponent, 2 control bits, 1 bit indicating

  the sign of the exponent, and 1 indicating the sign of the

  mantissa.

  Bits 68-79 are stored as zero bits unless an overflow occurs

  during the conversion.

  Positive and negative infinity is represented by numbers that are

  outside the range of the floating point representation used.

  If the result of an operation has no mathematical meaning, a NAN

  is produced.  In the case of a NAN or infinity, bits 92 and 93

  are both 1.



3.2  Floating point Constant ROM



  The FPU have an on-chip ROM where frequently used mathematical

  instructions are stored.  How to retrieve these constants will be

  discussed below.  Each constant has its own address in the ROM:

  

  Offset     Constant

  =============================

  $00        Pi

  $0b        Log10(2)

  $0c        e

  $0d        Log2(e)

  $0e        Log10(e)

  $0f        0.0

  $30        ln(2)

  $31        ln(10)

  $32        10^0

  $33        10^1

  $34        10^2

  $35        10^4

  .

  .

  .

  $3e        10^2048

  $3f        10^4096



3.3  Floating Point Instructions



  The FPU provides an extension to the normal 68000 instruction set.

  The floating point instructions can be divided into 5 groups:

  

     1...Data transfer instructions

     2...Dyadic instructions (two operands)

     3...Monadic instructions (one operand)

     4...Program control instructions

     5...System control instructions

     

  The syntax and addressing modes for these instructions are the same

  as those of the ordinary 68000 instructions.



3.3.1  Data Transfer Instructions



  Instruction Syntax     Op. Sizes      Operation

  ==================================================================

  FMOVE  FPm,FPn      | X             | The FMOVE instruction

  FMOVE  <ea>,FPn     | B/W/L/S/D/X/P | copies data from the

  FMOVE  FPm,<ea>     | B/W/L/S/D/X   | source operand to the

                      |               | destination operand

  ==================================================================

  FMOVE  FPm,<ea>(#k) | P             | When writing a .P real, an

  FMOVE  FPm,<ea>(Dn) | P             | optional rounding precision

                      |               | may be specified as a constant

                      |               | or in a data register

                      |               | See below for details

  ==================================================================

  FMOVE  <ea>,FPcr    | L             | These two FMOVE's copies

  FMOVE  FPcr,<ea>    | L             | data to/from control registers

  ==================================================================

  FMOVECR #ccc,FPn    | X             | This instruction retrieves a

                      |               | constant from the ROM, where

                      |               | ccc is the offset to be read

  ==================================================================

  FMOVEM <ea>,<list>  | L/X           | Moves multiple FP registers

  FMOVEM <ea>,Dn      | X             | The register list may be

  FMOVEM <list>,<ea>  | L/X           | specified as in 68000 assembler,

  FMOVEM Dn,<ea>      | X             | or be contained as a bitmask

                      |               | in a data register

  ==================================================================

  

  When writing floating point numbers to the memory using the .P

  format, an optional precision may be specified as a constant or

  in a data register.

  

  Meaning of the precision:

  

  -64<=k<=0  Rounded to |k| decimal places

  0<k<=17    Mantissa is rounded to k places

  

  Register bit mask of the MOVEM instruction:

  

  Addressing mode    Bit 7   Bit 6 -------- Bit 1   Bit 0

  =========================================================

  -(An)           |   FP7  |  FP6  |------|  FP1  |  FP0  |

  All other modes |   FP0  |  FP1  |------|  FP6  |  FP7  |

  =========================================================



3.3.2  Dyadic Operations



  Dyadic operations are operations computing with two operands.

  The first operand may be addressed with any addressing mode,

  while the second operand (destination) must be an FPU register.

  

  Instruction  Function

  ===================================================

  FADD       | Add two FP numbers

  FCMP       | Compare two FP numbers

  FDIV       | FP Division

  FMOD       | FP Modulo

  FMUL       | Multiply two FP numbers

  FREM       | Get IEEE remainder

  FSCALE     | Scale exponent

  FSGLDIV    | Single-precision Division

  FSGLMUL    | Single-precision Multiplication

  FSUB       | FP Subtraction

  ===================================================

  

  FSCALE adds the first operand to the exponent of the second

  operand.

  FREM returns the remainder of a division as specified by the IEEE

  definition.



3.3.3  Monadic Operations



  Monadic operations are operations computing with only one operand.

  The operand may be addressed with any addressing mode.

  The syntax of monadic operations are:

  

    Instruction.Size  <ea>,FPn

  

  Instruction  Function

  ====================================================

  FABS       | Absolute value

  FACOS      | FP Arc Cosine

  FASIN      | FP Arc Sine

  FATAN      | FP Arc Tangent

  FATANH     | FP Hyperbolic Arc Tangent

  FCOS       | FP Cosine

  FCOSH      | FP Hyperbolic Cosine

  FETOX      | FP e^x

  FETOXM1    | FP e^(x-1)

  FGETEXP    | Get exponent

  FGETMAN    | Get mantissa

  FINT       | FP Integer

  FINTRZ     | Get integer and round down

  FLOGN      | FP Ln(n)

  FLOGNP1    | FP Ln(n+1)

  FLOG10     | FP Log10(n)

  FLOG2      | FP Log2(n)

  FNEG       | Negate a floating point number

  FSIN       | FP Sine

  FSINH      | FP Hyperbolic Sine

  FSQRT      | FP Square Root

  FTAN       | FP Tangent

  FTANH      | FP Hyperbolic Tangent

  FTENTOX    | FP 10^x

  FTWOTOX    | FP 2^x

  ====================================================



  There is one more monadic operation that uses a double

  destination operand, the FSINCOS instruction:



  FSINCOS <ea>,FPc:FPs        Calculates sine and cosine

  FSINCOS FPm,FPc:FPs         of the same argument



  All trigonometric operations operate on values in

  radians.



3.3.4 Program Control Instructions



3.3.4.1  Instructions

  This group of instructions allows control of program flow based

  on condition codes generated by the FPU.  These instructions are

  parallells to the 68000 instructions with the same names.



  Instruction      Formats  Operation

  =====================================================================

  FBcc <Label>    | W/L   | Branch on FPU conditions

  FDBcc Dn,<Label>| W     | Test FPU conditions, decrement and branch

  FNOP            |       | No Operation   (Like NOP)

  FScc <ea>       | B     | Set on FPU conditions

  FTST <ea>       | All   | Test FP number at <ea>

  FTST FPn        | X     | Test FP number in FPn

  =====================================================================



3.3.4.2 Condition codes



  The FPU condition codes that may be acted upon are divided into two

  groups, with and without NAN exception.



3.3.4.2.1  Condition codes with NAN



  Symbol  Meaning

  ===========================================

  GE    | Greater or equal

  GL    | Greater or less

  GLE   | Greater, less or equal

  LE    | Less or equal

  LT    | Less

  NGE   | Not (greater or equal)

  NGL   | Not (greater or less)

  NGLE  | Not (greater, less or equal)

  NGT   | Not greater

  NLE   | Not (less or equal)

  NLT   | Not less

  SEQ   | Signalling equal

  SNE   | Signalling unequal

  SF    | Signalling Always FALSE

  ST    | Signalling Always TRUE

  ===========================================



3.3.4.2.2  Condition codes without NAN



  Symbol  Meaning

  ===========================================

  OGE   | Ordered and greater or equal

  OGL   | Ordered and greater or less

  OR    | Ordered

  OGT   | Ordered and greater

  OLE   | Ordered and less or equal

  OLT   | Ordered less

  UGE   | Unordered or greater or equal

  UEQ   | Unordered or equal

  UN    | Unordered

  UGT   | Unordered or greater

  ULE   | Unordered or less or equal

  ULT   | Unordered or less

  EQ    | Equal

  NE    | Unequal

  F     | Always FALSE

  T     | Always TRUE



3.3.4.2.3  Notes



  You might wonder why there are different symbols for (unequal) and

  (greater or less).

  Floating point numbers may represent an infinity, something that is

  impossible in the 68000 CPU.  In addition, they may represent illegal

  values (NAN).

  For detailed information on how these condition code symbols relate to the

  condition code flags, consult programming references for the 68881 FPU.



3.3.5  System Control Instructions



  This group consists of 3 instructions, FSAVE, FRESTORE and FTRAPcc.

  FSAVE and FRESTORE are priviliged instructions and are used to save

  and restore the FPU state frame to memory.



  FSAVE <ea>     Copies the internal registers to the specified state frame

  FRESTORE <ea>  Loads the internal registers with the specified state frame.



  The TRAPcc instruction can generate an exception dependant on the condition

  codes.



  FTRAPcc                        If the specified condition is true, TRAP.

  FTRAPcc #<data>.(W/L)





--- IV --------------------------- THE 68040 FPU ----------------- IV ---





4.1  Differences



  The FPU that is built-in in the 040, and indeed in the yet to come 060

  are highly optimized.  It omits some instructions that are found on

  the 68881/2, but the ones that are unimplemented, are usually emulated

  in software.  By avoiding the emulated instructions, a program can get

  a multiple speed increase when run on the 040.

  The 040 also omits the Packed Decimal format (.P)  When it is attempted

  written, the 040 will respond with an illegal format exception.



4.2  Instruction set of the FPU-40  (Or whatever it is called)



4.2.1  68881/2 instructions that are unimplemented



  Type        Instructions

  =================================================================

  Monadic    | FACOS,FASIN,FATAN,FATANH,FCOS,FCOSH,FETOX,FETOXM1,

             | FGETEXP,FGETMAN,FINT,FINTRZ,FLOG10,FLOG2,FLOGN,

             | FLOGNP1,FSIN,FSINCOS,FSINH,FTAN,FTANH,FTENTOX,

             | FTWOTOX

  Dyadic     | FMOD,FREM,FSCAL,FSGLDIV,FSGLMUL

  Transfer   | FMOVECR

  =================================================================



4.2.2  68881/2 instructions that WORK on an 040



  FABS     ADD       FBcc      FCMP      FDBcc     FDIV      FMOVE

  FMOVEM   FMUL      FNEG      FNOP      FRESTORE  FSAVE     FScc

  FSQRT    FSUB      FTRAPcc   FTST





--- V ----------------------------- SOURCES ----------------------- V ---



5.1  Sourcecodes



  There are no sourcecodes in this text, but in the same archive as you

  got this file there should be an assembler source for a julia fractal

  using the FPU.

  The program is a very simple one, and doesn't use the most advanced

  operations, but illustrates clearly how to program for the FPU.

  The program is written for Amiga computers with AGA chipset and at

  least OS 2.0, but it should be easy to degrade to earlier Amiga

  versions and to other platforms.





************************************************************************************************

*             This text was written by Erik H. Bakke 14/10-93   © 1993 Bakke SoftDev           *

*                                                                                              *

* This text is freely redistributable as long as all files are kept together and in unmodified *

* form.                                                                                        *

*             Permission is granted to include this text in the HowToCode archive              *

************************************************************************************************

*  Error corrections, comments and questions should be directed to the author:                 *

*                                                                                              *

*                               e-mail:   db36@hp825.bih.no                                    *

*                               phone:    +47-5630-5537   (13:00-21:00 GMT)                    *

*                               Post:     Erik H. Bakke                                        *

*                                         Bjørnen                                              *

*                                         N-5227 SØRE NESET                                    *

*                                         NORWAY                                               *

************************************************************************************************

��