BTI logo

– BTI 8000

While BTI did well selling systems on the BTI 5000 to car dealerships across the country, management boldly decided to move into the super-minicomputer This was a multi-year effort, and as the revenue from the older systems peaked and started to decline, the pressure for the 8000 to succeed mounted.

BTI 8000 Timeline (link)

(The following timeline was supplied mostly by Ron Crandall)

While the BTI 3000, 4000, and 5000 was keeping BTI growing, a few people in both the hardware and software department starting thinking about a next generation design, one that broke from the HP CPU heritage. In 1974 or 1975, Ron Crandall, George Lewis, Bill Cargile, and Bill Quackenbush started preliminary investigations along these lines. The effort was short lived, as BTI 4000 and 5000 work kept everyone too busy to do much else.

Later in 1975 the idea was resurrected, and Jim Meeker set about specifying the very CISC-y BTI 8000 instruction set, while Ron Crandall architected a robust file system structure, one that would be "crash-proof".

By 1976 some people began full time work on the 8000, and many more joined the effort in February 1977, after a reorganization.

In 1985 when I was there, I recall seeing some early design documents which were dated 1976. In particular, the notebook contained Bill Quackenbush's work on the synchronous backplane bus. The word was that someone had attempted an asynchronous backplane protocol, but Bill vetoed it on the grounds that such a system would always suffer from unpredictable bus failures.

BTI started letting the world know about the 8000 in 1978. They presented papers at technical conferences; the 8000 was mentioned in sales literature; glossy brochures were produced touting its advanced features.

The problem was that in 1978, the machine didn't exist in a shippable form. The hardware wasn't done yet, and software was developed on a system rigged with a BTI 5000 hanging off the BTI 8000 backplane, acting as a slow emulator for all the I/O devices which were not done or not working reliably.

The 8000 didn't really started shipping until 1981, and even then, the first few systems were very unreliable. Making it worse, the remote diagnostic facility (RDF) wasn't in place, making it impossible for in the field failures to be diagnosed and repaired in a timely manner. By late 1981, these issues had been worked out.

BTI ramped up staff in anticipation of the need to ship many of the new BTI 8000 systems in 1982 ... but the orders only trickled in. A massive layoff was the result, and two smaller ones followed the same year. Finally, in March, 1986, the decision was made to halt all development on the 8000 and to simply support any existing customers. Slowly these customers dwindled, and BTI kept downsizing.

In the end, there were perhaps 30 paying customers for the system. BTI built around 15 others, some of which were used internally, others were used as demo systems to bait customers into a purchase.

By 1993, BTI was down to about a dozen employees, but surprisingly, there were still 19 systems in the field as of 1995. Phil Deal supported the few remaining customers, but the final systems were retired in 2002, and the US part of BTI was closed down.

Variable Resource Architecture (VRA) (link)

IBM pioneered the idea of a system architecture with the IBM/360 family. The 360 architecture was an abstract model of computation, where many different machines implementing that model could span a couple orders of magnitude in performance while sharing the cost of developing the OS and other tools and preserving the customer's own software investments.

BTI didn't have the resources to develop a family of computers, and took a different approach. They decided to build a multiprocessor, where a low end system contained a single CPU, a single memory controller, and a single I/O controller. Higher end systems were built by adding more resources, instead of having a family of uniprocessors with a range of performance.

BTI developed a model where a single high speed backplane connected together one to many instances of each of a few computing resources, with each type of resource being identical and treated equally. This is known as a symmetric multiprocessor. BTI didn't invent the idea (for example, Burroughs 5000, Tandem T/16), but it also wasn't very common either.

BTI called this idea Variable Resource Architecture, or VRA for short.

Here are some key design features of the BTI 8000 VRA:

Fail-Soft Behavior (link)

Because of the bank/accounting/business focus, BTI wanted to assure customers that its data was safe. Although not nearly as fault tolerant as the Tandem line of computers, real effort was put into making the system "fail-soft."

BTI defined this to mean that it when hardware failed, the system would not cause harm, and it would be be easy to repair. Fail-Soft was engineered into different aspects of the system; it was not any one single piece of technology.

Virtual Machine Multiprocessing (VMM) (link)

It was alluded to above, but one of the concepts of the 8000 was Virtual Machine Multiprocessing, or VMM. This meant that the user programs were entirely unaware how much memory the system had, how many CPUs the system had, and any attempt manipulate a resource was mediated by the OS.

The virtualization of the user state was and is very common; it is required for protecting processes from either other, due to either malice or errors.

But virtualization was especially important for BTI in that virtualization also meant that a user program couldn't tell if it was running on a single CPU system or one with eight. When a system was reconfiguration, either adding or removing resources, user programs didn't need to be modified in any way, and nothing needed to be recompiled.

BTI 8000 Operating System (Monitor) (link)

The BTI 8000 OS was frequently called the monitor, as it monitored and controlled the system activities.

Like user programs, the monitor was distributed and ran on any and all CPUs. The only time there was any asymmetry was at boot time: after boot up diagnostics had finished, the SSU would enable the CPUs, and the CPUs would attempt to lock out all the other CPUs, but only one would succeed. That winning CPU would be responsible for bootstrapping the monitor into memory, and patching various configuration tables based on the resources which were available in the system.

Pervasive Security Model

A well though out security model was designed to appeal to the business, accounting, and banking markets.

The core security boundary was the account, which was arranged in a hierarchy of four levels of account control. It is described in the next section.

Like most operating systems, there was the concept of user state and privileged state. The user mode programs had no ability to access resources directly, other than the memory pages owned by that user. All other resources were handled by making an "XREQ" (eXecutive REQuest) call to into the monitor.

All files stored on removable media, including the primary disk packs as well as tape backups, were encrypted. It took extra work to save unencrypted files, typically when generating a tape to be exported to a different computer system.

Each account had limits on the resources made available by its superior account, including cumulative and per-session CPU time, cumulative wall-clock time, saved file block limit, and scratch file block limit.

Account Model

The operating system had the concept of "accounts," with four levels of account hierarchy. The "system" accounts were at the top of the pyramid. These in turn granted resources and privileges to division level accounts. Division level accounts delegated to project level accounts, and these controlled user-level accounts.

The default state was for all files to be private to an account. An account had an access control list, basically a list of other accounts which were granted various levels of permission to use all files within the account. For example, an account might permit all people in his division to read all files, and grant a specific list of people read/write privileges.

Beside the per-account access list, there was a per-file access list, offering the same types of privileges. In both per-account and per-file access lists, the permissions could also be tied to a password as an extra measure of security.

Although superior accounts were able to do anything an inferior account had permission to do, a superior account could also relinquish some or all of those permissions. Once, done, these permissions could be restored by the inferior account.

The security system also recognized that the responsibilities of the system administrators often were different than those of managers. Although system accounts were higher in the security hierarchy, individual administrator accounts were typically set up so they didn't have permission to access private files. Instead, they were in charge of managing print queues, mounting and dismounting disk volumes, and monitoring the process table. There was a MASTER account, though, that had the ability to do anything.

Groups of accounts could be "encapsulated," meaning they were walled off from the rest of the system. An account inside the barrier couldn't share any files with people outside the barrier, and was unable to write to files outside the barrier. This would be useful for ensuring the accounting department files were inaccessible to anyone outside of accounting, even if someone in accounting attempted to defeat security.

File System

The file system was flat for an account, other than the schism between the normal files and library files.

The user process had available 202 virtual I/O channels, known as LUNs (Logical Unit Numbers). By default 1 and 2 were the standard in and out channels, respectively. The first 200 could be assigned by the process to point to a file or other device. LUN 200 was always pointing at the file holding the executable. LUNs 201 and 202 were not assignable, and were always pointing at the user's terminal if in interactive mode, otherwise in a batch process they pointed at the virtual card reader and spooled line printer device.

Beside mapping a LUN to a specific file, it could be mapped to one of a number of logical devices:

.TERM
user terminal
.LP
line printer (spooled)
.MT
magnetic tape drive
.NULL
"write only memory"
.CDR
card-image reader's view of .TASK
.DIR
the directory of an account library
.LOCK
inter-process semaphore
.PATH
inter-process communication link
.CODE
executable program memory image file
.SAF
sequential access file
.RAF
random access file

Note that the CDR card reader was used by batch programs, and what it really pointed at wasn't a card reader but a file containing lines of text emulating a card reader.

Different logical devices had various properties associated with the logical file type. For instance, the .TERM type had information about the width of the terminal, the number of lines per page, baud rate, terminal type, etc.

BTI 8000 Software (link)

This needs to be fleshed out, but in short, major tools were:

In 1985/1986, there was a project to develop a C compiler for the machine. One engineer worked on it, and supposedly got pretty far. It parsed standard C and had multiple passes making both high level and peephole optimizations. It was canceled before it was production ready, and the engineer working on it left to work developing mapping software for some type of charitable organization. (anybody remember?)

BTI 8000 Compute Modules (link)

The BTI 8000 had four different classes of resources. A minimal system had one of each, and more than one of each could be added to a system to increase throughput and capacity. These four were named:

SSU
System Services Unit
MCU
Main Control Unit
CPU
Computational Processing Unit
PPU
Peripheral Processing Unit

Each board in the system was a 20" x 23" card that plugged into a 16 slot backplane. The CPU was self-contained, but some of the others were connected via ribbon cables to more distant resource; for example the memory controller was cabled over to another cabinet containing the core memory modules. Every board in the backplane was microcoded to allow self test and intelligent configuration.

At the time the 8000 was introduced, it wasn't practical to build an eight layer 20" x 23" board. Instead, the bus interface logic and power distribution were laid out using the copper on the board, and the rest of the wiring was wire wrapped by machine. A Plexiglas sheet was mounted on the rear of each board to prevent accidentally snagging any wires while adding or removing boards from the system.

SSU (System Services Unit)

The SSU performed some simple management functions. Normally, only one SSU was used in a system, but a second SSU could be installed, in which case the redundant one was used as a hot backup in case the primary SSU failed.

The SSU contained:

  • the master system oscillator
  • a real time clock
  • an interface to front panel switches and a vacuum fluorescent status display
  • a system monitor for abnormal power and temperature conditions
  • a remote diagnostic interface, allowing BTI technicians to log in remotely
  • a permanent, unique ID, which allowed both BTI and 3rd party software to be locked to a specific machine

At system reset, all the resources would perform self-test. The SSU would wait until all tests were complete, then poll all the slots to figure out which cards were present and healthy. Any failures would halt the system and be reported on the monitor panel, a small vacuum fluorescent display. The SSU would then enable all the CPUs to run. One CPU would lock out the other CPUs from running, and would then start the monitor bootstrap process. The monitor would use the system configuration data collected by the SSU to establish OS configuration tables.

The SSU was usually positioned in one of the middle slots. As the source of the backplane clock, and thus the clock for the entire system, a central position minimized the clock skew between boards.

The SSU used the Signetics 8X300 microcontroller for its intelligence. The 8x300 was one of the earliest microcontrollers, and had a reputation for being an ugly beast to program, and for running quite hot, as it was implemented in bipolar logic.

MCU (Memory Control Unit)

Originally, and for most of the life of the 8000, an MCU was simply an interface, and didn't directly control any memory. The MCU ran some quick diagnostics after reset and took care of the backplane bus protocol.

Requests from the bus were sent via ribbon cables to an external box, mounted in a second cabinet, which contained core memory and the actual core memory timing, driver, and sense circuitry.

The MCU had minimal pipelining. It could accept two operations before it started turning away new requests. Even those two requests weren't pipelined, other than the act of transmitting the request across the bus. While the first command was being processed, the second command simply sat in an input buffer, waiting its turn.

It was a system feature that the MCU directly supported atomic operations. The CPU defined a number of instructions as atomic, meaning a memory location would be read and the MCU would deny all other requests until the card performing the read/modify/write wrote back the updated value.

Although the backplane bus protocol was largely fair, there was a slight latency advantage to cards in the lower slot numbers. Therefore, it was advantageous to place the memory controllers in the lower numbered slots, since read latency critically affected system performance.

The core-based MCU's could be expanded in increments of 128 KB. A minimal system required at least 256 KB total, although practically all systems had more than this.

In 1985 or 1986, BTI designed a new memory controller that had an array of 64Kb DRAM chips mounted on board. SECDED ECC logic performed error checking and correction; a Z80 performed extensive diagnostics at power up of both the DRAM array and the board's control logic. The Z80 also logged any corrected errors so later this log could be inspected to see if a given chip or column was marginal. The board also used idle cycles to sweep through memory, with the hope that single bit errors could be repaired before they turned into double bit (uncorrectable) errors.

CPU (Computational Processing Unit)

A minimal system would have a single CPU, and high end systems could have up to eight CPUs. It wasn't architecturally specified as such, but none of the CPUs designed by BTI had any cache. Every instruction or data operation went over the backplane to a MCU. Instructions were fetched with double word transfers, which both increased backplane and efficiency, and acted as a kind of prefetch.

Because of the long latency to memory (around 12 cycles, or 750 ns, in the best case), the CPU would initiate the fetch of the next instruction as soon as it knew the instruction wasn't a branch.

An earlier CPU design used eight 74F181 4b ALUs as the core computing element.

Five different CPU designs were done, although I'm not sure how many of them were shipped. The last one, CPU5, was completed in 1985. John Kinsel designed the hardware; Jeff Libby wrote the microcode. The heart of the CPU used eight 29C03 4b bit slice chips and a 29C11 (?) sequencer. It also used a Z80 supervisory processor to run diagnostics at power up.

CPU5 was very horizontally microcoded, with a 108 bit wide microword (96 functional, 12 parity). The microcode store was 8K words deep, and all in RAM; this allowed upgrading the microcode in the field by swapping in a different daughter card containing a bank of EPROMs.

In addition to the 2903 ALU, the CPU5 datapath logic contained a barrel shifter and some other assorted logic. The microcode assembler was intelligent and could compute the number of cycles it would take to compute a given result; this value was stored in the microword. Thus, different microinstructions took different amounts of time.

Considering the multicycle microinstructions, the lack of a cache, and the long latency to read memory from an MCU, performance of a given CPU wasn't stellar. A single CPU would use typically less than 10 percent of the available backplane bandwidth. Even though it was slow compared to higher end 32b CPUs of the era, the BTI 8000 was still at least three times faster than the then top-end BTI 5000.

The slow CPU perversely had a system benefit; it made the simple backplane bus practical as a means of increasing system throughput. If a single CPU had been able to saturate the backplane bandwidth, it would have precluded adding more CPUs as a means of increasing performance. As it was, BTI estimated that seven CPUs in a system running a typical mix of operations ran as fast as about five and a half ideal CPUs.

PPU (Peripheral Processing Unit)

The PPU was essentially a DMA engine. Each PPU could connect to four I/O controllers over two high speed and two low speed channels. For instance, the disk controller used a high speed channel, and the terminal muxes sat on a low speed channel.

A CPU could set up a DMA channel operation in memory, consisting of a list of registers to poke in a given I/O controller, a transfer of a given size to/from a given memory block; a sequence of these could be chained together. Once the channel program was constructed, the CPU would point the PPU at it, go on to some other process, and the PPU would take care of it.

Because there were multiple CPUs and a process could switch between CPUs frequently, it made no sense for a completed PPU program to interrupt a CPU. Instead, the PPU channel program would be told to write a given word to a particular location in memory. The next time the monitor program was sweeping the suspended process list, looking for work to do, it would find the notice from the PPU that the requested work was done, and the CPU would move the process from the suspended process list (or whatever action was appropriate).

The PPU, acting as a DMA engine, had a byte wide interface to each I/O controller (via ribbon cables), with FIFO decoupling on each channel. The PPU took care of all the byte packing and unpacking and address generation so each I/O controller could be simplified. High speed channels at 10 MHz, and the low speed channels at 5 MHz.

The list of I/O controllers included:

  • disk controller
  • 9-track real-to-real tape controller
  • cartridge tape controller (good for backups)
  • printer controller
  • terminal controller (up to 19.2 Kbps)

BMB (Bus Monitor Board)

This board was used only by the developers. In essence, it was a logic analyzer custom made for the BTI 8000 bus protocol. Because only a couple were ever built, it was a write-wrapped affair.

A Z80 was able to set up a few triggers and capture events meeting some constraint. It was useful for, say, finding all the traffic between the MCU and a given disk controller, or looking for the first read after a certain address was written with a certain value.

It was flexible enough that I was able to write code to do statistical analysis of the mix of reads, double word reads, writes, callbacks, etc. on the bus.

BTI 8000 Instruction Set Architecture (link)

The BTI 8000 was architected in the mid 1970s, when complex instruction sets, as typified by the DEC VAX computer, was state of the art. Memory was a very expensive commodity, and it was thought that highly encoded instruction sets would make the most use of this expensive resource. At the time, core memory was still a viable technology for main memory.

The instruction set was defined by the software architecture group. Many features of the instruction set were chosen for performing OS-centric operations, such as operating on linked lists, performing atomic read/modify/write operations, and automatic subroutine linkage tasks. The focus was on encoding as much information in as few bits as possible, and in operating on arbitrary sized fields a fundamental operation. While these did make efficient use of the limited memory, it greatly complicated the CPU design, and made some of the operations very slow.

Here are a few examples of the complications.

Supposedly the person writing the microcode for the first CPU exclaimed, facetiously, that the listing for the CPU microcode was larger than the listing for the OS.

User State

Like most OS's, there was an explicit model of the user state. BTI called this the virtual machine. By having a clear definition of this state, multiple generations of CPUs could run the same code without modification. It was even normal to have a system containing multiple CPUs of different generations with the processes running on a different type of CPU each time slice (typically 100 ms, or until the process blocked on a resource).

The user's concept of the computer was:

  • one 17b program counter

    17 bits was enough because the virtual address space was 128 Kwords (512 KB).

  • one 15b process status register

    This held various user-accessible mode bits and flag state. For instance, mode bits indicated if memory locations containing uninitialized values were to cause a trap. Other bits contained the results of the most recent comparison.

  • eight 32b registers

    All eight registers were nearly general purpose, although a few specialized instructions were hardwired to use certain registers, such as CMOVE (character move). Other registers were assigned a dedicated function via software convention, such as using R6 as the stack pointer.

  • a 17b current console area register

    This pointed to a 10-word area in the user space where the user state was stored in the event of an interrupt. Ten words were enough to hold the program counter, the process status register, and the eight general purpose registers.

  • 512 KB of memory

    A system could contain more than 512 KB, but any single process was limited to a total of 512 KB virtual address space to hold all the code and data. Although the user saw 512 KB, it was actually organized into 4 KB pages that could be swapped between main memory and disk. The OS also allowed limiting a given process to less than the total 512 KB.

Memory Paging

With a limited virtual address space of only 128K words, the paging system was very simple: a single table containing the mapping for 128 pages of 4 KB per page was sufficient. This table lived on the CPU in a small SRAM. The bottom 10 bits of the address were unchanged and indexed a word within the page, and upper 7 bits of the virtual address indexed the mapping table, producing the physical page address and other status.

The page mapper had 256 entries: 128 for the current user space, and 128 for the monitor. One bit in the monitor status register indicated if the CPU was in user mode or privileged mode, and that selected which half of the mapping table was in use.

Each page table entry had 20 bits, with various fields.

  • 4 bits indicated which slot to address. Any slot could be specified, not just an MCU
  • 12 bits provided the physical page address (up to 16 MB per slot)
  • four control bits, indicating whether the page was resident, whether it was modified, whether it had been accessed (useful for LRU aging)

Data Types

The BTI 8000 had instructions that operated on a number of data types. Most of them are tersely listed here.

  • 32 bit fixed point
  • 64 bit fixed point
  • 64 bit floating point
  • 1-32 bit field
  • 8 bit character (extra support vs. the generic bit field addressing)
  • 32b pointer
  • linked list primitives
  • pushdown stack primitives
  • miscellaneous

The machine used two's complement arithmetic, but an optional commercial instruction set added extensive operations for supporting variable sized BCD math operations, and things like "FIELD EDIT" opcodes (like a PRINT USING statement in a single instruction).

For integer and floating point values, a unique "uninitialized" value was defined by the instruction set. The uninitialized value was an msb of 1, with 31 or 63 trailing zeros. This corresponds to the most negative value in a two's complement number system. If the uninitialized value checking was enabled, a trap occurred if any operands were seen with that value.

Instruction Formats

Lacking an instruction set reference manual, the following information has been paraphrased from a paper BTI present in AFIPS Volume 48 National Computer Conference (1979, pp. 513-528).

All instructions in the BTI 8000, without exception, were 32 bits wide and aligned on 32b boundaries. The first ten bits supplied the major opcode, but some instruction formats encoded sub-opcodes in other parts of the instruction word.

Like most computers, the BTI 8000 trapped any illegally encoded instructions. The designers designated a word of all 0s or all 1s to be illegal, as well as any opcode that started with 0x20, ASCII space. These values were deemed the most likely data words, and so making them illegal meant that errant programs would more likely get trapped before doing harm.

There were a total of about 200 opcodes, and around 30 different addressing modes. Helping keep things sane, just about any address mode that made sense could be used for any opcode.

  1. Immediate

    Format: [10b opcode][5b mode][17b field]

    In this format, bits [16:0] are used to form either an immediate value, or an immediate address. Depending on the size of the operand called out by the opcode, the immediate value may be expanded to 32 bits or 64 bits.

    • the 17b field is right justified and zero filled to form an immediate
    • the 17b field is right justified and ones filled to form an immediate
    • the 17b field is left justified and zero filled to form an immediate
    • the 17b field is the word address of an operand in memory
    • the 17b field is the word address of an indirect pointer in memory
  2. Indexed Memory

    Format: [10b opcode][2b mode][3b idx reg][17b address]

    This either supplies the address of a word in memory, or it supplies a location in memory of a pointer to another location in memory. The index register value is then added to that address to provide the location in memory where the operand resides. Instructions with double word length use an offset of two times the index register value.

    • 17b direct address
    • 17b indirect address
  3. Base Register

    Format: [10b opcode][5b mode][3b base reg][4b submode][10b offset]

    There are six different modes that use this format; their behaviors are complicated and not described here.

    • register to register
    • register indirect
    • word array
    • character array
    • formal parameter
    • stack
  4. Indexed Base Register

    Format: [10b opcode][5b mode][3b base reg][3b idx reg][1b submode][10b offset]

    This format is like the Base Register format, except there is a smaller offset field, and an index register value is added to the effective address that the plain Base Register format would compute.

    • register indirect
    • word array
    • character array
    • formal parameter
  5. Type Conversion

    Format: [10b opcode][5b mode][3b reg][4b submode][3b unused][2b type][5b unused]

    This format is used to convert between 32 bit fixed point, 64 bit fixed point, and 64 bit floating point formats. The fixed point formats can be treated as signed or unsigned, and conversions can be specified to round or not in case of loss of precision.

  6. Byte

    Format: [10b opcode][5b mode][3b base reg][5b bit][5b field len][4b offset]

    The instruction set has no shift or rotate instruction. Instead, this format is used by some instructions. In one mode a register is viewed as a circular list of bits and the instruction specifies an arbitrary field starting at an arbitrary offset. In the other mode the register specifies a word in memory where the bit field exists and again, an arbitrary field can be extracted. Rotates and shifts can be obtained by using Load-Effective-Address instruction and this addressing mode.

    • register ("circular")
    • array ("zigzag")

Words of memory which are used as pointers are also encoded:

[2b character][3b bit][5b field len][5b mode][17b address or immediate]

The "mode" field is akin to the (A) format above. Which fields were used and how they were interpreted depended on the operation. Note that a pointer could point to not just a word in memory, but an arbitrary 1-32b field in memory. Other wonders were possible. In array mode, the offset value is multiplied by the field size and the appropriate math is carried out so that a packed array of arbitrary (1-32b) values could be directly addressed.

Instruction Set Summary (link)

This set of instructions was lifted from BTI_8000_Technical_Summary_Sep78.pdf.

APPENDIX A: SUMMARY OF USER-MODE CPU INSTRUCTIONS

A.1 Fixed Point Arithmetic

ADD
operand added to contents of specified register, result stored back in that register
ADDM
("add to memory") as above, but result replaces operand instead of register
ADDB
("add to both") as in ADDH, but result also stored in register
ADD2, ADD2M, ADD2B
double-word analogs of above
SUB
operand subtracted from contents of specified register, result stored back in that register
SUBM, SUB2, SUB2M
see ADD family
RSB
("reverse subtract") contents of specified register subtracted from operand, result stored back in that register
RSBM, RSB2, RSB2M
see SUB family
MUL, MULM, MUL2, MULZM
multiply family (see ADD, SUB)
DIV, DIVM, DIV2, DIV2M
divide family
RDV, RDVM, RDV2, RDVW
reverse divide family
LD, LDN (N="negate"), LD2, LDN2
load register family
INCL, INCL2
Increment operand by 1, then load reg. with this new value
ST, ST2
store register (single, double)
STW, STW2, STMW, STMWZ
store the value "one" (W) or "minus one" (MW)
STU, STU2
store the value "undefined" (hexadecimal 80000000)
STZ, STZZ
store the value "zero"
EXCH, EXCH2
exchange register, operand
INC, INC2, DEC, DEC2
increment/decrement operand by one
INCP, DECP
increment/decrement pointer. These instructions assume the operand is a pointer. The bit length of the pointed-to entity (carried in the pointer) is added to/subtracted from its bit address, thus moving the pointer forward/backward one entry, no matter what the size of the entry.

A.2 Floating Point Arithmetic

These instructions deal with 64-bit (double word) floating-point operands, which have 11-bit biased exponents and 52-bit mantissas. Double-precision floating-point operands (128 bits) are generated and manipulated by software.

FAD, FADM, FADB
floating add ("to memory", "to both")
FSB, FSBM, FMU, FMUM, FDV, FDVM
floating subtract, multiply, divide
FRSB, FRSBM, FRDV, FRDVM
floating reverse subtract, reverse divide
FINC, FDEC
floating increment, decrement memory (by one)
FINCL
increment floating-point operand by 1, then load adjacent registers with this new value

A.3 Boolean Arithmetic

AND, ANDW, AND2
similar to fixed-point ADD family
BSUB, BSUBM
result = register AND NOT operand (Boolean subtract)
BRSBM
Boolean reverse subtract to memory
IOR, IORM, IOR2
inclusive OR family
XOR, XORM, XOR2
exclusive OR family
SETT
(set and test) set operand to one after setting condition bits to comparison of register and operand (used for locking of critical regions)

A.4 Jumps

Unconditional
JMP (load Program Counter with operand)
Conditioned on PSR condition bits
JCC,JCS (if carry clear/set), JOC, JOS (if overflow clear/set), JEQ, JNE, JLT, JGT, JLE, JGE
Conditioned on comparison of register contents to zero ("Z") or minus one ("MW")
JEQZ, JEQZ2, JNEZ, JNEZ2, JLTZ, JLTZ2, JGTZ, JGTZ2, JLEZ, 3LEZ2, JGEZ, JGEZ2, JEQMW, JNEMW
Bit tests
JBT, JBF (if bit in register true/false)
Address tests
JZA, JNZA ( if address field of register zero/non-zero)
Register increment/decrement
IRJ, DRJ (inc/dec register, then jump if result not equal to zero); JIR, JDR (if register not equal to zero, inc/dec register and jump)
Linkage jumps, conditioned on zero/non-zero address field fetched through register
LJZA, LJNA (load register with address field of word it points to, then jump if result zero/non-zero); RLJZA, RLJNA (remember, 1ink, and jump -- save register in adjacent register, then proceed as in LJZA, LJNA)

A.5 Subroutine Linkage

Several instructions are provided for subroutine 1inkage; they check entrypoints and provide parameter type-checking for the subroutine. The calling sequence and the entry sequence are executed part by part, passing one parameter at a time with the PAR (pass parameter) instructions on the calling side and corresponding STP (store parameter) instructions on the subprogram side. These instructions specify the parameter type (including "2" for doubleword), whether the parameter is being passes by location or value ("V"), and whether this is the last ("L") parameter in the protocol.

CALL, CALLNP ("NP" = no parameters)
begin linkage from calling side
ENTR, ENTRNP, ENTRS ("S" = start, for non-standard parameter passing)
begin subroutine
PAR, PARZ, PARL, PARZL, PARV, PARVZ, PARVL, PARVZL
pass parameter
STP, STPZ, STPL, STPZL, STPV, STPVZ, STPVL, STPVZL
store parameter
LEAVE
leave subroutine
LDPC, LDPCS
load Program Counter ("S" = also load status bits)
EXPC, EXPCS
exchange Program Counter (and status) with operand
JSR
jump and save return address in register

A.6 Compare Instructions

CKB, CKB2, FPCKB, I2CKB
bounds checking for array indexing
CPR, CPR2, UCPR, UCPRZ
signed/unsigned compare register with operand
MCPR
masked compare register with operand (adjacent register selects bits)
CMZ, CMZ2
compare operand ("memory") to zero
STLEQ
store logical one ("1") iff condition bits = "EQ", else store zero
STLNE, STLLT, STLGT, STLLE, STLGE
as above for other conditions

A.7 Character Instructions

These instructions are interruptible, and deal with character strings whose starting address and length are given by register values. The CMOVE instruction loads and stores whole words and thus is quite efficient no matter what the character alignment might be.

CSRCH
search for a specified character in a specified string
CMS
compare strings (can be paired with CSRCH to search for substrings)
CMOVE
move string

A.8 Miscellaneous Instructions

LDPSR, STPSR
load/store Process Status Register
CLPSR, IORPSR, XORPSR
PSR bit manipulation
HIB, HIB2
find location of leftmost one-bit in operand
LEA, LEA2
load effective address (generate a pointer)
XCT
execute operand as if it were an Instruction (one level only)
LSRCH
linked list search. Searches through a linked list of structures for a match between the value in a specified part of each structure and a value in a register (or register pair)
PMUT
(permute) Using a 32-word table, this instruction can permute bits in a register, encrypt data, compute parity, and form block checksums.
NOP
no operation

A.9 Address Modes

In addition to specifying a register, many instructions also specify an operand through an address mode field. Address mode parameters can in turn involve the specification of one or two registers used to arrive at an operand. Indirect addressing proceeds through "pointers", which themselves specify five different methods of addressing. The following summary is by class, with the number in parentheses representing the total number of modes in each major class. The distinction between single-word and double-word addressing (for word-size operands) is not considered In this count, since that distinction is made in the instruction operation-code field.

( 1) DIRECT
( 1) INDEXED
( 3) IMMEDIATE
( 5) INDIRECT
( 2) INDIRECT AND INDEXED (first indirect, then indexed)
( 1) REGl (register select, with value biased)
( 1) ARWDl (offset from base register)
( 1) CACHl (offset to character from base register)
( 5) FPVRl (offset from base register, then indirect)
( 1) REG2 (as in REGl, but indexed)
( 1) ARWD2 (offset from base register, then indexed)
( 1) CACH2 (offset from base register, then indexed to character)
( 2) FPVR2 (offset from base register, then indirect, then indexed)
( 1) CBM (circular bit-string mode)
( 1) ZBM (zig-zag bit-string mode)
( 1) STK (stack mode)
( 4) TCONV (type conversions: integer/floating-point, etc.)

Totals: 32 address modes through 17 classes

Trivia (link)