HowTo

Pippin Reference

Prof. Bartenstein

The Pippin is a very much simplified computer architecture designed for teaching. Pippin was first introduced by Rick Decker and Stuart Hirshfield in their paper, "The Pippin Machine: Simulations of Language Processing"[2001] to describe tools used in their book, The Analytical Engine: An Introdution to Computer Science Using the Internet [1998]. This web page describes a slightly altered and modified version of that architecture. There is no actual Pippin hardware (to my knowledge) because the Pippin architecture is too simple to be practical, but it is easy to simulate how Pippin would work, and demonstrate the capability of Pippin.

Introduction to Pippin

The Pippin architecture has two main components, a Central Processing Unit (CPU), and a memory, both described in more detail below. The CPU is primarily responsible for executing single Pippin instructions. Those instructions update the internal state of the Pippin machine by modifying the registers in the CPU and possibly the memory. A Pippin program is a list of instructions to be executed on the Pippin CPU. A Pippin program can be desribed using a man-readable Pippin assembly language, which can be converted by an assembler into machine readable "binary" Pippin object code. A Pippin Job is an execution of a Pippin program.

For simplicity, the basic data type used in Pippin is a 32 bit (4 byte) two's complement "word". Pippin does not support floating point data types, or characters or strings.

Pippin CPU

The Pippin CPU consists of an Arithmetic/Logic Unit (ALU) and several registers. The ALU takes one or two operands and an operation, and uses that operation to produce a result. For instance, the operands 6 and 21 with an operation of ADD produces a result of 27. All the different operations are described in the Instructions section below.

The Pippin CPU also contains several registers to keep track of the state of the Pippin machine. These registers are as follows:

accumulator: A general purpose 32 bit register to keep the result of most instructions.
instructionPointer: A 32 bit register to keep track of where the next instruction to be executed is in memory.
dataMemoryBase: A 32 bit register to keep track of where this program's data memory starts in memory.
halted: A one bit flag used to keep track of whether the program is running (false), or has finished (true). When a program is started, halted is set to false. When a HALT instruction is executed or when an abnormal event (such as a memory exception) occurs, halted is set to true.

The CPU also keeps track of the memory it is connected to, and the current job that is executing.

In our Pippin simulation, the CPU is modeled using the CPU class.

Pippin memory

The Pipppin memory is very simple, modeled in the Memory class. The memory consists of an array (really a vector) of Pippin words (4 byte integers). The size of the memory is configurable, but is currently defined in Pippin.MEMORY_SIZE in the Pippin class at 4096 words (which is a 16K memory.)

Each word in memory is individually accessable by the get and set methods, using the address in memory, which is just the index into the memory array. For example, I can use the address 141 to read or write a specific word in memory.

In Pippin, a job (a running program) will take a specific subset of memory. The instructions for the program will be loaded into memory starting at a specific codeStart location in memory. After the last instruction is the start of the data used for the program. When the program is running, the dataMemoryBase register is points to the start of the data. The program also has a dataSize that indicates how many words in the memory the program will use.

A program only uses memory from the codeStart location to the dataMemoryBase + codeSize location. Therefore, it is possible to load several programs into the Pippin memory. As long as the program's memory requirements don't overlap, mutliple programs can use the memory without causing incorrect results.

A program should use the CPU's getData and setData methods to access memory when reading or writing data. The getData and setData methods use the dataMemoryBase register as the base, and the loc argument as an offset. This allows programs to be loaded at different locations in memory, but use consistent data memory indexes.

Pippin Instructions

A Pippin instruction has three components: an operation, a mode, and an argument. The operation defines what the ALU should perform, and how the ALU results should be handled. The mode defines how to use the argument. The argument is a numeric value. All three are described in greater detail below. Instructions are modeled using the Instruction class.

A Pippin instruction may be represented using an encoded binary format in a single Pippin 32 bit word. There are methods in the Instruction class to take an operation, mode, and argument and encode it into a binary instruction, or to take a binary instruction, and decode it into an operation, mode, and argument.

Pippin Instruction Operations

In Pippin, there are four classes of operations: memory operations, arithmetic operations, logical operations, and flow control operations.

Memory Operations

LOD: Stores the argument value into the accumulator register.
STO: Writes the accumulator register value to memory.

Arithmetic Operations

ADD: Adds the accumulator to the argument value and stores the result in the accumulator register.
SUB: Subtracts the argument vale from the accumulator register and stores the result in the accumulator.
MUL: Mutliplies the argument value by the accumulator and store the result in the accumulator.
DIV: Divides the accumulator by the argument value and stores the result in the accumulator.

Logical Operations

NOT: Sets the accumulator to 1 if it was 0, or 0 if it was non-zero. (No argument value used.)
AND: Sets the accumulator to 1 if both the accumulator and argument value are non-zero. Sets the accumulator to 0 in all other cases.
CML: Sets the accumulator to 1 if the argument value is less than zero, 0 in all other cases.
CMZ: Sets teh accumulator to 1 if the argument value is zero or 0 if the argument value is non-zero.

Flow Control Operations

JMP: Adds the argument value to the location of the JMP instruction, and stores the result in the instructionPointer.
JMZ: Adds the argument value to the location of the JMZ instruction, and if the accumulator is zero, stores the result in the instructionPointer.
HLT: Sets the halted register to true.
NOP: Does nothing.

Pippin Instruction Modes

The instruction mode describes how to convert the argument to the "argument value" described in the operations above. The description of the valid instruction modes are as follows:

NOM: No Mode - no argument value is required for the operation, so the argument is ignored.
IMM: IMMediate - the argument itself is the argument value.
DIR: DIRect - the argument contains an index into memory. The argument value is the value at that index in memory, or if you think of memory as an array, MEM[arg]. (In the STO operation, the DIR mode indicates that accumulator should get written to memory using the argument as an index, MEM[arg]=accumulator.)
IND: INDirect - the argument contains an index in memory. That index contains *another* index in memory. The argument value is the value at the second index. If you think of memory as an array, MEM[MEM[arg]]. For a STO operation, write to the memory using the second index, or MEM[MEM[arg]] = accumulator.

Pippin Instruction Arguments

We normally think of an instruction argument as a simple Pippin 32 bit word. However, in order to fit the entire instruction in a single Pippin 32 bit word, the argument is restricted to a 24 bit two's complement value instead of a 32 bit two's complement value. (In decimal, that restricts the argument to -8,388,608 <= arg <= 8,388,607.) Depending on the mode, the argument may be used as an immediate value, an operand to the ALU, or an address in memory, where the value in memory is used as an operand to the ALU.

Pippin Programs

A Pippin program consists of two components; a list of instructions, and a list of initial value / data location pairs. It should be possible to run a Pippin program, starting at the first instruction, and executing sequentially with the exception of jump instructions, and eventually reaching a halt instruction. There is no guarantee that a halt will be reached, but well-formed programs will eventually halt.

We model a Pippin program with the Program class.

A Pippin program typically starts as a Pippin assembly language description of the program in a .pasm file. A Pippin Assembler converts the assembly language program into "binary" object language form in a .pexe file.

A Pippin program in a .pexe file can be loaded into memory at a starting location by writing the instructions into memory starting at the location specified, setting the dataMemoryBase register to the first word after the last instruction, and then writing data intiliazation values to the data memory. Once this is complete, the instructionPointer register can be set to the first instruction, the halted register can be set to false, and the program can be run.

A running program may be interrupted or "swapped out" by saving all the register values. At some later point, that same program can be "swapped in" by restoring all of the register values. Since the memory footprint of a program does not overlap the memory footprint of a differnt program, it is possible to swap out a running program, swap in another program, run the other program for a while, then swap out the other program and swap in the original program by restoring all the registers, and continue processing without causing any errors.

Pippin Jobs

A Pippin job, as modeled by the Job class, keeps track of a running program, and models information that is normally handled by an operating system. A job keeps track of:

The program associated with that job.
The location where the first instruction is loaded, the codeStart value.
The size of the data memory reserved for this program.
Whether this program has been loaded in memory or not.

When a job is swapped out, the job also keeps the values of all the registers, the accumulator, the instructionPointer, the dataMemoryBase, and the halted flag, saved at the time the program was swapped out. The swap in process consists of restoring all of these values to the real CPU registers.

The Job class has methods to load the job into memory at a specific location, which writes the instructions into memory, performs data initializations, and leaves the CPU and memory in a state where instructions in the program can be executed. The reload method resets the memory initializations and the halted flag, but does not rewrite the program instructions. The class also has swapIn and swapOut methods.

Pippin Assembly Code

Assembler file (.pasm) Syntax

The overall syntax of the Assembler file is as follows:

A comment is defined as everything from the first # found on a line until the end of the line. The assembler may ignore all comments.
Empty lines or lines that contain only comments should be counted as an assembler file line, but otherwise may be ignored.
Every non-empty line up to the line that contains the string ---data--- is considered an instruction line. The syntax of instruction lines are described Assembler Instruction Line Syntax section below.
Everything after the ---data--- line is considered a data line. The syntax of data lines are defined below as well.
Only one ---data--- data delimiter is allowed. A second data delimter line should cause the assembler to detect an error and ignore all the lines following the second delimiter.

While the Assemlber is processing instructions, it needs to keep track of two different locations - the line number in the .pasm file, and the index of the instruction being processed. Since each instruction will eventually take one word of memory when it is loaded into Pippin memory, the index of the instruction represents the offset from the beginning of the program.

Assembler Instruction Line Syntax

After removing any comments, an assembler instruction line must consist of an optional label, followed by an operation mnemonic, and an optional argument.

A label is any alpha-numeric token, followed by a colon (:). The label must be unique in this program; it may not match any other labels. An alpha-numeric token is a white-space delimited token that consists of an alphabetic first character, a-z or A-Z; followed by any number of alphabetic characters or numbers, a-z and/or A-Z, and/or 0-9. The trailing colon is not a part of the label. If a label is not unique, the assembler should report an error on this line and ignore the label. If the token preceding a colon contains characters which are not alphabetic or numeric, the assembler should report and error and ignore the label. The label represents the index in the program of the next instruction, and can be referenced as an argument in JMP or JMZ instructions.

If the first token on an assember program line is not followed by a semi-colon, the assembler should assume that there is no label on this line. In this case, the first token is assumed to be the operation mnemonic.

The next token after the optional label (or the first token if no label is present) is the operation mnemonic. This token must consist of one of the valid three-letter operations defined in the Instruction Operations section above. The operation mnemonic may be upper case, lower case, or mixed case. If the operation mnemonic field in the assembler instruction line does not match any of the valid operations, then the assembler should report an error on this line, and ignore the rest of the line.

For operations that do not require an argument, specifically, NOT, HLT, and NOP, no argument token should be present. If an argument token is present, the assembler should report an error for this instruction.

For all other operations, the last non-comment token on an instruction line is the argument. The argument may be one of several kinds of values:

Numeric Token: A numeric argument has an optional '+' or '-' prefix, followed by numeric characters between 0 and 9. Examples include "7", "+09", and "-234".
Alpha-Numeric Token: An alpha-numeric token is a token which starts with an alphabetic first character, a-z or A-Z; followed by any number of alphabetic characters or numbers, a-z and/or A-Z, and/or 0-9. Examples include "area", "Width10", or "END5IF".
Direct Numeric Token: A token which starts with a single at sign (@), followed by a numeric token. Examples inlcude "@7", "@+09", and "@-234".
Indirect Alpha-Numeric Token: A token which starts with an at sign (@) followed by an Alpha-Numeric token. Examples include "@area", "@Width10", and "@END5IF"
Indirect Numeric Token: A token which starts with a two at signs (@@), followed by an Numeric token. Examples include "@@7", "@@+09", and "@@-234".

The following define the interpretation of the argument token, depending on what kind of token it is, and what operation has been specified in the operation mnemonic for this instruction.

If the argument is a numeric token, it represents an immediate value. In this case, the mode should be IMM, and the numeric argument should be converted to an integer. If a numeric token is supplied for a STO, CML, or CMZ operation, all of which do not support immediate arguments, an error should be reported on this line by the assembler.
For JMP and JMZ operations, the argument may contain an alpha-numeric token which represents a line label. In this case, the line label must appear as a line label token (the first token of a line, followed by a semi-colon) somewhere else in the program (either before or after the current instruction.) If the argument is a valid label, the jump instruction should use an immediate mode (IMM), and the argument value should consist of the instruction number of the line label minus the instruction number of the current instruction. The value may be positive representing a jump forward or negative representing a jump backward.
For operations which support a direct mode argument, the argument token may an alpha-numeric token that represents a variable name. Each unique variable name should be associated with one word in the program's data memory. The Assembler should assign each unique variable name to a location in the program's data memory, starting at 0. For variable name tokens, the mode should be direct (DIR), and the argument should be the location of that variable in the program's data memory.
Another way of specifying a direct argument for operations which support direct arguments is a direct numeric token. In this case, the instruction mode should be DIR, and the value of the argument should be the numeric argument converted to an integer.
For operations which support an indirect mode, the argument token may consist of an Indirect Alpha-Numeric token in which the alpha-numeric part of the token consists of a variable name. In this case, the instruction mode should be indirect (IND), and the argument value should be the location of the variable in the program's data memory.
Another way of specifying an indirect argument for operations that allow indirect arguments is to use an Indirect Numeric token. In this case, the instruction mode should be indirect (IND), and the value should be the numeric argument converted to an integer.

Any argument that does not match the specifications outlined above should be considered an error by the assembler.

Assembler Data Line Syntax

A Pippin assembler data line conists of three white-space delimited tokens, the location specification, followed by an equal sign (=), followed by a value specification.

The location specification may either be a numeric token, in which case the location is simply the conversion of that token to an integer, or an alpha-numeric token, in which case the location is the location of that variable in the program's data memory. The second token must be an equal sign, and nothing else. The third token must be a numeric token, and the value is that token converted to an integer.

Pippin Object Code

Pippin object code is the "binary" representation of a Pippin program. The "binary" is in quotes because in order to make Pippin programs easier to debug, we aren't actually reading an writing binary data to the .pexe files. Instead, we are writing the ASCII representation of integers.

A Pippin object code file consists of two sections - an instruction section, and a data intialization section, separated by a divider line that contains the string "---init---". Everthing before the divider line is an instruction line. Everything after the divider line is a data initialization line.

Instruction lines consists of the character representation of the instruction, encoded as an integer.

Data initialization lines consist of the character representation of the integer location in the program's data memory, followed by an equals sign (=), followed by the character representation of the intial value for that location.

The Program class has both writePexe and readPexe files that can read or write a program to a .pexe file.

Contents: