In the Fall of 1972, my senior year in college, I was enrolled in EE 554: Digital Engineering Laboratory at the University of Wisconsin, Madison, taught by Professor Charles Kime – who also happened to be my adviser. A graduate student, Doug Tietz, served as a teaching assistant that semester. (I must have been a glutton for punishment: that same semester I was enrolled in EE 345, Semiconductor Physics and Devices, taught by Professor Henry Guckel. Both courses were demanding.)
EE 554 was taught using a set of logic frames arranged in four 19-inch equipment racks: the Digital Engineering Laboratory. The logic within a frame was interconnected using plug-boards that appeared to have been formerly used in IBM unit record equipment. The four frames could be interconnected using long cables, on the order of 10 feet each. One of the racks also contained a 12-bit core memory unit (there was also a 16-bit unit, but that unit may not have been operating correctly during our semester). More information on the lab setup can be found in the document: EE554-ZAP-1972-DigitalEngineeringLabEnvironment.pdf (available in the zip file located on Google Drive — see the link below).
Most of the lab experiments were conducted by every student, individually. However the final project of the course was to design and build a stored-program computer. Our design had four major units – one for each rack in the lab environment: Control (CPU), Arithmetic, Input/Output, and Memory. I was part of “Group III”.
The roster of Group III was:
- Guy Copeland: Group leader, Control
- Louis Chu: Control
- Rod Egan: Arithmetic and Software: Assembler, “Hangman”, Multiply
- Jay Jaeger: I/O and Software: Interpreter, Bootstrap Loader, Typing (documentation)
- Ed Rothen: Memory and Software: Divide
- Larry Stuessy: Arithmetic
- Ray Sundby: Test Board and Documentation: Typing, drawing
- John Wipfli: Arithmetic and Software: ZAP Graphics program.
The computer was named “ZAP” – that name being derived from the assembler symbolic name for the IBM 360 instruction for Zero and Add Decimal. On the IBM 360, decimal data is stored as two 4-bit BCD digits per byte, or “packed” two digits per byte – and is commonly referred to as “packed decimal”. Hence the “P”.
Being something of a pack-rat, I kept much of the documentation that was produced, including these documents, which are located inside the .zip file on Google Drive:
- “Digital Engineering Laboratory Equipment”, which can be found in file EE554-ZAP-1972-DigitalEngineeringLabEnvironment.pdf
- Some preliminary design notes, which can be found in file EE554-ZAP-1972-DesignConcept.pdf
- The preliminary design of the Input/Output subsystem, which can be found in file EE554-ZAP-1972-IO-Preliminary.pdf
- A draft of the final design document, which can be found in file EE554-ZAP-1972-SystemDoc-Ver0.5.pdf
- The final document produced by the group which can be found in file EE554-ZAP-1972-SystemDoc-Ver1.0.pdf
As a result of my effort, I also produced some additional files and documentation, also in the .zip file located on Google Drive (link below). The ZIP file includes:
- A version of the final document, with annotations describing changes I made in translating the design to a Field Programmable Gate Array (FPGA), which can be found in document EE554-ZAP-2015-SystemDoc-Ver1.1.pdf
- A spreadsheet which contains the unit interconnects and the gate allocations (a melding of the original gate allocations, where documented, and ones that I allocated for those that had not been allocated), in file PanelUtilization.xlsx
- Diagnostic programs I wrote while doing the work on the FPGA, named DiagnosticTest1.zap – DiagnosticTest8.zap .
- loader.zap (and loader.bin) – the bootstrap loader
- An assembler developed in the Perl language while I was doing the work on the FPGA, ZapAssembler.pl
- VHDL files containing the design, in sub-folder VHDL
Files and documents relating to the original “ZAP” and the FPGA implementation can be found at:
Why did I do this?
My collection includes several DEC digital computers, including the PDP-11/20, PDP-11/40 and PDP-11/45 formerly housed in the College of Engineering in a laboratory operated by Professor Richard Marleau, with whom I arranged to acquire them during an era when University of Wisconsin Surplus had long since discovered that the systems had no resale value.
That said, quite a bit of my focus has been on IBM’s 1410 Data Processing System. My interest in this system, the follow-on to the IBM-1401 Data Processing System began, when I was hired as part-time student help, and later as a graduate Project Assistant at the University of Wisconsin School of Business Data Processing Center, located in room B-5 of what was then known as the Commerce building. That system had originally been operated by the University of Wisconsin Registrar, and indeed still ran the scheduling program for incoming freshmen students each year while I was there. My interest in the 1410 led me to develop a modern simulator for the IBM 1410 – a full cycle-level simulation of that machine based upon documents IBM developed for training their field engineering staff of the day.
But what I would really like to do is to create a 1410 in actual hardware, and perhaps embed it inside an IBM 1415 console unit. Sufficient documentation exists, in the form of the IBM logic diagrams that accompanied another IBM 1410, on the “bitsavers” web site, at http://www.bitsavers.org/pdf/ibm/1410/drawings/ . I am also pursuing obtaining as much information as I can from IBM directly (for example, wire lists) if possible – but without success to date.
Learning a new technology – FPGA’s
The technology that seemed (and still seems) most promising for accomplishing my goal is the Field Programmable Gate Array – a “chip” made of thousands of logic “cells” containing both flip flops and combinatorial logic (in the form of look-up tables, or LUTs as they are called) that can reproduce a logic design.
Thus a few of years ago I requested as a Christmas gift, and obtained a Digilent Nexys 2 FPGA design board. (This product has since been retired by Digilent). This board, based on the Xilinx Spartan-3E technology is the target for the ZAP computer FPGA.
Clearly jumping into replicating the IBM 1410 design would not be a good first step. Instead, I planned and have been executing a plan, which includes:
- Getting up to speed on current logic design (FPGA) technologies
- Replicating a much simpler design in an FPGA to test the constraints and practices one might use in replicating an older computer design — the ZAP computer/
- Creating a database of the IBM 1410 design, based on the aforementioned documents, which was completed in 2018.
- Finally, replicating the IBM 1410 design.
To accomplish the first step, I sought out textbooks and other books on current design practices. There was quite a bit of value in waiting as long as I did (after I retired). FPGA’s came into being. But I still had a lot to learn. One of the places I looked was the local Half-Price book store in Madison, WI. There I found quite a few books on logic design. Coincidentally, several of them bore the mark of my former Professor – apparently having once been a part of his library. The book which I settled on was “Logic and Computer Design Fundamentals, Third Edition” by M. Morris Mano along with my former Professor, Charles R. Kime. This book was published in 2004, though of course more current editions exist.
I set out to go through this book much as a student might do. Of course, it was largely material that I had seen before in EE 554 in 1972. I was actually a bit surprised at how little had really changed. The biggest change, of course, is the expression of logic designs in formal languages, such as VHDL and Verilog, both covered in the text. Fortunately for me, Xilinx makes quite usable editions of their ISE Design and Vivado software available at no cost, and those tools along with the Adept application made available by Digilent, were all I needed to transfer designs I created for problem sets and to simulate them to confirm they were correct. I was also aided by the fact that answers to the problem sets are largely available out there on the Internet.
Thus I pursued the first objective, from October of 2013 until completed in June of 2014, with quite a few gaps in between for extended vacations, house remodeling and the like. I then focused on other things until January of 2015.
ZAP – in an FPGA
Given that I had quite a bit of documentation on the machine, a natural next step would be to replicate the design of the ZAP computer. Other alternatives might have included the 12-bit Digital Equipment PDP-8 computer family or the 16-bit Data General Nova computer family. The PDP-8/L, for which I have the engineering drawings, is not much more complicated than “ZAP”.
Thus I began to replicate the design of ZAP. I chose VHDL, but Verilog would certainly have been equally suitable. Since the intention was to replicate and document the original design as closely as possible, I chose to model the logic in a structural way. In the beginning, the basic structural elements were chosen to match those available in the original logic gates: NAND gates, Inverters, And-Or-Invert logic, and negative edge-triggered JK Flip Flops. I later added some D Flip Flops – more on that later.
The 12-bit core memory unit was modeled initially in a behavioral model, for simulation, and then using the SRAM mode of the cellular RAM available on the Nexys 2 board. Input/Output was accomplished using the RS-232 interface on the Digilent Nexys 2 board.
Each section of the design was modeled structurally, starting with memory, then arithmetic, control and finally I/O. A behavioral model test bench was also developed for each section, to test the logic under simulation. A spreadsheet was created (included in the zip file of the project as file “PanelUtilization.xlsx”) to document the interconnects for each rack, and logic module allocation within each rack.
Synopsis: Observations and Lessons Learned
These are covered in more detail under “Challenges” below, but the most significant things that I observed and learned from this exercise included:
- Given the nature of the original effort – a bunch of students who were not always working in close physical or time proximity to one another – that the machine ran as well as it did, behaving flawlessly during the requisite demonstration, even when it failed to do so at other times, really was a significant achievement.
- Again, given the context of the effort, the documentation for ZAP, albeit not without its share of inaccuracies, was pretty darn good – good enough that I was able to reproduce the design in an FPGA environment and get it working.
- Latches (in the case of ZAP, RS latches) can be problematic, in part because they can behave unexpectedly when the signals they generate are optimized in the look up tables (LUT’s) that implement the RS latch in an FPGA and the gates that the RS latch drives. The mitigation is to add a D flip flop clocked much faster than the original system clock (50 Mhz/20ns vs. the machine 1us clock) to isolate the RS latch from what follows it.
- Asynchronous control signals (used, for example, to load registers) can also be problematic, though not as often as the latches. The mitigation process is the same, to introduce a D Flip Flop clocked at a much higher speed to synchronize the signal.
- If the logic which generates an asynchronous control signal can glitch – can enter unexpected states as inputs change simultaneously – this can be problematic. Again the mitigation was the same: to introduce a D flip flop after the gate whose output glitched, so that the signal was only sampled synchronously.
Challenges, Simulation, Testing and more Challenges
The effort was not without its challenges. Documentation errors, sub-optimal design choices, unfamiliarity with the likely causes of certain kinds of issues when debugging (particularly when bringing a signal out to an FPGA IO pin so that I could observe it with a logic analyzer) all caused their share of confusion and frustration. All told, I probably spent something around 80 hours or so on the effort. Below is a description of some of those challenges. While the description below does try and present things within a given unit in the order in which they were encountered, in fact the testing and debugging occurred in four distinct phases:
- Coding the VHDL for each individual unit, and testing that under simulation, using a simulated behaviorally modeled memory unit.
- Developing a console that I could use to load the MAR, examine memory, deposit into memory, load the PCR, start the machine, and, finally, to automagically load the ZAP memory with a bootstrap loader.
- Integrating the units one at a time with each other and the console: Memory first, then the CPU, then Arithmetic and finally I/O.
- Debugging of the entire machine and writing and running diagnostics.
An early challenge was that there was not uniform documentation in each section as to how the gates were allocated and identified, and how the units were interconnected. This mattered to me because I wanted to make sure that the machine I ended up with reflected the original constraints on the number of gates available. Some sections did not document gate allocation completely and some did not document it at all – in those cases I just made it up, to ensure that the design was realistic given the constraints imposed by the original rack units. Also, the individual sections did not use a uniform notation to identify each gate. In the end, I settled on the top left INPUT to identify each logic unit. Another issue identified early on was that not all active low signals were always identified that way. A lot of those have been corrected in the annotated documentation, but not necessarily all of them. Also, there were some signals that were just passed through from one unit to another in ways that were not completely clear. There was also at least one interconnect signal left off: CLRPCR (clear the PCR). It was probably a late addition, necessary for the JMP instruction. In addition, the signals from TC and TD on the CPU to RC and RD on the Arithmetic unit were reversed in the documentation. Finally, the IO Done was documented as arriving at the CPU by two different paths: one directly and another via Memory. Odds are the direct connection was originally used, and the second was really intended for a direct-to-memory load capability that was never implemented.
In a very few cases there were undocumented (unconnected) inputs, which had to be tied to logic high in order for things to work properly.
During the design, I also recognized an issue with the manual controls. Originally each rack had its own set of push buttons and switches, and I knew that I could never have enough of those to replicate the design as originally envisioned. During unit testing of the individual units, I came up with a structure wherein each of the four major units had its own set of virtualized rocker switches, and a shared set of pushbuttons. Along with this design decision came the “TAR” (totally arbitrary rule) that the default operation would be with all rocker switches off. This in turn required changing the sense of a few of the units’ use of the rocker switches. It also had some impact on memory, particularly the READ and WRITE RS latches – more on that later.
Later, I came up with a design where these virtual rocker switches and pushbuttons were driven by a virtual console unit, controlled by a keypad “Pmod” (peripheral module) attached to the Nexys2. A real pushbutton was retained for master clear.
As I worked on the memory unit, I noticed that the documentation on page 28 had some errors. Firstly, to generate the active low signal at time CT3, the inputs for that logic gate for Q2 and Q3 needed to be the complemented signals. On the same page, the signal to generate the active low TMR signal at CT9 requires that the input for Q1 and Q2 be the complemented signals. Also, page 29 refers to signals “CPU Read” and “CPU Write”, but by the time they are used after CT0, the CPU can have withdrawn its active low request signal (once START sends MEMDONE off). I guessed that perhaps the READ and WRITE RS latches were added a little later, and the diagram on page 29 was not updated accordingly.
There was also a question of what went on when the CPU was not cabled in. In such a case, the signal showing up on the output of the NAND gate receiver would be 0 (an unconnected input to the destination rack receiver which was an inverter). If that signal were used directly in the Memory Sequence Control on page 34, then under those conditions the inputs to the NAND gates preceding the R/S latches for C/W and R/R would always be 0 – disabling them. If, however, those signals go only to the R/S latches on page 30, then the READ and WRITE signals would always be ‘1’, allowing the manual switch to take over. This also implies that during normal operation those two switches (rocker switches 8 and 9) would be left “ON”. As my convention was to have rocker switches off in normal operation, I had to invert those two signals.
Another set of challenges stemmed from the various RS latches, particularly those in the memory unit. There were two kinds of challenges associated with these latches. First of all, because they are asynchronous with respect to any of the ZAP system clocks, they could create glitches. Secondly, there were issues caused from the Xilinx ISE XST synthesizer optimizing the lookup tables (LUTs) associated with the gates in the RS latches and those inputs the latches fed, and so on.
The first glitch from an RS latch was associated with the RUN RS latch on page 29, figure 3.4. This RS latch is reset by an active low signal DONE’. Not only does this signal reset the RS latch, but it was used to set the DONE latch on page 30, figure 3.5. However, DONE’ is also dependent upon RUN. (Note: There is an error on page 29, figure 3.4, which shows the input to the gate generating DONE’ as the complement of RUN, when in fact it has to be RUN – active high). The result was that it did not last long enough to successfully set the DONE latch. While it worked under simulation, the issue was still visible but I didn’t notice it until it came up during integration with the CPU. Because I had not yet run into the situation with RS latches and optimization, I endeavored to fix this particular problem using structures similar to those in the original design. Thus I created an additional RS latch, RunLatchedUntilCT0, added to page 30, figure 3.5. This solved that particular problem.
Next I ran into issues with the RS latches that generate the READ and WRITE signals, also on page 30, figure 3.5. These were also reset by the active low signal DONE’ – and were thus now fed by RunLatchedUntilCT0. To make that work properly, I also gated this signal with the MEM_Start signal to make sure that they stayed set through the entire memory cycle. In addition, these were troublesome in part because of the earlier design rule regarding rocker switches. I worked around that issue by adding some combinatorial logic that gated the manual signals using the manual memory cycle initiate signal (see page 30, figure 3.5).
In a similar vein, there were issues with the CPU initiated memory cycles, in part because of the coexistence with the manual controls. I found this easiest to resolve by removing the RS latch at the top of page 30, figure 3.5, and replacing it with a signal called “CPU Request” which is the XOR of the CPU read and CPU write signals coming from the control unit. Until that change was made, there were issues with the active low version of MEM_Start forcing MemoryDone to stay high. I suspect something similar was done in the original implementation.
In part due to the aforementioned changes to the READ and WRITE RS latches in the memory unit, I found that the active low signal TMR used to load the Memory Data Register (MDR) did not work properly. To fix that I created an RS latch (again, trying to keep things consistent with how the memory unit was originally designed) to hold the READ signal until time CT11, because the R/R signal was reset at time CT3. Thus the signal Memory_Read_Cycle and associated RS latch, on page 29, figure 3.4 was created. This signal was then also used to gate the memory data into the MDR (see page 32, figure 3.7).
Note that the bus gating in and out of the memory unit was not really documented. I have added page 30.1 to document the bus gating of the memory unit.
(Incidentally, there was an opportunity here for the original machine to operate much faster. On a read, data was actually available 4us after the read was initiated, with the rest of the memory cycle time being taken up with the restore cycle writing the information back into core. On a write, data only needed to be held for 3us after the write to core was initiated. Thus we could have generated Memory Done for the CPU at CT7, knowing that the memory unit would be well past CT11 long before the CPU could change anything).
Another documentation error is on page 31, figure 3.6. The inputs to increment the PCR are active low, not active high. What this means is that the INC’ signal actually causes the PCR to increment when the input signal goes back high (and thus INC’ goes back low). So, the PCR is actually incremented on the trailing edge of the active low signal “CPU INC” as shown on page 31.
With these changes, the memory unit ran correctly under simulation and with the keypad-driven console – though occasionally it would behave oddly. As I investigated those occasional problems, I began to notice situations where, if I brought signals out to the I/O pins on the FPGA so that I could see them with a logic analyzer, whatever problem I was researching would vanish – only to reappear if I removed the I/O pin binding assignment.
The CPU documentation was largely complete and generally quite accurate. There were some errors in the documentation of the Finite State Machines (FSM) in the CPU – which became pretty obvious as they conflicted with the actual logic diagrams. There are notes on these errors in the annotated ZAP manual I created. One thing I noticed was that the “K” inputs to the instruction register were not documented. The most obvious thing was to treat them as having been low – although because of the nature of JK flip flops, a high input would have worked just as well. Another thing I noticed was that in manual clock the sync signal from the 3us clock flip flop 2 would not have been available – thus there would have been no way to sync IO Done or Memory Done if the manual clock was in use. In addition, it was not always clear whether or not the complement output of a given state flip flop was actually used, or not. Also, the diagrams on page 6 did not show DC resets for these synchronization flip flops, which were required for simulation to work properly (otherwise they ended up as permanently undefined). Another documentation glitch has to do with the generation of the INPUT signal, on page 18 – the three inputs to that NAND gate at T54 would necessarily be the active low, complemented signals to those originally shown.
Perhaps the biggest omission was that the signal to get the BDR to the BUS is not shown in the CPU diagrams. It took a few tries to get that right as I went through testing. The gates I used are in the VHDL model at F33 and T21, and are shown in the annotated documentation on page 19.1.
I noted that in a JMP/JMS instruction, the CPU goes through a memory fetch for the JMP/JMS destination – but does not actually use the address. Perhaps this was a leftover vestige of an idea to support indirect addressing. I also noticed that there is quite a lot of redundant logic between entering state IF1 and (not) entering Halt. One could have used a simpler arrangement where all of those signals fed a single set of gates and then used an XOR with SIE to feed the logic into HALT and IF1 – hindsight being closer to 20/20 than being in the heat of the moment.)
Unfortunately the operand fetch cycle input on page 16 was not fully documented. An AND gate is shown, but the gate inputs are for a NAND gate. Because this flip flop was necessarily a Type 1 flip flop, an inverter was required to generate the proper input. Perhaps earlier in the development it was a Type 2 flip flop with 2 inputs into AND logic, but then that Type 2 was needed elsewhere.
It was about this time that I noticed the first case of an incorrectly documented instruction, SHR on page 66. These are corrected in the annotated documentation.
I also noticed that state IN2 triggers the counter sequencer on page 17. At first this seemed odd, but after some thought, I realized that this would trigger the Bus Data Register to hold the result of the Input instruction, as well as the results ending up in the accumulator. This is useful because the output instruction transmits the character present in the BDR.
At this point I found I could reliably run a program that jumped to itself starting at location 0. (As it turned out later, this only worked at location 0 – more on that later).
It was time to bring arithmetic into the fold. The arithmetic unit documentation was generally in the form of template cells, and, as with the memory unit, the bus gating was not documented, but was easily divined. More importantly, the signals to load the Bus Data Register (BDR) as well as the adder conditions (zero, negative, overflow) and to gate the BDR onto the bus were not clear, but as it turned out, these were both just the signals coming from the CPU, used as is. The logic to generate Negative, Overflow and Zero were also not documented, but were easily determined. This added documentation appears on page 42.1.
As with the other units, there were some documentation errors in the Arithmetic unit, as well. The statement on page 36 regarding a Zero condition is incorrect – zero results only from 00..0 , because ZAP was a two’s complement computer. On page 37, the documentation on adder inputs is a little confusing. Where the cell indicates “AC or IX” this is in fact the adder left input (which can actually be the Accumulator, Index Register, or neither). Where the cell indicates “1’s C of D” it is referring to the output of the 1’s complement logic. Also, the diagram on page 37 has what are actually a pair of XOR gates incorrectly depicted. The diagram shows the gates at 67M and 67R as part of the same XOR gate – but they are not (see page 10 of the EE554 lab manual). The input labels are fine, and the wiring is fine – it is just that the logic drawing does not match the actual lab circuitry. I am not sure why the group decided to “expand” the XOR gate into NAND gates – they also did this for some JK flip flops for reasons unknown). On page 38, the 1’s C control signal is in fact the two’s complement control signal. An actual one’s complement is accomplished by inhibiting the addition of 1 onto the result by a set of combinatorial gates. On page 39, there is a note “AOI USED”, however the output actually used is the AO output. There is an error on page 41 as well: the right input gating shows the input A(i+1) as the input to the cell “i”. However, because ZAP bits are actually labeled from 1 to 12 where 1 is the most significant bit, that should actually be A(i-1). On the most significant bit, that input would always be 0. There were also errors on page 42 – more on those in a bit.
The I/O unit – the one I had designed during the project — had its share of interesting things in the documentation and design, as well. The first thing I noticed was that it used sets of NAND gates rather than JK flip flops to synchronize the incoming INPUT and OUTPUT requests from the CPU to the local unit clock, and to latch those signals based on state “I”. In addition, the design, rather than using the previous state ANDed with a signal to set or reset each flip flop in the FSM, used the JK flip flop in toggle mode, where setting both “J” and “K” would cause the flip flop to enter the opposite of the state it was currently in – a design much more prone to error. I find myself wondering if that was because my earliest experience with flip flops was with JK flip flops in this mode (going as far back as High School), rather than using them as D flip flops, as was done in the CPU.
As with the other units, there are documentation errors and/or ambiguities. One is that the overview diagram for I/O on page 44 might make it seem that the IO register is reset on Master Clear. That is not the case, however. As one can see from page 51, it resets during State “I”. Also, the diagram on page 46 does not make it clear that the State “I” flip flop must be set on Master Clear.
One change I made as a result of testing was to change the signal “INPUT+OUTPUT” used to toggle the State “I” flip flop from on to off to exit state “I” to use a new signal “IN + OUT”. The problem I found was that otherwise it might turn ON that flip flop when the FSM was not in state “I” already. I expect this change also occurred in the original implementation, but was not documented. Until I made that change, I had issues with input resulting in a zero accumulator for every other character and input hanging.
One other possible error I noticed in the original design was that the FSM did not wait for IDLE to be true before proceeding from State II to State III. I left the design as documented, however, because the presence of a 1 character buffer in the RS-232 interface component made the issue moot.
Once the arithmetic unit passed simple, manually entered test instructions, I then turned my attention to integration of I/O, and writing an interface between the I/O unit and the RS-232 “reference” UART for the Nexys2 available from the Digilent web site. That went reasonably smoothly, except that the reference UART implementation uses 9 bits instead of 8. Since the terminal emulator I was using to connect to the Nexys2 only used 8 bits, this required modifying the reference UART for 8 bits. However, as this only caused problems on characters sent TO ZAP (ZAP input), I only fixed that side of the UART. The output side still sends the extra bit. The transmitter side fix is left as an exercise for the reader. 😉
Having gotten I/O integration working to the point of a simple “echo” program, I next set to writing an assembler and bootstrap loader. Since the original loader was not found in my documentation, I wrote a new one – to slightly different specifications to match the assembler I developed in Perl. The assembler largely follows the original rules and syntax, but is more forgiving in format, and does not require fixed columns. Also, the output is slightly different. See ZapAssembler.pl to see the differences. One such difference is that this assembler only uses a single character for the octal 7 and octal 15 address and end flags, setting the high bit for these control characters – in the knowledge that no data word would have them set. In part this was because the flag words used in the original format, octal 7 and octal 15, could themselves appear as addresses in the actual data. (Perhaps originally all these also had the high bit set in the output?).
The original documentation was unclear as to what order the two 6 bit characters used to create a word would appear in the assembler binary output. I chose to send the most significant 6 bits first. Also, the bit numbering on page 71 is incorrect. It indicates that “the 6 most significant bits of the data being bits 13-8 of the word, and the 6 least significant bits being bits 5-0 of the word. To be consistent with the other documentation, this should be 1-6 and 7-12, respectively.
When I began testing the bootstrap loader program to load simple sequences into memory, I ran into a problem. Some of the gating / masking operations in arithmetic did not function properly. I traced the problem to errors on page 42, figure 5, in the logic table and in the gate connections derived from it. Typically these involved not gating one of the inputs to the adder correctly – 3 lines, as documented, gated nothing into the adder for three combinations. These corrections have been shown on the annotated documentation on page 42.
With those changes, the bootstrap loader could sometimes load short sequences correctly, particularly if I ran it in Single Instruction mode, but never a very long one. I began to “chase my tail” as I would bring some signal out to the IO pin of the FPGA to trace it, only to have the symptom disappear and be replaced by another one. I also had situations where an Input / Output / JMP loop would sometimes JMP to an address comprised of the previously input character. More on that, below.
About this time, after some correspondence on the Xilinx forums, I came up with an idea. The folks on the forums were universally negative, sometimes annoyingly so, about what I was trying to do. Time and again they encouraged me to change the design to be entirely synchronous. I would patiently explain that that was contrary to my goal (especially later when I try to implement an IBM 1410 in an FPGA), which meant that I wanted to retain as much of the original logic as possible. Others suggested that an FPGA would never work for what I was trying to do. One correspondent, however, had an interesting suggestion – to place a D “flop” after each and every combinatorial gate. I thought that an extreme solution at first, but later it did trigger a more conservative idea which I eventually put in place, which is to place a D flip flop, clocked by the 50 Mhz FPGA board clock, after each RS latch, and also use that to generate both the active high and the active low signals. In so doing, we essentially convert the design to be synchronous with respect to the 50 Mhz FPGA board clock, and also isolate the outputs of the RS latches from use in the FPGA look up tables (LUT’s) of the gates that follow. These flip flop additions are shown in the annotated documentation I created, with a vertical bar appearing after each RS latch. While I started doing this one latch at a time, essentially all of them were eventually treated this way, and I would expect to do that as a matter of general practice in the future.
At about this time I discovered the issue in Arithmetic as described earlier on page 42, figure 5, and fixed that issue, in order to get the OMA (Or memory with accumulator) instruction to operate correctly. I also quickly tired of entering instructions manually, so I added the ability to transfer the bootstrap loader code into memory via a single console command. The implementation sees the bootstrap code as existing in a separately addressed behaviorally-modeled ROM.
I mentioned earlier that an Input/Output echo program really only worked at location 0, and that sometimes it seemed that the JMP would jump to the address consisting of the previously input character. This turned out to be a timing issue with the signal “Gate from Bus” in the memory unit. Sometimes it was occurring too late. I then remembered that I had previously noticed that the documentation of this signal on page 27, figure 3.2 was contradictory – the logic diagram shows this signal as Q2 and NOT Q1, whereas the timing diagram showed it as Q1 and Q2. I had implemented the latter, which turned out to be incorrect, and changing that back to be consistent with the logic diagram fixed the problems with JMP.
Around this time I realized that the ZAP bus at times had no unit at all whose signal was driving it – it was essentially connected in a combinatorial loop. This probably would not cause any problems (though in the real machine it is possible that the bus oscillated when in this state), but just to be sure I put in a set of D flip flops after the memory unit’s bus output gating.
At this point the loader kind of ran – but typically not for very long unless I brought test signals out to the IO pins of the FPGA. I discovered that the BDR in the arithmetic unit would have the incorrect contents before being loaded into the MAR – even when the instruction was not indexed. I discovered that the BDR contents would start out correctly, but would go “bad” when the signal to load the BDR went from 0 to 1, which was surprising because the flip flops, even in their behavioral model, are negative edge triggered. This made me wonder if maybe that signal “glitched”. That turned out to be exactly the case. The CPU used a counter to time when to trigger loading of the BDR, at gate D54, page 18, using the C1 and Not C2 (10) time to trigger the load. However, after that, the counter keeps on counting to 11 until it is reset at AF4. The hypothesis was that during the reset, it would be possible for C2 to reset before C1, and thus “glitch” the active low signal “TR BDR” from the CPU. This was, in essence, another piece of asynchronous logic, and the approach to fixing it was the same as that used to fix the RS latches – I added a D flip flop in the control unit clocked by the 50 Mhz FPGA board clock to synchronize the signal so that the aforementioned glitch occurred between clock cycles.
After that fix, I proceeded to remove the debugging signals one at a time, and things worked until I removed the signal that gates the BDR to the bus. The fix for was the same, but was added in the Arithmetic unit rather than in the CPU.
Once I did these things, the machine became quite stable. A set of diagnostics were written, following along the same general lines as the DEC PDP-8/I diagnostics, trying as best I could to not rely on anything that had not already been tested earlier in the sequence as they were written. In the process I uncovered a few of the miss-documented instructions in the instruction set portion of the documentation, and a few errors in the assembler I wrote, as well.
Once the diagnostics were completed, I also did some “margin” testing by speeding up the clock. The machine ran just fine up to a 60ns clock with 40 ns on and 20 ns off. Remembering that ZAP generally uses negative clock triggering throughout, this was enough time for gate delays plus the additional 20ns required for signals to propagate through the added D flip flops on the RS latches and the two control signals to propagate properly before the next clock cycle.
The development environment used for the Digilent Nexys FPGA development board was the Xilinx ISE integrated development environment, version 14.6 and the Digilent Adept application version 2.4.2 was used to program the FPGA control memory. The installation of these is beyond the scope of this document.
Because this FPGA version does not have the original racks and switches, operation is slightly different. All references here refer to the development environment I used, a Digilent Nexys 2 FPGA development board.
- The Console Keyboard (PMod) is attached to connector PA1 on the Nexys 2. The functions of the console keyboard are:
- 0-7 load digits into a virtual pushbutton register (called the “keypad register”) The register is shifted left before adding in the keyed digit. It also causes the 7 segment display to display the keypad register.
- 8 – serves as “START”
- 9 – This key loads the bootstrap loader (found in the VHDL module ConsoleROM.vhd) into memory starting at location octal 7700 .
- A – This loads the virtual address register (MAR) from the keypad register (“load Address”). (Note: In practice there is a virtual MAR, because the MAR cannot increment)
- B – This loads the PCR from the keypad register (“Branch”)
- C – This sets the keypad register to 0000 (“Clear”), and causes the keypad register to be displayed on the 7 segment display.
- D – This loads the MAR from the virtual address register, loads the contents of the keypad register into the MDR, and initiates a memory write cycle, and then increments the virtual MAR. (“Deposit/Next”)
- E – This loads the MAR from the virtual address register, and initiates the memory read cycle. The memory word retrieved is displayed on the 7 segment display (the MDR is selected to display on the 7 segment display).
- F – This selects which of several registers will be displayed on the 7 segment display:
- AA: The Virtual Address Register (AAddress)
- FC: The PCR (program counter) (PC)
- AD: The MAR (ADdress)
- DA: The MDR (DAta)
- AC: The Accumulator (ACcumulator)
- 1A: The index register (IndeX)
- 10: The I/O register (IO)
- 00: The keypad register
- BTN0: Master Clear (the rightmost button).
- SW0: The rightmost switch, controls single instruction execution. Normally it is Off.
A typical sequence, once the FPGA has been loaded would be:
- Press BTN 0 — Master Clear
- Switch SW0 Off — SIE (Single Instruction Execution) off
- 9 – load the boot loader
- C – Clear the keypad register
- 7700 – the starting address of the bootstrap loader, into the keypad register
- B – Load the 7700 from the keypad register into the PCR
- 8 – Start
- Use a terminal emulator connected to the development board serial port to send the desired binary output from the assembler to the development board. Eight bits, no parity. The bootstrap loader should halt at location 7707 (7710 in the PCR)
- Press BTN 0 – Master Clear
- C – Clear the keypad register
- #### – enter the start address of the program into the keypad register
- B – load the starting address from the keypad register into the PCR
- 8 – Start