So now that I have a fancy programmable computer working with its guts bare to the world, what can I do with it?

Well, not much.

The 8 bits of information the computer can shift around necessarily has to be split into 4 bits for the memory address location and 4 bits for the program counter step. This means that there are only 2^{4} = 16 memory address locations for us to store all the commands and variables for our program – not much to work with at all! While all computers are only *notionally* Turing Complete since *formal* Turing Completeness requires infinite memory, rarely do you actually bump up against such restrictive limits. Even just doubling this to a 16-bit computer would let you write far more ~~wasteful~~ complex programs that work with far bigger numbers.

However even with the 8-bit constraint, it is possible with a little creativity to fit programs that are both interesting and useful in 16 lines of memory! I have copied over Ben’s multiplication program, and have also created two original programs: one which generates the Fibonacci Sequence, and one which checks if a number is prime (and spits out its largest factor if it isn’t).

And if you want to go even further, I thought of some ways that you could extend the instruction set to create programs that are effectively 2-3 times as long as is possible with Ben’s original list of instructions. If someone can think up a way to use this information to create a program that is interesting enough to elicit a “wow!” reaction, shoot me a comment or email and I’ll make a video showing it in action on my computer! I’ve listed a few challenges at the end of the post.

## Programs

**Multiplication **

This takes two numbers x and y, multiplies them together, and displays the result. This comes straight from Ben’s last video in the how-to series.

0: LDA 14 1: SUB 12 2: JC 6 3: LDA 13 4: OUT 5: HLT 6: STA 14 7: LDA 13 8: ADD 15 9: STA 13 10: JMP 0 11: - 12: 1 13: product 14: x 15: y

**Fibonacci Sequence**

Calculates and displays the Fibonacci numbers less than 255: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233].

Ben Eater implemented a Fibonacci sequence program in this video, but it had a bug that stopped 233 from being displayed. This is my re-work of a Fibonacci program, which fixes that bug and also introduces a feature to halt the program at the largest number, with re-running being as simple as pressing the reset switch on the computer. Alternatively, you can easily modify this program to reset automatically and print the sequence in a loop, by changing the “Jump on Carry” instruction on line 11 to point at the beginning of the program. I should note that this relies on knowing beforehand how many numbers in the sequence it will fit in 8 bits – if you didn’t know when the program would reach its max then you would want another JC 13 command in between lines 6 and 7 (but that doesn’t fit!).

0: LDI 1 1: STA 14 2: LDI 0 3: STA 15 4: OUT 5: LDA 14 6: ADD 15 7: STA 14 8: OUT 9: LDA 15 10: ADD 14 11: JC 13 # to loop the program, replace with JC 0 12: JMP 3 13: HLT # to loop the program, this line is not necessary 14: x 15: y

**Largest Divisor (Primality Test)**

This takes a number and calculates and displays the largest number that evenly divides it. If the output is 1, that means the original number was prime! The number x you are investigating has to be entered in both addresses 14 and 15 for the first run, and re-entered in 14 for each repetition. Note the new JNC command, which I’ll explain further on.

I’m pretty proud of coming up with this one – it may be the most interesting program that can physically fit on the computer without modifications. Any hey, it means that you could use this computer to help break really, really basic cryptography!

0: LDA 15 1: OUT 2: LDA 14 3: SUB 13 4: STA 14 5: LDA 15 6: SUB 14 7: JZ 10 8: JNC 2 9: JMP 6 10: LDA 14 11: OUT 12: HLT 13: 1 14: x # must enter each run 15: x # must enter first run

**Smallest Divisor (Primality Test)**

Similar to the above, but starts at 2 and works up. Should be faster than finding the largest divisor in all instances! Halts on the original number if the original number is prime.

0: LDA 15 1: OUT 2: LDA 14 3: ADD 13 4: STA 14 5: LDA 15 6: SUB 14 7: JZ 10 8: JNC 2 9: JMP 6 10: LDA 14 11: OUT 12: HLT 13: 1 14: 1 # must enter each run 15: x # must enter first run

## Commands

The computer as designed by Ben only uses 11 of the 16 slots available for commands, which leaves a lot of room to expand the base instruction set. Here is a list of commands that I think could be useful and would work directly with no hardware changes:

- Increment (INC): Add a specified 4-bit number directly to what’s in the A register. Saves from having to use a memory address to store a 1 (or other number) to count up and down by.
- Decrement (DEC): Similar to INC, subtract a specified 4-bit number directly from what’s in the A register.
- Load B / Store B (LDB / STB): Duplicates the load and store functions for the A register onto the B register.
- Display (DSP): Outputs the value at a specified memory address to the screen.
- Display Immediate (DSI): Outputs a specified 4-bit number to the screen. For example, if your program results in a true/false statement, you could spit out a 1 or 0.
- Jump if Not Carry / Jump if Not Zero (JNC / JNZ): Jump if the condition is
*not*met. Thanks to Micheal Burke with his awesome CPU8 app for this one!

For example, with all these commands you could rewrite the Smallest Divisor program with four fewer lines, which would free you up for things like prepopulating address 14:

0: DSP 15 -: LDI 1 # to prepopulate addr. 14 -: STA 14 # to prepopulate addr. 14 1: LDA 14 2: INC 1 3: STA 14 4: LDA 15 5: SUB 14 6: JZ 9 7: JNC 1 8: JMP 5 9: DSP 14 / HLT 10: - 11: - 12: - 13: - 14: 1 # must enter each run if not prepopulated 15: x

The real room for extensibility however comes from using all 8 of the microcode steps, rather than truncating at 5 to save execution time. Since the “meat” of each command takes only 1-3 steps (after the necessary 2 steps of loading the command instructions from memory), it’s possible to join together the meat of any 2 or 3 of the commands into one line. A useful one for example would be a Jump if Equal (JEQ), which would chain together a subtraction with a Jump if Zero. Or as a simple example, the HLT command can be tacked on for “free” at the last cycle of your last command.

And then if you really want to dive off the deep end, you could tailor-make your commands for a specific program, and fit in 2-3x the number of instructions you could otherwise. Every single address of program memory could have it’s own custom command – essentially you would be coding directly in microinstructions. I’d be really interested to see how complex a program someone could make with this approach!

One last room for optimization: if you know that you are going to need a constant value somewhere in your program, you can make one of the commands without an argument be that value specifically. For example, I need to decrement by 1 in the largest divisor program. I could make the HLT command be command 0000, and when I call it on line 12, call it as HLT 1. This would have unchanged behavior when that line is run, but it would also serve as the value of 1 (0000 0001) on line 13, and line 13 could then be used for something else.

## Challenges

Challenges to the reader which may or may not be possible, and will very likely require modifying the basic instruction set. If you figure out how to do any of these in 16 lines of memory, let me know and I’ll make a video of it running on my computer!

- Compute and display powers: a
^{b}= c- 3
^{5}= 243

- 3
- Sort 3 (or more) numbers in memory from smallest to largest: sort(c, a, b) = [a, b, c]
- sort(3, 2, 1) = 1, 2, 3

- Display the number of 1’s in the binary representation of an input number: countbits(a) = b
- countbits(15) = countbits(00001111) = 4
**Done 2019-07-19!**Thanks to collaboration with Tyler Richard and Micheal Burke. See the comments to this post.

- Compute the dot product of two vectors: dot(a, b, c, d) = e
- dot(1, 2, 3, 4) = [1, 2]
**·**[3, 4] = 1*3 + 2*4 = 11

- dot(1, 2, 3, 4) = [1, 2]
- Add together two 16-bit numbers: add16(a, b, c, d) = [e, f]
- add16(7, 208, 11, 184) = [19, 136] → (7*2
^{8}+ 208) + (11*2^{8}+ 184) = 00000111 11010000 + 00001011 10111000 = 2000 + 3000 = 5000 = 00010011 10001000 = (19*2^{8}+ 136)

- add16(7, 208, 11, 184) = [19, 136] → (7*2
- Something else!

A really simple way to do the count bits function would to be add the number to itself and, if the carry bit is set, add one to the answer.

0000: LDA 1111

0001: ADD 1111

0010: StoreA 1111

0011: Jump Carry 1000

0100: LDA 1101

0101: Sub 1

0110: Jump Zero 1100

0111: Jump 0000

1000: LDA 1110

1001: Add 1

1010: StoreA 1110

1011: Jump 0

1100: HLT

1101: Var Count (Starts at 8)

1110: Var Ans (Starts a 0)

1111: Var A

Awesome, I think that would work! You would need to use the INC/DEC commands I propose rather than ADD/SUB since you don’t have room to store 1, but I see what you mean and this flows really nicely when I work it out on paper. Will try it out on the machine when I get back home in a week. It would be great to cut off one line in order to display the final count with a DSP 1110 command after line 1011, but I can’t figure out a place to trim any lines. Perhaps if you brought in a new command that combined DSP and HLT.

Edit: Actually, I think you still need a STA 1101 after line 0101, otherwise your var count doesn’t get updated. Could probably get there by combining more commands.

I find when you have a JZ followed by a JMP, you can save one instruction using the JNZ instead. I think in this case it would look like:

0110: Jump Not Zero 0000

0111 HLT

That would free address 1100 for something else.

Obviously, a DSP with address or HLT with display could also free up some address space.

I was thinking the DSP instruction could display the contents at the address but add a new option for the HLT instruction to display the actual operand (immediate) value like a return code.

Yep, looks like a JNZ will save a line and do it! I haven’t tested this on the machine yet, but here’s a program that should count and display the number of 1’s in the binary representation of a number.

0000: LDA 1111

0001: ADD 1111

0010: STA 1111

0011: JC 1001

0100: LDA 1101

0101: DEC 1

0110: STA 1101

0111: JNZ 0000

1000: DSP 1110 / HLT

1001: LDA 1110

1010: INC 1

1011: STA 1110

1100: JMP 0000

1101: Count # initialize to 8 (0000 1000)

1110: Ans # initialize to 0 (0000 0000)

1111: X # must enter each run

Just tried this program out on the computer and it works like a charm! See the end of the post here for a new video of that running.

The following program should compute x^2 then display the result (assuming you implement the DSP instruction.)

0: LDA 15

1: JZ 11

2: STA 14

3: LDA 13

4: ADD 15

5: STA 13

6: LDA 14

7: DEC 1

8: JZ 11

9: STA 14

10: JMP 3

11: DSP 13

12: HLT

13: 0 # must enter each run

14: –

15: x # must enter first run

You can switch around 13 and 14 to group the two variables you need to enter together. 14 is used as a temp variable.

I like it! Seems like the multiplication program, but saves you from inputting the variable twice and has a speed up to check if your answer is 0 ahead of time.

Yeah, basically it’s just pre-programmed to calculate x*x instead of having to put it in twice. But might still be useful 🙂

I’ve been trying to do the exponent, where I’m outputting x^y. This is what I have so far:

0. LDA 22

1. SBI 1

2. STA 22

3. LDI x

4. STA 20

5. JZ 17

6. LDA 20

7. SBI 1

8. STA 20

9. JNZ 14

10. LDA 19

11. ADD 21

12. STA 19

13. JMP 6

14. LDA 19

15. STA 21

16. JMP 0

17. LDO 19

18. HLT

19. sum

20. z

21. x

22. y

The new instructions I have are SBI (subtract immediate, where you load in the IR output immediately into the B register and subtract), LDO (which is just a combination of LDA and OUT), and JNZ (jump not zero). I made these just to reduce down the instructions, but I’m still over by a lot.

The variables are: x for the base, y for the exponent (I’m using JZ to test for y=0 because I’m multiplying the base y-1 times), z for the counter index (it goes from x to 0 but uses JZ to test for z=0 because I’m adding x to the sum x-1 times), and sum for the final sum

I’m kind of stuck because I can’t seem to reduce any instructions down. 22 is way over 16, so it’s not looking too good right now. Anyone have any ideas?

Hmm, my first thought is to combine HLT into line 17 which gets you to 21 lines.

I don’t see how right now, but there might be a clever way to combine the chunks in lines 0-2 and 6-8 and knock down another 4 lines, getting you to 17. If that’s possible you’ve only got one line to go, though I’m having a hard time seeing where else you could cut something.

This new command might work or not, but in lines 10-12, perhaps you could rearrange things like this?

10. LDA 21

11. ADD / STORE 19

Alright thanks for the suggestions, I’ll play around with it and see. I think the 0-2 and 6-8 combination would be huge, so I’ll try that.

I have an alternative solution which, I think, sticks to Ben’s original instruction set. I don’t have a Ben original to test on though.

My solution is to do away with the counter since we can finish when the input value reaches zero. The problem is that we can’t test for zero after the counter has been increased because increasing the counter changes the zero flag.

Instead I test if the addition (left shift) produces a carry. If so I branch to code to increase the count. If no carry then I test for zero. If so I jump to the final output and halt code, otherwise I go around the loop again.

This does mean there’s an extra trip around the loop: in the penultimate loop value will be zero and carry will be set. The next and final loop sets both value to zero and carry unset.

One the plus side the code can abort early if the least significant bit(s) are zero.

start:

0000: LD A,value

0001: ADD A,value

0010: STA value

0011: JPC add

0100: JPZ done

0101: JMP start

done:

0110: LDA count

0111: OUT

1000: HLT

add:

1001: LDA count

1010: ADD one

1011: STA count

1100: JP start

one:

1101: 00000001 ;Constant

count:

1110: 00000000 ;Initialise to zero

value:

1111:

Scott, does the Fibonacci Sequence program (along with the rest of the programs) run on a version of the Ben Eater design as he originally published? I’ve entered Fibonacci Sequence program, verified what I wrote down on paper from above, verified that it was entered correctly, but I’m not getting a natural progression. I see the following sequence: 0,1,2,3,5,8,13,8,16, 24, 40, 64, 104, 168. I suspect I have a hardware issue.

Hey Jeff, I didn’t make any hardware changes to the computer from Ben’s original design, and the Fibonacci sequence program uses all original microcode. Some of the other programs use new microcode, but if you’re having issues with the Fibonacci program it’s likely a hardware bug on your end. Since you’re getting a “close” output, it’s probably something diabolical that’s going to need a close stepping through of things. 🙂

here one sort version of 0-255-0 program of Ben’s 8bit cpu with your extra commands. very simple..

0000 OUT

0001 INC 1

0010 JNC 0000

0011 DEC 1

0100 OUT

0101 JNZ 0011

0110 JMP 0000

I came up with this program that divides using only Ben’s commands, figured you all might be interested in it. Due to memory restrictions I couldn’t add a jump, so it will only divide evenly below 63. Please let me know any improvements!

0 LDA 14

1 SUB 15

2 JZ 4

3 JMP 6

4 LDA 12

5 OUT

6 LDA 14

7 ADD 13

8 STA 14

9 LDI 1

10 ADD 12

11 STA 12

12 0x01

13 Divisor

14 Divisor

15 Dividend

13 and 14 have to be the same and 12 has to be 1..

Thanks -Jon, 17

new instructions: JN or jump if negative ; STO or store output reg

I came up with this program that takes 2 numbers and finds the remainder if they were divided.

0: LDA 9

1: SUB 10

2: JN 4

3: JMP 1

4: STO 11

5: LDA 11

6: ADD 10

7: OUT

8: HLT

9: 11

10: 5

I haven’t really worked it out so be sure to reply if there are any bugs.

I implemented three of the instructions you mentioned.

1001 ADI (add immediate, analogous to your INC)

1010 SBI (subtract immediate, analogous to your DEC)

1010 DSP (same as yours)

This allowed me to rewrite the multiplication program to zero out location 13 so you don’t have to reset by hand after each run by using DSP 13, which shortens the program by 1 byte because you don’t have to LDA with the result for the OUT instruction. It also frees up location 12 in case you want to execute another instruction.

00: LDI 0

01: STA 13 //Zero out product

02: LDA 14

03: SBI 1 //Frees up location 12

04: JC 7

05: DSP 13

06: HLT

07: STA 14

08: LDA 13

09: ADD 15

10: STA 13

11: JMP 2

12:

13: product

14: factor x

15: factor y

I have a hard time making my Eater EEPROM programmer work, so I just add the instructions straight onto the chip with a TL866:

(each for x = 0 to 3, at each location and location + 1, all values hexadecimal)

ADI:

x4A: 08 02

xCA: 20 81

SBI:

x52: 08 02

xD2: 20 C1

DSP:

x5A: 48 10

xDA: 00 10

Hope found this interesting.

B

to know that you solved the primality test I took as a challange. Here my solution (with extra-commands, no hardware-changes):

0 SUB 0011 1111

1 STA 0100 1110 DIGIT

2 STA 0100 1101 VARIABLE

3 LDAO 1101 1111 (LOADS AND OUTPUTS)

4 LDA 0001 1101

5 SUBI 1011 0001 (SAME AS DEC)

6 STA 0100 1101

7 LDA 0001 1110

8 ADD 0010 1101

9 JZ 1000 1100

10 JC 0111 0100

11 JMP 0110 1000

12 LDAOH 1100 1101 (LOADS, OUTPUTS AND HALTS)

13

14

15 x # MUST ENTER FIRST

My solution for x^y. With some smaller hardware modifications (Reset A register; decrement a given ram address with one command, needs to reset B register and to set the least significant bit directly by opcode). The code is tested and works:

0000 LDA R begin adding loop

0001 ADD X

0010 STA R

0011 DEC C (decrement counter)

0100 JNZ 0 end adding loop

0101 LDA R

0110 STA C

0111 AR (reset A-register)

1000 STA R

1001 DEC Y (decrement exponent)

1010 JNZ 0

1011 LDAOH R

1100 COUNTER ( = X at start)

1101 RESULT (empty at start)

1110 EXPONENT (program exponent – 1)

1111 BASIS