The code I posted earlier is a basic game loop skeleton, but it could really be used for any assembly program I suppose. There seems to be a lot to it, but most of the code at this point is simply support routines.
If you are following along, you can download the source and compile it with the E/A cartridge (either on the real 99/4A or with Classic99), and run it.
Over the next few posts I'll break down the code and finally start adding some interesting features. So, time to get started.
DEF MAIN
"DEF" is an assembler and loader directive that specifies where our program begins. The E/A or XB loader will add this name to the REF/DEF table so our code can be called. I used the label "MAIN" because that is pretty universal in the world of C, Windows, Unix, MAC, etc. programming as the name of a processes entry point.
**
* VDP Memory Map
VDPRD EQU >8800 * VDP read data
VDPSTA EQU >8802 * VDP status
VDPWD EQU >8C00 * VDP write data
VDPWA EQU >8C02 * VDP set read/write address
These are assembler directives that let us use labels instead of numbers. Any place in the code where you see VDPRD, the assembler will replace with >8800, etc. These are the hardware memory mapped locations for accessing the VDP. These values are used by the VDP routines I posted previously and included in the complete code download.
**
* Workspace
WRKSP EQU >8300 * Workspace
R0LB EQU WRKSP+1 * R0 low byte required by VDP routines
R1LB EQU WRKSP+3 * R1 low byte
R2LB EQU WRKSP+5 * R2 low byte
More equates to specify the workspace and the addresses of the low bytes for R0, R1, and R2. These come in handy particularly when dealing with the VDP routines because it is the MSB of a register that is sent to the VDP and the value we need a lot of the time is in the LSB of another register. The R0LB label is required by the VDP routines, the others are optional.
**
* VRAM Base Locations (must match the values set up in the
* set video mode subroutine.)
NAMETB EQU >0000 * Name table base
PTRNTB EQU >2000 * Pattern generator table base
COLRTB EQU >0300 * Color table base
Equates to use when calculating VRAM addresses. Using the equates allows the VDP tables to be moved without having to change a lot of code. We specify the base address of the various tables here and use the labels in our calculations. These labels must match the table locations set up in the "set video mode" subroutine.
**
* Scratch pad RAM use - Variables
*
* >8300 * Workspace
* >831F * Bottom of workspace
STACK EQU >8320 * Subroutine stack, grows down (8 bytes)
* >8322 * The stack is maintained in R10 and
* >8324 * supports up to 4 BL calls
* >8326
TICK EQU >8328 * 1 tick every 16.6ms (rolls after 18.2 mins)
VSYNC EQU >832A * 1 when VSYNC is detected, otherwise 0
* Random Number Memory Map
RAND16 EQU >83C0 * 16-bit random number
RAND8 EQU >83C1 * 8-bit random number
Here we are setting up equates that specify memory locations in the scratch pad RAM that we will be using. The WP will be loaded with >8300 and will use 32 bytes for the 16 general purpose registers.
Next will be 8 bytes used for the subroutine stack which will support a call depth of 4 (remember, addresses are 16-bit.)
The TICK count will be incremented every time the VDP issues a VSYNC which happens 60 times a second on NTSC consoles, and 50 times a second on PAL consoles. Assuming NTSC, that would be an update every 16.6ms, and since there are 65536 values in a 16-bit value, that means the counter will roll over every 18.2 minutes. This is fine since we are simply using it to determine how much (if any) time has elapsed since a previous event.
The VSYNC variable will be set to 0 unless the VSYNC signal was received, at which point it will have a value of 1 for a single pass through the game loop. We can use this variable to quickly check for and synchronize to the VSYNC.
RAND16 and RAND8 store the random numbers generated by our random number generator subroutine. The address >83C0 is used because that is what the console uses to store a random number "seed" in the form of the amount of time the use took to "press and key" from the master title screen. This makes for a really good seed and there is no reason not to use it.
As our program grows, we will be reserving more and more of the scratch pad RAM.
**
* Runtime Constants
* In an EA3 program these will be in 8-bit RAM, in a cartridge they
* will be in 8-bit ROM.
*
VSTAT DATA >8000 * VDP vsync status
NUM01 DATA 1 * 16-bit number 1
These are assembler directives and simply reserve and initialize memory as specified. DATA reserves 16-bit values, BYTE and TEXT reserve 8-bit values. I'm using them as constants because in a cartridge they will be in a ROM file and therefore unchangeable. In a program designed to be loaded (like this one), we could actually write to these values since they will be in RAM, either the low 8K or high 24K of the 32K RAM expansion.
In both cases we are using memory locations to hold the data even though we are treating them as unchangeable values. So why not just use equates (you may be asking)? Good question. The reason is because we can use these memory locations in instructions where an immediate values cannot be used. Remember, and equate is just a "search and replace", but these labels represent real memory locations. For example, take the NUM01 above. There are a lot of times when you need to compare a memory address to a number and the "immediate" instructions only work with registers. The the value "one" comes up a lot, as do other values which you will see as the program grows. There are several ways to code the check, and in the example code it is used to test if VSYNC is 0 or 1:
C @VSTAT,@NUM01
-or-
MOV @VSTAT,R1
CI R1,1
-or-
MOV @VSTAT,@VSTAT * This trick uses the CPU "compare to zero"
CI @VSTAT,1 * ILLEGAL
The "MOV" trick is okay, but only lets us test if the register is zero or not zero. If we specifically need to test among other values, then it won't help. Also, MOV requires 4 memory accesses minimum but C only needs 3, so C will be faster.
**
* Program execution starts here
MAIN LIMI 0
LWPI WRKSP
This is where execution of our program will start. First thing we do it shut off interrupts and leave them off. Next the WP is set up with the address we specified via the equate, which is >8300.
* Initialize the call stack and Finite State Machine (FSM)
LI R10,STACK * Set up the stack pointer
In this code R10 is used as a stack pointer. Since the TMS9900 CPU does not have stack support in the form of a real stack register, we will make our own. A stack is just a convention used to store and retrieve temporary data.
To set up a stack you simply set aside some memory and load a register with the first address. If we had a stack pointer we would load that, and use "push" and "pop" instructions. But we don't, so I picked R10 and the "pushing" and "popping" have to be done manually.
So, when we place a value on the stack (push), the data is copied to where the stack pointer (R10) is pointing, then the stack pointer is incremented or decremented depending on if your stack "grows" up or down in memory (up being towards bigger addresses.) In our case the stack grows up. It starts at address >8320 and ends at address >8327 (8 bytes):
MSB LSB
R10 --+-> >8320 >8321
grows+-> >8322 >8323
"up" +-> >8324 >8325
+-> >8326 >8327
Since we will be using our stack to store addresses, we will always "push" 16-bit values (words) on the stack, and remove (pop) 16-bit values off the stack. When the values are popped off the stack, the data where the stack pointer is pointing is copied to some designated register (or another memory location), and the stack pointer adjusted the opposite direction of a push (so decremented in our case.)
Using a stack like this allows us to have a few levels of subroutine calls (one advantage of BLWP over BL is that you don't need a stack, but you do need an entirely new workspace for each BLWP level.) There are three branching instructions in the TMS9900:
* BLWP: Branch and Load Workspace Pointer
* BL: Branch and Load
* B: Branch
We won't be using BLWP, so I'll leave it as an exercise for you to look it up. The B instruction is very simple, it unconditionally branches to the designated address. The B instruction is just like the unconditional jump instruction JMP, except JMP is restricted to jumps within -128 to +127 "words" away from the current location. This is because the location to jump to is stored as part of the JMP instruction's opcode as an offset, and there are only 8-bits to store the offset value (and the range of an 8-bit value (one byte) is 0 to 255 or -128 to +127.)
However, the B instruction's opcode is immediately followed by a complete 16-bit value (one word) that specifies the address to branch to, so it can branch to any *evenly* addressable location in the 64K range of the TMS9900 CPU. Instructions are always on even addresses. So, use B when you need to jump far, and JMP when you are within 127 words (the assembler will let you know if you try to JMP too far.) The main thing to remember is, B uses 4-byte to encode the instruction, JMP only uses 2.
So, that leaves us with BL. The "branch" part is just like the B instruction. However, the "load" part of the instruction is what lets us use this instruction for calling subroutines. To call a subroutine we need to remember where we are, jump to the address where the subroutine starts, then return to where we left off. So, to remember where we are before branching to a subroutine, we need to store the value in the program counter (PC), and that's exactly what the "load" part of BL does. The current PC value is placed in R11 (this cannot be changed, and whatever was in R11 is wiped out) and the branch is taken.
Now we are sitting in our subroutine and when we are done we need to "return" to the code that called the subroutine. Since we were careful not to destroy R11, it still holds the address of where we were before the BL call. Thus, we issue a B instruction using indirect addressing on R11, like this:
B *R11
Note: The assembler has a pseudo instruction "RET" that will be replaced with "B *R11". So any place you see "RET", it is the same as writing out B *R11.
The stack comes in to play when we need to call a subroutine from within a subroutine. Think about it, if we call a subroutine with BL, then that subroutine calls another with BL, the original return address is blown away unless we save it:
BL @SUB1 <--- stores current PC in R11
. . .
SUB1 code
code
code
WIPE BL @SUB2 <--- stores current PC in R11, blowing away previous return value
code
B *R11 <--- original return address is gone, R11 holds the address at WIPE
. . .
SUB2 code
code
code
B *R11 <--- returns to SUB1
To fix this, we have to store R11 in any subroutine that needs to call another. For assembly language, 2 or 3 levels is usually all you need. Any more than that and you need to rethink your program's organization. Thus, I set up a stack to support 4 levels of calls. In "bottom level" subroutines, i.e. those that don't need to BL to any other routine, you do not have to deal with the stack. Thus, in any subroutine that needs to call another subroutine, you do this:
BL @SUB1 <--- stores current PC in R11
. . .
SUB1 MOV R11,*R10+ <--- "push" R11 onto the stack and use auto-increment to adjust stack
code
code
BL @SUB2 <--- stores current PC in R11, blowing away previous return value
code
DECT R10 <--- adjust stack pointer (pop)
MOV *R10,R11 <--- copy address back to R11
B *R11 <--- returns to original calling location
. . .
SUB2 code <--- subroutine does not call any others, no stack required
code
code
B *R11 <--- returns to SUB1
I hope this is clear. If you are not familiar with the addressing mode of the TMS9900, you should read up on them a little so you better understand what is going on.
LI R15,STINIT * Initial state, one-time initialization
CLR @TICK * Clear the tick counter
In this code, R15 is used as the finite state machine (FSM) state variable. This just sets the initial value and clears the TICK counter.
A FSM is very simple really and you deal with them every day and don't realize it. For example, a stop light is a FSM. Basically a FSM has "states", and depending on the current state there are a fixed number of other states you could go to, meaning a fixed number of possibilities from where you are.
So, for a stop light, this would be the FSM:
state = red
timer = 10
forever
dec timer
when state is red:
if timer = 0 then
state = green
timer = 10
end if
when state is green:
if timer = 0 then
state = yellow
timer = 5
end if
when state is yellow:
if timer = 0 then
state = red
timer = 10
end if
end forever
Notice that it is illegal to go from yellow to green or from green to red. Once in a given state, based on the current state and external input, you decide the next state which includes staying in the current state. In a game for example, the "run game" state is maintained until all the lives are gone, at which point you would switch to the "attract mode" state or "enter initials" state if their score was high enough. So, the initial state for our game loop is STINIT or "state initialize".
**
* Finite State Machine (FSM)
FSM00
CI R15,STQUIT * WHILE R15 != STQUIT
JNE FSM10
STQUIT BLWP @>0000 * Quit
This is the top level "forever" loop that contains the state machine. R15 holds the current state, and as long as it is not equal to STQUIT, we will jump to FMS10. BLWP @>0000 performs a power-on reset. >0000 is the address the CPU loads when power is applied, so we are doing the same thing. Currently there is no condition to set R15 to STQUIT, so to end the program you have to power off the console (the QUIT key won't work since interrupts are disabled and it is the console ISR that checks for that key combination.)
FSM10
CLR @VSYNC * VSYNC indicator only active for a single cycle
CLR R1
MOVB @VDPSTA,R1 * Reading clears the VDP sync indicator
COC @VSTAT,R1
JNE FSM20 * No VSYNC, skip updating the TICK
INC @TICK * Increment the tick
INC @VSYNC * Set the VSYNC indicator
This is the guts of the game loop! Not much to it is there? This stuff is just not really that complicated. First we clear the VSYNC indicator since it is only active for a single loop through the FSM. We also clear R1 because it is about to get the value of the VDP status register and only the MSB will be modified, and when we check the status with COC we need to make sure the LSB is clear.
The MOVB @VDPSTA,R1 reads the VDP's status register into the MSB or R1 and also clears the register in the VDP. COC (compare ones corresponding) checks the VSYNC indicator from the status register. VSTAT was set to >8000 so we are only testing the most significant bit in R1. If there is no VSYNC then we skip forward to the FSM. If the VSYNC indicator was set, we increment the TICK count and set the VSYNC variable to 1.
* Branch to the current state
FSM20
B *R15 * SWITCH R15
This is the FSM selection. R15 always holds the address of where the code is for the current state. This instruction simply jumps to the current state, which initially is STINIT.
* One time initialization
*
STINIT
BL @GMODE * Set the graphics mode
BL @LSCS * Load standard character set
BL @OTINIT * One time initialization
LI R15,STRUN * Set next state
B @FSM50 * BREAK
This is the one-time initialization. First we set up the graphics mode, then load a decent character set (I really don't like the default character set!), and finally call a one-time initialization function that will to program specific stuff (since GMODE and LSCS are both pretty generic and meant to be reused.)
Finally we update R15 with the new state, which will be the "run" state. When the current state is done, we use a branch to jump to the bottom of the FSM for any additional processing that may need to happen. All states should jump to a single location! Just because this is assembly language does not mean we don't need to follow good program flow. We are basically performing code similar to C's WHILE loop and SWITCH statement.
* Main state when things are running, game is playing, etc.
*
STRUN
BL @PLOT
B @FSM50 * BREAK
This is the whole "run" state. It calls PLOT which does all the work. Also, we never leave this state since we are not doing any user input and the program is just not complicated enough yet.
* Every state jumps here when complete so any necessary out-of-state
* logic or decision making can happen if necessary.
FSM50
FSM99
B @FSM00 * WEND
*// MAIN
This is the bottom of the FSM. You would place any "out of state" processing here if necessary, then the code branches back to the top. The FSM uses branch because as your game (or whatever you are writing) grows, the bottom of your FSM will be too far from the top to use JMP.
*********************************************************************
*
* <subroutine skeleton>
*
SKEL
MOV R11,*R10+ * Push return address onto the stack
* Subroutine code here ...
DECT R10 * Pop return address off the stack
MOV *R10,R11
B *R11
*// SKEL
This is a skeleton subroutine to copy and paste when adding new subroutines. It contain the code necessary to manage the stack.
Here is the BASIC code we are reproducing below:
100 CALL CLEAR
110 CALL SCREEN(5)
120 A$="007E7E7E7E7E7E7E"
130 FOR I=40 TO 64 STEP 8
140 CALL CHAR(I,A$)
150 NEXT I
160 CALL COLOR(2,8,1)
170 CALL COLOR(3,6,1)
180 CALL COLOR(4,10,1)
190 CALL COLOR(5,15,1)
200 X=INT(RND*32)+1
210 Y=INT(RND*24)+1
220 C=INT(RND*4)*8+40
230 CALL HCHAR(Y,X,C)
240 GOTO 200
The PLOT subroutine. I got the idea from the Raspberry thread where the example of filling the screen with 4-color squares was used to demonstrate the speed. While even in assembly we can't get as fast as the Raspberry demo running on a GHz speed CPU, we do pretty good.
*********************************************************************
*
* Plot a random character
*
PLOT
* Only draw on the VSYNC
C @VSYNC,@NUM01 * If VSYNC is not active, return
JEQ PLOT01
B *R11
PLOT01
MOV R11,*R10+ * Push return address onto the stack
I want to point out that this subroutine has an initial check to see if VSYNC is active, and if not it simply returns. That means this subroutine will only run once every 16.6ms, or 60 times a second. Thus, the screen takes a little while to fill up completely with squares. If the VSYNC is active, we jump down to the push the return address on the stack because the rest of the subroutine will call other subroutines (the random number generator and VSBW.)
To see how fast assembly language can be, after you run this code once, comment out those first 3 lines so the routine runs every time it is called. The screen fills up in a few seconds! It's pretty cool and makes a nice effect.
* Get a random screen location
BL @RANDNO * Get a random number (in R5)
LI R3,768
CLR R4 * Dividend will be R4,R5
DIV R3,R4 * Make a number between 0 and 767
MOV R5,R0 * Move to R0 for the VDP routine
AI R0,NAMETB * Adjust to the name table base
This code gets a 16-bit random number in R5 and divides it by 768 to get a screen location. Remember that the screen is really a linear block of memory 768 bytes long, so to get an X,Y location we only need 1 number, not 2! Once we have our number, we stuff it in R0 to prepare for the call to VSBW (which requires R0 to contain the VRAM address to write to.) We also add the name table offset to R0 so we are writing to the correct location. This is where using the equates come in handy. We can generate our screen location as a 0-based index, then add that to the real base address of the name table.
* Get a random character 40, 48, 56, 64
BL @RANDNO * Get a random number (in R5)
SRL R5,14 * Make a number between 0 and 3
SLA R5,3 * Multiply by 8 (number is now 0, 8, 16, 24)
A @CHR040,R5 * Add to the base character
MOV R5,R1
SWPB R1 * Remember, the MSB goes to the VDP!
This is doing the same thing as above except we are getting a random number between 0 and 3 and using that to select 1 of 4 characters to display. The character value goes into the MSB of R1 for VSBW.
BL @VSBW
DECT R10 * Pop return address off the stack
MOV *R10,R11
B *R11
*// PLOT
With R0 and R1 set up, we write the byte to VRAM which displays the character on the screen. Then we clean up the stack and return to the FSM.
DEFTBL
CHR040 DATA 40,8
DATA >007E,>7E7E,>7E7E,>7E7E
CHR048 DATA 48,8
DATA >007E,>7E7E,>7E7E,>7E7E
CHR056 DATA 56,8
DATA >007E,>7E7E,>7E7E,>7E7E
CHR064 DATA 64,8
DATA >007E,>7E7E,>7E7E,>7E7E
DEFEND
This is a table-based tile (character) pattern setup to help make things easier. The format is:
Tile name (0 to 255), number of pattern bytes
Pattern Data, ...
You can use a label to set up references to specific tiles if necessary. The generic labels here should be replaced with something meaningful. This data will be in 8-bit RAM for an EA3 program, and in 8-bit ROM for a cartridge. The "name" and "length" values could be bytes, but using full words makes the code easier. If you have a lot of individual definitions, you may consider changing to BYTE.
COLTBL DATA >7050,>90E0
The color data is so small, 32 bytes total for all 255 tiles, that using a table layout was over kill.
I tend to place the pattern DATA close to the subroutine that reads them, at least until the code is working, by which time I'm used to it being where it is, so I keep it there...
*********************************************************************
*
* One-Time Initialization
*
OTINIT
MOV R11,*R10+ * Push return address onto the stack
* Initialize tile pattern definitions
LI R1,DEFTBL * Start of defintion table
OTI01 MOV *R1+,R0 * Move the character code into R0
SLA R0,3 * Mul by 8 to adjust offset into PGT
AI R0,PTRNTB * Add pattern generator table base
MOV *R1+,R2 * Move the byte count into R2
BL @VMBW
CI R1,DEFEND
JNE OTI01 * Loop until end of table
This code loads the pattern data from the table above. Since we are writing multiple bytes to the VDP via VMBW, R1 has to hold the address in CPU RAM of the data to write to the VDP, so we load that address into R1 first.
The first word in the table is the starting character that we are going to write a pattern for, so we move that value to R0 and auto-increment R1 past that word. The next word in the table the in number of bytes to write starting at the character code identified by the 1st word. So, the count goes to R2 and R1 is auto-incremented past that word. Now R1 is pointing at the start of the actual pattern data, R2 holds the count, and R0 holds the starting character.
Now, R0 needs two modifications. First, since each character requires 8 bytes of pattern data, we have to multiply the character code by 8 to get the proper offset into the pattern generator table. So we do that with SLA (shift left arithmetic). In case you do not know, shifting binary values left multiplies by 2, and shifting right divides by 2. This works the same way as moving the decimal point in decimal numbers multiplies or divides by 10. So, shifting left 3 positions multiplies by 8 (2x2x2). Then we add the pattern table base to R0 which is the final VRAM location for the specified character's pattern data.
Then we call VMBW to write the data, and finally check if we are at the end of the table. If not, we go back and start over reading the character code to write pattern data for, the number of bytes that follow, and the next set of pattern data.
Note that if you were defining patterns for consecutive characters, you would just include the pattern data and set the "count" value accordingly. You don't have to set up each character. In this case, the characters were spaces 8 apart to get each one in a different color group.
* Set colors
LI R0,COLRTB+5 * Start with color set 5 (char 40)
LI R1,COLTBL
LI R2,4
BL @VMBW
This simply writes the color data and should be self explanatory by now. If not, ask questions...
GMODE
MOV R11,*R10+ * Push return address onto the stack
CLR R0 * M3 is bit 6 and is off for Graphics I
BL @VWTR
* This is the "busy" register
LI R0,>01E0 * 11100000 Graphics I
BL @VWTR * 16K,No Blank,Enable Int,M1,M2,0,8x8,No Mag
LI R0,>0200 * Name Base Table to >0000 - >02FF (768 bytes)
BL @VWTR
LI R0,>030C * Color Table to >0300 - >0320 (32 bytes)
BL @VWTR
LI R0,>0404 * Pattern Generator Table
BL @VWTR * >2000 - >2800 (2048 bytes)
LI R0,>0507 * Sprite Attribute Table
BL @VWTR * >0380 - >03FF (128 bytes)
LI R0,>0605 * Sprite Pattern Table
BL @VWTR * >2800 - >2C00 (1024 bytes)
LI R0,>0380 * Disable all sprite processing by writing
LI R1,>D000 * >D0 (208) to the vertical position of the
BL @VSBW * first sprite entry
* Set colors
LI R0,>07F4 * R7 is the text-mode color and border color
BL @VWTR * White on bark blue
LI R0,>0300 * Start of color table
LI R1,>F400 * White on dark blue
LI R2,>0020 * All color table entries (32 bytes)
BL @VSMW
This is a complete "set the VDP" subroutine. The comments should let you know what's going on. Basically it runs through every VDP write-only register and sets each to a specific value, which is the only way to know what is in the registers since they are write only. Sprites are disabled and finally the background (border) color is set. Also, all the character sets are defaulted to the same foreground/background color scheme.
LSCS
MOV R11,*R10+ * Push return address onto the stack
LI R0,>2000 * Start at the space character
LI R1,SCS1
LI R2,SCS1E-SCS1
BL @VMBW
DECT R10 * Pop return address off the stack
MOV *R10,R11
B *R11
*// LSCS
This loads the "standard character set" from the data I posted very early on in this tread. The data is also included in the complete source .zip download. This is older code that I copy and pasted so you can see it does not use the equates we set up for the VDP table locations. Note how R0 is loaded with a value that assumes the pattern generator table is at >2000. It is in this case, but we should really fix this to be consistent, and the comment is wrong, the data starts with character >00, not the space >20 (32 decimal).
LI R0,PTRNTB
There, that fixes it. :-)
I think that is it except for the RNG and VDP routines which have been covered already (the RNG has its own thread.) Next time I'll be adding support for reading the joystick so we can get some user input and I'm going to develop a "scrolling within a window" so Owen will have something to mess with.
Side Note: While I appreciate the feedback everyone has given, no one is asking questions... So, either everyone knows all this already, or no one is trying out the code. Either way, I'll continue to post, but I'd like to know if I'm going over stuff people want to learn about, or if this is helping anyone at getting started with assembly? I'm trying to get into the guts of the game stuff, but there was a lot of necessary boring evil that had to be gone through first.
Matthew
Edited by matthew180, Fri May 28, 2010 12:11 AM.