Jump to content
IGNORED

Lean F4 boot framework


Tjoppen

Recommended Posts

I fiddled around with F4 (32k) bankswitching today and figured out how to boot everything without wasting space for stubs in every bank that jumps to Start in the appropriate bank.

The idea is very simple: put Start in bank 7 and point the initialization vector to $1FFB in all others. At $1FFB you put $4C = JMP Abs which causes the cart to switch to bank 7, read the address at $1FFC-$1FFD (Start) and finallly jump to the initialization stuff just as it would happen if the machine booted in bank 7.

The good thing with this is that it doesn't waste any space - it even uses space that would otherwise be wasted ($1FF4-$1FFB)! Just 18 bytes of overhead per bank - even less if you don't need JMPBank in some of them (saves six bytes).

This approach can be adapted for F6 and F8 as well. The code (feel free to use):

	processor 6502
include vcs.h
include macro.h

;FPS = 50 or 60
;PAL = 0 or 1
#if FPS==50		 ;PAL
VBLNK   equ 48
LINES   equ 228
OVERSCN equ 36
#else			   ;NTSC
VBLNK   equ 40
LINES   equ 192
OVERSCN equ 30
#endif

VBLNK64 equ VBLNK*19/16-1   ;value to set TIM64 to for VBLNK
OVERS64 equ OVERSCN*19/16   ;ditto for overscan

JMPBank equ $1FEE

;18 byte bootstrap macro
;Includes JMPBank routine and JMP to Start in Bank 7
MAC END_SEGMENT
.BANK   SET {1}
echo "Bank",.BANK,":", (JMPBank - *), "free"

org JMPBank + (.BANK * 4096)
rorg JMPBank
;Jump to fnptr in bank X
;Example usage:
;   SET_POINTER fnptr Address
;   ldx #Bank
;   jmp JMPBank
;
;$1FEE-$1FF3
nop $1FF4,X	 ;3 B
jmp (fnptr)	 ;3 B
;$1FF4-$1FFB
.byte 0,0,0,0
.byte 0,0,0,$4C ;JMP Start (reading the instruction jumps to bank 7, where Start's address is)
;$1FFC-1FFF
.word $1FFB
.word $1FFB
;Bank .BANK+1
org $1000 + ((.BANK + 1) * 4096)
rorg $1000
ENDM

;RAM
SEG.U VARS
org $80

fnptr   ds  2

echo "RAM:", ($100 - *), "bytes left"

;ROM
SEG CODE

;Bank 0
org $1000
rorg $1000

Dummy
.byte $FF	   ;Dummy byte else DASM fails to assemble this correctly.
				;You can remove this when you have anything in Bank 0 at $1000

END_SEGMENT 0

END_SEGMENT 1

END_SEGMENT 2

END_SEGMENT 3

END_SEGMENT 4

END_SEGMENT 5

END_SEGMENT 6

Start
CLEAN_START

MainLoop
VERTICAL_SYNC
lda #VBLNK64
sta TIM64T

WaitForVblankEnd
lda INTIM
bmi WaitForVblankEnd_Overflow
bne WaitForVblankEnd
WaitForVblankEnd_Overflow
lda #0
sta VBLANK
;NOTE: Don't set COLUBK before VBLANK has been turned off (above)
;	  Otherwise you get ugly colors for the first few lines

ldx #LINES
Kernel
sta WSYNC
stx COLUBK
dex
bne Kernel

lda #OVERS64
sta TIM64T
lda #2
sta VBLANK
WaitForOverscanEnd
lda INTIM
bmi WaitForOverscanEnd_Overflow
bne WaitForOverscanEnd
WaitForOverscanEnd_Overflow
jmp MainLoop

echo "Bank",7,":", (JMPBank - *), "free"

org JMPBank + $7000
rorg JMPBank
;JMPBank
;Jump to fnptr in bank X
;$1FEE-$1FF3
nop $1FF4,X	 ;3 B
jmp (fnptr)	 ;3 B
;$1FF4-$1FFB
.byte 0,0,0,0
.byte 0,0,0,0
;$1FFC-1FFF
.word Start
.word Start

Edited by Tjoppen
  • Like 5
Link to comment
Share on other sites

Very clever! I only tested it in Stella. What I found is it works, but depends on the BRK vector in the last bank (which is fine). I have modified your code to show this. If the BRK vector is hit in Bank 7 then the colours will scroll instead of being stationary.

 

 

TestBankTjoppen.zip

 

 

I think this was a very clever approach to use BRK in this fashion! The other BRK vectors in the other banks are still free too. :)

 

I also noticed in your code that you put in a dummy byte to have DASM compile the empty space correctly. I have also ran into this bug, and used the same solution (even using a .byte $FF), ha ha

Link to comment
Share on other sites

Very clever! I only tested it in Stella. What I found is it works, but depends on the BRK vector in the last bank (which is fine). I have modified your code to show this. If the BRK vector is hit in Bank 7 then the colours will scroll instead of being stationary.

 

You seem to imply that BRK would happen randomly? Since it's under software control you can just do you BRK-reliant stuff in one of the other banks since. Their BRK vector is free, as you pointed out.

 

Speaking of BRK, why to peoeple use it? Is it to save ROM, since It's slower but smaller than JSR?

 

I also noticed in your code that you put in a dummy byte to have DASM compile the empty space correctly. I have also ran into this bug, and used the same solution (even using a .byte $FF), ha ha

 

Yep, it's probably time to switch to that other, better assembler I read about on here.

Link to comment
Share on other sites

You seem to imply that BRK would happen randomly? Since it's under software control you can just do you BRK-reliant stuff in one of the other banks since. Their BRK vector is free, as you pointed out.

It looks like in Stella it hits BRK every single time. You can see the address and processor being pushed on the stack. Every time Stella opened it seemed to like starting in Bank 0, with the stack pointer at $FF. On a real 2600 I'm not sure if the stack pointer would always be at $FF, and I think the bank selection would be more random.

 

I tried the rom I modified on a Harmony cart today (single image mode), and BRK was hit 50% of the time, and the other 50% it was the Reset vector. Does anyone else get this result?

 

 

 

Speaking of BRK, why to peoeple use it? Is it to save ROM, since It's slower but smaller than JSR?

 

Yes, for saving bytes. I've used this before in a Genesis controller hack of Kung Fu Master. JMP was being done to go to an address at the end of the rom to preforme a bankswitch. All bankswitching went to the same place, and it occurred many places in the rom. So I used BRK, and then used three PLA's to reset the stack pointer (I didn't know where it was) and finally did the bankswitch. This added 16-19 cycles the old code which luckily could be spared, but saved 16 much needed bytes by using BRK instead of JMP.

 

 

Yep, it's probably time to switch to that other, better assembler I read about on here.

 

I don't mind DASM. Never really had big enough trouble with it to want too switch to another assembler. The biggest hitch for me is all of me disassembles, and 99% of the rest out there on the web are all meant to compile with DASM.

Link to comment
Share on other sites

Little off topic, but I just had a thought. If you adjusted the stack pointer to one of the color registers you could do three color updates in 7 cycles. And you could also use SEI or CLI, CLV, etc to adjust the color provided by the status register.

 

We know:

 

COLUP0 = $06

COLUP1 = $07

COLUPF = $08

COLUBK = $09

CTRLPF = $0A

 

 

Something like:

STA COLUBK

STX COLUPF

STY COLUP1

SAX COLUP0

BRK ; pointed to COLUBK, or COLUPF

 

Would do 7 color updates in 19 cycles. However, since CTRLPF is also beside the color registers you might be able to do some amazing color changes by playing with the score bit, and incorporating the ball.

Link to comment
Share on other sites

Little off topic, but I just had a thought. If you adjusted the stack pointer to one of the color registers you could do three color updates in 7 cycles. And you could also use SEI or CLI, CLV, etc to adjust the color provided by the status register.

 

We know:

 

COLUP0 = $06

COLUP1 = $07

COLUPF = $08

COLUBK = $09

CTRLPF = $0A

 

 

Something like:

STA COLUBK

STX COLUPF

STY COLUP1

SAX COLUP0

BRK ; pointed to COLUBK, or COLUPF

 

Would do 7 color updates in 19 cycles. However, since CTRLPF is also beside the color registers you might be able to do some amazing color changes by playing with the score bit, and incorporating the ball.

 

 

Grr I've been saving that one for ages - I've just posted a very early WIP of the game I'm planning to use it in at: http://www.atariage....ng-on-for-ages/

 

Here's an image showing how I'm planning to use it:

post-19935-0-68215300-1331515676_thumb.png

Link to comment
Share on other sites

Grr I've been saving that one for ages - I've just posted a very early WIP of the game I'm planning to use it in at: http://www.atariage....ng-on-for-ages/

 

Here's an image showing how I'm planning to use it:

post-19935-0-68215300-1331515676_thumb.png

 

Are you using BRK there? Mind posting some code?

 

 

Screenshot looks beautiful, too. :)

Edited by Omegamatrix
Link to comment
Share on other sites

I've only actually got the bit that uses BRK coded on paper - but it's to get 6 colours in 12 pixels. It works like this:

 

P1|P1|PF|PF|PF|PF|P0|P0|M1|M1|P0|P0
--|--|--|--*--|--|--*--|--|--*--|--

 

* Are where BRK stores to COLUPF, COLUP1, COLUP0 in succession.

 

Edit: Obviously you have to control the address BRK is called from and set up the processor status appropriately.

 

My notes have 5F, 5D, 5B, 59, 57, 55 for the colours - so it must get called from $5B57 with V,B,and I flags set for P=$54

 

Edit again: Which means I have the colours in the reverse order in the mockup above :) - I think it's probably the mockup that's wrong but my notes aren't very well organised :)

Edited by eshu
Link to comment
Share on other sites

Little off topic, but I just had a thought. If you adjusted the stack pointer to one of the color registers you could do three color updates in 7 cycles. And you could also use SEI or CLI, CLV, etc to adjust the color provided by the status register.

I never thought of that - I'll have to cook up an effect based on it for Revision :)

Link to comment
Share on other sites

It looks like in Stella it hits BRK every single time. You can see the address and processor being pushed on the stack. Every time Stella opened it seemed to like starting in Bank 0, with the stack pointer at $FF. On a real 2600 I'm not sure if the stack pointer would always be at $FF, and I think the bank selection would be more random.

 

I tried the rom I modified on a Harmony cart today (single image mode), and BRK was hit 50% of the time, and the other 50% it was the Reset vector. Does anyone else get this result?

 

I would be interested in test ROMs and/or definitive results on this. Changing Stella to be more random is easy; I just want to make sure it reflects the actual machine.

Link to comment
Share on other sites

I would be interested in test ROMs and/or definitive results on this. Changing Stella to be more random is easy; I just want to make sure it reflects the actual machine.

Note that some demos like Tricade seem to rely on Stella booting in a specific bank and Harmony in the other (hence the two ROMs, AFAICT). In other words, you might want to disable such randomness for certain specific ROMs. If I were you I'd spider the VCS prods on pouet to automatically add all their MD5s to an exemption list (the F8 ones at least).

Link to comment
Share on other sites

Note that some demos like Tricade seem to rely on Stella booting in a specific bank and Harmony in the other (hence the two ROMs, AFAICT). In other words, you might want to disable such randomness for certain specific ROMs. If I were you I'd spider the VCS prods on pouet to automatically add all their MD5s to an exemption list (the F8 ones at least).

 

Right, but I was hoping someone else would do that for me, and forward the ROM MD5s :)

 

I'm actually content to do whatever Harmony is doing, so there's only one type of behaviour to consider.

Link to comment
Share on other sites

  • 2 weeks later...

I would be interested in test ROMs and/or definitive results on this. Changing Stella to be more random is easy; I just want to make sure it reflects the actual machine.

Note that some demos like Tricade seem to rely on Stella booting in a specific bank and Harmony in the other (hence the two ROMs, AFAICT). In other words, you might want to disable such randomness for certain specific ROMs. If I were you I'd spider the VCS prods on pouet to automatically add all their MD5s to an exemption list (the F8 ones at least).

 

I investigated this. The tricade and doctor ROMs come in two kinds: the "emulator" and "Real" versions. BOTH versions rely on the ROMs starting up in bank 1. If you start up in bank 0, the demo starts halfway through. I diff'd the two "doctor" ROMs ("emulator" and "real") and the only difference is which address is read in bank 1 to get into bank 0. The "emulator" version reads FFF8, and the "Real" version reads FFF9. Kinda interesting that the "Real" version should not work at all since it's already in bank 1, and will just result in it selecting bank 1 again then crashing. That 1 byte is the only difference between the two ROMs.

 

When the demo switches to bank 0, it starts executing garbage code too before the actual demo code runs. I guess they got it running and didn't investigate why it ran.

Link to comment
Share on other sites

In general, it's not a good idea to put code in bankswitch hotspot locations unless you know the hardware on which it will run.

 

There are at least three kinds of hardware I can think of:

 

1. If the hardware is edge-triggered, it will switch before any values are fetched.

2. If the hardware is level-triggered, it will read the value first, then switch.

3. The hardware might ignore the contents (that is, the ROM will not be enabled, and it not return a proper value at all.)

 

There are further complications. If there are multiple, sequential fetches in hotspots, type 1 will usually switch to the first bank in the sequence and read subsequent data from that bank, and type 2 will switch to the last, reading data from the current bank until the switch.

 

There is one exception: If the contents of hotspots are the same in all banks and you are reading only one byte, types 1-2 will work the same. This doesn't help you with type 3, though.

 

A trick that works in any case is a BRK at $1FF3, RESET vectors set to $1FF3, and a IRQ vector to Start in bank 1. A BRK does a dummy fetch of $1FF4 (discarding the value) so it will always start in bank 1 automatically regardless of the underlying hardware.

Link to comment
Share on other sites

Just a followup - why is there a $4C at $1FFB in all but the last bank?

 

As said, the only way code in hotspots works as expected in most hardware is if there is just one byte at a time (no sequential bytes in hotspots) and all banks have the same data. If there is also a $4C in the last bank, this should correct the issue and it would work on most hardware.

Link to comment
Share on other sites

3. The hardware might ignore the contents (that is, the ROM will not be enabled, and it not return a proper value at all.)

Unlikely since this would require extra chip-enable logic compared to just outputing whatever is in ROM at that position.

A trick that works in any case is a BRK at $1FF3, RESET vectors set to $1FF3, and a IRQ vector to Start in bank 1. A BRK does a dummy fetch of $1FF4 (discarding the value) so it will always start in bank 1 automatically regardless of the underlying hardware.

Ah, that's rather clever.

Just a followup - why is there a $4C at $1FFB in all but the last bank?

 

As said, the only way code in hotspots works as expected in most hardware is if there is just one byte at a time (no sequential bytes in hotspots) and all banks have the same data. If there is also a $4C in the last bank, this should correct the issue and it would work on most hardware.

You're right, the last bank should have $4C too - I simply reasoned that it might not be required, but your argument is compelling. Works fine on the Harmony though.

I'm actually seeing some kind of strange problem with the F4 ROM I'm fiddling with atm. It's probably just a "normal" bug unrelated to the bankswitching code, but you never know..

Link to comment
Share on other sites

3. The hardware might ignore the contents (that is, the ROM will not be enabled, and it not return a proper value at all.)

Unlikely since this would require extra chip-enable logic compared to just outputing whatever is in ROM at that position.

F4 with conventional hardware already has a chip enable (inverted A12, usually) and it would only require extra logic inside the PLD, but generally this sort of logic is "free" so it would be possible. The reason why it might be done is because some legacy games write to bankswitch hotspots and disabling the ROM would avoid the output contention.

 

In an early version of Harmony bankswitching, I floated the bus because of this very reason but eventually decided against doing so because some homebrews do put data in hotspots. I figured the chance of damage was low.

Link to comment
Share on other sites

  • 2 weeks later...

Little off topic, but I just had a thought. If you adjusted the stack pointer to one of the color registers you could do three color updates in 7 cycles. And you could also use SEI or CLI, CLV, etc to adjust the color provided by the status register.

I never thought of that - I'll have to cook up an effect based on it for Revision :)

 

I've played around with BRK after this came up. It's interesting because it made me think of things in different ways. For example an inline JMP is really easy to update the program counter high address:

 

;current address $F3xx

 

JMP .next-$2000 ; go to the next instruction in code as if there was no jump at all

.next:

 

;address is now $D3xx

 

You can also jump into ram to have the to get $00xx as the high address for a black color, or any other RIOT ram mirrors such as $20xx, $40xx, etc. Running in RIOT ram limits the low address thrown on the stack by BRK ($82-$FF, $00, $01), but it seems a fair trade-off.

 

I realized now that I don't have to use SEC or CLC as the bit 0 is never used in the color registers. PLP might be a better option.

 

;SP at some RIOT ram location

LDX #COLUBK

PLP

TXS ; doen't affect status register

BRK

 

 

Anyhow I made a small demo that is really just some playing around. In the demo I'm trying to keep the background color the same while doing color updates to pixels as quick as I can. I got 5 color updates in 5 pixels, but couldn't find a way to do 6 in 6 or more.

 

 

TestColors.zip

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...