Anyone think Ballblazer is possible on the 2600?

GroovyBee · December 14, 2011

The most time consuming part is the calculation of the 2d coordinates. That uses about 400 cycles.

Maybe this is a good candidate for speed optimisation?

roland p · December 14, 2011

The most time consuming part is the calculation of the 2d coordinates. That uses about 400 cycles.

Maybe this is a good candidate for speed optimisation?

Maybe, I've to study it a bit more. I'm already glad it works

;CALC_2d
;
;input:  OBJECT_X3d OBJECT_Y3d
;output: OBJECT_X2d OBJECT_Y2d OBJECT_SIZE

;First calc Y2d:
;Y2d = (128 + <Y3d >> 1) >> >Y3d
;
CALC_2d
LDA #0	  ;2
STA OBJECT_Y2d	;3

LDX OBJECT_Y3d + 1   ;3
CPX #11	  ;2
BCC NOT_TOO_FAR	;2/3

LDA #-5	  ;2 object is too far away.
 STA OBJECT_Y2d + 1   ;3
LDA #34
STA OBJECT_SIZE
RTS	   ;6
NOT_TOO_FAR
LDA DIV_JUMP_TABLE_HIGH,X ;4
PHA	   ;3
LDA DIV_JUMP_TABLE_LOW,X ;4
PHA	   ;3
LDY #4	  ;OBJECT_SIZE

LDA OBJECT_Y3d	;3
LSR	   ;2
EOR #$FF	 ;2  a = 255 - (<Y3d >> 1)

RTS	   ;6 jump to DIV_...

DIV_FAR	   ;$0000...$0080
SEC
SBC SUBSTRACTION_TABLE,X
STA OBJECT_Y2d
LDA #0
STA OBJECT_Y2d + 1
LDA OBJECT_SIZES,X
STA OBJECT_SIZE
LDA CORRECTION_TABLE,X
STA CORRECTION

JMP CALC_X2d
DIV_5	   ;$0080...$0100
STA OBJECT_Y2d
LDA #0
STA OBJECT_Y2d + 1

LDA #5
STA OBJECT_SIZE
LDA #14
STA CORRECTION

JMP CALC_X2d

DIV_4
LSR	  ;2  128...255  >> 7  = $0100...$0200
ROR OBJECT_Y2d   ;5
DIV_3
LSR	  ;2  128...255  >> 6  = $0200...$0400
ROR OBJECT_Y2d   ;5
DIV_2
LSR	  ;2  128...255  >> 5  = $0400...$0800
ROR OBJECT_Y2d   ;5
DIV_1
LSR	  ;2  128...255  >> 4  = $0800...$1000
ROR OBJECT_Y2d   ;5
DIV_0
LSR	  ;2  128...255  >> 3  = $1000...$2000
ROR OBJECT_Y2d   ;5
LSR	  ;2  128...255  >> 2  = $2000...$4000
ROR OBJECT_Y2d   ;5
LSR	  ;2  $8000...$FF00  >> 1 = $4000...$8000
ROR OBJECT_Y2d   ;5
STA OBJECT_Y2d + 1  ;3

LDA OBJECT_Y2d
SEC
SBC #$80
LDA OBJECT_Y2d + 1
SBC #0
CLC
ADC #5
STA OBJECT_SIZE

LDA #14
STA CORRECTION

;
;X2d = X3d * (Y2d * 3 + correction)
;
CALC_X2d
LDX OBJECT_Y2d + 1  ;3
LDA OBJECT_Y2d   ;3
ASL	  ;2
BCC NO_ADD	;2/3
INX	  ;2
NO_ADD
CLC	  ;2
ADC OBJECT_Y2d   ;3
STA TEMP	;3 TEMP = <OBJECT_Y2d * 3

STA FAC_LOW_RESULT_LOW_PLUS	 ;set zp adresses
STA FAC_LOW_RESULT_HIGH_PLUS
EOR #$ff
STA FAC_LOW_RESULT_LOW_MINUS
STA FAC_LOW_RESULT_HIGH_MINUS

TXA	  ;2
ADC OBJECT_Y2d + 1  ;3
ADC OBJECT_Y2d + 1  ;3
ADC CORRECTION   ;3
STA TEMP + 1   ;3 TEMP + 1 = >OBJECT_Y2d * 3 + CORRECTION

STA FAC_HIGH_RESULT_LOW_PLUS	 ;set zp adresses
STA FAC_HIGH_RESULT_HIGH_PLUS
EOR #$ff
STA FAC_HIGH_RESULT_LOW_MINUS
STA FAC_HIGH_RESULT_HIGH_MINUS

;	   AB  (TEMP+1,TEMP)
;	   CD * (OBJECT_X3d+1, OBJECT_X3d)
;   ------
;	   HL    (B*D) (TEMP * OBJECT_X3d)
;	  HL	 (A*D) (TEMP + 1 * OBJECT_X3d)
;	  HL	 (B*C) (TEMP * OBJECT_X3d + 1)
;	 HL	  (A*C) (TEMP + 1 * OBJECT_X3d + 1)
;TEMP * OBJECT_X3d   = AAaa
;TEMP + 1 * OBJECT_X3d  = BBbb
;TEMP * OBJECT_X3d + 1  = CCcc
;TEMP + 1 * OBJECT_X3d + 1 = DDdd

; 
;    AAaa
;  BBbb
;  CCcc
;DDdd	    +
;   
;TEMP * OBJECT_X3d   = AAaa
LDY OBJECT_X3d
SEC
; LDA (FAC_LOW_RESULT_LOW_PLUS),y	;Lowest byte of result not needed
; SBC (FAC_LOW_RESULT_LOW_MINUS),y
; STA PRODUCT		
LDA (FAC_LOW_RESULT_HIGH_PLUS),y
SBC (FAC_LOW_RESULT_HIGH_MINUS),y
STA AA

;TEMP + 1 * OBJECT_X3d  = BBbb
SEC
LDA (FAC_HIGH_RESULT_LOW_PLUS),y
SBC (FAC_HIGH_RESULT_LOW_MINUS),y
STA bb
LDA (FAC_HIGH_RESULT_HIGH_PLUS),y
SBC (FAC_HIGH_RESULT_HIGH_MINUS),y
STA BB

LDY OBJECT_X3d + 1
;TEMP * OBJECT_X3d + 1  = CCcc
SEC
LDA (FAC_LOW_RESULT_LOW_PLUS),y
SBC (FAC_LOW_RESULT_LOW_MINUS),y
STA cc
LDA (FAC_LOW_RESULT_HIGH_PLUS),y
SBC (FAC_LOW_RESULT_HIGH_MINUS),y
STA CC

;TEMP + 1 * OBJECT_X3d + 1 = DDdd
SEC
LDA (FAC_HIGH_RESULT_LOW_PLUS),y
SBC (FAC_HIGH_RESULT_LOW_MINUS),y
STA dd
LDA (FAC_HIGH_RESULT_HIGH_PLUS),y
SBC (FAC_HIGH_RESULT_HIGH_MINUS),y
STA PRODUCT + 3

clc				   
lda AA
adc bb
sta PRODUCT+1
lda BB
adc CC
sta PRODUCT+2							 
bcc SKIP1
inc PRODUCT+3						 
clc								   
SKIP1
lda cc
adc PRODUCT+1							 
sta PRODUCT+1							 
lda dd
adc PRODUCT+2							 
sta PRODUCT+2
bcc SKIP2
inc PRODUCT+3						 
SKIP2
;Take care of signed OBJECT_X3d
LDA OBJECT_X3d + 1
bpl NOT_NEG
sec
lda PRODUCT+2
sbc TEMP+0
sta PRODUCT+2
lda PRODUCT+3
sbc TEMP+1
sta PRODUCT+3
NOT_NEG
CLC	   ;MAYBE THIS PART CAN BE REMOVED
LDA PRODUCT+1
ADC #$80
STA OBJECT_X2d

LDA PRODUCT+2
ADC #0
STA OBJECT_X2d + 1

LDA PRODUCT+3
ADC #0
STA OBJECT_X2d + 2

RTS

I already see I could replace "PRODUCT" (result of multiplication) with OBJECT_X2d.

I remember I was pretty exhausted when I finished this one...

FAC_LOW_RESULT_LOW_PLUS etc. are pointers to tables with squares. I have this from some online C64 magazine.

Edited December 14, 2011 by roland p

GroovyBee · December 15, 2011

Could you provide the theory behind the code?

roland p · December 15, 2011

I'm not that good in explaining things but I'll try.

This routine calculates the screen coordinates (X2d & Y2d) and an OBJECT_SIZE for the 3d coordinates of an object (X3d Y3d). These coordinates are the relative to the player. X3d=0 would indicate the object is exactly in front or behind the player.

This is all very 'pseudo 3d'.

Calculation of y2d coordinate (this is the weirdest):

The y2d calculation only needs the y3d coordinate. The y3d coordinate is a 16-bit value. The y2d value is also 16-bit, The lowest value is used for precision when calculating x3d. The highest value indicates the scanline (0 is horizon, 23 is last line of checkerboard kernel)

It first takes the lowest y3d 8-bit value negates it, divides it by 2 and add $80 to it.

In other words, the 0...$FF range becomes $FF...$80 range.

An object at y3d = $0000 whould be displayed at scanline $FF. This value is too big tp display so I LSR this value always 3 times to make it smaller. so the range becomes $1F...$10 ($1F = 31, checkerboard is 24 scanlines high)

For further objects, it uses the high value of y3d to LSR this value even more. So for farther objects get a lower value.

Some examples of y3d to y2d coordinates:

y3d y2d

0000...00FF $1F00...$1000

0100...01FF $0F00...$0800

0200...02FF $0700...$0400

0300...03FF $0300...$0200

0400...04FF $0100...$0000

IF an object is further, it stops LSR'ing. and it just returns an OBJECT_SIZE value to indicate the object should be drawn smaller.

Calculation of x2d coordinate.

This is done by the formula X2d = X3d * (Y2d * 3 + correction)

It multiplies y2d by 3 because the diagonals of the checkerboard grow 3 pixels each scanline.

The correction is added because otherwise, all values at scanline 0 would result in x2d 0. The correction will be the width of a tile at the horizon. for objects behind the horizon, this value will become smaller.

Thanks for reading

GroovyBee · December 15, 2011

Looks like you could get some of this into tables if you have the ROM space.

roland p · December 15, 2011

Looks like you could get some of this into tables if you have the ROM space.

It uses 2kB for the multiplication.

I already reduced the precision of the multiplication so it's already getting faster

I hope the lsr/ror'ing can be optimised some more.

GroovyBee · December 15, 2011

In the formula :- X2d = X3d * (Y2d * 3 + correction)

Can "Y2d*3 + correction" not be simplified to the addition/subtraction of a 16 bit variable every time you update Y2d? That'd save the multiplication and add.

roland p · December 18, 2011

In the formula :- X2d = X3d * (Y2d * 3 + correction)

Can "Y2d*3 + correction" not be simplified to the addition/subtraction of a 16 bit variable every time you update Y2d? That'd save the multiplication and add.

But in the routine above, y2d is updated (RORed) 7 times worst case.

I'm thinking of dropping the framerate to 30fps. So with the screen running at 60fps, one frame will be dedicated to game logic, the other frame will be dedicated to on screen calculations.

+Gemintronic · December 19, 2011

Full disclosure: I have no idea what I'm talking about.

That being said, instead of halving the framerate, how about one frame be a "guess" at the next rendering values needed thus saving time for logic? I bet the 2600 would be overburdened just coming up with the predicted values needed for the next frame though.. ugh.

roland p · December 20, 2011

Full disclosure: I have no idea what I'm talking about.

That being said, instead of halving the framerate, how about one frame be a "guess" at the next rendering values needed thus saving time for logic? I bet the 2600 would be overburdened just coming up with the predicted values needed for the next frame though.. ugh.

I've now changed it the game-logic into (you probably triggered the idea ):

even frames: calculate all game logic, acceleration, friction, collisions, add speed to current position of drone.

odd frames: skip game logic, only add speed to current position of drone. Calculate positions etc. of objects for display.

That way, the drone moves every 1/60s and you still get 60fps movement of the checkerboard. 30fps movement of the other objects.

Edited December 20, 2011 by roland p

roland p · December 24, 2011

I've now added 2 goalbeams. I optimised the gamelogic a bit (including more border-collision-detection now) so it fits in the overscan area.

The pseudo-3d-calculations are not 'optimised' now, optimisation made it look less smooth so I want to wait with that.

Sprites look a bit screwed up when moving too much to the left/right, that's because it takes a few days to correct it and I'm lazy.

All is still running at 60fps, about 1250 cycles left in screenblanking area.

ballblazer_20111224.bin

Edited December 24, 2011 by roland p

+Stephen · December 25, 2011

Still looking great! I hope you can keep the 60Hz screen, but if not, totally understandable. It's hard to believe this is running on a stock 2600!

enthusi · January 5, 2012

Ah, VERY nice and smooth.

However, I would drop 60 FPS anytime if that allows for better progress/gameplay etc ;-)

But it appears to be very far already - splendid!

Best of luck with that gemstone of yours.

Really looking forward to it.

enthusi

roland p · January 10, 2012

Ah, VERY nice and smooth.

However, I would drop 60 FPS anytime if that allows for better progress/gameplay etc

But it appears to be very far already - splendid!

Best of luck with that gemstone of yours.

Really looking forward to it.

enthusi

Thanks!

- I've now dropped the sprites and the game-logic to 30fps. But I now interpolate the playfield so it still runs in that smooth 60Hz!

- I also updated the pseudo-3d routines. The horizontal lines of the checkerboard had a too linear movement. I've corrected this with table that made the movement more parabolic. It has now more the smoothness of the 7800 version.

- Rotofoil in the lower viewport is now displayed correctly.

- More goalbeams!

- I moved some of the sprite-position code to the vblank area. So less spare-cycles there, but more screen real estate.

- masking of rotofoil not 100% finished yet...

ballblazer_20120110.bin

enthusi · January 11, 2012

Weee ;-)

Keep it coming hehe.

Let me/us know when you can need tests or bug-reports.

From own experience they'd sure be annoying currently

Cheers,

enthusi

roland p · January 11, 2012

Let me/us know when you can need tests or bug-reports.

From own experience they'd sure be annoying currently

At this moment, they aren't really needed because I know there are a lot of bugs in it Reports/critique about gameplay/speed/etc. is welcome.

I now want to put the 'plasmorb' in it. I considered using the Playfield registers but that's probably way too time consuming. At this moment, I have 30 (possibly more) spare cycles left in the sky-kernel, where the ball comes. So I probably use the ball sprite for the plasmorb. So it will be a bit smaller than the plasmorb in the original ballblazer, unless I use flickering, but it's nicer not to have flicker...

enthusi · January 11, 2012

Though the plasma(!) orb could do with some flicker, I'd guess as well, that a smaller ball would be nice. Atari-gfx is rather clean and well colored, the ball would not easyly be missed during gameplay.

A certain flicker might give a nice effect, however. Maybe some slight horizontal jitter left/right with huge overlap at 60 Hz?

Ed Fries · January 11, 2012

This is looking amazing. Please keep working on it!

wvoutlaw2k · April 19, 2012

Wow!

I just tried this in Stella on my MacBook. Looks great! Plus, it seems like it could be played with the Trak-Ball.

Godzilla · April 19, 2012

i can't wait to see where this goes, it really is impressive imho

roland p · April 20, 2012

Thanks for the comments! I'll pickup the project soon again, I took a sort of break.

RevEng · April 20, 2012

Take the break you need and then a little more; coding burnout isn't pretty, and we can be patient.

But please do eventually return, because this is a masterpiece and it would be a shame for it to not be finished.

Keatah · April 20, 2012

Agreed! take the summer off (or winter), and come back fresh with new optimization ideas.

Godzilla · November 28, 2012

bump

roland p · January 6, 2013

Sorry, not much of an update at the moment. I'm trying to pick up the project again and wrap my head around it.

Last thing I was working on is a sort of time management system. Sounds fancy, but it works more or less like this:

Frame 1: Calculate game logic (process speed/friction/collisions etc.)

Frame 2: Calculate Screen positions of rotoroil 1/2

Frame 3: Calculate Screen positions of goalbeams

Frame 4: Calculate Screen positions of plasmorb

So everything is updated every 4 frames. Which will be choppy, so I'm now experimenting with delta values. When a rotofoil is at position 0 (wich is calculated at frame 2), and the next time (at frame 6) it is at position 40, I calculate a delta value (pos2 - pos1)/4 = (40-0)/4 = 10. So at every frame I can interpolate the onscreen position by adding 10 every time. Ofcourse, this consumes a lot of memory since I need to have precise values (at least 2 bytes for every 2 rotofoils, 4 goalbeams, 2 plasmorbs, 2 playfields x 2 axis = 48 bytes)...

Also the gamelogic has to fit in one frame. The gamelogic will mostly consist of: collision detection of rotofoils vs wall, and rotofoils vs each other. And it is possible that, in one frame, rotofoil a collides with rotofoil b, rotofoil b hits a wall, bounces back and hits rotofoil a again. This has to be checked for two axis'.

Ofcourse, this is all a topdown approach, which isn't exactly considered best practise. In real-life (when creating web applications, which I do for a living) I would create the logic together with a simplistic view (gui) and make sure the application does what it needs to do, and afterwards make it pretty.

So if I would make ballblazer this way, I would create a simple 2d playfield by just using the playfield pixels, and use square dots for rotofoils. And afterwards, make it pretty and try to do it in 3d...

Anyone think Ballblazer is possible on the 2600?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members