Jump to content

Anyone think Ballblazer is possible on the 2600?


767 replies to this topic

#626  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Thu Sep 2, 2010 7:39 AM

View PostCybergoth, on Thu Sep 2, 2010 7:25 AM, said:

View Postroland p, on Thu Sep 2, 2010 7:06 AM, said:

- horizontal (X) position of the sprite. Now that's a multiplication routine that uses squares.

Maybe try them one after another. It seems that the x position is only depending on the vertical and the horizontal distance of the two players?
Yes.

First I calculate a X2d position for Z-high = 0, Z-low can be 0...255. That's a position for the ship on the first row of tiles in front of you. I'll LSR this value Z-high times. The higher Z-high is, the closer to 0 (center of screen) X2d will be.

#627  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Thu Sep 2, 2010 7:58 AM

View Postroland p, on Thu Sep 2, 2010 7:39 AM, said:

First I calculate a X2d position for Z-high = 0, Z-low can be 0...255. That's a position for the ship on the first row of tiles in front of you. I'll LSR this value Z-high times. The higher Z-high is, the closer to 0 (center of screen) X2d will be.

So that's a pretty fast task already I assume, costing maybe a scanline to compute?

#628  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Thu Sep 2, 2010 8:02 AM

There is also multiplication involved. And the values are 16 bit wide. But it's a good start to go back to the code and optimise this further. Probably as in making the multiplication less acurate but faster.

#629  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Thu Sep 2, 2010 8:10 AM

It would be cool if you could post this snippet tonight. It sounds interesting enough to have a look :)

#630  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Thu Sep 2, 2010 1:55 PM

View PostCybergoth, on Thu Sep 2, 2010 8:10 AM, said:

It would be cool if you could post this snippet tonight. It sounds interesting enough to have a look :)

Hi,

I need some more time to understand my own code :D

I think it does TEMP * TEMP2 where TEMP is 8bit and TEMP2 is 16bit. I see X2d_FRACT which could be skipped for increased speed.

Here it is, sorry for the mess:

	;
	;TEMP * X3d result in X2d
	;
	
	LDA #>SQUARES_H
	STA ZP1+1
	
	LDA TEMP		;number1
	STA ZP1
	LDA TEMP2			;number2
	
	TAY
	SEC
	SBC TEMP
	
	BCS .CONT
	EOR #$FF
	CLC
	ADC #$01
.CONT	
	TAX
	LDA (ZP1),Y	;g(Y+A)
	SEC
	SBC SQUARES_H,X	;-g(Y-A)
	
	STA X2d ;RESULT + 1
	
	
	
	
	
	LDA #>SQUARES_L
	STA ZP1+1
	
	LDA (ZP1),Y	;g(Y+A)
	SEC
	SBC SQUARES_L,X	;-g(Y-A)
	
	STA X2d_FRACT
	BCS .CONT2
	DEC X2d		;RESULT + 1
.CONT2


	; CALC >NUMBER2 * NUMBER1

	LDA #>SQUARES_H
	STA ZP1+1

	LDA TEMP2 + 1
	
	TAY

	SEC
	SBC TEMP
	
	BCS .CONT3
	EOR #$FF
	CLC
	ADC #$01
.CONT3
	TAX
	LDA (ZP1),Y	;g(Y+A)
	SEC
	SBC SQUARES_H,X	;-g(Y-A)

	STA X2d+1		;RESULT+2
	
	
	
	
	
	
	
	LDA #>SQUARES_L
	STA ZP1+1
	
	LDA (ZP1),Y	;g(Y+A)
	SEC
	SBC SQUARES_L,X	;-g(Y-A)
	
	BCS .CONT4
	DEC X2d+1	;RESULT + 2
.CONT4
	CLC
	ADC X2d		;RESULT + 1
	STA X2d		;RESULT + 1
	
	BCC .CONT5
	INC X2d+1	;RESULT + 2
.CONT5	
	
	
	LDA X3d+1
	BPL .NO_NEG2
	LDA #0
	SEC
	SBC X2d_FRACT
	STA X2d_FRACT
	LDA #0
	SBC X2d
	STA X2d
	LDA #0
	SBC X2d + 1
	STA X2d + 1
	
.NO_NEG2	

This probably doesn't make any sense. I think the math behind the 3d>2d was something like:

	;X2d = ((128 - <Z3d >>> 2) * X3d) * 3/4 >>> >Z3d + 21*X3d
	;y2d = (128 - <Z3d >>> 2) >>> >Z3d

The 3/4 part is for compensating the first diagonals in the board, which arent 45 degrees. the 21*X3d is because tiles at the horizon are 21 pixels wide.

#631  

    Thrust, Jammed, SWOOPS!

  • 16,625 posts
  • Joined: 25-April 01
  • Always left from right here!
  • Location:Düsseldorf, Germany

Posted Thu Sep 2, 2010 2:12 PM

Looks optimizeable. How many cycles do you need?

#632  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Thu Sep 2, 2010 2:24 PM

I haven't counted the cycles yet. I'll continue tomorrow.

#633  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Thu Sep 2, 2010 2:57 PM

View Postroland p, on Thu Sep 2, 2010 1:55 PM, said:

I think the math behind the 3d>2d was something like:

	;X2d = ((128 - <Z3d >>> 2) * X3d) * 3/4 >>> >Z3d + 21*X3d
	;y2d = (128 - <Z3d >>> 2) >>> >Z3d

The 3/4 part is for compensating the first diagonals in the board, which arent 45 degrees. the 21*X3d is because tiles at the horizon are 21 pixels wide.

X * 3/4 should be trivial, something like:

LDA X
LSR
STA tmp
LSR
CLC
ADC tmp

What is the value range of X3d?

#634  

    Thrust, Jammed, SWOOPS!

  • 16,625 posts
  • Joined: 25-April 01
  • Always left from right here!
  • Location:Düsseldorf, Germany

Posted Thu Sep 2, 2010 3:23 PM

Optimization for 1st part, saves 10 cycles. Untested!
        ;
        ;TEMP * X3d result in X2d
        ;          
        LDA #>SQUARES_L     ; 2
        STA ZP1+1           ; 3
        LDA TEMP            ; 3         number1
        STA ZP1             ; 3 = 11
        LDA TEMP2           ; 3         number2        
        TAY                 ; 2
        SEC                 ; 2
        SBC TEMP            ; 3 
        BCS .CONT           ; 2/3
        EOR #$FF            ; 2
        ADC #$01            ; 2
        SEC                 ; 2
.CONT                       ;   = 15.5
        TAX                 ; 2
        LDA (ZP1),Y         ; 5          g(Y+A)
        SBC SQUARES_L,X     ; 4         -g(Y-A)
        STA X2d_FRACT       ; 3 = 14 

        LDA #>SQUARES_H     ; 2
        STA ZP1+1           ; 3  
        LDA (ZP1),Y         ; 5          g(Y+A)
        SBC SQUARES_H,X     ; 4         -g(Y-A)         
        STA X2d             ; 3 = 17    RESULT + 1

.CONT2                      ;   
You could save more if you can use two zero page pointers, one for SQUARES_H and one for SQUARES_L.

Edited by Thomas Jentzsch, Thu Sep 2, 2010 3:25 PM.


#635  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Fri Sep 3, 2010 1:09 AM

View PostCybergoth, on Thu Sep 2, 2010 2:57 PM, said:

View Postroland p, on Thu Sep 2, 2010 1:55 PM, said:

I think the math behind the 3d>2d was something like:

	;X2d = ((128 - <Z3d >>> 2) * X3d) * 3/4 >>> >Z3d + 21*X3d
	;y2d = (128 - <Z3d >>> 2) >>> >Z3d

The 3/4 part is for compensating the first diagonals in the board, which arent 45 degrees. the 21*X3d is because tiles at the horizon are 21 pixels wide.

X * 3/4 should be trivial, something like:

LDA X
LSR
STA tmp
LSR
CLC
ADC tmp

What is the value range of X3d?
I found the snippet of the 3/4, it looks okay:

Quote

LSR
STA TEMP
LSR
ADC TEMP
STA TEMP
I wonder if the square tables for the multiplication could be adjusted so the multiplication output values that are 3/4th of what they would be.

X3d and Z3d are 16 bit wide. I use a fractional value too, but only for smoother physics.

#636  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Fri Sep 3, 2010 1:15 AM

View PostThomas Jentzsch, on Thu Sep 2, 2010 3:23 PM, said:

Optimization for 1st part, saves 10 cycles. Untested!
Thanks!

I'll try to run it in the simulator later today.

Attached are the files create_squares.vbs and squares.h
squares.h can be created from the command line with: cscript create_squares.vbs (windows only)

Attached Files



#637  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Fri Sep 3, 2010 3:01 AM

View Postroland p, on Fri Sep 3, 2010 1:09 AM, said:

X3d and Z3d are 16 bit wide. I use a fractional value too, but only for smoother physics.

I'm still questioning all the math used here. Instead of starting to optimize the existing code (which TJ can do much better than me ;)) I'm trying to figure if the whole system could possibly be simplified first.

E.g. it seems the problem at hands is already 2D to begin with: When looking onto the playfield from a birds eye view, all objects on the playfield should only need an x and a y postion. I understand it that you have these values already and that they're 16-Bit.

So what are the dimensions of the playfield you're using here? Something like 1024*2048?

And what is the maximum distance the player can see other objects? Over the complete playfield or some lesser value?

#638  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Fri Sep 3, 2010 4:10 AM

View PostCybergoth, on Fri Sep 3, 2010 3:01 AM, said:

I'm still questioning all the math used here. Instead of starting to optimize the existing code (which TJ can do much better than me ;)) I'm trying to figure if the whole system could possibly be simplified first.

E.g. it seems the problem at hands is already 2D to begin with: When looking onto the playfield from a birds eye view, all objects on the playfield should only need an x and a y postion. I understand it that you have these values already and that they're 16-Bit.

So what are the dimensions of the playfield you're using here? Something like 1024*2048?

The playfield is now 51 tiles deep and 21 tiles wide. The high byte of X3d and Z3d is a 'tile' the lowerbyte is the position within a tile.

View PostCybergoth, on Fri Sep 3, 2010 3:01 AM, said:

And what is the maximum distance the player can see other objects? Over the complete playfield or some lesser value?
In the playfield I can see 5 tiles deep. In the original ballblazer objects can be seen when they pass the horizon. I guess they'll drop a scanline every time they go further one tile. So when an object is 10 scanlines high at the horizon it would dissappear after 10 tiles has passed, so from the player can see an object 15 tiles far.

#639  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Fri Sep 3, 2010 4:30 AM

View Postroland p, on Fri Sep 3, 2010 4:10 AM, said:

The playfield is now 51 tiles deep and 21 tiles wide. The high byte of X3d and Z3d is a 'tile' the lowerbyte is the position within a tile.

Okay, got that. What are the dimensions of a tile? 21 x 21 I assume?

View PostCybergoth, on Fri Sep 3, 2010 3:01 AM, said:

In the playfield I can see 5 tiles deep. In the original ballblazer objects can be seen when they pass the horizon. I guess they'll drop a scanline every time they go further one tile. So when an object is 10 scanlines high at the horizon it would dissappear after 10 tiles has passed, so from the player can see an object 15 tiles far.

So you only need to display an object when the Z3d difference between an object an the player is <= 15.

#640  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Fri Sep 3, 2010 4:46 AM

View PostCybergoth, on Fri Sep 3, 2010 4:30 AM, said:

View Postroland p, on Fri Sep 3, 2010 4:10 AM, said:

The playfield is now 51 tiles deep and 21 tiles wide. The high byte of X3d and Z3d is a 'tile' the lowerbyte is the position within a tile.

Okay, got that. What are the dimensions of a tile? 21 x 21 I assume?

In memory a tile is 256x256.
On screen a tile is 21 pixels wide at the horizon and grows 3 pixels/colorclocks each scanline. The height of a tile at the horizon is 1 scanline, the one below is 2 scanlines, 4, 8, 16 etc.

#641  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Fri Sep 3, 2010 5:20 AM

View Postroland p, on Fri Sep 3, 2010 4:46 AM, said:

View PostCybergoth, on Fri Sep 3, 2010 4:30 AM, said:

View Postroland p, on Fri Sep 3, 2010 4:10 AM, said:

The playfield is now 51 tiles deep and 21 tiles wide. The high byte of X3d and Z3d is a 'tile' the lowerbyte is the position within a tile.

Okay, got that. What are the dimensions of a tile? 21 x 21 I assume?

In memory a tile is 256x256.
On screen a tile is 21 pixels wide at the horizon and grows 3 pixels/colorclocks each scanline. The height of a tile at the horizon is 1 scanline, the one below is 2 scanlines, 4, 8, 16 etc.

So coming back to your formula:

 ;X2d = ((128 - <Z3d >>> 2) * X3d) * 3/4 >>> >Z3d + 21*X3d

I understand it now that the left part of that sum determines the position within a tile and the right part determines the overall offset of that tile? It kinda seems as if you're blowing the values up first, only to shrink them back to the screen :)

If you know that the enemy is e.g. in tile 30, the actual screen value seems to come down to:

Starting position of tile 30 (if visible) on the correct scanline + offset within tile

And you should already have means to get that starting position cheap, since that's precisely where you execute the PF color change for the grid, no?

#642  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Fri Sep 3, 2010 8:54 AM

View PostCybergoth, on Fri Sep 3, 2010 5:20 AM, said:

So coming back to your formula:

 ;X2d = ((128 - <Z3d >>> 2) * X3d) * 3/4 >>> >Z3d + 21*X3d

I understand it now that the left part of that sum determines the position within a tile and the right part determines the overall offset of that tile? It kinda seems as if you're blowing the values up first, only to shrink them back to the screen :)

That's right, but it works :)

You could see (128 - <Z3d >>> 2) as width of a tile at distance Z3d.

The shrinking back part (>>> >Z3d) could also be applied on the (128 - <Z3d >>> 2) part, which is an 8-bit value. But I think the accuracy will go down then, but it is faster and maybe still good enough. The value can be recycled for calculating Y2d too.

View PostCybergoth, on Fri Sep 3, 2010 5:20 AM, said:

If you know that the enemy is e.g. in tile 30, the actual screen value seems to come down to:

Starting position of tile 30 (if visible) on the correct scanline + offset within tile

And you should already have means to get that starting position cheap, since that's precisely where you execute the PF color change for the grid, no?
I don't think the drawing code can do this, besides that, I have to know the position before the drawing code starts.

#643  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Fri Sep 3, 2010 9:34 AM

View Postroland p, on Fri Sep 3, 2010 8:54 AM, said:

I don't think the drawing code can do this, besides that, I have to know the position before the drawing code starts.

But you could possibly run similar code just for one particular line, before the display kernel? Worst case the determination of the starting pos should be a whole sanline worth of cycles then.

BTW: I assume 21*X3d is actually 21*>X3d? That could be two 21 byte long lookup tables instead of a multiplication then.

#644  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Fri Sep 3, 2010 12:38 PM

View PostCybergoth, on Fri Sep 3, 2010 9:34 AM, said:

View Postroland p, on Fri Sep 3, 2010 8:54 AM, said:

I don't think the drawing code can do this, besides that, I have to know the position before the drawing code starts.

But you could possibly run similar code just for one particular line, before the display kernel? Worst case the determination of the starting pos should be a whole sanline worth of cycles then.

I don't see how the checkerboard kernel could be used for determination of sprite positions.

View PostCybergoth, on Fri Sep 3, 2010 9:34 AM, said:

BTW: I assume 21*X3d is actually 21*>X3d? That could be two 21 byte long lookup tables instead of a multiplication then.
I checked it, 21*X3d is what it is. It uses the 16bit value of X3d. But this can be optimised with tables too I guess.

#645  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Fri Sep 3, 2010 1:59 PM

View Postroland p, on Fri Sep 3, 2010 12:38 PM, said:

View PostCybergoth, on Fri Sep 3, 2010 9:34 AM, said:

View Postroland p, on Fri Sep 3, 2010 8:54 AM, said:

I don't think the drawing code can do this, besides that, I have to know the position before the drawing code starts.

But you could possibly run similar code just for one particular line, before the display kernel? Worst case the determination of the starting pos should be a whole scanline worth of cycles then.

I don't see how the checkerboard kernel could be used for determination of sprite positions.

Not that kernel, just similar code. I added a picture to illustrate what I mean:

Attached Image: bb.gif

A: Some offset.
B: 3 times the tilewidth of that line.
C: Some offset.

Now, the screen position of the rotofoil (also ball/goal) should just be A+B+C, where B is also just a few additions. (Up to 7 in the worst case I think)

BTW Considering figuring out the tilewidth: Make a table that maps the y position to the tilewidth.

#646  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Sat Sep 4, 2010 2:50 PM

View PostCybergoth, on Fri Sep 3, 2010 1:59 PM, said:

A: Some offset.
B: 3 times the tilewidth of that line.
C: Some offset.

I think this still boils down to tile-width * X3d where 'B: 3 times the tilewidth' is the high byte of X3d times tile-width. 'C: some offset' is the low byte of X3d times tile-width.

I've found the multiplication routine for running in the 6502-simulator, I started to put Thomas' optimizations in it, but haven't finished yet (I'm lazy).

Attached Files



#647  

    Thrust, Jammed, SWOOPS!

  • 16,625 posts
  • Joined: 25-April 01
  • Always left from right here!
  • Location:Düsseldorf, Germany

Posted Sat Sep 4, 2010 2:58 PM

View Postroland p, on Sat Sep 4, 2010 2:50 PM, said:

I've found the multiplication routine for running in the 6502-simulator, I started to put Thomas' optimizations in it, but haven't finished yet (I'm lazy).
If a few less cycles alone are sufficient to solve your problem, then OK. But else Manuel is right, first one should try to optimize the algorithm before optimizing the code.

#648  

    Stargunner

  • 1,361 posts
  • Joined: 14-September 07
  • RLA
  • Location:The Netherlands

Posted Sat Sep 4, 2010 3:42 PM

((128 - <Z3d >>> 2) * X3d) * 3/4 >>> >Z3d + 21*X3d
could also be optimised to

(((128 - <Z3d >>> 2) >>> >Z3d) * 3/4 + 21) * X3d
The part left to '* X3d' are all 8-bit values.

#649  

    Quadrunner

  • 8,107 posts
  • Joined: 14-May 01
  • This is Sparta!
  • Location:Bavaria

Posted Sun Sep 5, 2010 12:30 AM

View Postroland p, on Sat Sep 4, 2010 2:50 PM, said:

View PostCybergoth, on Fri Sep 3, 2010 1:59 PM, said:

A: Some offset.
B: 3 times the tilewidth of that line.
C: Some offset.

I think this still boils down to tile-width * X3d where 'B: 3 times the tilewidth' is the high byte of X3d times tile-width. 'C: some offset' is the low byte of X3d times tile-width.

The end result should be exactly the same, that was the intention. But 2-9 8-Bit additions should run much faster than all the other code with the 16-Bit multiplication I thought :)

View Postroland p, on Sat Sep 4, 2010 3:42 PM, said:

((128 - <Z3d >>> 2) * X3d) * 3/4 >>> >Z3d + 21*X3d
could also be optimised to

(((128 - <Z3d >>> 2) >>> >Z3d) * 3/4 + 21) * X3d
The part left to '* X3d' are all 8-bit values.

I saw that in the beginning, but then thought you where missing < > in front of X3d. And when you corrected me I had forgotten about it again ;)

#650  

    Thrust, Jammed, SWOOPS!

  • 16,625 posts
  • Joined: 25-April 01
  • Always left from right here!
  • Location:Düsseldorf, Germany

Posted Sun Sep 5, 2010 1:20 AM

Sorry, but what do you mean with >>>? Divide by 256?

Edited by Thomas Jentzsch, Sun Sep 5, 2010 1:25 AM.






1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users