GroovyBee Posted December 14, 2011 Share Posted December 14, 2011 The most time consuming part is the calculation of the 2d coordinates. That uses about 400 cycles. Maybe this is a good candidate for speed optimisation? Quote Link to comment Share on other sites More sharing options...
roland p Posted December 14, 2011 Share Posted December 14, 2011 (edited) The most time consuming part is the calculation of the 2d coordinates. That uses about 400 cycles. Maybe this is a good candidate for speed optimisation? Maybe, I've to study it a bit more. I'm already glad it works ;CALC_2d ; ;input: OBJECT_X3d OBJECT_Y3d ;output: OBJECT_X2d OBJECT_Y2d OBJECT_SIZE ;First calc Y2d: ;Y2d = (128 + <Y3d >> 1) >> >Y3d ; CALC_2d LDA #0 ;2 STA OBJECT_Y2d ;3 LDX OBJECT_Y3d + 1 ;3 CPX #11 ;2 BCC NOT_TOO_FAR ;2/3 LDA #-5 ;2 object is too far away. STA OBJECT_Y2d + 1 ;3 LDA #34 STA OBJECT_SIZE RTS ;6 NOT_TOO_FAR LDA DIV_JUMP_TABLE_HIGH,X ;4 PHA ;3 LDA DIV_JUMP_TABLE_LOW,X ;4 PHA ;3 LDY #4 ;OBJECT_SIZE LDA OBJECT_Y3d ;3 LSR ;2 EOR #$FF ;2 a = 255 - (<Y3d >> 1) RTS ;6 jump to DIV_... DIV_FAR ;$0000...$0080 SEC SBC SUBSTRACTION_TABLE,X STA OBJECT_Y2d LDA #0 STA OBJECT_Y2d + 1 LDA OBJECT_SIZES,X STA OBJECT_SIZE LDA CORRECTION_TABLE,X STA CORRECTION JMP CALC_X2d DIV_5 ;$0080...$0100 STA OBJECT_Y2d LDA #0 STA OBJECT_Y2d + 1 LDA #5 STA OBJECT_SIZE LDA #14 STA CORRECTION JMP CALC_X2d DIV_4 LSR ;2 128...255 >> 7 = $0100...$0200 ROR OBJECT_Y2d ;5 DIV_3 LSR ;2 128...255 >> 6 = $0200...$0400 ROR OBJECT_Y2d ;5 DIV_2 LSR ;2 128...255 >> 5 = $0400...$0800 ROR OBJECT_Y2d ;5 DIV_1 LSR ;2 128...255 >> 4 = $0800...$1000 ROR OBJECT_Y2d ;5 DIV_0 LSR ;2 128...255 >> 3 = $1000...$2000 ROR OBJECT_Y2d ;5 LSR ;2 128...255 >> 2 = $2000...$4000 ROR OBJECT_Y2d ;5 LSR ;2 $8000...$FF00 >> 1 = $4000...$8000 ROR OBJECT_Y2d ;5 STA OBJECT_Y2d + 1 ;3 LDA OBJECT_Y2d SEC SBC #$80 LDA OBJECT_Y2d + 1 SBC #0 CLC ADC #5 STA OBJECT_SIZE LDA #14 STA CORRECTION ; ;X2d = X3d * (Y2d * 3 + correction) ; CALC_X2d LDX OBJECT_Y2d + 1 ;3 LDA OBJECT_Y2d ;3 ASL ;2 BCC NO_ADD ;2/3 INX ;2 NO_ADD CLC ;2 ADC OBJECT_Y2d ;3 STA TEMP ;3 TEMP = <OBJECT_Y2d * 3 STA FAC_LOW_RESULT_LOW_PLUS ;set zp adresses STA FAC_LOW_RESULT_HIGH_PLUS EOR #$ff STA FAC_LOW_RESULT_LOW_MINUS STA FAC_LOW_RESULT_HIGH_MINUS TXA ;2 ADC OBJECT_Y2d + 1 ;3 ADC OBJECT_Y2d + 1 ;3 ADC CORRECTION ;3 STA TEMP + 1 ;3 TEMP + 1 = >OBJECT_Y2d * 3 + CORRECTION STA FAC_HIGH_RESULT_LOW_PLUS ;set zp adresses STA FAC_HIGH_RESULT_HIGH_PLUS EOR #$ff STA FAC_HIGH_RESULT_LOW_MINUS STA FAC_HIGH_RESULT_HIGH_MINUS ; AB (TEMP+1,TEMP) ; CD * (OBJECT_X3d+1, OBJECT_X3d) ; ------ ; HL (B*D) (TEMP * OBJECT_X3d) ; HL (A*D) (TEMP + 1 * OBJECT_X3d) ; HL (B*C) (TEMP * OBJECT_X3d + 1) ; HL (A*C) (TEMP + 1 * OBJECT_X3d + 1) ;TEMP * OBJECT_X3d = AAaa ;TEMP + 1 * OBJECT_X3d = BBbb ;TEMP * OBJECT_X3d + 1 = CCcc ;TEMP + 1 * OBJECT_X3d + 1 = DDdd ; ; AAaa ; BBbb ; CCcc ;DDdd + ; ;TEMP * OBJECT_X3d = AAaa LDY OBJECT_X3d SEC ; LDA (FAC_LOW_RESULT_LOW_PLUS),y ;Lowest byte of result not needed ; SBC (FAC_LOW_RESULT_LOW_MINUS),y ; STA PRODUCT LDA (FAC_LOW_RESULT_HIGH_PLUS),y SBC (FAC_LOW_RESULT_HIGH_MINUS),y STA AA ;TEMP + 1 * OBJECT_X3d = BBbb SEC LDA (FAC_HIGH_RESULT_LOW_PLUS),y SBC (FAC_HIGH_RESULT_LOW_MINUS),y STA bb LDA (FAC_HIGH_RESULT_HIGH_PLUS),y SBC (FAC_HIGH_RESULT_HIGH_MINUS),y STA BB LDY OBJECT_X3d + 1 ;TEMP * OBJECT_X3d + 1 = CCcc SEC LDA (FAC_LOW_RESULT_LOW_PLUS),y SBC (FAC_LOW_RESULT_LOW_MINUS),y STA cc LDA (FAC_LOW_RESULT_HIGH_PLUS),y SBC (FAC_LOW_RESULT_HIGH_MINUS),y STA CC ;TEMP + 1 * OBJECT_X3d + 1 = DDdd SEC LDA (FAC_HIGH_RESULT_LOW_PLUS),y SBC (FAC_HIGH_RESULT_LOW_MINUS),y STA dd LDA (FAC_HIGH_RESULT_HIGH_PLUS),y SBC (FAC_HIGH_RESULT_HIGH_MINUS),y STA PRODUCT + 3 clc lda AA adc bb sta PRODUCT+1 lda BB adc CC sta PRODUCT+2 bcc SKIP1 inc PRODUCT+3 clc SKIP1 lda cc adc PRODUCT+1 sta PRODUCT+1 lda dd adc PRODUCT+2 sta PRODUCT+2 bcc SKIP2 inc PRODUCT+3 SKIP2 ;Take care of signed OBJECT_X3d LDA OBJECT_X3d + 1 bpl NOT_NEG sec lda PRODUCT+2 sbc TEMP+0 sta PRODUCT+2 lda PRODUCT+3 sbc TEMP+1 sta PRODUCT+3 NOT_NEG CLC ;MAYBE THIS PART CAN BE REMOVED LDA PRODUCT+1 ADC #$80 STA OBJECT_X2d LDA PRODUCT+2 ADC #0 STA OBJECT_X2d + 1 LDA PRODUCT+3 ADC #0 STA OBJECT_X2d + 2 RTS I already see I could replace "PRODUCT" (result of multiplication) with OBJECT_X2d. I remember I was pretty exhausted when I finished this one... FAC_LOW_RESULT_LOW_PLUS etc. are pointers to tables with squares. I have this from some online C64 magazine. Edited December 14, 2011 by roland p Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted December 15, 2011 Share Posted December 15, 2011 Could you provide the theory behind the code? Quote Link to comment Share on other sites More sharing options...
roland p Posted December 15, 2011 Share Posted December 15, 2011 I'm not that good in explaining things but I'll try. This routine calculates the screen coordinates (X2d & Y2d) and an OBJECT_SIZE for the 3d coordinates of an object (X3d Y3d). These coordinates are the relative to the player. X3d=0 would indicate the object is exactly in front or behind the player. This is all very 'pseudo 3d'. Calculation of y2d coordinate (this is the weirdest): The y2d calculation only needs the y3d coordinate. The y3d coordinate is a 16-bit value. The y2d value is also 16-bit, The lowest value is used for precision when calculating x3d. The highest value indicates the scanline (0 is horizon, 23 is last line of checkerboard kernel) It first takes the lowest y3d 8-bit value negates it, divides it by 2 and add $80 to it. In other words, the 0...$FF range becomes $FF...$80 range. An object at y3d = $0000 whould be displayed at scanline $FF. This value is too big tp display so I LSR this value always 3 times to make it smaller. so the range becomes $1F...$10 ($1F = 31, checkerboard is 24 scanlines high) For further objects, it uses the high value of y3d to LSR this value even more. So for farther objects get a lower value. Some examples of y3d to y2d coordinates: y3d y2d 0000...00FF $1F00...$1000 0100...01FF $0F00...$0800 0200...02FF $0700...$0400 0300...03FF $0300...$0200 0400...04FF $0100...$0000 IF an object is further, it stops LSR'ing. and it just returns an OBJECT_SIZE value to indicate the object should be drawn smaller. Calculation of x2d coordinate. This is done by the formula X2d = X3d * (Y2d * 3 + correction) It multiplies y2d by 3 because the diagonals of the checkerboard grow 3 pixels each scanline. The correction is added because otherwise, all values at scanline 0 would result in x2d 0. The correction will be the width of a tile at the horizon. for objects behind the horizon, this value will become smaller. Thanks for reading Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted December 15, 2011 Share Posted December 15, 2011 Looks like you could get some of this into tables if you have the ROM space. Quote Link to comment Share on other sites More sharing options...
roland p Posted December 15, 2011 Share Posted December 15, 2011 Looks like you could get some of this into tables if you have the ROM space. It uses 2kB for the multiplication. I already reduced the precision of the multiplication so it's already getting faster I hope the lsr/ror'ing can be optimised some more. Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted December 15, 2011 Share Posted December 15, 2011 In the formula :- X2d = X3d * (Y2d * 3 + correction) Can "Y2d*3 + correction" not be simplified to the addition/subtraction of a 16 bit variable every time you update Y2d? That'd save the multiplication and add. Quote Link to comment Share on other sites More sharing options...
roland p Posted December 18, 2011 Share Posted December 18, 2011 In the formula :- X2d = X3d * (Y2d * 3 + correction) Can "Y2d*3 + correction" not be simplified to the addition/subtraction of a 16 bit variable every time you update Y2d? That'd save the multiplication and add. But in the routine above, y2d is updated (RORed) 7 times worst case. I'm thinking of dropping the framerate to 30fps. So with the screen running at 60fps, one frame will be dedicated to game logic, the other frame will be dedicated to on screen calculations. Quote Link to comment Share on other sites More sharing options...
+Gemintronic Posted December 19, 2011 Share Posted December 19, 2011 Full disclosure: I have no idea what I'm talking about. That being said, instead of halving the framerate, how about one frame be a "guess" at the next rendering values needed thus saving time for logic? I bet the 2600 would be overburdened just coming up with the predicted values needed for the next frame though.. ugh. 1 Quote Link to comment Share on other sites More sharing options...
roland p Posted December 20, 2011 Share Posted December 20, 2011 (edited) Full disclosure: I have no idea what I'm talking about. That being said, instead of halving the framerate, how about one frame be a "guess" at the next rendering values needed thus saving time for logic? I bet the 2600 would be overburdened just coming up with the predicted values needed for the next frame though.. ugh. I've now changed it the game-logic into (you probably triggered the idea ): even frames: calculate all game logic, acceleration, friction, collisions, add speed to current position of drone. odd frames: skip game logic, only add speed to current position of drone. Calculate positions etc. of objects for display. That way, the drone moves every 1/60s and you still get 60fps movement of the checkerboard. 30fps movement of the other objects. Edited December 20, 2011 by roland p Quote Link to comment Share on other sites More sharing options...
roland p Posted December 24, 2011 Share Posted December 24, 2011 (edited) I've now added 2 goalbeams. I optimised the gamelogic a bit (including more border-collision-detection now) so it fits in the overscan area. The pseudo-3d-calculations are not 'optimised' now, optimisation made it look less smooth so I want to wait with that. Sprites look a bit screwed up when moving too much to the left/right, that's because it takes a few days to correct it and I'm lazy. All is still running at 60fps, about 1250 cycles left in screenblanking area. ballblazer_20111224.bin Edited December 24, 2011 by roland p 7 Quote Link to comment Share on other sites More sharing options...
+Stephen Posted December 25, 2011 Share Posted December 25, 2011 Still looking great! I hope you can keep the 60Hz screen, but if not, totally understandable. It's hard to believe this is running on a stock 2600! 2 Quote Link to comment Share on other sites More sharing options...
enthusi Posted January 5, 2012 Share Posted January 5, 2012 Ah, VERY nice and smooth. However, I would drop 60 FPS anytime if that allows for better progress/gameplay etc But it appears to be very far already - splendid! Best of luck with that gemstone of yours. Really looking forward to it. enthusi 1 Quote Link to comment Share on other sites More sharing options...
roland p Posted January 10, 2012 Share Posted January 10, 2012 Ah, VERY nice and smooth. However, I would drop 60 FPS anytime if that allows for better progress/gameplay etc But it appears to be very far already - splendid! Best of luck with that gemstone of yours. Really looking forward to it. enthusi Thanks! - I've now dropped the sprites and the game-logic to 30fps. But I now interpolate the playfield so it still runs in that smooth 60Hz! - I also updated the pseudo-3d routines. The horizontal lines of the checkerboard had a too linear movement. I've corrected this with table that made the movement more parabolic. It has now more the smoothness of the 7800 version. - Rotofoil in the lower viewport is now displayed correctly. - More goalbeams! - I moved some of the sprite-position code to the vblank area. So less spare-cycles there, but more screen real estate. - masking of rotofoil not 100% finished yet... ballblazer_20120110.bin 9 Quote Link to comment Share on other sites More sharing options...
enthusi Posted January 11, 2012 Share Posted January 11, 2012 Weee Keep it coming hehe. Let me/us know when you can need tests or bug-reports. From own experience they'd sure be annoying currently Cheers, enthusi 1 Quote Link to comment Share on other sites More sharing options...
roland p Posted January 11, 2012 Share Posted January 11, 2012 Let me/us know when you can need tests or bug-reports. From own experience they'd sure be annoying currently At this moment, they aren't really needed because I know there are a lot of bugs in it Reports/critique about gameplay/speed/etc. is welcome. I now want to put the 'plasmorb' in it. I considered using the Playfield registers but that's probably way too time consuming. At this moment, I have 30 (possibly more) spare cycles left in the sky-kernel, where the ball comes. So I probably use the ball sprite for the plasmorb. So it will be a bit smaller than the plasmorb in the original ballblazer, unless I use flickering, but it's nicer not to have flicker... Quote Link to comment Share on other sites More sharing options...
enthusi Posted January 11, 2012 Share Posted January 11, 2012 Though the plasma(!) orb could do with some flicker, I'd guess as well, that a smaller ball would be nice. Atari-gfx is rather clean and well colored, the ball would not easyly be missed during gameplay. A certain flicker might give a nice effect, however. Maybe some slight horizontal jitter left/right with huge overlap at 60 Hz? Quote Link to comment Share on other sites More sharing options...
Ed Fries Posted January 11, 2012 Share Posted January 11, 2012 This is looking amazing. Please keep working on it! 2 Quote Link to comment Share on other sites More sharing options...
wvoutlaw2k Posted April 19, 2012 Share Posted April 19, 2012 Wow! I just tried this in Stella on my MacBook. Looks great! Plus, it seems like it could be played with the Trak-Ball. 1 Quote Link to comment Share on other sites More sharing options...
Godzilla Posted April 19, 2012 Share Posted April 19, 2012 i can't wait to see where this goes, it really is impressive imho 1 Quote Link to comment Share on other sites More sharing options...
roland p Posted April 20, 2012 Share Posted April 20, 2012 Thanks for the comments! I'll pickup the project soon again, I took a sort of break. Quote Link to comment Share on other sites More sharing options...
RevEng Posted April 20, 2012 Share Posted April 20, 2012 Take the break you need and then a little more; coding burnout isn't pretty, and we can be patient. But please do eventually return, because this is a masterpiece and it would be a shame for it to not be finished. 4 Quote Link to comment Share on other sites More sharing options...
Keatah Posted April 20, 2012 Share Posted April 20, 2012 Agreed! take the summer off (or winter), and come back fresh with new optimization ideas. 1 Quote Link to comment Share on other sites More sharing options...
Godzilla Posted November 28, 2012 Share Posted November 28, 2012 bump 1 Quote Link to comment Share on other sites More sharing options...
roland p Posted January 6, 2013 Share Posted January 6, 2013 Sorry, not much of an update at the moment. I'm trying to pick up the project again and wrap my head around it. Last thing I was working on is a sort of time management system. Sounds fancy, but it works more or less like this: Frame 1: Calculate game logic (process speed/friction/collisions etc.) Frame 2: Calculate Screen positions of rotoroil 1/2 Frame 3: Calculate Screen positions of goalbeams Frame 4: Calculate Screen positions of plasmorb So everything is updated every 4 frames. Which will be choppy, so I'm now experimenting with delta values. When a rotofoil is at position 0 (wich is calculated at frame 2), and the next time (at frame 6) it is at position 40, I calculate a delta value (pos2 - pos1)/4 = (40-0)/4 = 10. So at every frame I can interpolate the onscreen position by adding 10 every time. Ofcourse, this consumes a lot of memory since I need to have precise values (at least 2 bytes for every 2 rotofoils, 4 goalbeams, 2 plasmorbs, 2 playfields x 2 axis = 48 bytes)... Also the gamelogic has to fit in one frame. The gamelogic will mostly consist of: collision detection of rotofoils vs wall, and rotofoils vs each other. And it is possible that, in one frame, rotofoil a collides with rotofoil b, rotofoil b hits a wall, bounces back and hits rotofoil a again. This has to be checked for two axis'. Ofcourse, this is all a topdown approach, which isn't exactly considered best practise. In real-life (when creating web applications, which I do for a living) I would create the logic together with a simplistic view (gui) and make sure the application does what it needs to do, and afterwards make it pretty. So if I would make ballblazer this way, I would create a simple 2d playfield by just using the playfield pixels, and use square dots for rotofoils. And afterwards, make it pretty and try to do it in 3d... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.