Atari v Commodore

atariksi · December 14, 2008

I claim you CANNOT hit every 8th pixel on C64 consistently. C64 is less accurate regardless of where on the display. The 0.985Mhz is not accurate enough to do horizontal splits at any arbitrary 8th pixel. I don't think it can do every 16th pixel.

0.985 = 8 pixels/cycle. Or for NTSC C64: 1.02 MHz = 8 pixels/cycle.

For A8 : 1.79 = 4pixels/cycle, so really the A8 is more accurate than the c64 in this case.

He does reluctantly accept it's more accurate, but for Gr.8 (320*200) he's not accepting.

Anyway, here's code for Stable DLI (no WSYNC):

;Example of display list interrupt on atari 800/400/600XL/800XL/XE/XEGS © 2005-2007 KSI

;Compile and boot this as an image disk on your atari computer. If you boot with BASIC cartridge,

ORG_ = 1529

CASINI = 2 ;for trapping reset vector

VCOUNT EQU $D40B

COLBK EQU 53274

WARMSTART = 58484

;HDR_ = 6

;DW 0FFFFh

;DW ORG

;DW LastOffset-1

DB 0,3

DW ORG_

DW StartAdr

Rts

Pla

StartAdr: LDA 560 ;LSB of ptr to display list (DL)

STA 203

LDA 561 ;MSB of ptr to display list (DL)

STA 204

LDY #2 ;intr on 3rd item of DL

NXTDLVAL: LDA (203),Y

;CMP #65

;BEQ DLISET

;EOR #$80

ORA #$80 ;set DLI bit in ANTIC display list

STA (203),Y

;AND #$70

;CMP #64

;BNE NOTLMS

;INY

;INY

NOTLMS: ;INY

;BNE NXTDLVAL

DLISET: LDA #0

STA 54286 ;NMIEN

LDA #StableDLI,L

STA 512

LDA #StableDLI,H

STA 513

LDA #128

STA 54286 ;NMIEN

Lda #0

Sta 580

Lda #1

Sta 9

Lda #StartAdr,L ;get LSB of StartAdr

Sta CASINI

Lda #StartAdr,H ;get MSB of StartAdr

Sta CASINI+1

;For text mode, subtract 24*40+192*40+32 (DList) = 8672+DLI routine

;27510 - DMA cycles

IdleLoop:

;Nop

Jmp IdleLoop

;rts

Rts

;*** Our display list interrupt subroutine; OS does BIT 54287, BPL, JMP [512] so 4+2+5 = 11 cycles. H/W

;*** vector jump from [65530]. DLI = 25 cycles.

;*** VCount register is not as accurate as POKEY POT counter so we we can use the POT(4) to get 8-bit scan-line

;*** count 0..228.

StableDLI: PHA ;3 cycles

Lda #39

Sta 53274

;Nop

Lda #96

Sta 53274

PLA ;4 cycles

RTI ;6 cycles

LastOffset: ;DW 2E2,2E3,StartAdr

any chance to get binaries instead of the source???

MPDOS does not generate ATRs. I can output an image disk (sector dump) if you can work with that.

Shannon · December 14, 2008

The original poster has not been seen since 2 days after posting this thread. :lol:

Allas · December 15, 2008

29 - H.E.R.O

Atari screenshots

A great classic. C64 have walls at hi-res, but i think the Atari is the best for the extra colors of different luminance, those are good screen for the eyes. Even the 2600 version is great!

C64 screenshots

Goochman · December 15, 2008

C64 HERO is like the CV version - the textured walls just dont cut it and seem out of place with the cartoony character and enemy sprites.

Fun game though and no reason the C64 versoin couldnt look the same - seems the programmer just wanted to use the hires but a bad result IMHO.

atariksi · December 15, 2008

Do you really see a 4x improvement? Using antic D rather than E only saves you an extra 40 cycles every 2 lines?

And Antic draws a full scanline without CPU help. With "other 8Bits" you would have a blank line every second line. Or you would have to copy the previous line, which cost at least 200 cycles where the 6502 in the A8 can do almost 400 cycles for game-calculations...

You can also do a mirror image effect by doing LMS on same memory locations and by updating the top half, the mirrored part automatically gets updated. I did a few scanlines of mirroring in the Curtains demo.

supercat · December 15, 2008

If your C64 was exactly 1.79/2 Mhz, I would accept the 8pixel accuracy. Since it's not evenly divided, you will have problems getting it to trigger off irq at exact point consistently.

On the Apple II, a scan line consists of 64 cycles of 3.5 colorburst cycles each and one of 4.0 colorburst cycles. On the early-model NTSC C64's, a scan line consisted of 64 cycles of 3.5 colorburst cycles each; on the newer ones, it consists of 65 cycles of 3.5 colorburst cycles each (no funny extra half-cycle).

On the Apple II, there is a pixel clock which runs at 2x colorburst, and that is divided by seven to get the CPU clock (with an extra cycle added between scan lines). On the C64, a colorburst/7 signal is generated and that is then multiplied by 16 to yield the pixel clock. The CPU clock is then derived from pixel clock/8.

Note that for optimal color performance if one isn't planning to use artifacting, it's best that the pixel clock not be even remotely close to a multiple of colorburst. On the C64, it's colorburst * 16/7, which is a ways off from 2.0, but still close enough to get some artifacting in some cases. On newer C64's, however, the artifacting is minimized by the fact that the color phase will reverse on alternate scan lines and on alternate frames. The TI-99/4a, Colecovision, and Nintendo all use a pixel frequency which is about 1.5x colorburst, which is as far from an integer multiple as one can get, so color artifacts aren't the slightest bit noticeable on those machines.

atariksi · December 15, 2008

There's all sorts of combinations you could do.

Using narrow DMA would probably unlock some more.

I did some playing around... once a Player has started to display, you're entirely free to reload it's graphic and position register.

So, with precise timing and good coding, you can have A/X/Y loaded with subsequent data that you'll need.

Time the first Store operation such that the last cycle of the instruction coincides around where the Player is starting.

I'm not sure if there's enough time to do another (ie a third) load/store of graf/position data. In such a case, the positional layout I have would probably need to change. Maybe we could position the first player such that only 6 bits of it is used, then we might have enough cycles freed up to move more stuff around.

I don't think you can reload the position of the same player until it has finished displaying. The new position immediately takes effect on its clone so no overlapping occurs and first position may not finish displaying. You can shift one player in overlap mode to extend the resolution. When I did the curtains which is two huge 92*240 multicolor sprites (overscanned) in 7-shades, I ran into that issue. I didn't have to deal with reloading Grafn though since the curtain pattern repeats.

Fröhn · December 15, 2008

0.985 = 8 pixels/cycle. Or for NTSC C64: 1.02 MHz = 8 pixels/cycle.

If your C64 was exactly 1.79/2 Mhz, I would accept the 8pixel accuracy. Since it's not evenly divided, you will have problems getting it to trigger off irq at exact point consistently.

A8 pixel clock: 1.77 * 4 MHz

C64 pixel clock: 0.985 * 8 MHz

Why don't you get it: one C64 CPU clock tick is EXACTLY 8 pixels.

Rybags · December 15, 2008

I had no problems changing the XPos once the player started displaying. And, we're talking 4x width too.

Easy enough to prove/disprove (which I did). Change the operand so it's changing the Background Colour, and you see the half-cycle after the operation completes.

Then change it back. Change the initial position of the player and you can work out the exact boundary of where positional/graphic changes may take place.

Heaven/TQA · December 15, 2008

I had no problems changing the XPos once the player started displaying. And, we're talking 4x width too.

Easy enough to prove/disprove (which I did). Change the operand so it's changing the Background Colour, and you see the half-cycle after the operation completes.

Then change it back. Change the initial position of the player and you can work out the exact boundary of where positional/graphic changes may take place.

how good is atari800win emulation in such cycle exact things?

Rybags · December 15, 2008

I use the real hardware for such things. Can't trust the emulator 100%

The problem with A800Win+ is that the 4.2 version is graphics accurate to a higher degree, but the sound is buggy.

Still, it's an idea to verify stuff on both, although compromising a graphic method just to be emulator compatible isn't exactly productive.

You can get the real machine to instantly display the PF0-PF3 GTIA "mode" in Antic F/2/3 by toggling GPRIOR early in the scanline before the display starts. Last time I tried the same routine on both real hardware and the emu, it didn't work properly on the emulator.

Edited December 15, 2008 by Rybags

Rybags · December 15, 2008

OK - not the easiest example to work with, but it's from something I'm building ATM.

Green is Player 2, Red is Player 0. The DLI routine (set to run twice here) does cycle exact changes of the position and graphic registers for the 2 players.

First pic is normal DLI, second one I've changed the reposition of the green player to change the BG colour instead.

The instruction finishes on the exact cycle that the green player starts displaying (note that since I'm using the character set as PMBASE, most of the first pixels are blank)

Edited December 15, 2008 by Rybags

Fröhn · December 15, 2008

And here is my test code to show you that the IRQs are NOT stable (includes source code):

test_a8_dli.zip

It has some code in the main loop to avoid pseudo-stable conditions. This main loop modifies some memory similar to what real code would do, the changes are visible as changing characters.

The blue bar shows the DLI.

This proves:

A) IRQs are not stable, but have to wait for end of main-loop opcodes too (just like any other CPU)

B) IRQs in display area have 8 pixel accuracy on display rasterlines

Ok and now to complete this test, let's do the same on C64:

test_c64_irq.zip

And again:

A) IRQs are not stable

B) 8 pixel accurace

Please note that the blue bar looks the same on both machines. It jitters the same, it has the same 8 pixel steps on the jitter.

Edited December 15, 2008 by Fröhn

Rybags · December 15, 2008

Of course they're not stable. In a normal programming situation they never are.

It's not the machine's fault, it's the nature of the 6502. There are no indivisible instructions.

Even on a 68000 machine, there's no guarantees. IIRC, it has some indivisible instructions - in fact a machine needs at least 1 or 2 instructions that can't be interrupted midstream to be able to set/test locks in a multiprocessing environment.

Fröhn · December 15, 2008

Of course they're not stable. In a normal programming situation they never are.

Tell that to atariski not me

That endless-JMP as main loop is a typical trap where people think their IRQs are stable but they are not.

Even on a 68000 machine, there's no guarantees. IIRC, it has some indivisible instructions

It has only atomic instructions too. Otherwise internal processor states would have to be written to the stack which they obviously are not.

Edited December 15, 2008 by Fröhn

Rybags · December 15, 2008

I'd have to RTFM but I was almost positive the 68K had divisible instructions.

A necessity in many CPUs since you have operations like MOVEM.L that can take a while to execute.

Doesn't matter anyway on the A8 - you have NMIs which are as instant as you can get on a 6502, and WSync which can put you on a known boundary.

atariksi · December 15, 2008

0.985 = 8 pixels/cycle. Or for NTSC C64: 1.02 MHz = 8 pixels/cycle.

If your C64 was exactly 1.79/2 Mhz, I would accept the 8pixel accuracy. Since it's not evenly divided, you will have problems getting it to trigger off irq at exact point consistently.

A8 pixel clock: 1.77 * 4 MHz

C64 pixel clock: 0.985 * 8 MHz

Why don't you get it: one C64 CPU clock tick is EXACTLY 8 pixels.

There are at least two things which affect stable raster-- instruction alignment and CPU clocks getting evenly divided into the scanline time and/or frame time.

atariksi · December 15, 2008

I had no problems changing the XPos once the player started displaying. And, we're talking 4x width too.

Easy enough to prove/disprove (which I did). Change the operand so it's changing the Background Colour, and you see the half-cycle after the operation completes.

Then change it back. Change the initial position of the player and you can work out the exact boundary of where positional/graphic changes may take place.

Okay, it won't trigger off new grafn until new HPOS occurs (although color changes take place even in the middle of the player). So if you have Graf0 = 129 @4X and you write new HPOS0 in middle of displaying first HPOS0 the new one takes effect before the full 129 gets written to display.

Rybags · December 15, 2008

That's right. Since the GRAF register is transferred to a shift register when the player is displayed, you're free to just reload it once that's happened, which seems to be right on the cycle when it starts.

I'm developing this routine to use with Gr. 15 (Mode E). Problem is, I was testing in Mode 0.

Kinda forgot that the DMA is a bit different - you get the DList fetch every line in bitmap.

You can get around that if you need all the cycles you want by just disabling DList Instruction DMA. Antic then just repeats the same mode.

Problem is, you then need to re-enable it to get over the 4K boundaries.

atariksi · December 15, 2008

That's right. Since the GRAF register is transferred to a shift register when the player is displayed, you're free to just reload it once that's happened, which seems to be right on the cycle when it starts.

I'm developing this routine to use with Gr. 15 (Mode E). Problem is, I was testing in Mode 0.

Kinda forgot that the DMA is a bit different - you get the DList fetch every line in bitmap.

You can get around that if you need all the cycles you want by just disabling DList Instruction DMA. Antic then just repeats the same mode.

Problem is, you then need to re-enable it to get over the 4K boundaries.

My case was easier as I don't have DMA enable for P/M. Just load GRAFn initially at beginning of kernel and write three kernels-- one for first 101 scanlines, one for LMS scan line, and then another for the rest. If you are going to be loading GRAFn, perhaps you can set up that bus-load method that you showed previously where instruction gets executed and GRAFn gets loaded.

Rybags · December 15, 2008

That's a thought - but in the case of my project it's better to just let Antic do it.

The "bus load" technique is only partially useful though - since you can only have 3 byte instructions, it picks up unwanted data.

Another thing I'm wanting to find out... can we eliminate the refresh cycles during a display line?

Can we trick Antic by having a hires character mode, then disable screen DMA right at the last moment (before the display starts). Will it reinstate those Refresh cycles that would otherwise be skipped, or does it only do the 1 at the end of the display area?

Only partially useful though if it worked - since you get a blank line out of it.

Edited December 15, 2008 by Rybags

atariksi · December 15, 2008

And here is my test code to show you that the IRQs are NOT stable (includes source code):

test_a8_dli.zip

It has some code in the main loop to avoid pseudo-stable conditions. This main loop modifies some memory similar to what real code would do, the changes are visible as changing characters.

The blue bar shows the DLI.

This proves:

A) IRQs are not stable, but have to wait for end of main-loop opcodes too (just like any other CPU)

B) IRQs in display area have 8 pixel accuracy on display rasterlines

Ok and now to complete this test, let's do the same on C64:

test_c64_irq.zip

And again:

A) IRQs are not stable

B) 8 pixel accurace

Please note that the blue bar looks the same on both machines. It jitters the same, it has the same 8 pixel steps on the jitter.

I never stated that IRQs are stable automatically. On Atari, they can be made stable by properly aligning the cycles used in the background code and interrupt code. The alignment has to be done on a scanline basis and frame basis if interrupt is for horizontal splits and alignment has to be on frame basis otherwise.

You are a scholar for knowing the irqs instability, but you need the creativity step (ingenuity) to take it beyond that step and make it exact. Just like C64 has 8 mulitcolor sprites and Atari has 2.5 multicolor sprites-- that anyone can figure out reading books but it takes that creativity step and better designed software to get more than that.

The Atari has 29868 CPU cycles in one frame (NTSC). If you take the simple example of 9 refresh cycles per frame and only use sprites w/DMA disabled you get 27510 cycles remaining. If your IRQ takes 60 cycles (including all overhead), you have 27450 cycles remaining. This factors into 5*5*2*3*3*61. So your background code can be 61 cycles, 61*3 cycles, 61*9 cycles, etc. There's 6! (6 factorial) combinations here; atleast, since you can always adjust the IRQ routine by adding NOPs and thus changing the number so it has more factors. The Atari is doing things exactly the same way-- there's nothing random. It's just that some people don't bother keeping track of what's happening so think it's "unstable" raster.

atariksi · December 15, 2008

If your C64 was exactly 1.79/2 Mhz, I would accept the 8pixel accuracy. Since it's not evenly divided, you will have problems getting it to trigger off irq at exact point consistently.

On the Apple II, a scan line consists of 64 cycles of 3.5 colorburst cycles each and one of 4.0 colorburst cycles. On the early-model NTSC C64's, a scan line consisted of 64 cycles of 3.5 colorburst cycles each; on the newer ones, it consists of 65 cycles of 3.5 colorburst cycles each (no funny extra half-cycle).

On the Apple II, there is a pixel clock which runs at 2x colorburst, and that is divided by seven to get the CPU clock (with an extra cycle added between scan lines). On the C64, a colorburst/7 signal is generated and that is then multiplied by 16 to yield the pixel clock. The CPU clock is then derived from pixel clock/8.

Note that for optimal color performance if one isn't planning to use artifacting, it's best that the pixel clock not be even remotely close to a multiple of colorburst. On the C64, it's colorburst * 16/7, which is a ways off from 2.0, but still close enough to get some artifacting in some cases. On newer C64's, however, the artifacting is minimized by the fact that the color phase will reverse on alternate scan lines and on alternate frames. The TI-99/4a, Colecovision, and Nintendo all use a pixel frequency which is about 1.5x colorburst, which is as far from an integer multiple as one can get, so color artifacts aren't the slightest bit noticeable on those machines.

Okay, the C64 is minimizing the artifacting, but CIA IRQ does not know about the video beam-- it's just counting its ticks so it has to divide out evenly into scanline time and/or frame time to get the 8-pixel accuracy.

atariksi · December 15, 2008

That's a thought - but in the case of my project it's better to just let Antic do it.

The "bus load" technique is only partially useful though - since you can only have 3 byte instructions, it picks up unwanted data.

Another thing I'm wanting to find out... can we eliminate the refresh cycles during a display line?

Can we trick Antic by having a hires character mode, then disable screen DMA right at the last moment (before the display starts). Will it reinstate those Refresh cycles that would otherwise be skipped, or does it only do the 1 at the end of the display area?

Only partially useful though if it worked - since you get a blank line out of it.

But does it need those char loads to do the refresh?

With the "bus load" technique, you can leave missiles DMA on and work with just players. I guess it depends on the application. If you could disable each player DMA individually, that would have helped in your case.

atariksi · December 15, 2008

Of course they're not stable. In a normal programming situation they never are.

Tell that to atariski not me

That endless-JMP as main loop is a typical trap where people think their IRQs are stable but they are not.

Even on a 68000 machine, there's no guarantees. IIRC, it has some indivisible instructions

It has only atomic instructions too. Otherwise internal processor states would have to be written to the stack which they obviously are not.

No, Rybags was correct-- they are NOT stable in a normal programming situation. JMP main loop was to make the program simple. You can write as much code as you want as long as it is aligned. 68000 has more of a problem as each instruction takes a lot more cycles.

Atari v Commodore

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Recently Browsing 0 members