batari, on Fri Jun 17, 2011 5:02 PM, said:
The clock would have to be derived from the TIA's clock, so that a known number of cycles would happen every TIA clock. With that taken care of, if one were running at a rate of 16x the chroma clock, one could simply have a loop which took about 228*16 cycles, taking an extra cycle if it saw a phi2 in the wrong place, or an extra 48 cycles if, for six or more scan lines in a row, it didn't see a sync pulse in the right place. The ARM would only have to worry about outgoing sync pulses during those parts of the line when it wasn't having to worry about incoming picture data, and would only have to worry about incoming sync pulses during those parts of the line it wasn't having to worry about any picture data.
With an ARM clock of 16*chroma, it might almost be possible to have the ARM do everything, including watching the 6502. It would be tricky, since one would have to worry about watching for TIA writes every 48 clocks, but since one only has to watch for TIA writes once every three pixels it might be possible to do it if one spreads out the work of handling TIA writes. My guess would be that the timing wouldn't be too terrible if one were willing to restrict TIA COLUxx to once every three cycles (a restriction which would work with all "normal" code).
The inner "loop" would probably be a macro that handled three chroma clocks (48 cycles):
; R13 -- I/O port base
; R12 -- Pointer for video data to be output
; R11 -- Pointer for video data coming from TIA
; R10 -- Active color base
; R4 -- Color data being computed for output
; r3 -- Temp data fetched from TIA
; r0-r2 -- Data fetched from buffer
;
; Assume r0-2 are loaded with next three words to be output, and r5 is r0>>16
; Cycle 0
strh r0,[r13,#OUT]
ldr r3,[r13,#IN] ; Top two non-blank bits must be luma bits, and next lower bit must be blank
ldrh r4,[r10,r3,lsr #whatever]
; Cycle 8
strh r5,[r13,#OUT]
lsr r0,r1,#16
sub r3,#Whatever ; Subtract base address of COLUxx
lsls r3,#Whatever ; See if everything is valid for a COLUxx write (will be zero if so)
bleq handle_write
nop ; Show there is enough time to deal with things in branch case
nop
; Cycle 16
strh r1,[r13,#OUT]
ldr r3,[r13,#IN]
ldrh r5,[r10,r3,lsr #whatever]
; Cycle 24
strh r0,[r13,#OUT] ; Value that was in r1, shifted right 16
add r4,r4,r5,asl #16
str r4,[r11,#4]!
lsr r0,r2,lsr #16
; 2 spare cycles
; Cycle 32
strh r2,[r13,#OUT]
ldmia r12,{r0,r1,r2}
; Cycle 40
strh r5,[r13,#OUT]
lsr r5,r0,#16
; 5 spare cycles
Including five cycles to handle a test for whether a TIA write cycle is occurring, the code ends up fitting with 7 cycles to spare when using a 16x clock (assuming a fully-unrolled loop for the parts of the line which require simultaneous latching and display of data). A COLUxx would knock things a little "behind", but on the next 6507 cycle one could skip the check for a COLUxx store, and thus get caught up.Perhaps it would be possible to make things work. It would certainly be very close.













