Luma Enhancement Module Development
Started by ClausB, Oct 23 2009 11:21 AM
96 replies to this topic
#26
Posted Mon Oct 26, 2009 3:59 PM
It occurred to me that we don't need an 8x clock, just 4x. Here's why: Our highest resolution mode sends out 8 pixels per Phi2 cycle. Instead of clocking them out of a shift register 8 times per cycle, we can multiplex them out with 8 states per cycle. We can get 8 states by decoding 3 bits - the 1x, 2x, and 4x clocks. So we just need two clock doublers.
This is exactly how GTIA does its hi-res mode (ANTIC mode F). Those pixels come out at 7.2 MHz but GTIA only has a 3.6 MHz clock. It uses half a cycle to display one pixel and the other half to display the next pixel.
This is exactly how GTIA does its hi-res mode (ANTIC mode F). Those pixels come out at 7.2 MHz but GTIA only has a 3.6 MHz clock. It uses half a cycle to display one pixel and the other half to display the next pixel.
#27
Posted Tue Oct 27, 2009 7:41 AM
ClausB, on Mon Oct 26, 2009 3:59 PM, said:
So we just need two clock doublers.
#28 ONLINE
Posted Tue Oct 27, 2009 9:22 AM
Yep - we could do that. Which IC were you looking at? They seem to have the ability to make the delays anything we want - that can't be a low volume project! Can it? If 70ns is a standard (low volume part) value, that should work. The duty cycle won't quite be 50% but that won't matter so much.
Bob
A very simple clock doubler is merely a delay line and an XOR gate. You delay the input clock by 1/4 cycle and XOR both signals to get twice the frequency and 50% duty. Put two of those in series and you also get the 4x clock we need. So the first doubler needs 140 ns delay and the second needs 70 ns. I have found a triple 70 ns delay line for $9 from DDD. A bit pricey but we likely won't be mass producing!
Bob
ClausB, on Tue Oct 27, 2009 7:41 AM, said:
ClausB, on Mon Oct 26, 2009 3:59 PM, said:
So we just need two clock doublers.
#29
Posted Tue Oct 27, 2009 3:40 PM
bob1200xl, on Tue Oct 27, 2009 9:22 AM, said:
Yep - we could do that. Which IC were you looking at? They seem to have the ability to make the delays anything we want - that can't be a low volume project! Can it? If 70ns is a standard (low volume part) value, that should work. The duty cycle won't quite be 50% but that won't matter so much.
MOQ is 10 pieces
3D7323Z-70 $8.68 each 1 week to ship
MDU3C-70 $11.55 each 4-6 weeks
This is the part:
http://www.datadelay...eets/3d7323.pdf
The delay tolerance is 2%. The ideal delays are 69.8 ns for NTSC and 70.5 ns for PAL. They differ by less than the tolerance.
#30 ONLINE
Posted Tue Oct 27, 2009 3:45 PM
can't you do this in cpld?
#31
Posted Tue Oct 27, 2009 3:59 PM
candle, on Tue Oct 27, 2009 3:45 PM, said:
can't you do this in cpld?
I researched a bit on the Web and saw some things about the Xilinx Digital Clock Manager core and about Digital Locked Loops, but I could not find enough details to see if such a thing would fit into our smallish CPLD. Do you have details to share?
#32 ONLINE
Posted Tue Oct 27, 2009 4:04 PM
how about wait 70ns statement in VHDL code?
no need for domain synchronisers and PLL inside a cpld, just the routing inside that matters - besides, this would only be avaivable inside fpga, not cpld chip
no need for domain synchronisers and PLL inside a cpld, just the routing inside that matters - besides, this would only be avaivable inside fpga, not cpld chip
#34 ONLINE
Posted Tue Oct 27, 2009 4:26 PM
no it doesn't
it bases on timing equations and propagation delays inside cells of cpld
it bases on timing equations and propagation delays inside cells of cpld
#35
Posted Tue Oct 27, 2009 4:34 PM
I doubt there are enough spare gates in our CPLD to chain up to 140 ns and 70 ns delays. Maybe we should use the larger chip you recommended and devote 90% of it to the clock and 10% to the LEM.
#36 ONLINE
Posted Tue Oct 27, 2009 4:47 PM
i still think that fpga chip is the way to go with this
costs are the same if you consider small fpga and large cpld
costs are the same if you consider small fpga and large cpld
#37
Posted Tue Oct 27, 2009 4:56 PM
You might be right. I'm old-school so I'm trying to design hardware, not software-on-a-chip. That's fine for large, complex designs like VBXE, but I don't think LEM needs it. I'll try to keep an open mind, though.
"I can change, if I have to." - Red Green
"I can change, if I have to." - Red Green
#38 ONLINE
Posted Tue Oct 27, 2009 4:59 PM
do you know most of cpld chips are in fact small fpga chips with bootloaders?
#39 ONLINE
Posted Tue Oct 27, 2009 7:52 PM
So, they make these custom at 70ns? wow....
Instead of spending $100 on delay lines, how about I just tweak a clock into the circuit and simulate 70ns? I would hate to want a 35ns delay down the road.
We're going to do at least three iterations of the boards, I expect. Maybe more.
Bob
This is the email quote I got yesterday:
MOQ is 10 pieces
3D7323Z-70 $8.68 each 1 week to ship
MDU3C-70 $11.55 each 4-6 weeks
This is the part:
http://www.datadelay...eets/3d7323.pdf
The delay tolerance is 2%. The ideal delays are 69.8 ns for NTSC and 70.5 ns for PAL. They differ by less than the tolerance.
Instead of spending $100 on delay lines, how about I just tweak a clock into the circuit and simulate 70ns? I would hate to want a 35ns delay down the road.
We're going to do at least three iterations of the boards, I expect. Maybe more.
Bob
ClausB, on Tue Oct 27, 2009 3:40 PM, said:
bob1200xl, on Tue Oct 27, 2009 9:22 AM, said:
Yep - we could do that. Which IC were you looking at? They seem to have the ability to make the delays anything we want - that can't be a low volume project! Can it? If 70ns is a standard (low volume part) value, that should work. The duty cycle won't quite be 50% but that won't matter so much.
MOQ is 10 pieces
3D7323Z-70 $8.68 each 1 week to ship
MDU3C-70 $11.55 each 4-6 weeks
This is the part:
http://www.datadelay...eets/3d7323.pdf
The delay tolerance is 2%. The ideal delays are 69.8 ns for NTSC and 70.5 ns for PAL. They differ by less than the tolerance.
#41 ONLINE
Posted Tue Oct 27, 2009 9:12 PM
you could use 74ls14 chip and r/c circuit to delay the signal
not verry controlled maybe, but still better than spending 100$ for delay lines
not verry controlled maybe, but still better than spending 100$ for delay lines
#42 ONLINE
Posted Wed Oct 28, 2009 4:59 PM
I was thinking of a gated oscillator setup, actually. 02 would gate a series of clock pulses that would load registers from SRAM, or whatever. It would have to be manually adjusted on the prototypes, while the finished boards could use delay lines that wouldn't need adjusting ('tweaking').
Bob
Bob
ClausB, on Tue Oct 27, 2009 9:03 PM, said:
#43 ONLINE
Posted Wed Oct 28, 2009 8:55 PM
so every 4 pixels would be a bit disorted, but within controlable range
may be a good idea to use higher frequency than nessesary, and then scalling it down by the clock divider
it might reduce pixel skew in those 4 pixel chunks if the falling edge of phi2 would activate the clocking circuit it would be in-phase with phi2 all the times - even if not - higher frequency to start with would give smaller skew rate
may be a good idea to use higher frequency than nessesary, and then scalling it down by the clock divider
it might reduce pixel skew in those 4 pixel chunks if the falling edge of phi2 would activate the clocking circuit it would be in-phase with phi2 all the times - even if not - higher frequency to start with would give smaller skew rate
#44 ONLINE
Posted Wed Oct 28, 2009 10:07 PM
If I make the initial delay and the data-to-data delay variable, I can adjust the pixels for best fit, can't I?
I'm not sure... it still isn't entirely clear what the sequence is for the process.
*02 clock falls, indicating the start of a new cycle.
*S4 falls, indicating $8000-$9FFF data access. (it had better be ANTIC because that's our only clue)
*After an adjustable delay, (perhaps 0) SRAM is accessed for the first data byte/bits.
*SRAM data is latched into the CPLD data reg.
*Data is clocked out of the register at an adjustable clock rate. (after an adjustable delay?) **when does this happen? do we need two sets of data regs?**
Is that about right?
Would it be worthwhile to have a line counter and start/stop without requiring DLIs? We have the vertical and horizontal sync pulses in the LUMA input. Maybe implement two-line modes?
Bob
I'm not sure... it still isn't entirely clear what the sequence is for the process.
*02 clock falls, indicating the start of a new cycle.
*S4 falls, indicating $8000-$9FFF data access. (it had better be ANTIC because that's our only clue)
*After an adjustable delay, (perhaps 0) SRAM is accessed for the first data byte/bits.
*SRAM data is latched into the CPLD data reg.
*Data is clocked out of the register at an adjustable clock rate. (after an adjustable delay?) **when does this happen? do we need two sets of data regs?**
Is that about right?
Would it be worthwhile to have a line counter and start/stop without requiring DLIs? We have the vertical and horizontal sync pulses in the LUMA input. Maybe implement two-line modes?
Bob
candle, on Wed Oct 28, 2009 8:55 PM, said:
so every 4 pixels would be a bit disorted, but within controlable range
may be a good idea to use higher frequency than nessesary, and then scalling it down by the clock divider
it might reduce pixel skew in those 4 pixel chunks if the falling edge of phi2 would activate the clocking circuit it would be in-phase with phi2 all the times - even if not - higher frequency to start with would give smaller skew rate
may be a good idea to use higher frequency than nessesary, and then scalling it down by the clock divider
it might reduce pixel skew in those 4 pixel chunks if the falling edge of phi2 would activate the clocking circuit it would be in-phase with phi2 all the times - even if not - higher frequency to start with would give smaller skew rate
#45
Posted Thu Oct 29, 2009 12:08 AM
Thank you for keeping this alive, Claus. My family is more financially stable, now, so am eager and willing to put my money where my mouth is and support.
#46
Posted Thu Oct 29, 2009 12:49 AM
Couldn't we somehow automate the enable/disable process?
Just reserve an address which, if accessed, will enable the LEM, another for disable.
Since we're talking custom Display Lists anyway, we could have something like a dummy graphics line before the real display.
e.g.
2 x 8 Blank
1 x 7 Blank
LMS $BE00 Mode D - tell the LEM to enable itself. (read to page $BE00 will return zeros, any access to $BE00 enables LEM mode)
LMS $9C40 Mode 2
23 x Mode 2
LMS $BE80 Mode D - tell the LEM to disable itself. (any access to $BE80 shuts off LEM mode)
Just reserve an address which, if accessed, will enable the LEM, another for disable.
Since we're talking custom Display Lists anyway, we could have something like a dummy graphics line before the real display.
e.g.
2 x 8 Blank
1 x 7 Blank
LMS $BE00 Mode D - tell the LEM to enable itself. (read to page $BE00 will return zeros, any access to $BE00 enables LEM mode)
LMS $9C40 Mode 2
23 x Mode 2
LMS $BE80 Mode D - tell the LEM to disable itself. (any access to $BE80 shuts off LEM mode)
#47
Posted Thu Oct 29, 2009 4:06 PM
bob1200xl, on Wed Oct 28, 2009 10:07 PM, said:
I'm not sure... it still isn't entirely clear what the sequence is for the process.
As far as the SRAM goes, the sequence is laid out in the timing diagrams I posted at the top of this thread. At the rising edge of Phi2, 8 bits of SRAM data get clocked into the first data register. 140 ns later, 8 bits from another bank go into the second register. (That's one reason why a 140 ns delay line on Phi2 would be ideal.)
As for the luma output, we must divide each 560 ns bus cycle into 8, 4, or 2 equal parts and select 1, 2, or 4 bits at a time per pixel using a variable width, variable period multiplexer. (A 70 ns delay helps generate the counter to address the mux).
#48
Posted Thu Oct 29, 2009 4:12 PM
Rybags, on Thu Oct 29, 2009 12:49 AM, said:
Couldn't we somehow automate the enable/disable process?
Just reserve an address which, if accessed, will enable the LEM, another for disable.
Since we're talking custom Display Lists anyway, we could have something like a dummy graphics line before the real display.
e.g.
2 x 8 Blank
1 x 7 Blank
LMS $BE00 Mode D - tell the LEM to enable itself. (read to page $BE00 will return zeros, any access to $BE00 enables LEM mode)
LMS $9C40 Mode 2
23 x Mode 2
LMS $BE80 Mode D - tell the LEM to disable itself. (any access to $BE80 shuts off LEM mode)
Just reserve an address which, if accessed, will enable the LEM, another for disable.
Since we're talking custom Display Lists anyway, we could have something like a dummy graphics line before the real display.
e.g.
2 x 8 Blank
1 x 7 Blank
LMS $BE00 Mode D - tell the LEM to enable itself. (read to page $BE00 will return zeros, any access to $BE00 enables LEM mode)
LMS $9C40 Mode 2
23 x Mode 2
LMS $BE80 Mode D - tell the LEM to disable itself. (any access to $BE80 shuts off LEM mode)
Page $BE is outside the range we've selected, but $9E would work.
ANTIC mode 2 would not be useful. The design only works with single-line modes which use DMA on every line.
But the clever idea of using ANTIC to enable and disable the luma is worth considering.
#49
Posted Thu Oct 29, 2009 4:33 PM
ClausB, on Thu Oct 29, 2009 4:12 PM, said:
ANTIC mode 2 would not be useful. The design only works with single-line modes which use DMA on every line.
#50 ONLINE
Posted Thu Oct 29, 2009 9:25 PM
The timing diagram shows the SRAM but not the LUMA timing. (does it?) While we are loading one register, are we reading LUMA from the other? In the next/same cycle?
Bob
It's been bouncing around in my head for a year, so it's pretty clear to me:
As far as the SRAM goes, the sequence is laid out in the timing diagrams I posted at the top of this thread. At the rising edge of Phi2, 8 bits of SRAM data get clocked into the first data register. 140 ns later, 8 bits from another bank go into the second register. (That's one reason why a 140 ns delay line on Phi2 would be ideal.)
As for the luma output, we must divide each 560 ns bus cycle into 8, 4, or 2 equal parts and select 1, 2, or 4 bits at a time per pixel using a variable width, variable period multiplexer. (A 70 ns delay helps generate the counter to address the mux).
Bob
ClausB, on Thu Oct 29, 2009 4:06 PM, said:
bob1200xl, on Wed Oct 28, 2009 10:07 PM, said:
I'm not sure... it still isn't entirely clear what the sequence is for the process.
As far as the SRAM goes, the sequence is laid out in the timing diagrams I posted at the top of this thread. At the rising edge of Phi2, 8 bits of SRAM data get clocked into the first data register. 140 ns later, 8 bits from another bank go into the second register. (That's one reason why a 140 ns delay line on Phi2 would be ideal.)
As for the luma output, we must divide each 560 ns bus cycle into 8, 4, or 2 equal parts and select 1, 2, or 4 bits at a time per pixel using a variable width, variable period multiplexer. (A 70 ns delay helps generate the counter to address the mux).
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users














