Advanced sound techniques: how do they work?

+thegoldenband · March 6, 2009

This is, perhaps, a rather vague set of questions, so bear with me:

I've come to a fairly good understanding of how to use the TIA's sound hardware "by the book". In other words, using bB, I'm able to write my own two-channel songs, and have become reasonably adept at rapidly switching between distortion settings to create the illusion of 3-4 channel textures. All of these songs are synced to 60Hz timing -- so in other words, to quote the name of Eckhard Stolberg's excellent sound editor, I've essentially been writing "Frame Timed Sound Effects".

But I have no real understanding of what's involved in manipulating the sound channels at a speed faster than 60Hz. For instance, let's say I wanted to phase two low-frequency waveforms against each other: distortion setting 1, pitch value $1F, on both channels. Is there a way to make the two waves slowly and smoothly in and out of phase with one another, say with a two- or three-second periodicity, by somehow introducing miniscule delays to one channel? Is it possible to do this with reasonable results?

Or, to take a more advanced example, the voice samples in Quadrun and Open Sesame -- I assume they're somehow generated through rapid toggling of square waves. But how does that work? Does Supercat's 4-voice demo of "The Entertainer" use the same method? I notice that he's able to keep an onscreen display going while playing the music, whereas the other two titles have to blank the screen.

What about -- even though it's different hardware -- the music for the Fairchild Channel F, as heard in Pac-Man for that platform? Is the basic principle the same? Would the Odyssey2 be capable of the same things we've heard from the Channel F? I remember hearing about the Apple II's 1-bit speaker, and how people were able to push out multivoice music through that. Are people using special programs to convert pre-existing soundfiles into...is it pulse-width modulation format?

I should add that I have only the most rudimentary understanding of ASM, and I get the impression that pushing the hardware in this way really demands some knowledge of ASM. That's in my to-do list for the medium-term future. That being said, I can probably follow simple examples, though I'm mainly interested in just understanding conceptually how these things work.

TROGDOR · March 6, 2009

Sample playback on the 2600 is achieved by setting AUDC0 to 0, which produces a waveform that is always high (does not oscillate.) You then provide the wave pattern by rapidly changing the volume in AUDV0. The quality of the samples you can play will depend on 3 things:

- The resolution of the the sample. Atari 2600 samples are 4-bit, which corresponds to the 16 volume levels possible in the AUDV0 register.

- The speed of the CPU. The Atari 6507 operates at 1.19 million cycles per second. To produce a 41 KHz sample, you would have 1190000 / 41000 = 29 CPU cycles per sample, which might be possible with a tight assembly loop.

- The size of your ROM. A 41 KHz sample would sound great, but it would consume 20.5k bytes of ROM per second, (storing 2 4-bit samples per byte), which is too much for the relatively small Atari ROMs. So, most samples for the Atari are recorded around 1k to 4k per second, depending on how much ROM you're willing to burn.

To produce a 4 second sound sample at 1K samples per second, you would need 2K of ROM. Pseudo code to achieve this would look like:

LDA #7
STA HiByte
LDA #0
STA LoByte

PageLoop
LDX #255
ByteLoop
LDA (HiByte),Y
STA Temp
;	AND %00001111    This actually isn't necessary.  The top 4 bits will be ignored by the AUDV0 register.
STA AUDV0
Wait

LDA Temp
LSR
LSR
LSR
LSR
STA AUDV0
Wait

DEX
CPX #255
BNE ByteLoop

DEX HiByte
BPL PageLoop

I just banged this code out. I can't guarantee it will work, but it should be close.

The Wait will burn cycles such that 1/1000 of a second would pass between each sample update. This would use a NOP loop that burns 1190 cycles, minus the cycles consumed for the playback code.

This requires exact timing, which is why some games blank the screen when they play a sample. The processor has to focus on playing back the sample at the exact times, so it can't be used to update the screen display.

It is possible to play back samples while displaying an image on the screen if you sync up the sample playback with the kernel scanlines. Each scanline uses 76 cycles. 1190000 / 76 = 15657. So if you wanted to sync up a 1kps sample playback with a display kernel, you'd have to update AUDV0 once every 15 scanlines. (The math would be easier if you updated every 16 scanlines.) You'd also have to update AUDV0 at that same interval during the vertical blank and the overscan, to keep the playback in sync. It requires advanced coding to make this work. It's much easier to just blank the screen during playback.

Regarding your question about manipulating a signal at faster than 60 Hz, this can be achieved without using samples. You just need to alter the sound registers more than once per screen. If you altered the sound register twice per screen (at even intervals), this would result in 120 Hz signal processing. 3 times per screen would be 180 Hz, etc.

Though I've never coded for the Channel F, I've read enough to know that the sound system is very primitive, even by Atari 2600 standards. To play a song, you turn the sound register on and off to simulate a square wave. The interval between turning the sound on or off defines the pitch of the square wave. This requires carefully timed code, so you can't generally do anything else while playing music.

Edited March 6, 2009 by TROGDOR

supercat · March 6, 2009

Or, to take a more advanced example, the voice samples in Quadrun and Open Sesame -- I assume they're somehow generated through rapid toggling of square waves. But how does that work? Does Supercat's 4-voice demo of "The Entertainer" use the same method? I notice that he's able to keep an onscreen display going while playing the music, whereas the other two titles have to blank the screen.

Open Sesame can show a display while outputting its "speech". In that sense it's more sophisticated than Quadrun.

Pitfall II, Quadrun, and Open Sesame all work by setting AUDC0 to zero and banging audio out AUDV0. My BTP2 demos, and the Stella's Stocking menu use both AUDV0 and AUDV1. Basic speech output at 7.875KHz would require less than 30 cycles per line pair, so keeping up some sort of display shouldn't be a problem. The BTP2 driver uses 46 cycles per scan line for 15.75kHz output; showing a good-looking display in the remaining 30 is a major challenge, but in Stella's Stocking I was able to manage it (the title screens drawn by Nathan et al. would be considered good by 2600 standards even if they didn't have fancy music playing; a new banking scheme was necessary to free up enough cycles for stuff to work, but the music in Stella's Stocking is all generated by the 6507.

With regard to using fractional-frame timing to generate phasing effects, that would be theoretically possible, but difficult. The Atari's sound channels each have a five bit counter which counts twice per scan line and is reset to zero if, at the start of a count cycle, it equals AUDFx. When it resets to zero, it also triggers the AUDCx-based noise shaper circuit. Note that when changing from a lower frequency to a higher one, if the programmed value goes from being higher than the counter to being lower, the result will be a longer-than-normal wave cycle as the counter misses the programmed value and runs until it wraps at 32. If one were trying for subtle phasing effects, the audio disruption from that would almost certainly ruin them.

It would probably be possible to generate some very nice phasing effects by changing AUDFx at appropriate points in a frame, but one would have to always be aware of what the frequency counters were doing. Probably more complicated than would be worth bothering with.

TROGDOR · March 6, 2009

Supercat, could you post the code for the playback, or a link to the source?

I also know there's a program around that will convert 8K wavs to Atari 2600 asm, but I'm still poking around the forums to find it. Otherwise I'll have to reinvent the wheel and write one in perl.

Edited March 6, 2009 by TROGDOR

TROGDOR · March 6, 2009

A couple years ago I took an interest in advanced Atari sound techniques. Here's a small program that demonstrates high resolution pitch playback on the Atari using delays to alter the frequency of a square wave. With some polish, this could be used to play near perfect pitch music on the Atari, although it would have to be on a blank screen and in a single voice. Binary and source are included.

Unsound.zip

+thegoldenband · March 6, 2009

Thanks very much to both of you for your posts. The connection between AUDV0's 4-bit resolution and having 4 bits of depth for sample playback makes perfect sense. For some reason I was fixated on PWM and the 1-bit paradigm, which seems to be what the Channel F is doing. Do you think being able to use AUDC0 on 0 for this was a design decision (since it's otherwise useless, right?), or just a happy accident?

Supercat, in your BTP demo, did you have to design a wavetable for every combination of pitch-pairs you used, or is there a way to generate that on the fly? In other words, to get 4-voice textures (which in this case is 2 voices per channel), do you have to pre-generate every conceivable interval between the two voices -- one wavetable for a minor second, one for a major second, and so on up to an octave and beyond? If so, I'd imagine that makes it difficult to have different rhythms in the two voices without a lot of careful planning.

How are folks generating their 4-bit samples for 2600 use? Is it a matter of using a program to handle the downsampling and reduction in bit-depth -- from, presumably, a higher-fidelity source recording -- and then extracting the raw data using a hex editor, minus the header? But no, it sounds like one needs more than that, to convert the PCM encoding to...whatever it is that the 2600 needs to see (?).

I assume one doesn't need to create a new sample-playback engine afresh for every project, assuming that one keeps the same sample rate and bit depth, and knows how to handle any necessary bank-switching. 4 bits isn't much, but at least it's intelligible.

Are there games that use sampled sound effects during gameplay? It seems one could fit a few half-second sounds into a 4k bank, which might be enough for a grunt or two in a karate game, for instance, though I understand that the coding difficulties are formidable.

Trogdor, thanks so much for that code sample. That Unsound demo is very cool! The tone sounds clean and relatively free of digital "grit". You mention that it requires a blank screen, but I wonder if certain stock "flashing screen" effects might be possible, so that (for example) one might generate a little victory tune that plays after the completion of a game, right before one gets "A WINNER IS YOU" or whatever congratulatory screen. Certainly, in-tune music on the 2600, even if it's monophonic, is a worthy reward with which to end a game, and would add a lot to titles where a specific piece of music is relevant.

Edited March 6, 2009 by thegoldenband

Impaler_26 · March 6, 2009

I also know there's a program around that will convert 8K wavs to Atari 2600 asm, but I'm still poking around the forums to find it. Otherwise I'll have to reinvent the wheel and write one in perl.

Are you looking for Makebin?

TROGDOR · March 7, 2009

How are folks generating their 4-bit samples for 2600 use? Is it a matter of using a program to handle the downsampling and reduction in bit-depth -- from, presumably, a higher-fidelity source recording -- and then extracting the raw data using a hex editor, minus the header? But no, it sounds like one needs more than that, to convert the PCM encoding to...whatever it is that the 2600 needs to see (?).

I assume one doesn't need to create a new sample-playback engine afresh for every project, assuming that one keeps the same sample rate and bit depth, and knows how to handle any necessary bank-switching. 4 bits isn't much, but at least it's intelligible.

Are there games that use sampled sound effects during gameplay? It seems one could fit a few half-second sounds into a 4k bank, which might be enough for a grunt or two in a karate game, for instance, though I understand that the coding difficulties are formidable.

Trogdor, thanks so much for that code sample. That Unsound demo is very cool! The tone sounds clean and relatively free of digital "grit". You mention that it requires a blank screen, but I wonder if certain stock "flashing screen" effects might be possible, so that (for example) one might generate a little victory tune that plays after the completion of a game, right before one gets "A WINNER IS YOU" or whatever congratulatory screen. Certainly, in-tune music on the 2600, even if it's monophonic, is a worthy reward with which to end a game, and would add a lot to titles where a specific piece of music is relevant.

I could swear I found a program that could convert pcm to assembly code. The conversion is fairly simple. Let's say your source is an 8-bit sample at 8 kHz. You would grab each sample and strip off the bottom 4 bits, leaving you with a 4-bit sample. You would then do this to a second byte sample. Then you would take the two 4-bit samples and store them into a single byte for the Atari to use. (You'd want to store 2 samples per byte to make optimal use of the ROM.) If you wanted to downsample from 8 kHz to 4kHz, you'd just discard every other sample while processing the input data. All this data would then be written to an output text file in reverse order so it could be read in with a descending loop. This could be achieved with about 40 lines of perl code. If I can't find the original program, I'll try writing one.

Playing back sounds during a game is possible, but very tricky. You'd have to have code inside your kernel that would play back a sample every 4 scanlines to play a 4 kHz sample. Even more difficult is the fact that you'd have to do this during the vertical blank and overscan. So you'd have to intersperse these playbacks at the exact cycle intervals throughout all your off-screen game logic. Only the most meticulous coder would be able to pull that off.

The unsound code could be enhanced to be a tone generator. In its present state, it loops through 256 different tones. It could be enhanced to produce more tones by adding an extra byte of resolution, so you'd have 1000 or more distinct tones, which would allow for a perfect pitch song. The encoding for the song would be very small, only requiring a few bytes per note.

Another possibility with sample playback is altering the playback speed of the sample. This would allow you to, for example, encode a short guitar wav and then alter the playback speed to produce different pitches. This could be used for a very cool intro song to a game, and could probably be done with a short 256 byte looping sample. It could also be used to make a drum machine on the 2600.

Are you looking for Makebin?

No, Makebin is a different beast. That program converts starpath wav files directly into binary ROM files. It only expects starpath data recordings as input. I'm looking for a program that converts arbitrary pcm sound files into 2600 asm text.

Edited March 7, 2009 by TROGDOR

Eckhard Stolberg · March 7, 2009

I could swear I found a program that could convert pcm to assembly code. The conversion is fairly simple. Let's say your source is an 8-bit sample at 8 kHz. You would grab each sample and strip off the bottom 4 bits, leaving you with a 4-bit sample. You would then do this to a second byte sample. Then you would take the two 4-bit samples and store them into a single byte for the Atari to use. (You'd want to store 2 samples per byte to make optimal use of the ROM.) If you wanted to downsample from 8 kHz to 4kHz, you'd just discard every other sample while processing the input data. All this data would then be written to an output text file in reverse order so it could be read in with a descending loop. This could be achieved with about 40 lines of perl code. If I can't find the original program, I'll try writing one.

I did a program like that to convert the data for my "Stella says" experiments. I posted it together with the sound demo to the Stella mailing list. You can find it in the Stella list archives. It's pretty simple though, and I was using Bruce Tomlin's assembler instead of DASM back then, so it might be better to write your own tool in Perl anyway.

supercat · March 7, 2009

Supercat, in your BTP demo, did you have to design a wavetable for every combination of pitch-pairs you used, or is there a way to generate that on the fly? In other words, to get 4-voice textures (which in this case is 2 voices per channel), do you have to pre-generate every conceivable interval between the two voices -- one wavetable for a minor second, one for a major second, and so on up to an octave and beyond? If so, I'd imagine that makes it difficult to have different rhythms in the two voices without a lot of careful planning.

Two wave tables for each pitch from middle C to the B above (one for loud, one for soft). There are also twelve modulus tables (one for each pitch). Each channel mixes two waveforms selected from those, independently selectable for 1/4x, 1/2x, 1x, 2x, or 4x speed; the C may also be played at 8x. The Stay Frosty tune uses the full 5-octave range.

Getting a 4-channel wave-table synthesizer to run in 46 cycles/sample was no easy task (note that 6 cycles were spent on STA AUDVx, leaving 40 to do the computations). The code is unrolled in an interleaved 4-line pattern, so generating four samples requires, for each voice, four LDY zp, five LDA (zp),y or ADC (zp),y, and one STA zp (40 cycles in all).

+thegoldenband · March 8, 2009

Some thoughts and questions:

Eckhard, thanks for posting the link to your test program. When I downloaded the ROM, I noticed that there's a little glitch in the way that webpage represents the UU-encoding (it replaced an @ sign with the word "at") which required manual repair, so here's the .bin:

say.bin

Is there any benefit to choosing a sampling rate that's a factor of the 2600's clock speed of 1.19 MHz? (If so, it seems like 7000, 3500, and 1750 Hz would be ideal.) Or is it best approached the way that Eckhard did it, i.e. by multiplying 60Hz * the number of lines in your display kernel (which I understand will normally be 262), and then choosing a sampling rate that evenly factors into that number?

I've been conducting some experiments with Logic's Bitcrusher plug-in, which downsamples and bitcrushes any input you provide it. I'm actually shocked at how good 4-bit samples can sound; very little of the original is lost, relatively speaking. On the other hand, sampling rate is a more substantial issue -- going much lower than 4000 Hz, you start losing a lot of intelligibility. So Eckhard's 3930 Hz seems like the perfect compromise, and allows for a bit more than 2 seconds per 4k bank.

When using Bitcrusher, I also noticed that the results of downsampling were dramatically improved by using heavy equalization beforehand to suppress the lower frequencies, and accentuate the higher frequencies, so that all the sound energy is dedicated to the frequency range that we use for speech intelligibility -- just as telephones do, filtering out everything below 300Hz and above 3300Hz. Making the soundfile as loud as possible is also key.

Eckhard, did you do any pre-treatment of that sort on your speech sample? It's right at the edge of intelligibility, and I wonder if we can get clearer results. Quadrun is reasonably clear if you already know the title of the game, but Open Sesame is pretty close to unintelligible. We also might get better results with the bit reduction through dithering, rather than just truncating.

Are there any extant test binaries that use higher sampling rates? I suspect one could get a very clean-sounding sample at 7860 Hz, though it'd use about 4k per second so it'd have to be brief to fit in one bank. I'm just curious what the upper limit of quality sounds like.

Eckhard Stolberg · March 8, 2009

Is there any benefit to choosing a sampling rate that's a factor of the 2600's clock speed of 1.19 MHz? (If so, it seems like 7000, 3500, and 1750 Hz would be ideal.) Or is it best approached the way that Eckhard did it, i.e. by multiplying 60Hz * the number of lines in your display kernel (which I understand will normally be 262), and then choosing a sampling rate that evenly factors into that number?

There are 76 cycles in every scanline, so the sample rate I used already is a factor of the 2600 clock speed. The advantage of using a sample rate like this is that exact timing in the 6507 code can be done with STA WSYNCs.

Eckhard, did you do any pre-treatment of that sort on your speech sample? It's right at the edge of intelligibility, and I wonder if we can get clearer results. Quadrun is reasonably clear if you already know the title of the game, but Open Sesame is pretty close to unintelligible. We also might get better results with the bit reduction through dithering, rather than just truncating.

When I worked on this demo, I still used a 486 with a simple DOS-based sampling program. The program didn't support any complicated pre-processing of the sample data. You could probably get much clearer results with modern sampling programs.

Are there any extant test binaries that use higher sampling rates? I suspect one could get a very clean-sounding sample at 7860 Hz, though it'd use about 4k per second so it'd have to be brief to fit in one bank. I'm just curious what the upper limit of quality sounds like.

I only had a Supercharger for testing the binaries, and 2600 emulators didn't support sample playback back then either, so I didn't do any other demos. If you have a way to test binaries with larger bankswitching schemes, you could easily use 7860 or 15720 Hz.

+thegoldenband · March 14, 2009

For instance, let's say I wanted to phase two low-frequency waveforms against each other: distortion setting 1, pitch value $1F, on both channels. Is there a way to make the two waves slowly and smoothly in and out of phase with one another, say with a two- or three-second periodicity, by somehow introducing miniscule delays to one channel? Is it possible to do this with reasonable results?

I've actually been able to pull off something along these lines (though less complex), with pretty good results:

phaser06.asm.txt phaser06.bin

Not too bad for someone who only learned ASM today. :cool: Though of course, all of the display code in that file is from Andrew Davie's tutorial!

Edited March 14, 2009 by thegoldenband

TROGDOR · March 17, 2009

Wow, that is a great effect! I like it. I was trying to think where I've heard that before. There was a similar phased sound at the end of

. You'll also hear this effect on

when the ship lands. But I don't remember any phased sound effects like that on the Atari.

The code looks very simple. It looks like AUDV0 and AUDV1 are inversions of each other, oscillating based on the frame count. What's going on with AUDF0 and AUDF1?

I wrote a perl program to convert wavs to Atari dasm code. You can find it on my blog. There's also a couple ROMs demonstrating sample playback.

+thegoldenband · March 20, 2009

Wow, that is a great effect! I like it.

Thanks very much!

I was trying to think where I've heard that before. There was a similar phased sound at the end of
. You'll also hear this effect on
when the ship lands. But I don't remember any phased sound effects like that on the Atari.

I can't think of any either -- it shows up all over the place on other platforms, but no examples on the 2600 are coming to mind. Of course, since it uses both sound channels, you can't really have anything else going on at the same time!

The code looks very simple. It looks like AUDV0 and AUDV1 are inversions of each other, oscillating based on the frame count. What's going on with AUDF0 and AUDF1?

Actually, I think I commented out the inversion code for AUDV0/V1 -- I should've cleaned that up before posting it.

Basically, other than the volume swells, the main thing that happens is the brief change in AUDF1 during the overscan period. AUDF0 is tuned to 16 throughout, but AUDF1 goes to 15 for a total of 16 WSYNCs before going back to 16. Frequency #15 is 61.7 Hz, and #16 is 58.3 Hz, so they're both close to the refresh rate, and 16 WSYNCs are much less than 1 full cycle of either waveform. I don't quite have the math to compute the phase change I'm inducing by swapping in the higher frequency, and even if I did, supercat's post earlier makes it sound like the effects would be complicated to predict in any fine-grained way. Honestly, I just used trial and error, and found that I got basically the same result with a varying number of WSYNCs in between frequency changes; I think I remember that if I reduced the number to one or two WSYNCs, things got less interesting.

I wrote a perl program to convert wavs to Atari dasm code. You can find it on my blog. There's also a couple ROMs demonstrating sample playback.

Excellent, I look forward to checking it out!

Edited March 20, 2009 by thegoldenband

Impaler_26 · May 20, 2009

Has anyone played around with TROGDOR's wav to dasm code converter? I had fun playing around with it, i need to find some better wav-files though and give it another try. Anyway, here's my first try...

run.bin

Advanced sound techniques: how do they work?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members