Jump to content
  • entries
    657
  • comments
    2,692
  • views
    898,511

Alex tests


SpiceWare

2,626 views

Just for grins, did some builds using the Alex voice at 3 different rates: 2000, 3000 and 4000 Hz.

Sample space used for the 12 words:

 

2000 Hz = 7575 bytes

3000 Hz = 11365 bytes

4000 Hz = 15153 bytes

 

Edit: added Berzerk samples at 2000 and 3000 Hz

2000 Hz = 8743 bytes

3000 Hz = 13117 bytes

 

Edit 2: added Victoria samples at 2000, 3000 and 4000 Hz.

2000 Hz = 7537 bytes

3000 Hz = 11304 bytes

4000 Hz = 15072 bytes

 

ROMs

alex2000.bin

alex3000.bin

alex4000.bin

berzerk2000.bin

berzerk3000.bin

victoria2000.bin

victoria3000.bin

victoria4000.bin

22 Comments


Recommended Comments

Tested all the others voices.

 

Victoria sounds better to me, the unique voice I understand "chicken fight like a robot" and it's a female voice, more creative for the atari 2600.

 

Bruce sounds ok, but only on "Intruder Alert".

 

Zarvox is a good voice for the game, but it's not cleat the speech.

 

I'll test the newer files soon as possible :)

Link to comment

I didn't download any of the source sounds, since I wanted to see if I could identify the 2600-ified ones. If I didn't already know what most of the samples were supposed to say, I'd have a hard time recognizing any of them. Even then, I had to play through multiple voices multiple times to pick out what they were saying. If I listen to them enough times, I can convince myself I can understand them, but listening to them fresh, I really can't. :(

Link to comment

I added some Victoria tests. Yeah, Zarvox is a neat sound, but at the low sample rate it's not clear enough.

 

I do agree they're hard to understand, that's why I added the new tests to see how they sounded with a higher frequency. 4K is the max buffer for a pure audio demo(one with fewer words than this demo) but for a game 2K is max as the 4K of the Display Data bank also needs to hold graphics and so forth.

 

These are supposed to be fall-back for those w/out an AtariVox, so maybe a tradeoff would be to simplify the phrases so the samples could be larger: "Kill Intruder", "Kill Chicken"1, "Intruder Alert, Intruder Alert"2, and just "Chicken" when leaving the room. AtariVox users would get the traditional phrases.

 

1 used on the screen after you left a room w/out killing all the robots.

2 repeating words don't use any extra space as far as the samples are concerned.

Link to comment

The roms are crashing after the speech, only alex 2000 runs ok.

 

But it didn't improve much :(

 

Can you make more tests with Zarvox?

 

Edit : If possble, make them a bit louder.

Link to comment

That's odd - guess that's what I get for only testing them in Stella. The file for alex2000 is dated yesterday, so it still has the 1024 byte buffer. The ones from today are using a 2048 byte buffer. Not sure what could be going on though considering it plays the initial phrase just fine. I'll dig into it tomorrow as it's almost 10pm and I've still not had dinner!

 

The initial playback was much quieter. I utilized SoX's norm parameter, it normalizes the audio by adjusting the gain as much as it can while insuring there's no clipping. The raw_to_dpc program also normalizes it, though I should probably disable that as it's left over from when I was using Switch instead of Sox. There's not much that can be done with just 4 bits for the waveform.

Link to comment

How about playing back two or three simplified larger sample phrases in succession to form a complete phrase.

 

If the player leaves w/out killing all robots, can it manage playing a sequence of two individual samples?

 

For example:

 

Sample 0 - "Chicken" - followed by Sample 2 - "Fight like a robot!"-

 

Or make the samples larger by splitting up the phrase further:

 

Sample 0 ["Chicken"] + Sample 1 ["Fight..] + Sample 2 [..like a..] + Sample 3 [..robot!]

 

 

 

Another idea, you may want to make most of the samples using 1 or 2 words (or up to 3 syllables) with higher sample rates so you can mix and match them up to make complete phrases. The list below are just samples I would suggest.

 

Sample 0 ["Chicken!"]

 

Sample 1 ["Fight]

 

Sample 2 [..like a..]

 

Sample 3 [..robot!"]

 

Sample 4 ["Kill the..]

 

Sample 5 ["Got the.. ]

 

Sample 6 [..humaniod!"]

 

Sample 7 [..intruder!"]

 

Sample 8 ["The..]

 

Sample 9 [..must not..]

 

Sample 10 [..escape!"]

 

Sample 11 ["Alert!']

 

Sample 12 ["Shoot him!"]

 

 

 

After the player kills all robots and leaves the room, the program may sequentually playback four samples 8, 6, 9 and 10.

 

["The]+[humaniod]+[must not]+[escape!"]

 

 

 

Here are some phrases using just one or two samples during game in progress:-

 

Samples 1, 2, 3: ["Fight]+[like a]+[robot!']

 

Samples 4, 6: ["Kill the]+[humaniod!"]

 

Samples 4, 7; ["Kill the]+[intruder!"]

 

Sample 1: ["Chicken!"]

 

Sample 12: ["Shoot him!"]

 

When player is killed:

 

Samples 5, 6, 5, 7: ["Got the]+[humaniod!"]+["Got the]+[intruder!"]

 

If you will, you may want to create one or two new phrases unique to Frantic.

 

ie: Every time a robot is killed by another robot's fire, it can say,

 

Samples 3, 9, 1, 3: ["Robot]+[must not]+[fight]+[robot!"]

 

 

 

Maybe this can be used in berzerk too.

 

Link to comment

That's actually what I'm doing - each word is a separate sample so I can maximize the quality of what fits in the playback buffer.

 

Problem is, the better the quality the larger the sample and there's only 28K in the DPC+ ROM. Alex at the best quality uses 15K (the 15153 bytes in the blog entry) to store the 12 words. That would only leave 13K for the game logic and data (sound effects, graphics, etc). While the original 2600 version of Bezerk was a 4K game, ARM code takes up quite a bit more space than 6507 code so that 13K isn't really that much.

Link to comment

That's actually what I'm doing - each word is a separate sample so I can maximize the quality of what fits in the playback buffer.

 

 

 

Problem is, the better the quality the larger the sample and there's only 28K in the DPC+ ROM. Alex at the best quality uses 15K (the 15153 bytes in the blog entry) to store the 12 words. That would only leave 13K for the game logic and data (sound effects, graphics, etc). While the original 2600 version of Bezerk was a 4K game, ARM code takes up quite a bit more space than 6507 code so that 13K isn't really that much.

 

 

Hmm.. would it help the quality to store fewer words, like 10 words instead of 12?

Link to comment

The 2048 byte playback buffer is what limits the quality. For Alex4000, humanoid takes up 1894 bytes while intruder takes up 1764. I can't make the samples max out at 2048 as there needs to be something at the end of the sample to trigger "fill in the next word". Digital samples only use 0-15, so I pad out the sample with $F7 (247 decimal) as the trigger. When the 6507 routines see the $F7 it knows to call the ARM routine to request the next word.

 

My idea behind simplifying the phrase is that fewer words take up less ROM, so I'd most likely not have to reduce the quality later when the game code gets written.

Link to comment

So each sample is limited to the 2K buffer. And the number of words got nothing to do with the quality of playback. I think I understad now, hopefully.

 

So if 'humaniod' takes up 1894 bytes, how about shortening the same word to say "human" insead, and treat the last syllable "iod" as another word? And for intruder, the word "in" next to "truder"?

 

"The human iod must not escape!"

 

"Got the human oid. Got the in truder!"

 

It will still sound ok, and each word will use less bytes per sample. Of course, the trade-off is it increases the amount of words stored from 12 to 14 as follows:

Sample 1 ["Chicken!"]

 

Sample 2 ["Fight]

 

Sample 3 [..like a..]

 

Sample 4 [..robot!"]

 

Sample 5 ["Got.. ]

 

Sample 6 [..human-]

 

Sample 7 [oid!"]

 

Sample 8 [..in-]

 

Sample 9 [truder!"]

 

Sample 10 ["The..]

 

Sample 11 [..must]

 

Sample 12 [ not..]

 

Sample 13 [..escape!"]

 

Sample 14 ["Alert!']

 

Think this a good idea?

Link to comment

Any space saved for "human" would just be used to store "oid".

 

 

How bout just using the word "human"?

 

"The human must not escape." "Got the human. Got the intruder."

 

 

It deviates a little from the original but the phrase still works and I feel is a good enough fall back to the atari vox..

Link to comment

human = 1416

 

478 bytes may come in handy later, but it's not enough savings to add other words. The shortest word "a" takes 836 bytes. Here's all the words I've experimented with, including "oid", and how much space they take up in Alex4000.

 

 

 836 a.vcs
1204 alert.vcs
1271 chicken.vcs
1433 escape.vcs
1144 fight.vcs
998 get.vcs
1039 got.vcs
1416 human.vcs
1894 humanoid.vcs
1764 intruder.vcs
1133 like.vcs
1115 must.vcs
1003 not.vcs
945 oid.vcs
1514 robot.vcs
842 the.vcs

Link to comment

The increase of 467 bytes when using "human" + "oid" instead of "humanoid" got me to thinking that with the current simple phrases I can treat "get the" as a single word - it saves 494 bytes! Of course that limits flexibility.

Link to comment

The increase of 467 bytes when using "human" + "oid" instead of "humanoid" got me to thinking that with the current simple phrases I can treat "get the" as a single word - it saves 494 bytes! Of course that limits flexibility.

 

Cool! At least we figured something out by experimenting!

If you can save nearly 500 bytes that way, then maybe combining the words "must" and "not" will help further. (1115 + 1003) - 494 = 1624?

 

And the words "like" and "a" as well. It's worth a try.

 

 

Here's another try. I have 3 combined words in this new list. maybe still workable.

 

 

 

Sample 1 ["Alert!'] 1204

 

 

Sample 2 ["Chicken!"] 1271

 

 

Sample 3 [..escape!"] 1433

 

 

Sample 4 ["Got..] 1139

 

 

Sample 5 ["Get the..] (998+842) - 494 = 1346

 

 

Sample 6 [..humaniod!"] 1894

 

 

Sample 7 [..intruder!"] 1764

 

 

Sample 8 [..like a ] (1133+836) - X = ?

 

 

Sample 9 [ must not] (1115+1003) - X = ?

 

 

Sample 10 [..robot!"] 1514

 

 

Sample 11 [Fight] 1144

 

Some phrases it can have: (combined words are underlined)

 

 

 

"Intruder alert! Intruder alert!"

 

 

"Get the humaniod!"

 

 

"Get the intruder!"

 

 

"Chicken!"

 

 

"Fight like a robot!"

 

 

"Got humaniod! Got intruder!"

 

 

"Humaniod must not escape!"

 

 

Robot shot by player's bullet from off a wall:

 

 

"Alert! Intruder got robot!"

 

 

"Human got robot!"

 

 

Robot shot by robot's bullet from off a wall:

 

 

"Alert! robot must not fight robot!"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Link to comment

For now I'm going with the 5 "words" (get-the, intruder, alert, chicken & humanoid) used in "simple phrases" for the AtariVox-less gamers as anything else is going to require additional ROM space that I'm not willing to commit at this time.

 

As it is, even the "simple phrases" may have to be reduced in quality to Alex3000, Alex2000, or even be eliminated as the game would be fun w/out the voices, but the converse would not be true.

Link to comment

I guess the phrases i mentioned above can be concidered for the AtariVox version then?

 

I can see 5 simple phrases from out of the 5 words you're currently going with:

 

"Get the humaniod"

"Get the intruder"

"Intruder alert! Intruder alert!"

"Alert!"

"Chicken!"

 

That's a good enough alternative for people w/out the AtariVox.

 

True, the game don't need the voices to be enjoyable. However, instead of concidering to eliminate the voices altogether, you should at the very least keep the two words "Intruder alert!" for when Evil Otto appears - which I think is very much a help and part of the game play to warning the player. I know it's not essencial, but it does help the game play in a small way. And maybe keep the word "Chicken" too, because it's not a random in-game phrase, but rather it's a triggered comment to the player, lowering his esteem for actually chickening out of a fight.

Link to comment

Can you use both nibbles of the byte to double your sample rate? You'd need to come up with a EOL byte ($00 is my favorite since it sets the Z flag) then tweak the output so it doesn't appear in the normal data. It also might be possible to put in some kind of simple compression. You'd have to look at the data from the samples and see if there's any patterns. If the MSB of the byte is 1 (i.e. N set) then the byte is some kind of codeword. It could be a simple run-length encoding (bits 6-4 the count-1 and bits 0-3 the value).

 

Oh, one other suggestion - add a high pass filter to your script, i.e. 20Hz - ?KHz. If nothing else it will pull out any DC offset.

Link to comment

No, there's no time during the kernel to unpack the data, so the 2K playback buffer is what limits the size of the samples.

 

I'm now using both nybbles to store the data, the ARM routine unpacks it into the 2K buffer. Saved a lot of space.

 

Sure, I'll give the high pass filter a try tonight. Never did audio manipulation like this before, so it's all been a learning experience.

Link to comment

Tried the high pass filter, didn't pan out. Some of the words sounded a little better, others sounded much worse.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...