Jump to content



3

6502 Killer hacks


141 replies to this topic

#51 djmips OFFLINE  

djmips

    Dragonstomper

  • 591 posts
  • scrolling
  • Location:Seattle

Posted Tue Nov 15, 2005 3:52 AM

supercat, on Tue Nov 8, 2005 10:08 PM, said:

I also like his 'toggle carry' method, though I still find myself wishing the 6502 allowed a direct method to do that.

Yes, that is a nice one. At a quick glance, I thought it trashed A (but it doesn't as bit seven is safely tucked away in the carry and is restored in the ror)

 rol	; Cb into b0  
 eor #$01	; toggle bit
 ror	; b0 into Cb


#52 TROGDOR OFFLINE  

TROGDOR

    Chopper Commander

  • 160 posts
  • Why Yes, I Have Played Atari Today.
  • Location:Strongbadia

Posted Wed Dec 21, 2005 12:49 AM

Here's my all-time favorite hack for the 6502. The "skip 2 bytes" pseudo-instruction, which piggybacks off the BIT instruction. It saves a byte when performing "either/or" branching.

Consider the following example. You want to store a value in either Loc1 or Loc2, depending on whether it's negative. Normally, your code would look like:

LDA MyNumber;2
BPL B1;2
STA Loc1;2
BMI Done;2 (This is effectively a BRA to Done.)
B1
STA Loc2;2
Done
It uses 10 bytes. Now consider this alternative:

LDA MyNumber;2
BPL B1;2
STA Loc1;2
.BYTE $2C;1 (Skips the next 2 bytes, effectively doing a BRA to Done.)
B1
STA Loc2;2
Done
It only uses 9 bytes, and does the same thing. When the processor hits the ".BYTE $2C", it reads this as an Absolute BIT command, which is a 3 byte command. So, it uses the two bytes of the STA Loc2 command as the argument to the BIT command, performs a meaningless BIT test, and then continues to run, without ever executing the STA Loc2 command. The only thing to be careful with this is that the N, V, and Z flags will be corrupted. But the accumulator is not affected. So you've just saved a byte. Also be wary that one extra cycle is used. BIT Absolute is 4 cycles, and a BMI branch is 3 cycles (or 4 if it crosses a page boundary.)

Edited by TROGDOR, Wed Dec 21, 2005 12:53 AM.


#53 supercat OFFLINE  

supercat

    Quadrunner

  • 6,367 posts

Posted Wed Dec 21, 2005 1:16 AM

TROGDOR, on Wed Dec 21, 2005 1:49 AM, said:

The only thing to be careful with this is that the N, V, and Z flags will be corrupted.  But the accumulator is not affected.  So you've just saved a byte.  Also be wary that one extra cycle is used.  BIT Absolute is 4 cycles, and a BMI branch is 3 cycles (or 4 if it crosses a page boundary.)

View Post


Another thing to beware of is that certain memory locations can be affected by reads. In a 4K cart, this isn't apt to be a problem (the only thing affected by a read is the timer-interrupt flag on the RIOT which is cleared by reading INTIM). In a 16K or 32K cart, skipping "INC nn,X" or "ISB nn,x" will trigger a bank switch if 'nn' is $1F, $3F, $5F, $7F, $9F, $BF, $DF, or $FF, but such an instruction isn't too likely to occur. In a 32K cart, skipping "SBC nn,X" will also trigger a bankswitch.

None of those bankswitch hotspots is in practice likely to be a problem. On the other hand, RAM based carts open up a much bigger danger zone. On a Superchip cart, skipping any instruction whose opcode is $00-$7F and whose operand is $10, $30, $50, $70, $90, $B0, $D0, or $F0 will trash a byte of Superchip RAM. On a Supercharger, skipping any two-byte instruction whose operand is one of the above will be, to put it mildly, "interesting".

#54 supercat OFFLINE  

supercat

    Quadrunner

  • 6,367 posts

Posted Wed Dec 21, 2005 1:31 AM

djmips, on Tue Nov 15, 2005 4:52 AM, said:

Yes, that is a nice one. At a quick glance, I thought it trashed A (but it doesn't as bit seven is safely tucked away in the carry and is restored in the ror)

View Post


BTW, if it's desired to copy bit 7 of an address into all bits of the accumulator (so it's 00 or FF):
  lda #$7F
  cmp Address ; Carry clear if >= $80
  adc #$80  ; $00 if carry set; $FF if carry clear


#55 TROGDOR OFFLINE  

TROGDOR

    Chopper Commander

  • 160 posts
  • Why Yes, I Have Played Atari Today.
  • Location:Strongbadia

Posted Wed Dec 21, 2005 2:17 AM

supercat, on Wed Dec 21, 2005 2:16 AM, said:

TROGDOR, on Wed Dec 21, 2005 1:49 AM, said:

The only thing to be careful with this is that the N, V, and Z flags will be corrupted.  But the accumulator is not affected.  So you've just saved a byte.  Also be wary that one extra cycle is used.  BIT Absolute is 4 cycles, and a BMI branch is 3 cycles (or 4 if it crosses a page boundary.)

View Post


Another thing to beware of is that certain memory locations can be affected by reads. In a 4K cart, this isn't apt to be a problem (the only thing affected by a read is the timer-interrupt flag on the RIOT which is cleared by reading INTIM). In a 16K or 32K cart, skipping "INC nn,X" or "ISB nn,x" will trigger a bank switch if 'nn' is $1F, $3F, $5F, $7F, $9F, $BF, $DF, or $FF, but such an instruction isn't too likely to occur. In a 32K cart, skipping "SBC nn,X" will also trigger a bankswitch.

None of those bankswitch hotspots is in practice likely to be a problem. On the other hand, RAM based carts open up a much bigger danger zone. On a Superchip cart, skipping any instruction whose opcode is $00-$7F and whose operand is $10, $30, $50, $70, $90, $B0, $D0, or $F0 will trash a byte of Superchip RAM. On a Supercharger, skipping any two-byte instruction whose operand is one of the above will be, to put it mildly, "interesting".

View Post


Thanks for the heads-up supercat. I want to maintain Supercharger compatibility for all my programs, so it looks like I'll have to ditch my favorite hack. :sad:

#56 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,237 posts
  • begin 644 contest

Posted Wed Dec 21, 2005 3:19 AM

TROGDOR said:

supercat, on Wed Dec 21, 2005 2:16 AM, said:

On a Supercharger, skipping any two-byte instruction whose operand is one of the above will be, to put it mildly, "interesting".

View Post

Thanks for the heads-up supercat. I want to maintain Supercharger compatibility for all my programs, so it looks like I'll have to ditch my favorite hack.
No need to abandon the hack unless you are writing actual Supercharger games. If you create a regular 4k binary and run it on a Supercharger, you will have writes disabled anyway so this doesn't apply.

Edited by batari, Wed Dec 21, 2005 3:20 AM.


#57 cd-w OFFLINE  

cd-w

    Stargunner

  • 1,195 posts
  • Juno First!
  • Location:Glasgow, UK

Posted Wed Dec 21, 2005 11:04 AM

batari, on Wed Dec 21, 2005 9:19 AM, said:

TROGDOR said:

supercat, on Wed Dec 21, 2005 2:16 AM, said:

On a Supercharger, skipping any two-byte instruction whose operand is one of the above will be, to put it mildly, "interesting".

View Post

Thanks for the heads-up supercat. I want to maintain Supercharger compatibility for all my programs, so it looks like I'll have to ditch my favorite hack.
No need to abandon the hack unless you are writing actual Supercharger games. If you create a regular 4k binary and run it on a Supercharger, you will have writes disabled anyway so this doesn't apply.

View Post


Ah, I have been wondering for some time why this trick was unreliable when I used it within my PoP code :idea:

Thanks,
Chris

#58 cd-w OFFLINE  

cd-w

    Stargunner

  • 1,195 posts
  • Juno First!
  • Location:Glasgow, UK

Posted Wed Apr 19, 2006 11:20 AM

View Postvdub_bobby, on Mon Jun 13, 2005 4:57 PM, said:

Here's the software collision routine I worked up for Go Fish!
A little different - I use it to check if two boxes overlap. Call it once for X values, then call it again with Y values.
CheckBoundaries
		lda rect1.leftortop
		cmp rect2.leftortop	  [*]
		bmi Check2
		cmp rect2.rightorbottom
		bmi InsideBoundingBox
Check2
		lda rect2.leftortop
		cmp rect1.leftortop	 [*]
		bmi NotInsideBoundingBox
		cmp rect1.rightorbottom
		bpl NotInsideBoundingBox
InsideBoundingBox
		sec
		rts
NotInsideBoundingBox
		clc
		rts

I was looking back through this thread for a collision detection routine for my Juno First game, and the above routine looks like it will be ideal. However, if I am not mistaken, the comparisons marked [*] are the inverse of each other, so I think the code can be simplified as follows:

CheckBoundaries
		lda rect1.leftortop
		cmp rect2.leftortop  
		bmi Check2
		cmp rect2.rightorbottom
		bmi InsideBoundingBox
NotInsideBoundingBox
		clc
		rts
Check2
		lda rect2.leftortop
		cmp rect1.rightorbottom
		bpl NotInsideBoundingBox
InsideBoundingBox
		sec
		rts

Chris

#59 vdub_bobby OFFLINE  

vdub_bobby

    Quadrunner

  • 5,831 posts
  • Boom bam.
  • Location:Seattle, WA

Posted Wed Apr 19, 2006 11:37 AM

:thumbsup:

Don't know how I missed that; that looks nice. :)

#60 cd-w OFFLINE  

cd-w

    Stargunner

  • 1,195 posts
  • Juno First!
  • Location:Glasgow, UK

Posted Wed Apr 19, 2006 1:51 PM

View Postvdub_bobby, on Wed Apr 19, 2006 6:37 PM, said:

:thumbsup:
Don't know how I missed that; that looks nice. :)

Thanks - I haven't actually tried it yet, but it should work unless I am missing something :)

Chris

#61 supercat OFFLINE  

supercat

    Quadrunner

  • 6,367 posts

Posted Wed Apr 19, 2006 4:52 PM

View Postcd-w, on Wed Apr 19, 2006 12:20 PM, said:

I was looking back through this thread for a collision detection routine for my Juno First game, and the above routine looks like it will be ideal. However, if I am not mistaken, the comparisons marked [*] are the inverse of each other, so I think the code can be simplified as follows:

If one is trying to determine whether two objects of fixed size are overlapping, why not just use:
 ; Start with carry clear
  lda xpos1
  sbc xpos2 ; Note will subtract n-1
  sbc #SIZE2-1
  adc #SIZE1+SIZE2-1 ; Carry set if overlap

If SIZE1=4 and SIZE2=8, xpos1=20...
- If xpos2=24, the first SBC yields $FB (no carry) and the second, $F3 (carry set). The ADC generates no carry.
- If xpos2=12, the first SBC yields $07 (carry set) and the second, $00 (carry set). Again, the ADC generates no carry.

- If xpos2=23, the first SBC yields $FC (no carry) and the second, $F4 (carry set). The ADC generates carry.
- If xpos2=13, the first SBC yields $06 (carry set) and the second, $FF (carry clear). Again, the ADC generates carry.

#62 cd-w OFFLINE  

cd-w

    Stargunner

  • 1,195 posts
  • Juno First!
  • Location:Glasgow, UK

Posted Thu Apr 20, 2006 5:02 AM

View Postsupercat, on Wed Apr 19, 2006 11:52 PM, said:

If one is trying to determine whether two objects of fixed size are overlapping, why not just use:
; Start with carry clear
  lda xpos1
  sbc xpos2; Note will subtract n-1
  sbc #SIZE2-1
  adc #SIZE1+SIZE2-1; Carry set if overlap

If SIZE1=4 and SIZE2=8, xpos1=20...
- If xpos2=24, the first SBC yields $FB (no carry) and the second, $F3 (carry set). The ADC generates no carry.
- If xpos2=12, the first SBC yields $07 (carry set) and the second, $00 (carry set). Again, the ADC generates no carry.

- If xpos2=23, the first SBC yields $FC (no carry) and the second, $F4 (carry set). The ADC generates carry.
- If xpos2=13, the first SBC yields $06 (carry set) and the second, $FF (carry clear). Again, the ADC generates carry.

This seems to be an even better approach, and is particularly useful for me as it has a constant cycle count. I wish I could write code like this, but I always seem to have trouble figuring the arithmetic :) Incidentally, I implemented the previous version which I posted above and it works fine.

Chris

Edited by cd-w, Thu Apr 20, 2006 5:03 AM.


#63 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,237 posts
  • begin 644 contest

Posted Sun May 21, 2006 3:03 AM

Here's another one for the code wizards out there. I'm trying to write some good code for 7-bit binary->8-bit BCD. Here's two examples I've come up with:

Obvious method (23 bytes):

; input: binary value in A

   ldx #$FF
   sec
count
   sbc #10
   inx
   bcs count
   adc #10
   ora table,x

; output: BCD value in A

table .byte $0,$10,$20,$30,$40,$50,$60,$70,$80,$90




Obfuscated method: 19 bytes

; input: binary value in A

   ldx #8
   ldy #4
repeat
   dey
   bmi noadd
   cmp #$50
   bcc noadd
   adc #$2f
noadd
   cmp #$80
   rol
   dex
   bne repeat

; output: BCD value in A

Can anyone guess how #2 works?

I'm not satisfied with either though. It seems that there should be a better way, in terms of speed and cycles... Maybe something using the "D" flag?

#64 Rybags ONLINE  

Rybags

    Quadrunner

  • 10,323 posts
  • Location:Australia

Posted Sun May 21, 2006 3:45 AM

Quickly written... probably more cycles - table values adjusted since carry will be set for add.

 tay 
 and #3; starting figure (0-7) 
 pha 
 tya 
 lsr a 
 lsr a
 lsr a 
 sta ztemp 
 pla 
 ldx #4
loop:  lsr ztemp 
 bcc noadd 
 adc dectab-1,x 
 noadd:  dex 
 bne loop
 rts 
dectab .byte $63,$31,$15,$07

Another tip: Instead of using branch, load/or, etc in certain conditions, you can just use the Processor Status, e.g. preserve bit 7 of a location.

Lda (screen),y
php
<other stuff>
plp
and #$80
ora newchar
sta (screen),y

I used a similar technique to preserve bit 7 of screen data in a softsprite routine - so that multicolour bitpair 11 still uses the right playfield.

Handily, the N and V flags and bit 7 and 6, so something like a BIT instruction followed by PHP will preserve the bit values for later processing.

#65 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,237 posts
  • begin 644 contest

Posted Sun May 21, 2006 3:34 PM

View PostRybags, on Sun May 21, 2006 4:45 AM, said:

Quickly written... probably more cycles - table values adjusted since carry will be set for add.
Actually it might be quicker in terms of cycles, but it is quite a bit longer.

After research, I came up with something really short (17 bytes)
; Binary in A

   sed
   sta temp1
   lda #0
   ldx #8
loop
   asl temp1
   sta temp2
   adc temp2
   dex
   bne loop
   cld

; BCD in A
What's cool about this one is that it actually will do 8-bit binary -> 9-bit BCD, with the 9th bit contained in the carry! Can this be improved any more, though?

Edited by batari, Sun May 21, 2006 3:34 PM.


#66 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • 8,111 posts
  • Location:Baden-Württemberg, Germany

Posted Mon May 22, 2006 6:06 AM

ehm...call me stupid but for what do i need 9th bit bcd? positive/negative range?

#67 Rybags ONLINE  

Rybags

    Quadrunner

  • 10,323 posts
  • Location:Australia

Posted Mon May 22, 2006 6:22 AM

Theoretically, allowing full representation of 0-127, I suppose.

Probably would want some modification to allow for calling and using the carry bit in each initial calculation.

Anyway, stuff like that you only need once in a program and memory usage is mainly a 2600 consideration for stuff like BCD score / timers etc.

#68 djmips OFFLINE  

djmips

    Dragonstomper

  • 591 posts
  • scrolling
  • Location:Seattle

Posted Wed Mar 7, 2007 4:23 PM

This isn't exactly a 6502 Killer hack but I wanted to share this link.

http://graphics.stan...r/bithacks.html

There you will find some useful methods that apply to your 6502 coding.

#69 vdub_bobby OFFLINE  

vdub_bobby

    Quadrunner

  • 5,831 posts
  • Boom bam.
  • Location:Seattle, WA

Posted Fri Mar 9, 2007 12:44 PM

View Postdjmips, on Wed Mar 7, 2007 2:23 PM, said:

This isn't exactly a 6502 Killer hack but I wanted to share this link.

http://graphics.stan...r/bithacks.html

There you will find some useful methods that apply to your 6502 coding.
Some of those are pretty cool.

I thought I'd translate some of them to 6502 assembly:
Counting bits set, Brian Kernighan's way

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
  v &= v - 1; // clear the least significant bit set
}
Assembly:
;--count set bits in v
;  trashes v, accumulator, result is in c

   lda #0
   sta c
Loop
   lda v
   beq Done
   inc c
   dec v
   and v
   sta v
   bne Loop
Done
Seems like that should be improveable.

#70 vdub_bobby OFFLINE  

vdub_bobby

    Quadrunner

  • 5,831 posts
  • Boom bam.
  • Location:Seattle, WA

Posted Fri Mar 9, 2007 12:56 PM

Here's another:
Swapping values with XOR

#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

This is an old trick to exchange the values of the variables a and b without using extra space for a temporary variable.

;--swap a and b

   lda a
   eor b
   sta a
   eor b
   sta b
   eor a
   sta a
That right?

Edited by vdub_bobby, Fri Mar 9, 2007 12:58 PM.


#71 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,237 posts
  • begin 644 contest

Posted Fri Mar 9, 2007 3:25 PM

View Postvdub_bobby, on Fri Mar 9, 2007 1:56 PM, said:

Here's another:
Swapping values with XOR

#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

This is an old trick to exchange the values of the variables a and b without using extra space for a temporary variable.

;--swap a and b

   lda a
   eor b
   sta a
   eor b
   sta b
   eor a
   sta a
That right?
That does have some academic appeal, but this is how I usually do it:

Quote

lda a
ldy b
sta b
sty a
I suppose if you absolutely can't use RAM or another register, the EOR trick would work.

#72 vdub_bobby OFFLINE  

vdub_bobby

    Quadrunner

  • 5,831 posts
  • Boom bam.
  • Location:Seattle, WA

Posted Fri Mar 9, 2007 3:28 PM

View Postbatari, on Fri Mar 9, 2007 1:25 PM, said:

That does have some academic appeal
That's what I'm going for here! :lol:

Edited by vdub_bobby, Fri Mar 9, 2007 3:28 PM.


#73 Urchlay OFFLINE  

Urchlay

    Stargunner

  • 1,131 posts

Posted Fri Mar 9, 2007 4:16 PM

View Postvdub_bobby, on Fri Mar 9, 2007 1:44 PM, said:

Assembly:
;--count set bits in v
;  trashes v, accumulator, result is in c

   lda #0
   sta c
Loop
   lda v
   beq Done
   inc c
   dec v
   and v
   sta v
   bne Loop
Done
Seems like that should be improveable.

I'll take a stab at it...

Execution time for your version is 11 cycles if v is 0, or 5+n*23-1 cycles if v is not zero (where n is the number of bits set in v). Average time (4 bits set in input) is 96 cycles.

One obvious improvement would be to use the X register instead of memory location c. Use "ldx #0" to initialize (saves 3 cycles, assuming "c" was a zero page address), and "inx" instead of "inc c" (saves 3 cycles per set bit in v). Also the code gets 2 bytes smaller. Of course, if you already had written a lot of code that used this as a subroutine, all that code would have to change (to look for the result in X instead of c). Depending on how the calling code used the result, you might end up losing a lot more than the 2 bytes you've saved.

Another improvement would be to move the Loop label down two lines (below the "beq Done")... You only need to do "lda v : beq Done" once, before the first loop iteration (to catch the special case where the initial value of v is zero). At the end of the loop, you're making this check again (the "bne Loop")... if this branch is taken (back to Loop), it's redundant to check v for zero again (you're guaranteed it's not, since you just took the "bne Loop" branch). This will save you 4 cycles per set bit in v, and it won't break compatibility with (hypothetical) existing code that calls the routine, either. (It will add 4 constant cycles before the loop, though)

Applying both improvements (X register and moving the label) would save 7 cycles per loop iteration, which is a 30% speedup (of just the loop)... not bad.

Code looks like:

;--count set bits in v
;  trashes v, accumulator, result is in X register

   ldx #0
   lda v
   beq Done
Loop
   inx
   dec v
   and v
   sta v
   bne Loop
Done

Timing is 8 cycles if v is 0, or 7+16*n-1 if v is not 0... Average case (4 bits set in input) is 70 cycles, 28% faster than the original, assuming I've gotten all the numbers right (if not, I apologize). Total code size is 15 bytes (3 bytes smaller than the original).

However, there are a couple of other ways to count set bits that are faster at the expense of more code. I was messing around with this a while back, and came up with 5 other solutions (I never came up with the Kernighan method on my own, though). Here are a couple of them:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Count the number of set bits in A
; Unrolled version
; Result in A
; Uses one byte of RAM at tmp (preferably ZP)
; On exit: X and Y untouched, C/Z/N undefined
; Code size: 36 bytes (+1 for RTS = 37)
; Execution time: Constant, 5+(7*8) = 61 cycles (+6 for RTS = 67)
countbits_3:
  sta tmp; 3
  lda #0 ; 2
  rol tmp; 5
  adc #0 ; 2
  rol tmp; 5
  adc #0 ; 2
  rol tmp; 5
  adc #0 ; 2
  rol tmp; 5
  adc #0 ; 2
  rol tmp; 5
  adc #0 ; 2
  rol tmp; 5
  adc #0 ; 2
  rol tmp; 5
  adc #0 ; 2
  rol tmp; 5
  adc #0 ; 2

  rts

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Count the number of set bits in A
; Nybble table version
; Result in A
; Uses one byte of RAM at tmp (preferably ZP)
; Execution time: Constant, 35 cycles (+6 for RTS)
; Code size: 17 bytes table + 19 bytes code = 36 bytes (+1 for RTS = 37)
; This is faster than the fully-unrolled version,
; but it trashes all the registers and flags. Could use PHA/PLA instead of
; TAY/TYA, to preserve the Y register, at the expense of 2 cycles.
nyb_bits: byte 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4
countbits_4:
  tay		   ; 2
  and #$0F	  ; 3
  tax		   ; 2
  lda nyb_bits,x; 4
  sta tmp	   ; 3
  tya		   ; 2
  lsr		   ; 2
  lsr		   ; 2
  lsr		   ; 2
  lsr		   ; 2
  tax		   ; 2
  lda nyb_bits,x; 4
  clc		   ; 2
  adc tmp	   ; 3

  rts		   ; 6

Neither of these is all that original. The unrolled version is obvious to any 6502'er, and the nybble-table idea came from some code I saw either here or on the [stella] list (I couldn't find the original code, so I rewrote it from scratch).

Here's one I came up with on my own. It's not all that fast, but I like it because it fits in 11 bytes of code and doesn't use a temp location in RAM. Also, if I added a ROL before the RTS, the accumulator and carry flag would keep their original values, which might simplify things for the calling code.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Count the number of set bits in A
; Result in Y
; Uses no RAM
; On exit:
; Y = result, X = 0, A = undefined, C = undefined, Z = 1, N = 0
; Min execution time (input $00, 0 bits set):
; 4+(2+3+2+3)*8-1 = 83 cycles
; Avg execution time (input $AA, 4 bits set):
; 4+(2+3+2+3)*4+(2+2+2+2+3)*4-1 = 87 cycles
; Max execution time (input $FF, 8 bits set):
; 4+(2+2+2+2+3)*8-1 = 91 cycles
; (add 6 cycles for RTS to all 3 counts)
; Code size: 11 bytes (+1 for RTS = 12)
 
countbits_2:
 ldx #8; 2
 ldy #0; 2
 
cb2loop
  rol		 ; 2
  bcc cb2_noiny; 2/3
  iny		 ; 2
cb2_noiny
  dex		 ; 2
  bne cb2loop ; 3
 
; rol; adding this ROL preserves the original values of A and C (and sets Z and N according to the A value)
  rts; 6 


#74 Urchlay OFFLINE  

Urchlay

    Stargunner

  • 1,131 posts

Posted Fri Mar 9, 2007 4:18 PM

View Postbatari, on Fri Mar 9, 2007 4:25 PM, said:

I suppose if you absolutely can't use RAM or another register, the EOR trick would work.

The EOR trick still uses RAM (locations a and b).

#75 Urchlay OFFLINE  

Urchlay

    Stargunner

  • 1,131 posts

Posted Fri Mar 9, 2007 4:37 PM

A while ago, I wrote this bit of code to calculate even/odd parity. It could probably stand some improvement... I definitely don't consider it a "killer hack": it's coded for clarity more than speed or size, but it does implement something from the bithack page, so I thought someone might be interested.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; calc_parity will calculate even or odd parity for the low 7 bits of A,
; sets the high bit and N flag to the result.

; Input:
; A = data to calculate parity for (bit 7 ignored)
; C = carry set for odd parity, clear for even
; X, Y, other flags: ignored

; Output:
; A = data with correct parity bit in bit 7 (bits 0-6 unchanged)
; N = parity bit (same as A bit 7)
; C = parity bit (same as A bit 7 and N)
; Z = 1 if A is 0, or 0 otherwise
; X = 0
; Y = bits 0-6 of input
; tmp = same as Y

; Memory usage: One byte at tmp. No stack used.
; If desired, the routine could easily be modified to preserve the
; Y register (and use another RAM temp instead).

; Execution time: Constant, 108 cycles, +6 for RTS. Add 10 cycles if tmp is
; not located in zero page.

; Code size: 25 bytes, +1 for RTS (+4 if tmp not zero page)

; Code:
calc_parity: subroutine;{
  and #$7F ; 2; throw away top bit (it will get replaced anyway)
  sta tmp  ; 3; copy in tmp will be repeatedly shifted (destroyed),
  tay	  ; 2; so stash another copy in Y
  lda #0   ; 2; init accumulator:
  rol	  ; 2; A=1 if C was set, otherwise A=0
  ldx #7   ; 2
		   ; =13 (+1 if tmp not ZP)

;  {
.loop	 ; loop executes 7 times...
  lsr tmp  ; 5
  adc #0   ; 2
  dex	  ; 2
  bne .loop; 3
		   ; =12*7-1 = 83, +7 if tmp not ZP
;  }

; A now has count of set bits from bits 0-6 of the original argument,
; plus 1 if C flag was initially set.
; Bit 0 of A will be the new parity bit.
  ror	  ; 2; C = bit 0 of A
  lda #0   ; 2
  ror	  ; 2; bit 7 of A = C, bits 0-6 are 0
  sty tmp  ; 3
  ora tmp  ; 3
		   ; =12 (+2 if tmp not ZP)

  rts	  ; 6
;}

BTW, the comments with the curly braces (lines like "; }") are a Stupid Editor Trick: My editor lets me jump from an opening { to a closing } quickly (very handy for C or Java code), so I've gotten in the habit of using them in 6502 asm code for easy navigation. I use Vim, and I know Emacs has support for this, too. I'd bet that any halfway decent code editor or IDE supports it, too.

I've also got a vim macro that selects the current block of code (from the { to the }), for cut/copy or applying other commands. I just press F2 to highlight... and I've got F1 bound to a macro that comments out the block (or uncomments it, if it's already commented). Who says vi is hard to use? :)




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users