Jump to content



0

Get the hell from your Jag !


3 replies to this topic

#1 GT Turbo OFFLINE  

GT Turbo

    Moonsweeper

  • 362 posts
  • Location:Alsace, France

Posted Mon Aug 13, 2007 3:52 AM

For people who wants to get the hell out of their Jag, let's have a look here :

http://www.jagware.o...p?showtopic=464

and here :

http://www.jagware.o...p?showtopic=465


SCPCD has timed some operations and gives measures. That can help for doing optimisations on Jaguar code and that is real timing.



GT Turbo (Jagware) Posted Image

#2 Gorf OFFLINE  

Gorf

    River Patroller

  • 4,633 posts

Posted Mon Aug 13, 2007 6:06 PM

Hey Jagware crew.... since I really want to see the difference....and dont have an analyzer...

...A GPU version of the same.....

... try this out in main....just align it on a page boundary
and have the 68k start it....stop the 68k after the start though.

Lets see the analyzer results.


BLITBASE	.equr 		r14; base of blitter registers

A1FLAGS		EQU		1; register index defines
A1CLIP		EQU		2
A1PIXEL		EQU		3
A1STEP		EQU		4	
A1FSTEP		EQU		5
A1FPIXEL		EQU		6
A1INC		EQU		7
A1FINC		EQU		8
A2BASE		EQU		9
A2FLAGS		EQU		10
A2MASK		EQU		11
A2PIXEL		EQU		12
A2STEP		EQU		13
BCMD		EQU		14
BCOUNT		EQU		15
BSRCDH		EQU		16
BSRCDL		EQU		17
BDSTDH		EQU		18
BDSTDL		EQU		19
BDSTZH		EQU		20
BDSTZL		EQU		21
BSRCZ1H		EQU		22
BSRCZ1L		EQU		23
BSRCZ2H		EQU		24
BSRCZ2L		EQU		25
BPATDH		EQU		26
BPATDL		EQU		27
BIINC		EQU		28
BZINC		EQU		29	
BSTOP		EQU		30
BLITI0 		EQU		31
BLITI1		EQU		32	

BLITBASEHI	.equr 		r15; pick up were last index register leaves off....not using most of these here but
				; good to have for future endevors....this is the same loaction as BLIT_I2
BLITI3		EQU		1		
BLITZ0		EQU		2			
BLITZ1		EQU		3	
BLITZ2		EQU		4	
BLITZ3		EQU		5	


	moveq 	#0,r0
	movei	#A1_BASE,BLITBASE
	movei 	#PITCH1|PIXEL32|WID128|XADDPHR,r2
	movei	#source,r3
	movei	#destination,r4
	movei	#$00010400,r5
	movei	#SRCEN|LFU_REPLACE,r6
	movei	#G_CTRL,r7

	store	r2,(BLITBASE+A2FLAGS)
	store	r3,(BLITBASE+A2BASE)
	store	r0,(BLITBASE+A2PIXEL)
	store	r0,(BLITBASE+A2STEP)
	store	r2,(BLITBASE+A1FLAGS)
	store	r4,(BLITBASE+A1BASE)
	store	r0,(BLITBASE+A1PIXEL)
	store	r0,(BLITBASE+A1FPIXEL)
	store	r0,(BLITBASE+A1STEP)
	store	r0,(BLITBASE+A1FSTEP)
	store	r0,(BLITBASE+A1CLIP)
	store	r0,(BLITBASE+A1INC)
	store	r0,(BLITBASE+A1FINC)
	store	r5,(BLITBASE+BCOUNT)
	store	r6,(BLITBASE+BCMD)
	store	r0,(r7)	;done...stop GPU
	nop
	nop
	nop
	nop

Edited by Gorf, Mon Aug 13, 2007 6:32 PM.


#3 SCPCD OFFLINE  

SCPCD

    Star Raider

  • 53 posts
  • Location:France

Posted Wed Aug 15, 2007 10:40 AM

It's a first try, i'll make other test in the futur.

I have made some changes to see more easily this onto the LA.

	move.l		#ints,VBL_VECTOR
	move.w		#%1111100000010,INT1;clear all pending int, & enable GPU interrupt

gorf_test:
	move.l		#gpu_code_start,G_PC
	move.l		#1,G_CTRL
	stop		#$2100;wait a gpu stop
	
	bra			gorf_test

ints:
	move.w		#%1111100000010,INT1
	move.w		#0,INT2;68k to normal level
	rte

	.qphrase
gpu_code_start:
	.gpu

BLITBASE	.equr		 r14; base of blitter registers

A1FLAGS		EQU		1; register index defines
A1CLIP		EQU		2
A1PIXEL		EQU		3
A1STEP		EQU		4	
A1FSTEP		EQU		5
A1FPIXEL		EQU		6
A1INC		EQU		7
A1FINC		EQU		8
A2BASE		EQU		9
A2FLAGS		EQU		10
A2MASK		EQU		11
A2PIXEL		EQU		12
A2STEP		EQU		13
BCMD		EQU		14
BCOUNT		EQU		15
BSRCDH		EQU		16
BSRCDL		EQU		17
BDSTDH		EQU		18
BDSTDL		EQU		19
BDSTZH		EQU		20
BDSTZL		EQU		21
BSRCZ1H		EQU		22
BSRCZ1L		EQU		23
BSRCZ2H		EQU		24
BSRCZ2L		EQU		25
BPATDH		EQU		26
BPATDL		EQU		27
BIINC		EQU		28
BZINC		EQU		29	
BSTOP		EQU		30
BLITI0		 EQU		31
BLITI1		EQU		32	

BLITBASEHI	.equr		 r15; pick up were last index register leaves off....not using most of these here but
			 ; good to have for future endevors....this is the same loaction as BLIT_I2
BLITI3		EQU		1		
BLITZ0		EQU		2			
BLITZ1		EQU		3	
BLITZ2		EQU		4	
BLITZ3		EQU		5	


	moveq	#0,r0
	moveq	#2,r1;for G_CTRL register : interrupt the 68k
	movei	#A1_BASE,BLITBASE
	movei	#PITCH1|PIXEL32|WID128|XADDPHR,r2
	movei	#source,r3;=somewhere in DRAM phrase aligned
	movei	#destination,r4;=G_RAM+$8000
	movei	#$00010010,r5
	movei	#SRCEN|LFU_REPLACE,r6
	movei	#G_CTRL,r7

	store	r2,(BLITBASE+A2FLAGS)
	store	r3,(BLITBASE+A2BASE)
	store	r0,(BLITBASE+A2PIXEL)
	store	r0,(BLITBASE+A2STEP)
	store	r2,(BLITBASE+A1FLAGS)
	store	r4,(BLITBASE)
	store	r0,(BLITBASE+A1PIXEL)
	store	r0,(BLITBASE+A1FPIXEL)
	store	r0,(BLITBASE+A1STEP)
	store	r0,(BLITBASE+A1FSTEP)
	store	r0,(BLITBASE+A1CLIP)
	store	r0,(BLITBASE+A1INC)
	store	r0,(BLITBASE+A1FINC)
	store	r5,(BLITBASE+BCOUNT)
	store	r6,(BLITBASE+BCMD)
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	store	r1,(r7) ;done...stop GPU and launch a CPU interrupt
	nop
	nop
	.68000
.gpu_code_end:
	.dc.l	0

Actually I don't know how I can easily trig for your test so I add a repeat of it.
But as the 68K is stopped the only possibility to restart when the blitt is finisched is to restart the 68k by an interrupt but there is no Blitt interrupt for the 68K.
And I can not launch the CPU interrupt before the blitter is idled, else the 68k take the priority...

so I add nop and reduce the length of the blitt for a first try.

this is the result :
gorf.jpg

in A : the first 2 instruction of the GPU.
We have 13 cycles for 2 consecutive moveq.
12 cycles per each "32-bit GPU instruction" read until "store" instructions.
then each 2x"store rn,(rn+x)" takes 14 cycles.

When the blitter start, we can see that there are interleaved of gpu instructions and blitter access, and more interesting : time between each GPU instruction takes less time to read into the dram !
time between 2 "32-bit GPU instruction" are not constant but seems to be about 8 cycles.
time between 2 blitt are 8 cycles.


I'll make updates in the futur, now I have others things to do ;)

Edited by SCPCD, Wed Aug 15, 2007 10:43 AM.


#4 Gorf OFFLINE  

Gorf

    River Patroller

  • 4,633 posts

Posted Wed Aug 15, 2007 2:19 PM

View PostSCPCD, on Wed Aug 15, 2007 12:40 PM, said:

Actually I don't know how I can easily trig for your test so I add a repeat of it.
But as the 68K is stopped the only possibility to restart when the blitt is finisched is to restart the 68k by an interrupt but there is no Blitt interrupt for the 68K.
And I can not launch the CPU interrupt before the blitter is idled, else the 68k take the priority...

Oh...yeah....forgot about the blitter running... :P..you can have the GPU wait for the Blitter then
have the blitter stop and use the GPU interrrupt to wake the 68k.

View PostSCPCD, on Wed Aug 15, 2007 12:40 PM, said:

When the blitter start, we can see that there are interleaved of gpu instructions and blitter access, and more interesting : time between each GPU instruction takes less time to read into the dram !
time between 2 "32-bit GPU instruction" are not constant but seems to be about 8 cycles.
time between 2 blitt are 8 cycles.

I m not suprised by the faster DRAM instruction reads. I think the pipeline is the issue out in
main. It seems to pipeline at 64 bits instead of its internal 32 bits....im guessing this BTW.
It makes sense whenyou consider how main code jumps work.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users