A 2-line (or 3-line, or 4-line, etc.) kernel is a lot like a 1-line kernel, but your loop takes more cycles. Whether or not you need any WSYNC instructions depends on how many cycles the loop contains and how precisely-timed it is. Thus...
If your loop takes less than 76 cycles-- or really, less than 73 cycles-- then include 1 WSYNC to fill out the line and it will be a 1-line kernel. Don't forget to factor in the cycles needed for decrementing or incrementing your loop counter and testing it, plus the cycles needed to loop back, plus the 3 cycles for strobing WSYNC. Also, keep in mind that branches take 2 cycles if they aren't taken, or 3 cycles if they're taken, or 4 cycles if they're taken but the target address is on a different 256-byte page.
If your loop takes exactly 76 cycles-- including the overhead cycles for updating the loop counter, checking it, and branching-- then you can omit the WSYNC. This is still a 1-line kernel.
If your loop takes more than 76 cycles but less than 152 cycles (or less than 149 cycles), then include at least 1 WSYNC and it will be a 2-line kernel. You may not need to use 2 WSYNCs if you don't care so much about the timing on the second line. But if you need the instructions on both lines to be lined up so they always execute at specific points on each line, then include 2 WSYNCs.
If your loop takes exactly 152 cycles-- excluding any WSYNCs but including all the extra overhead stuff-- then you don't need any WSYNCs.
And so on for a 3-line kernel, 4-line kernel, etc.
Of course, the number of times you execute the loop will depend on how many lines it draws-- like 192 times for a 1-line kernel, but only 96 times for a 2-line kernel, or 64 times for a 3-line kernel, etc.
If you do go with a 2-line kernel, or 3-line kernel, etc., you might need to use a separate variable or register as an index for the graphics.
For example, a simple 2-line kernel might look something like this:
ldy #0 ; index for the graphics
ldx #96 ; counter for the loop
loop
lda (graphics_vector),y
sta GRP0
; do some more stuff
iny ; increment the graphics index
sta WSYNC ; finish off the first line
lda (graphics_vector),y
sta GRP0
; do some more stuff
iny ; increment the graphics index
sta WSYNC
dex ; decrement the loop counter
bne loop ; not done yet? then loop again
Since updating the loop counter, checking it, and looping back take time, you might want to put the first WSYNC at the beginning of the loop so the instructions for the first line are always lined up exactly the same as the instructions for the second line...
ldy #0 ; index for the graphics
ldx #96 ; counter for the loop
loop
sta WSYNC ; finish the current line and start a new line
lda (graphics_vector),y
sta GRP0
; do some more stuff
iny ; increment the graphics index
sta WSYNC ; now finish this line and start another new line
lda (graphics_vector),y
sta GRP0
; do some more stuff
iny ; increment the graphics index
dex ; decrement the loop counter
bne loop ; not done yet? then loop again