I have an idea for ya since your design is similar to my castlevania work. The technique i'm using for creating the many different screens could be adapted to horizontal scrolling and save you more space than mine does. What i've been doing is loading a list of index values into ram and the area of the kernel that handled the playfield would load from the list of index values by using Y as a scanline counter to go through the list and the index value would be in X for the playfield data. Then the playfield portion of the kernel just loaded from each page for each section of the playfield.
page A is the color of the scanline
page B is PF1 left side
page C is PF2 left side
page D is PF2 right side
page E is PF1 right side
With all these loading from the same index, everything lines up fine for a total of 256 individual scanline data. The index list loaded into RAM handles the overall screen image. Once this is done, load 0 into the accumulator and turn the playfield color to black and blank out PF1 and PF2. Changing the playfield color to black at the right time causes PF0 which is already 1111 to be black while blanking the next scanline of any data and it will keep PF0 black while the next scanline handles nothing but sprites. I also have several cycles left during the PF section for other things nd the Y value in mine decrements every 2 scanlines instead of every scanline.
This is what I did:
Scanline start
;11 cycles free
ldx 80,y ;4
lda fa00,x ;4*
sta PF1 ;3
lda fb00,x ;4*
sta COLPF ;3
lda fc00,x ;4*
sta PF2 ;3
;7 cycles free
lda fd00,x ;4*
sta PF2 ;3 begin on cycle 48
lda fe00,x ;4*
sta PF1 ;3
dey ;2
;5 cycles free
; blank out playfield
lda #0 ;2
sta PF1 ;3
sta COLPF ;3
sta PF2 ;3
;1 cycle free
;scanline end
As you can see I haven't killed the free cycle areas with any code yet. But it fills all 76 cycles of a scanline and sets up the next scanline for sprites.
One way you could get your scrolling is simplify for detail and pre-render all the horizontal movements you want to design and have them fill up a few pages. I said ya could do it and take up less data than mine. Here's how. All you would need is atleast 2 pages of data to get quite a few variations of PF data. The key to your type though, is having 2 index values stored in RAM (as compared to my 1 per instance) which reference diffrent parts (left and right sides). The entire height of the ground type object you want to draw can reference those 2 index values for all of it since it doesn't have to change from the top of it to the bottom unless you want it to, you won't need near the huge list mine does in RAM. One difference you would need would be for your PF color loading. I had mine builtinto the same system for simplicity, but you may want to do that differently for more detail each scanline (same visual data, different color). The horizontal movement would just be changing out your index tables in RAM for every frame it needs to change for movement. (for both vertical and horizontal movement)
So this way all of your movement is simply processed by changing the set of index values you store in RAM. In mine, I will be using superchip RAM and for such a game as yours it would be advised as well.