Better way to dynamically update tile data on Commodore 64
Asked Answered
M

1

6

I'm planning to use software sprites in multicolor char mode for my new C64 project. My idea is to use superimpose 'bullet' sprite data to tile data.

I think I can have tileset data at address 'TILESET', sprite data at address 'SPRITE'. And I can combine this two to prepare a bullet char with dynamically calculated background and store in address 'SUPERIMPOSED'

I wrote the following code and cycle count to check if it is feasible. And I think it is not. The loop eats up 219 cycles. Nearly four raster lines. And I didn't include other necessary calculations required before this loop. Like calculation of target addresses.

When I want to have 16 bullets on the screen, it will take 64 rasters or 8 character rows. So I become suspicious. Is this the correct way? Or are there any other more optimized way to do the same job?

                         cycles
                        ---------
    ldy #$07             4 x1 = 4   
-   LDA TILESET,x       3 x8 = 24
    AND SPRITE,x        4 x8 = 32 
    STA SUPERIMPOSED,x  5 x8 = 40
    dey                 2 x8 = 16
    cpy                 4 x8 = 32
    bne -               3 x8-1 = 71 
                        ----------
                        219 Cycle

I'm considering have repeating pattern in background. So that I can use same bullet tile without re-calculating.

Marinelli answered 27/9, 2015 at 9:24 Comment(4)
Unroll the loop to get rid of the overhead at the cost of some code increase. Also you seem to be using x for indexing but y for loop?Charissa
If your sprite is more than one pixel in size and you want to be able position it any pixel you need a lot more code, including the ability to superimpose 2 or 4 characters (tiles). If the sprite is just one pixel you can simplify the code a bit.Paxon
@RossRidge I didn't finilazie the sprites yet. So I'm not sure on the size but it will something around 4x4.Marinelli
If you need pixel granular positioning of 4x4 sprites, I think you're either going to have use bitmap mode (which will be pretty expensive as well, since you still have to update as many bytes) or hardware sprites. You can have more than 8 sprites on screen at a time by moving them as the screen is being scanned out. You're still limited to 8 sprites on a given scan line. You could also alternate between two sets of 8 sprites every frame. This will cause them to flicker, but the effect may not be too bad. (And well, flickering was something your namesake was infamous for.)Paxon
P
7

As Jester suggests, as a first optimisation just repeat the lda, and, sta and dey eight times. Eliminate the cpy and bne. That'll save 103 cycles immediately. Even if you want to keep the formal loop, notice that dey sets the zero flag so you don't need the cpy.

As a second optimisation, consider a compiled sprite. Instead of performing the read from sprite, x, you'd have those values coded directly into your routine, making a distinct routine for each sprite. That'd cut another 16 cycles.

That being said, your lda would be 4 cycles in an aligned table, not 3. So there are 8 you haven't accounted for. Meaning that unrolled plus specialised to your sprite = 102 cycles (having omitted the final dey).

Without knowing the C64 architecture and/or what the rest of your code does, if whomever ingests SUPERIMPOSED can do so from the stack page, consider writing output to the stack rather than via indexed addressing. Just load s with an appropriate seed value and store new results via pha. That'll save two cycles per store at the cost of 12 additional cycles of setup and restore.

Following on from that thought, if you had freedom in how these tables look then consider switching their format — instead of one table that holds all eight bytes of TILESET, use eight tables, each of which holds one byte of it. That'd remove the need to adjust y in the loop; just use a different target table in each unrolled iteration.

Supposing both TILESET and SUPERIMPOSED can be eight tables that gets you down to:

LDA TILESET1, x
AND #<value>
STA SUPERIMPOSED1, x    ; * 8

[... LDA TILESET2, x ...]

... which is a total of 88 cycles. If SUPERIMPOSED is linear but in the stack page then:

TSX
TXA
LDX #newdest
TXS
TAX                ; adds 10

LDA TILESET1, y
AND #<value>
PHA                ; * 8

[... LDA TILESET2, y ...]

TXS                ; adds 2

... which is 84 cycles.

Late addition:

If you're willing to premultiply the index in x by 8, effectively reducing your indexable range to 32 tiles, then you can proceed filling a linear output array without adjusting y, as per:

LDA TILESET, x
AND #<value1>
STA SUPERIMPOSED, x

LDA TILESET+1, x
AND #<value2>
STA SUPERIMPOSED+1, x

... etc ...

So you'd need eight copies of that routine with different table base addresses still to be able to hit 256 output tiles. Supposing you have 20 sprites, that makes a total of 20*8 = 160 copies of your sprite plotting routine, each of which is likely to be of the order of 100 bytes, so you're spending about 16kb.

If your game is much heavier on one kind of sprite than on others — e.g. it's usually two or three spaceships shooting thousands of bullets at each other — then obviously you can optimise very selectively and keep that total footprint down.

Personalty answered 27/9, 2015 at 13:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.