Exercise in self modifying memory copy routine, 6502 ASM
Asked Answered
L

3

5

Below is my self modifying routine for memory copy on Commodore 64.

I wrote char codes and number of repeats in a table and filled the screen_ram the with this routine.

I'm looking for suggestions for optimization. My priority is memory in this case.

memCopy:    
  sourceAddress=*+1 ; mark self modifying addrres
fetchNewData:
  lda data_table    ; read char value into A
  ldx data_table+1  ; read repeat value into x
  inc sourceAddress     
  inc sourceAddress 

  cpx #00           ; if X=0 
  beq end           ; finish copying

  destination=*+1
- sta SCREEN_RAM
  inc destination
  dex
  bne -

  jmp fetchNewData

end:
  rts   

; data format:  <char>,<number of repeats>,[<char>,<number of repeats>,...],00,00

data_table: 
!by 01,03,02,02,......,00,00
Larcenous answered 10/2, 2016 at 13:36 Comment(3)
Upvote for asking about optimizing C64-based 6502 assembly in 2016. Cheers. Made my day to read about it.Patisserie
I see here some flaw, in the whole idea of incrementing low byte of the instruction address while not caring about the high one. This way you can fill only 256 bytes of SCREEN_RAM and only in case it was 256-byte aligned at the start.Thundercloud
Hi wizofwor, you might be interested to know that on Area 51 we've got a proposal going specifically for Retro Computing: area51.stackexchange.com/proposals/94441/…Nagging
T
3

Correct increment of instruction's address should be made like this:

address=*+1
    lda self_modifying_address
    inc address+0
    bne *+5
    inc address+1

thus probably neglecting all memory savings for self-modified code.

I suggest another approach, that includes self-modifying instruction addresses only where absolulety necessary and also stores memory variables in the instructions.

.loop
fetch_ptr=*+1
    ldx #0
    lda filler_bytes,x ;have two tables, first contains only filler bytes,
    ldy repeat_bytes,x ;second only repeat counts
    beq .exit
    inc fetch_ptr      ;this way you save 1 increment

fill_ptr=*+1
    ldx #0
.fill
    sta SCREEN_RAM,x
    inx
    bne +
    inc .fill+2 ;only self-modify high byte of address in the instruction
+   dey
    bne .fill

    stx fill_ptr

    jmp .loop
.exit
    rts


filler_bytes !byte 1,2,3,4,5,4,3,2,1
repeat_bytes !byte 4,4,5,5,6,6,5,5,4,0
Thundercloud answered 11/2, 2016 at 5:32 Comment(2)
I didn't pretty understand inc address+0, inc address+1 thing. Is this to handle overflow?Larcenous
Exactly. Address here is 16 bit (or 2 byte) value, stored as consecutive bytes in little-endian way: first low part of address (address+0), then high part (address+1). Therefore you increment 16 bit value like this: inc low_part : bne *+5 : inc high_part.Thundercloud
S
3

JMP fetchNewData -> BEQ fetchNewData. Move INC sourceAddress after BEQ end and there is no need from CPX #0 (after LDX). 3 bytes less.

Serotine answered 10/2, 2016 at 13:40 Comment(0)
T
3

Correct increment of instruction's address should be made like this:

address=*+1
    lda self_modifying_address
    inc address+0
    bne *+5
    inc address+1

thus probably neglecting all memory savings for self-modified code.

I suggest another approach, that includes self-modifying instruction addresses only where absolulety necessary and also stores memory variables in the instructions.

.loop
fetch_ptr=*+1
    ldx #0
    lda filler_bytes,x ;have two tables, first contains only filler bytes,
    ldy repeat_bytes,x ;second only repeat counts
    beq .exit
    inc fetch_ptr      ;this way you save 1 increment

fill_ptr=*+1
    ldx #0
.fill
    sta SCREEN_RAM,x
    inx
    bne +
    inc .fill+2 ;only self-modify high byte of address in the instruction
+   dey
    bne .fill

    stx fill_ptr

    jmp .loop
.exit
    rts


filler_bytes !byte 1,2,3,4,5,4,3,2,1
repeat_bytes !byte 4,4,5,5,6,6,5,5,4,0
Thundercloud answered 11/2, 2016 at 5:32 Comment(2)
I didn't pretty understand inc address+0, inc address+1 thing. Is this to handle overflow?Larcenous
Exactly. Address here is 16 bit (or 2 byte) value, stored as consecutive bytes in little-endian way: first low part of address (address+0), then high part (address+1). Therefore you increment 16 bit value like this: inc low_part : bne *+5 : inc high_part.Thundercloud
C
1

In addition to i486's suggestions, if data_table is limited to 128 values (including terminating 0,0) then you can save a couple of bytes (and lots of cycles) by avoiding the self-modifying LDAs and using the Y register instead.

I've shown all that below. You could also save another byte (remove one INY) by putting the data_table values into two separate tables.

Alternatively you might be able to use Y to index SCREEN_RAM, but I'm not a C64 guy ...

  ldy #0
fetchNewData:
  lda data_table,y  ; read char value into A
  iny               ; [could remove if two tables]
  ldx data_table,y  ; read repeat value into x
  beq end           ; finish copying [x=0]
  iny

  destination=*+1
- sta SCREEN_RAM
  inc destination
  dex
  bne -
  beq fetchNewData

end:
  rts   

; data format:  <char>,<number of repeats>,[<char>,<number of repeats>,...],00,00

data_table: 
!by 01,03,02,02,......,00,00
Chavira answered 11/2, 2016 at 2:13 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.