Sidreloc
Voyage, travel, and change of place impart vigor.
— Seneca the Younger, ca 4 BC – 65 AD
Sidreloc is a tool for relocating SID tunes any number of whole pages, using a novel but remarkably simple algorithm. It can also relocate all zero-page variables used by the tune.
Sidreloc is capable of relocating 91% of HVSC #56 out of the box. Many of the remaining tunes can be relocated by tweaking the settings to fit them.
Download
- sidreloc-1.0 (Source code, tar/gzip, 19.6 kB)
Here's the manual page for sidreloc.
While the tool is SID specific, the fundamental algorithm could easily be adapted to relocate all sorts of 6510/6502 machine code that meets a couple of requirements (see "Observations" below). Sidreloc is primarily a linux program, but it's open source (MIT license) and you are encouraged to port it to different operating systems. Good patches will be merged into the official version.
Background
SID tunes are music files for the Commodore 64 home computer. They are typically 3-4 KB large, and consist of a small header followed by executable 6510 machine code. The header contains a 16-bit load address; the code must be loaded at this address, and then executed. This can be done in an emulator running on a modern computer (e.g. sidplay) or on a real C64.
Of the over 40000 SID tunes found in the excellent HVSC, many have been ripped from games or demos, and the original source code is long lost. Some tunes were written in assembly language, and then linked to run at a particular address. Others were programmed in machine code monitors, where addresses are hardcoded from the start. Yet others were made using interactive tools such as trackers, but then exported into executable code linked to run at some given load address.
The following charts show a breakdown of all SID tunes in the HVSC based on load address and size.
(Note: BASIC tunes have been omitted here and in the following.)
Unfortunately, it is sometimes difficult to combine an existing SID tune with other code, such as accompanying a new demo effect with a historical SID tune. Demo effects often rely on bank switching and other address-related trickery, and need to occupy certain memory regions. On the other hand, SID tunes are just programs, and should be able to execute from any place, if it weren't for the irritating little fact that they're already linked.
The problem
Once linked, it is generally very difficult to relocate (move) a piece of machine code. This is especially true for 6510/6502 code. To illustrate, here are two small routines that implement the same functionality: Copying a 2 KB chunk of memory from the current program into the area $3800-$3fff. This is something you might do in order to set up a font for a C64 program.
Self-modifying version:
* = $1000 ; This informs the assembler that we'll be loaded at $1000 ldy #8 ; 8 iterations of the outer loop ldx #0 ; 256 iterations of the inner loop loop lda font,x ; Copy one byte... sta $3800,x ; ...into the right place. inx bne loop dey beq done inc loop+2 ; For each iteration of the outer loop, modify the inc loop+5 ; load and store addresses. jmp loop done rts font ; 2 kb of data follows...
If you are unfamiliar with 6510 code, this might seem utterly terrifying, but it's standard practice. The index registers (X and Y) are 8 bits wide, and there's no obvious way to loop efficiently over a large array of bytes without resorting to self-modifying code.
This is what we get if we run the code through an assembler and a linker, and then a disassembler:
1000 a0 08 ldy #$08 1002 a2 00 ldx #$00 1004 bd 1a 10 lda $101a,x 1007 9d 00 38 sta $3800,x 100a e8 inx 100b d0 f7 bne $1004 100d 88 dey 100e f0 09 beq $1019 1010 ee 06 10 inc $1006 1013 ee 09 10 inc $1009 1016 4c 04 10 jmp $1004 1019 60 rts 101a ...
The branches are relative, but the jmp uses an absolute address which depends on where in memory the program is located. To relocate this code, we must change the operand of the jmp instruction. When relocating a whole number of pages, we only need to change the most significant byte (MSB) of the jmp address, i.e. the $10 byte at $1018. For similar reasons, we need to modify the bytes at $1006, $1012 and $1015. However, we should not change the byte at $1009, because the font should still be copied into $3800.
Note, by the way, that all the changed bytes happen to have the value $10. This is because all of the code fits into one page of memory. If the code were bigger, the font data would be pushed into the next page, and the byte at address $1006 would be $11, but we'd still need to change it.
So far, all of the addressing modes were based on literal 16-bit addresses. But now consider the following code:
Explicit pointer version:
* = $1000 lda #<font ; a0/a1 keep track of the reading position. sta $a0 lda #>font sta $a1 lda #$00 ; a2/a3 keep track of the writing position. sta $a2 lda #$38 sta $a3 ldy #0 ; Y will remain zero throughout the routine. loop lda ($a0),y ; Copy a byte. sta ($a2),y lda $a0 ; Increment the reading pointer. clc adc #$01 sta $a0 lda $a1 adc #$00 sta $a1 lda #$01 ; Increment the writing pointer. clc adc $a2 sta $a2 lda #$00 adc $a3 sta $a3 lda $a0 ; Check if we're done. cmp #<fontend bne loop lda $a1 cmp #>fontend bne loop rts font ; 2 kb of data follows... fontend
This routine is not particularly fast, but at least it's not self-modifying. However, it turns out to be much more tricky to relocate.
This is what the disassembled code looks like:
1000 a9 3d lda #$3d 1002 85 a0 sta $a0 1004 a9 10 lda #$10 1006 85 a1 sta $a1 1008 a9 00 lda #$00 100a 85 a2 sta $a2 100c a9 38 lda #$38 100e 85 a3 sta $a3 1010 a0 00 ldy #$00 1012 b1 a0 lda ($a0),y 1014 91 a2 sta ($a2),y 1016 a5 a0 lda $a0 1018 18 clc 1019 69 01 adc #$01 101b 85 a0 sta $a0 101d a5 a1 lda $a1 101f 69 00 adc #$00 1021 85 a1 sta $a1 1023 a9 01 lda #$01 1025 18 clc 1026 65 a2 adc $a2 1028 85 a2 sta $a2 102a a9 00 lda #$00 102c 65 a3 adc $a3 102e 85 a3 sta $a3 1030 a5 a0 lda $a0 1032 c9 3d cmp #$3d 1034 d0 dc bne l1012 1036 a5 a1 lda $a1 1038 c9 18 cmp #$18 103a d0 d6 bne l1012 103c 60 rts 103d ...
To relocate this, our algorithm needs to be able to figure out that the operand of the second lda (the byte at $1005) should be relocated, because it will be stored into $a1 and later end up as the MSB of an address, as part of an indirect addressing mode. But indirect addressing could just as well be performed on e.g. addresses that have been computed as the sum of a relocatable base address and a non-relocatable offset, as in the subsequent iterations of the loop. Furthermore, the operand of one of the cmp instructions (the $18 byte at $1039) must also be relocated, otherwise the loop won't terminate properly.
This second example also illustrates another peculiarity of 6510 code: Zero-page addresses are not only fast, but also required for indirect addressing. Hence SID tunes tend to use at least a couple of them. On the C64, the four bytes at $fb-$fe are documented as "Free Zero Page space for User Programs", so they get used a lot, increasing the probability of a collision if we try to combine an existing SID tune with some other code.
Thus we also need to be able to relocate zero-page accesses. Like regular pointers, zero-page addresses may be computed and stored into future instruction operands as part of self-modifying code.
So far, we have seen that pointers may appear inside 16-bit operands, but also in separate 8-bit operands spread out all over the code. Finally, pointers may appear in data. SID tunes typically contain tables of pointers to data structures called patterns. These pointers are either stored as regular words in little-endian order, or split into one LSB table and one MSB table.
Observations
The approach taken in sidreloc hinges on the following crucial observations:
SID tunes have a well-defined API: We're supposed to load the tune into memory, call an initialisation routine once, then call a playroutine regularly at some rate (e.g. the video frame rate). Some SID tunes also install a secondary timer interrupt which is called very rapidly in order to play samples ("digis").
All of this can be emulated, and we can make reasonable assumptions about how many calls we need to make to these routines in order to exercise all of the code.
In practice, it is enough to be able to relocate the tune a whole number of pages in either direction. Thus we only need to be concerned with address MSBs.
The MSB of almost every pointer that is computed in a 6510 program is the sum of:
- one or more program bytes (bytes that appear verbatim somewhere in the linked machine code), of which exactly one is the MSB of an address, and
- possibly some offset that is independent of the program location (such as a computed index into a table).
The output of a relocation algorithm is a set of relocation flags, one for each program byte, expressing whether or not the relocation offset (in pages) should be added to this particular byte.
(The same principle applies to zero-page relocation, but some extra book-keeping is needed; check the source code for details.)
It is not possible to write a perfect algorithm that works for every SID tune, because a playroutine could theoretically e.g. read out the current program counter and store it into a critical SID register like $d404. Being able to relocate most SID tunes is good enough.
We can easily check whether the algorithm worked for a particular SID tune by emulating the original code and the relocated code side by side, and comparing the contents of the SID registers after each call to the playroutine.
The last two observations taken together also imply that we don't have to be conservative in what we do; if we're unsure about something, we make an educated guess that is correct in most cases, and later check if it was true for this particular tune.
The algorithm
The trick is to emulate a 6510 processor and a 64 KB memory, but inside every memory cell (including the registers) we not only track the current 8-bit value, but also the list of program bytes which contributed to this value. Whenever an effective address is computed inside the emulated processor, we check whether this address is in the relocation range (e.g. the extent of the program), and if so, we note down the fact that exactly one of these program bytes must be relocated.
When a byte is copied e.g. from memory into a register, not only is the value copied, but the list of contributing program bytes is copied along with it. When a sum is computed (adc instruction), the result is not only the computed sum, but also the concatenation of the lists of program bytes contributing to both operands. When other computations are performed, the resulting value is associated with an empty list of contributors.
Furthermore, every time a comparison is made (using cmp or eor), we note down that the two lists of program bytes contributing to the operands must have equal relocation status: Either they're both independent of the program location, or they're both dependent of the program location. In the latter case this implies that each list must contain exactly one program byte which is relocated.
Having visited all parts of the code, we thus end up with a collection of facts (equations) relating the relocation flags of program bytes to each other. Then we use a constraint solver to find a solution to this set of equations. Flags of unknown status (corresponding to program bytes which do not appear in any equation) are treated as "don't relocate".
Zero-page
For zero-page relocation, we also need to keep track of which program bytes contribute to which zero-page addresses. We need a bitfield for this, since a program byte often points to more than one zero-page variable: The operand of lda ($a0),y contributes to the effective addresses $a0 and $a1.
We still get equations describing lists of program bytes of which exactly one should be relocated.
Once we have a solution to the set of equations, we visit each program byte that was flagged for relocation. For regular pointer MSBs, we add the main relocation offset. For zero-page addresses, we add the relocation offset for those zero-page addresses to which the program byte contributed. For this to work, some zero-page addresses must of course move together.
Example
When sidreloc is run with the verbose option (-v), it prints a map of all the program bytes. The character R indicates that the relocation offset (difference in pages between the old and new code locations) has been added to this byte. The character Z indicates that some zero-page relocation offset has been added to this byte.
Here's the output you get when relocating Zoids to page $42, with a free zero-page region starting at $80. If you are familiar with Hubbard's playroutine, you'll immediately recognise the pattern pointers MSB table (a large stretch of consecutive R bytes) and the subtune tables (one "===RRR" chunk per subtune, consisting of three track pointers). The first third of the file is code, and the rest is mostly data.
Zoids, Rob Hubbard, 1986 Martech, $1000-$1a7f, 3 subtunes Relocating from $1000-$5aff to $4200-$8cff Analysing subtune 1 Analysing subtune 2 Analysing subtune 3 Program map: 1000, 4200: ==R==R==R==R========R====R==R==R==R=====R==R===?==========?====? 1040, 4240: ==R==R====R====R==R==R==R===R==R====R=Z==R=Z==R====R...==R==R=Z= 1080, 4280: ?===?====R==R====R==R==R==R...===R=Z==R=Z=?==R==R=?==R=Z==R==R=? 10c0, 42c0: ==R=?===R=?===?=====R====R==R====Z====R==R==R==R==Z==R====R==R== 1100, 4300: R==R=====R==R=====R==R==R==R==R==R======R==R==R==R=====R=====R== 1140, 4340: ===R=====R=====R==R==R==R==R=Z=?======R==R==R==R==R=?====R====R= 1180, 4380: ?====?========R======R==R==R==R==R==R==R====R=?=?===?==R==R===== 11c0, 43c0: R==R==R==R==R===R==R====R==R==R==R==R==R=?=?====R======R==R==R== 1200, 4400: R==R==R==R==R==R=====R=====R=?====R==R==R==R==R=====R==R====R=?= 1240, 4440: =R=.==R==R=?==R==R====R===R===R=?=?==?====R==R===R==R===R=?=?==? 1280, 4480: ====R==R==R===R======R=====R==R==R===?==R==R=?=====R==R==R=====R 12c0, 44c0: =?==R=====R===R==R==R=====R=?==R=====R=?====R====R====R=?==?==R= 1300, 4500: =R====R==R=====R=?====R====?=====R=?====R=?======R======R=?====R 1340, 4540: =.==R==R=====R=?====R=?====R==?==R==R====R==R==R==R=====R======= 1380, 4580: =R=........................??............????..????????..????... 13c0, 45c0: .????..????????????????????????????????????????????????????????? 1400, 4600: ?????????????????????????????????????????????............??....? 1440, 4640: ?..===???????????????????????????????????..?..?????????????????? 1480, 4680: ???????????????????????????????????????????????????????????????? 14c0, 46c0: ??????????????????????........????????................??????===R 1500, 4700: RR===RRR===RRR===============================RRRRRRRRRRRRRRRRRRR 1540, 4740: RRRRRRRRRRRR================================================?=== 1580, 4780: ================================================================ 15c0, 47c0: ================?=============================================== 1600, 4800: =====================================?==.==.==?==.====.====????? 1640, 4840: ???????????????????????????????????????????????????????????????? 1680, 4880: ???????????????????????????????????????????????????????????????? 16c0, 48c0: ???????????????????????????????????????????????????????????????? 1700, 4900: ???????????????????????????????????????????????????????????????? 1740, 4940: ???????????????????????????????????????????????????????????????? 1780, 4980: ???????????????????????????????????????????????????????????????? 17c0, 49c0: ???????????????????????????????????????????????????????????????? 1800, 4a00: ???????????????????????????????????????????????????????????????? 1840, 4a40: ???????????????????????????????????????????????????????????????? 1880, 4a80: ???????????????????????????????????????????????????????????????? 18c0, 4ac0: ???????????????????????????????????????????????????????????????? 1900, 4b00: ???????????????????????????????????????????????????????????????? 1940, 4b40: ???????????????????????????????????????????????????????????????? 1980, 4b80: ???????????????????????????????????????????????????????????????? 19c0, 4bc0: ???????????????????????????????????????????????????????????????? 1a00, 4c00: ?????????????????????????????????????????????????=====R====R===R 1a40, 4c40: ==R===?===?=============?====?==R==?==R=........................ MSB relocations (R): 233 Zero-page relocations (Z): 9 Static bytes (=): 969 Status undetermined (?): 1349 Unused bytes (.): 128 Old zero-page addresses: fb fc fd fe New zero-page addresses: 80 81 82 83 Verifying relocated subtune 1 Verifying relocated subtune 2 Verifying relocated subtune 3 Bad pitches: 0, 0% Bad pulse widths: 0, 0% Relocation successful. Largest unused region: $4d00-$9fff
Limitations
There are a couple of situations where the algorithm fails.
Unrelated memory
This concerns code which accesses memory completely unrelated to the program location. For instance, some SID tunes decrunch data into $e000-$ffff. The relocated tune works great, but it will also access this memory; hence, it can't coexist with demo effects or other tunes which happen to use the same area of memory for data. Sidreloc prints warnings about out-of-bounds memory accesses.
In most cases, such a situation can be resolved by relocating in multiple steps. For instance, suppose a SID tune loads into pages $10-$1e, and also makes use of RAM at pages $90-$93. You want to relocate this tune so that it only affects memory pages in the $e0-$ff range. This is done in two steps: First, you specify -r 10-1e -p 81 in order to create an intermediate SID file where the code is located close to the data area. Then, you specify -r 81-93 -p e0 to move both code and data to the desired location.
Compressed code
The algorithm assumes that addresses are computed from raw program bytes, as they appear in the linked code. If code is read from a compressed stream of bits, there is no way we can add an offset to a set of program bytes and somehow end up with a compressed bit stream which happens to decrunch to relocated code.
Subtraction
Subtraction (sbc instruction) can be used in a couple of different ways:
Subtract an offset from an address, producing an address.
Subtract two addresses, producing a difference.
Subtract two numbers, producing a difference.
At the time when we encounter the sbc instruction, we do not know which case to pick. The current implementation makes the assumption that case 2 never occurs (the choice among cases 1 and 3 can be postponed), since this behaviour seems to be good enough in practice.
Data table overflow
Most SID tunes contain a table that maps note numbers to frequencies, for somewhere around 96 notes (8 octaves). Occasionally the playroutine ends up using a note number which is too high. This might be used for sound effects where the pitch doesn't really matter, or during periods when the voice is playing at zero volume anyway. However, if an address is stored after the table, the MSB of this address might get picked up and written to a SID register. When the code is relocated, the pitch changes. For this reason, sidreloc will verify a relocated tune even if some pitches are wrong, but the amount of bad pitches will be reported.
A similar problem concerns pulse widths. Some playroutines store pulse width information in an instrument data structure, and if an invalid instrument is used, a pointer MSB might get picked up and used as a pulse width (or pulse width delta). In most cases this doesn't affect the perceived quality of the music, because if the composer desired a specific pulse-width (e.g. a non-modulating square wave), then it would've been very improbable for him or her to accidentally achieve this with an invalid instrument number.
Often, pulse width is treated as an integrator, meaning that a delta value is added in each call to the playroutine. When relocation results in a different delta value, this will make all future pulse width values wrong. Hence, the reported amount of bad pulse width values is often very high, even though the music sounds just like the original.
No playroutine
Some SID tunes take over the entire C64, and never return from the initialisation routine. These tunes tend to use up all CPU cycles and nearly all RAM. It should be possible to apply the relocation algorithm here too, by extending the CPU emulation to also include the CIA timers and the raster position register ($d012), and sample the SID registers at some reasonable cycle-interval. However, that would be of limited practical use, because such extreme SID tunes would be very hard to combine with other code anyway.
Evaluation
A script was used to run sidreloc against all (non-BASIC) tunes in HVSC #56, with all printouts redirected to a log file. Other scripts were then used to compile statistics from the log file.
Hints
If you're trying to relocate a SID tune, and the tool fails with the default settings, here are some things you could try:
If the verification fails with a lot of bad pitches or pulse widths, try the -f option to force sidreloc to output the file anyway. In many cases there is no audible difference.
If you get warnings about memory accesses outside the relocation range, but very close to it, then you should extend the relocation range using the -r option.
Try the -k option if you can live without zero-page relocation. This reduces the number of equations, potentially removing the one which was causing trouble.
If -k works, but you need to relocate some of the zero-page accesses, then you might get away with adjusting the range of free zero-page addresses (using the -z option) so that some of the addresses end up at their original location, while others are moved. In particular, avoid relocating zero-page addresses which could also be interpreted as single-byte opcodes (e.g. $a8 and $c8).
Posted Monday 28-May-2012 08:13
Discuss this page
Disclaimer: I am not responsible for what people (other than myself) write in the forums. Please report any abuse, such as insults, slander, spam and illegal material, and I will take appropriate actions. Don't feed the trolls.
Jag tar inget ansvar för det som skrivs i forumet, förutom mina egna inlägg. Vänligen rapportera alla inlägg som bryter mot reglerna, så ska jag se vad jag kan göra. Som regelbrott räknas till exempel förolämpningar, förtal, spam och olagligt material. Mata inte trålarna.
Mon 28-May-2012 14:42
Mon 4-Jun-2012 17:14
However, I'm trying to relocate a song from $1000 to $c000. It's only like 9 pages but the tool bugs out on me:
sidreloc -p c0 -v tune_at_1000.sid tune_at_c000.sid
blablabla, $1000-$190b, 1 subtunes
Relocating from $1000-$59ff to $c000-$09ff
sidreloc: Neither the source nor the destination relocation range may overlap with the zero-page.
In the source I spotted the snippet below, but I don't get why you always add 64 pages. Some kind of safety zone?
for(i = 0; i < 64 && reloc_end != 0xcfff && reloc_end != 0xffff; i++) {
reloc_end += 0x100;
}
Regards
P-a Bäckström
Mon 4-Jun-2012 17:17
More regards
P-a Bäckström
Tue 5-Jun-2012 20:32
The extended end address poses a few problems though. The extended area isn't allowed to overlap zp so relocating to high addresses (under kernal) usually fails.
Note also that your particular example of Zoids is already relocated in HVSC (as are probably many others). The original player is at $c500.
/tlr
Tue 5-Jun-2012 20:34
/tlr
Fri 26-Oct-2012 23:03
Wed 8-May-2013 18:40
I mean, most of the $1000-$1003 tunes were made in a limited set of music programs, were they not? And $1800 tunes, there is only one culprit, Future Composer, whose format should be well understood, well enough to construct one's own relocatable replay routine (if using Future Composer's own relocation capabilities is not desired). That should cover 90% of tunes out there, should it not?
Wed 8-May-2013 18:44
So maybe a little "shim" routine which would act as the runtime linker for dynamic objects, ld.so.1?
Hmmm, that might be a cool next project for me to implement, hmmmmm...
Tue 24-Nov-2015 02:05
Sun 21-May-2017 02:58
Stranger in a Strange Land. 2800-3985, inic 2848, play 2821, Player routine: Music Assembler (Voice Tracker 4), 4486 byte. sidreloc -p 60 -r 28-39 -k -v, relocation failed, too many mismatching pitches, bad pitches 11635, 4%, zeropage 250-255.
Sun 21-May-2017 03:16
Linus Åkesson
Mon 29-May-2017 14:32
In short, because it compares the register values, not the sound. I haven't looked into this case in particular, but if the tune writes location-dependent values into the pitch registers when a voice has already faded to silence, the above would be the result.
Why would a tune write location-dependent values into the pitch registers? I have seen this happen when e.g. a drum sound contains an out-of-bounds reference past the end of an arpeggio table, reading instead from an unrelated table of pointers that happens to be located after it. As the tune is relocated, the pointers change, and different pitch values get written.
Tue 30-May-2017 15:51
lft wrote:
In short, because it compares the register values, not the sound. I haven't looked into this case in particular, but if the tune writes location-dependent values into the pitch registers when a voice has already faded to silence, the above would be the result.
Why would a tune write location-dependent values into the pitch registers? I have seen this happen when e.g. a drum sound contains an out-of-bounds reference past the end of an arpeggio table, reading instead from an unrelated table of pointers that happens to be located after it. As the tune is relocated, the pointers change, and different pitch values get written.
When do you fix the relocator program?
The program works uncertainly. Error indicate, but no real error, etc.
Tue 30-May-2017 17:06
The program works uncertainly. Error indicate, but no real error, etc.
I don't regard this as an error. It is useful to know when different register values are written, and one can easily override this behaviour with -f, like you did.
That being said, the program is open source, so you can easily amend or modify it yourself.
Thu 18-Apr-2019 11:13
Tue 17-Mar-2020 22:02
Thu 26-Jan-2023 18:59
Someone? Anyone? :) +1
Tue 9-Jul-2024 03:56