How does it work?
The SID theme search engine consists of a large database of themes. The database was automatically generated from the HVSC (this took several days).
This is the general idea of the melody extraction program: A 6510 emulator is invoked once for each SID, subtune and voice. The init routine is called with the right subtune number, and then the play routine is called 70000 times (allowing a maximum song length of approximately 23 minutes for a 50Hz SID). If the play routine won't return after 200000 CPU cycles, the emulation stops. Every time the play routine returns, the control register of the current voice is examined. If the gate bit was turned off during this call to the play routine, the 16-bit value in the frequency control register is converted into a note name, and emitted.
This way we obtain three sequences of note names for each SID and subtune. But this database turns out to be huge (207 MB). Therefore, each of the note sequences is scanned for repetition. If the SID appears to restart (from the very beginning, or from some other place), then all iterations except the first one are removed from the database. Additionally, if a voice doesn't play more than 4 notes (e.g. if we're looking at a sound effect subtune), then those 4 notes are removed from the database -- it's not like anyone's going to try to find them. =) After these optimizations, the database is about 32 MB large.
Note that my emulation environment isn't perfect; the illegal opcodes aren't implemented (that's the easy part), and several SIDs don't work at all for various, mysterious reasons. The database would have been a couple of megs bigger if all SIDs had been correctly emulated.
Each search query is transposed into all twelve keys and converted to a regular expression. The actual pattern matching is then handled by a Postgres server.
Here are some general tips and tricks for searching the SID database:
- A glissando is usually considered to be one note, because the emulator is triggered every time the gate bit is reset. Try searching for the first note in the glissando, or the last one. Wildcards will also work, of course.
- Try removing the first note. ("defgfgeedefefg" won't match the Parallax melody, for instance, but "efgfgeedefefg" will. Interesting...)
- Sometimes a melody is split into two voices, each voice playing every other note. Try searching for every other note in your theme. (For instance, you might consider "daeafaeadagafaead" to be the initial theme of Giana Sisters, subtune 5, but it will only match a couple of remixes, and not Hülsbeck's SID. "defedgfed", on the other hand, will find it.)
- Arpeggios may match if you're lucky, but in most cases they don't. The apparent melody of Spellbound, for instance, is too deeply embedded in the arpeggios to be searchable.