Scene productions
Chip music

A 9-bit pitch technique

This article describes a new approach to representing pitch in C64 playroutines. The intended audience is people with a general understanding of C64 coding and SID playroutine design, although readers outside this somewhat narrow demographic may still find the article interesting.

Background

The SID chip in the C64 has, for each of the three voices, a 16-bit register that controls the oscillator frequency. This 16-bit value gets added into a 24-bit accumulator on every clock cycle, and the top bits of the accumulator are used as an index into the current waveform.

Traditionally, for speed and simplicity, playroutines work with integer notes, for instance in the range 0-95. A table is then used to translate each note into its corresponding 16-bit frequency value.

Such a lookup table can be prepared as follows, here with A440 at note number 60:

#define SID_CLOCK_HZ 985248

int freq[96];

void compute_freq() {
        int i;
	double semitone = pow(2, 1.0/12);

        for(i = 0; i < 96; i++) {
                freq[i] = round(440 * pow(semitone, i - 60) / SID_CLOCK_HZ * 0x1000000);
        }
}

After the lookup, effects such as vibrato and glide are computed by adding a 16-bit offset to the frequency value. This approach is problematic, because the glide or vibrato will then affect the perceived pitch differently in different parts of the frequency range. Composers have to compensate for this, e.g. by selecting a greater vibrato depth for high-pitched notes.

In particular, suppose you have a sound patch—or instrument—that involves a rapid arpeggio between two notes that are an octave apart. You are currently playing a sustained sound that arpeggiates between C-4 and C-5, and you wish to glide an octave down, so as to end up with a sound that arpeggiates between C-3 and C-4. This will not work using the above approach, because the same glide offset would have to be added to (or subtracted from, as it were) the frequency of each note; once you reach the point where the upper note has become a C-4, the lower note will be at F#3.

To get rid of these problems, we could instead perform all computations on linear pitch values, perhaps using 8.8 fixed-point math. In this model, effects are applied to the 16-bit pitch value, and in a final step the pitch is converted to the corresponding frequency by looking up—in the same table as above—the values corresponding to the closest note below and above the desired pitch, and then performing a linear interpolation between the two frequencies based on the fractional part of the pitch. This creates a piece-wise linear approximation of an exponential curve, which coincides with the ideal curve at the semitones.

However, since the CPU used in the C64 lacks a multiplication instruction, the interpolation step described above is costly in terms of clock cycles.

This article describes a novel way of working with pitch values, that is highly efficient in practice.

Pitch as a 9-bit quantity

Let me start with a somewhat controversial proposition:

Two bits of fractional pitch are sufficient.

That is to say, we can represent the usual range of 96 semitones with a fixed-point number in 7.2-bit form or, equivalently, a 9-bit integer expressing the pitch in microtones, where four microtones correspond to one semitone.

The claim is controversial, because if we listen closely to a slow glide over a large interval, let's say an octave, the quantisation is audible. But in my experience, enough precision is retained for glide and vibrato effects to sound good. You can evaluate this yourself, because the technique is used in Scene Spirit. Hopefully you'll agree that the glides and vibratos sound quite normal.

The next step is to generate a large table that supplies a frequency value for each of the 96 * 4 = 384 microtones. Values from this table are then copied verbatim into the SID registers.

The code for precomputing the table might look like this:

#define SID_CLOCK_HZ 985248

int freq[96 * 4];

void compute_freq() {
        int i;
        double microtone = pow(2, 1.0/48);

        for(i = 0; i < 96 * 4; i++) {
                freq[i] = round(440 * pow(microtone, i - 60 * 4) / SID_CLOCK_HZ * 0x1000000);
        }
}

But wait a minute. How can 9-bit math be efficient on an 8-bit processor?

Here is another proposition:

Pitch may be conveniently represented as a sum.

This is less controversial. For instance, a playroutine might compute pitch as the sum of the current note, track transpose, instrument transpose, detune, arpeggio, virbato and glide. We can group these terms into two 8-bit quantities, which are added together at the very last step to produce a 9-bit sum.

Each subsum is of course limited to an 8-bit range (64 semitones). The question is then, how can we expose this constraint in a natural way to the composer?

A table of pitch offsets

In the playroutine and tracker that I developed for Scene Spirit, there is a notion of a current instrument and a current arpeggio for each voice. The instrument is responsible for maintaining the current base pitch, which is one of these 8-bit quantities. For leads, it is computed relative to the note being played, and for e.g. some drums it is rapidly decremented from a high to a low value, producing a glide effect. Independently, the current arpeggio provides the offset pitch, which is the other 8-bit quantity. The offset pitch is looked up in a global offset table based on the current arpeggio position for the voice.

The offset table is quite versatile: The default arpeggio consists of a fixed value of $70. This effectively selects a window of pitches (semitones 28-91) that are available for use by tracks and instruments. Whenever the bass instrument plays, an arpeggio with a fixed value of $10 is selected instead, moving the window two octaves down, effectively reconfiguring the voice to use another set of pitches (semitones 4-67).

To create a major chord, we would place the following sequence of offsets in the table: $70, $5c, $50, $40. The playroutine would then move the current arpeggio position through each of these table entries in turn.

And since we are working with microtones, a vibrato can be regarded as a special kind of arpeggio. The offset sequence would then be something like $70, $71, $72, $72, $71, $70, $6f, $6e, $6e, $6f.

Finally, the playroutine supports one-shot arpeggios (meaning that they stop at the final value, rather than repeat), so we can also use the offset table to create short glides up to a target note. For instance, to approach a note from one semitone below, we'd just select a one-shot arpeggio with the values $6c, $6d, $6e, $6f, $70.

Converting from pitch to frequency

The summing of the two 8-bit quantities is performed very efficiently as part of the final table lookup, shown here for voice 1:

voice1_arppos = * + 1
        ldx     arptable
voice1_base1 = * + 1
        lda     freq_lsb,x
        sta     $d400
voice1_base2 = * + 1
        lda     freq_msb,x
        sta     $d401

The base pitch is kept in the least significant byte of the instruction operand of each lda instruction, so it must be written twice whenever it changes. The current arpeggio position is kept in the least significant byte of the ldx operand. All three tables must be page-aligned.

Compared to the interpolation-based technique outlined above, this way of computing a frequency value from a linear pitch is extremely fast, with a worst case execution time of 22 clock cycles. It is used in Scene Spirit, along with many other tricks, to achieve a total worst case time of 14 rasterlines for the entire playroutine.

Of course, the microtone frequency table is quite big. With 384 entries of 16 bits each, it occupies a full three pages (with an inconvenient half-page gap in the middle). This should be compared to the traditional 96-entry table, which occupies 3/4 of a page. Is there any way around this?

Folding the table

As a first step, consider what happens if we skip every other entry in the frequency table. We also split the conversion procedure into two cases, based on the least significant bit of the 9-bit pitch value. When the least significant bit is zero, we perform a regular lookup based on the remaining eight bits. If it is one, we compute the average of the two table entries surrounding our desired pitch. This is effectively linear interpolation with a single-bit parameter.

Here is a straightforward way of doing it:

        lda     base_pitch
        clc
        adc     offset_pitch
        ror
        tax
        bcc     fractional_0

fractional_1
        lda     freq_lsb,x
        clc
        adc     freq_lsb+1,x
        sta     temp
        lda     freq_msb,x
        adc     freq_msb+1,x
        ror
        sta     $d401
        lda     temp
        ror
        sta     $d400
        jmp     done

fractional_0
        lda     freq_lsb,x
        sta     $d400
        lda     freq_msb,x
        sta     $d401
done

But we can do better. Notice how we are computing the average as:

(freq[i] + freq[i + 1]) / 2

Analytically, this is equivalent to:

freq[i] / 2 + freq[i + 1] / 2

Sure, due to rounding we lose one bit of precision, but this is negligible compared to the error we are introducing by making a piece-wise linear approximation in the first place.

Next, observe that the halved values we want are actually stored in the table, exactly one octave (24 entries) earlier!

freq[i - 24] + freq[i + 1 - 24]

In practice, we have to extend the table with an extra octave towards the bottom, and adjust our lookups accordingly. Then we get:

        lda     base_pitch
        clc
        adc     offset_pitch
        ror
        tax
        bcc     fractional_0

fractional_1
        lda     freq_lsb,x
        clc
        adc     freq_lsb+1,x
        sta     $d400
        lda     freq_msb,x
        adc     freq_msb+1,x
        sta     $d401
        jmp     done

fractional_0
        lda     freq_lsb+24,x
        sta     $d400
        lda     freq_msb+24,x
        sta     $d401
done

Finally, we can extend this trick to consider both of the fractional bits. The linear interpolation is a weighted sum according to the following table:

BitsFirst weightSecond weight
00100%0%
0175%25%
1050%50%
1125%75%

We have seen that we can scale a frequency by 50% by going back one octave. Clearly, we can scale by 25% by going back two octaves. But what about 75%?

From music theory we know that a perfect fifth corresponds to a frequency ratio of 3:2 = 1.5. In equal temperament, this is then approximated by seven semitones, with an actual ratio of 1.4983. But this means that if we go seven semitones up, and then one octave down (for a net movement of five entries down the table), we should obtain a frequency that is very close to 3:4, or 75%, of the original entry.

The code grows a bit, because we now have four cases. The table also needs to be extended with a further octave to accomodate the 25% lookup. But essentially, we're back to the traditional frequency table with one entry per semitone. Here's the final routine:

        lda     base_pitch
        clc
        adc     offset_pitch
        ror
        bcc     fractional_x0

fractional_x1
        lsr
        tax
        bcc     fractional_01

fractional_11
        lda     freq_lsb,x
        clc
        adc     freq_lsb+19+1,x
        sta     $d400
        lda     freq_msb,x
        adc     freq_msb+19+1,x
        sta     $d401
        jmp     done

fractional_01
        lda     freq_lsb+19,x
        ;clc
        adc     freq_lsb+1,x
        sta     $d400
        lda     freq_msb+19,x
        adc     freq_msb+1,x
        sta     $d401
        jmp     done

fractional_x0
        lsr
        tax
        bcc     fractional_00

fractional_10
        lda     freq_lsb+12,x
        clc
        adc     freq_lsb+12+1,x
        sta     $d400
        lda     freq_msb+12,x
        adc     freq_msb+12+1,x
        sta     $d401
        jmp     done

fractional_00
        lda     freq_lsb+24,x
        sta     $d400
        lda     freq_msb+24,x
        sta     $d401
done

And here's the code for generating the table. Note that we have to include an extra entry at the end, to support interpolation at the top of the range. With a total of 121 entries, this table occupies slightly less than one page of RAM.

#define SID_CLOCK_HZ 985248

int freq[24+96];

void compute_freq() {
        int i;
	double semitone = pow(2, 1.0/12);

        for(i = 0; i <= 24 + 96; i++) {
                freq[i] = round(440 * pow(semitone, i - 24 - 60) / SID_CLOCK_HZ * 0x1000000);
        }
}

The worst case execution time for the above routine, if we pretend that base_pitch and offset_pitch are encoded in immediate operands like in the original code snippet, is 46 cycles.

Given that we have three voices, the net change in worst case execution time is (46 - 22) * 3 = 72 cycles, a little over one rasterline. But that's a fair price for replacing a three-page table with one that fits in a single page.

Example

This C64 program (source) performs a glide across the full 96-semitone range using the code above, at a rate of one microtone per video frame. Note how the perceived glide rate remains constant.

Concluding remarks

In this article, we have seen that it is feasible to work with 9-bit linear pitch values in a C64 playroutine, by representing them as two 8-bit values that are added together in a final step, right before the conversion to a 16-bit frequency.

We have seen how the final conversion could be made very quick with the help of a large table, and how this table can be reduced to a more typical size at the cost of one additional rasterline of execution time. By considering both of these implementations, the playroutine coder is free to decide whether maximum speed or minimum size is preferable for a given application. Crucially, this decision may be postponed until after a piece of music has been composed, when the total rastertime and song size are known.

One could even write a playroutine in which the code for the final conversion step may be dynamically loaded and switched while a song is playing. This would allow the same song to accompany both memory-hungry and cycle-hungry parts in the same trackmo, even if the exact time of each transition is decided at runtime.

Posted Tuesday 31-Mar-2015 00:44

Discuss this page

Disclaimer: I am not responsible for what people (other than myself) write in the forums. Please report any abuse, such as insults, slander, spam and illegal material, and I will take appropriate actions. Don't feed the trolls.

Jag tar inget ansvar för det som skrivs i forumet, förutom mina egna inlägg. Vänligen rapportera alla inlägg som bryter mot reglerna, så ska jag se vad jag kan göra. Som regelbrott räknas till exempel förolämpningar, förtal, spam och olagligt material. Mata inte trålarna.

Anonymous
Wed 1-Apr-2015 07:35
So you're discarding some of the pitch information to get more pitch range? jackychanmindexplosion.jpg
lft
Linus Åkesson
Wed 1-Apr-2015 10:00
No, the total range is the same. I'm discarding some pitch information in order to be able to work with linear pitch values with high performance.
jaymzjulian
jaymz julian
Mon 13-Apr-2015 09:35
I did actually implement a player which used 9bit pitch and linear slide/vibrato a few years ago, using an single octave table at c-4, and asl/lsr to get to the target octave - but a maximum of 4 rotations per note, with a hardcoded set of shifts and a jump table to pick the octave. The worst case was around the same for the shift - around 28 cycles/channel. Of course, I then fucked it up by using volume envelopes with multiplies, and channel multiplexing, and blowing out the player to around 80 rasterlines, however.