Music downloads
Scene productions
Chip music

Safe VSP

I contributed this one-filer to the C64 Demo compo at Datastorm 2013. It ended up on 7th place, which I consider quite good for a technical proof of concept.

Screenshot

One of the tricks you can do on the C64 involves manipulating the video chip into reading the graphics data at an offset from where it's usually located. This allows you to scroll the display horizontally, and the trick is called VSP for Variable Screen Position. However, some machines crash when you attempt this, and the reason for that has always been a mystery. Not anymore.

Some say this forum thread reads like a thriller. Zer0-X managed to capture a VSP crash using a logic analyser and posted 15 MB of data. A year later I started looking into it and discovered the root cause. The proposed workaround is not very practical, but it supports my hypothesis, because people have tried it on their crash prone machines and so far it hasn't crashed.

A technical explanation appears in the demo as a 10-minute scroller, but the same text is provided below, for your convenience.

However, I found this an excellent opportunity to compose a 10-minute SID epic heavily inspired by Martin Galway's Parallax. In particular, I borrowed its musical structure that I affectionately think of as starter; main course; dessert — something a bit weird, followed by something substantial and straight-forward, followed by a sweet melodic part. The three courses are quite distinct, but complete each other. My tune is called Sideways in reference to both Parallax and the VSP trick.

Safe VSP has a csdb page and a pouët page, and was featured on Hacker News.

Technical lowdown:

The dreaded VSP crash is caused by a metastability condition in the DRAM. Some have speculated that it has to do with refresh cycles, but hopefully the detailed explanation in this scroller will crush that myth once and for all.

But first, this is what the machine behaves like from a programmer's point of view. Let us call memory locations ending in 7 or f fragile. Sometimes when VSP is performed, several fragile memory cells are randomly corrupted according to the following rule: Each bit in a fragile memory cell might be changed into the corresponding bit of another fragile cell within the same page.

This specific behaviour can be exploited in several ways: One approach is to ensure that every fragile byte in a page is identical. If the page contains code, for instance, corruption is avoided if all the fragile bytes are $ea (nop). Similarly, in font definitions, the bottom line of each character could be blank.

Another technique is to simply avoid all fragile memory locations. The undocumented opcode $80 (nop immediate) can be used to skip them. Data structures can be designed to have gaps in the critical places.

This latter technique is used in this demo, including the music player of course. Data that cannot have gaps, i.e. graphics, is continuously restored from safe copies elsewhere in memory. You can use shift lock to disable this repair, and eventually you should see garbage accumulating on the screen. And yet the code will keep running.

Thus, for the first time, the VSP crash has been tamed.

Now for the explanation. The C64 accesses memory twice in every clock cycle. Each memory access begins with the LSB of the address (also known as the row address) being placed on an internal bus connected to the DRAM chips. As soon as the row address is stable, the row address strobe (RAS) signal is given. Each DRAM chip now latches the row address into a register, and this register controls a multiplexer which connects the selected memory row to a set of wires called sense lines. Each sense line connects to a single bit of memory.

The sense lines have been precharged to a voltage in between logical zero and logical one. The charge stored in the memory cell affects the sense line towards a slightly lower or higher voltage depending on the bit value. A feedback amplifier senses the voltage difference and exaggerates it, so that the sense line reaches the proper voltage representing either zero or one. Because the memory cell is connected (through the multiplexer) to the sense line, the amplified charge will also flow back and refresh the memory cell. Hence, a memory row is refreshed whenever it is opened.

VSP is achieved by triggering a badline condition during idle mode in the visible part of a rasterline. When this happens, the VIC chip gets confused about what memory address to access during the half-cycle following the write to $d011. It sets the internal bus lines to 11111111 in preparation for an idle fetch, but suddenly changes its mind and tries to read from an address with an LSB of 00000111.

Now, since electrical lines can't change voltage instantaneously, there is a brief moment of time when each of the changing bits (bit 3 through 7) is neither a valid one nor a valid zero. But because the VIC chip changes the address at an abnormal time, there is now a risk that the RAS signal, which is generated independently by another part of the VIC chip, is sent while one or more bus lines is within the undefined voltage range.

When an undefined voltage is latched into a register, the register enters a metastable state, which means that its output will flicker rapidly between zero and one several times before settling. This has catastrophic consequences for a DRAM: The row multiplexer will connect several different memory rows, one at a time, to the same sense lines. But as soon as some charge has moved from a memory cell to the sense line, the amplifier will pull it all the way to a one or a zero. If, at this point, another memory row is connected, then the charge will travel from the sense line into this other memory cell. In short, one memory cell gets refreshed with the bit value of a different memory cell.

Note that because the bus lines change from $ff to $07, only memory rows with an address ending in three ones are at risk of being opened simultaneously. This explains why corruption can only occur in memory locations ending in 7 or f.

Finally, this phenomenon hinges on the exact timing of the RAS signal at the nanosecond level, and on many machines the critical situation simply doesn't occur. The timing (and thus the probability of a crash) depends on factors such as temperature, VIC revision, parasitic capacitance and resistance of the traces on the motherboard, power supply ripple and interference with other parts of the machine such as the phase of the colour carrier with respect to the dotclock. The latter is assigned randomly at power-on, by the way, which could be the reason why a power-cycle sometimes helps.

This is lft signing off.

Posted Wednesday 20-Mar-2013 23:23

Discuss this page

Disclaimer: I am not responsible for what people (other than myself) write in the forums. Please report any abuse, such as insults, slander, spam and illegal material, and I will take appropriate actions. Don't feed the trolls.

Jag tar inget ansvar för det som skrivs i forumet, förutom mina egna inlägg. Vänligen rapportera alla inlägg som bryter mot reglerna, så ska jag se vad jag kan göra. Som regelbrott räknas till exempel förolämpningar, förtal, spam och olagligt material. Mata inte trålarna.

Anonymous
Sun 16-Jun-2013 12:16
Bravo Linus! You're such a genius, please don't stop to discover! Greetings from France!
Anonymous
Sat 24-Aug-2013 07:14
It would seem like you could "fix" this (or at least, push the machine so that it didn't happen any more) by putting a few extra pF of capacitance on pin 18 of the VIC.
Anonymous
Sun 25-Jan-2015 01:14
Wonderful tune, it is meditative, yet it rocks. Someone remix it into a full hour song please.
Anonymous
Tue 24-Jan-2017 16:22
Can any byte in ram matching the $0007 pattern be changed?
I guess what is banked out would not suffer?
Anonymous
Sat 18-Feb-2017 19:18
Hi,
Please could you explain why you can say this:
"the phase of the colour carrier with respect to the dotclock. The latter is assigned randomly at power-on".
I believe the MOS-8701 works with fixed delays to generate the dot clock, so it should come up with always the same phase with respect to the color clock.
On the boards with no MOS-8701, there's a classic PLL circuit that should come up with always the same phase relationship too.
What am I missing? Thanks (iz8dwf at amsat dot org)
lft
Linus Åkesson
Thu 23-Feb-2017 18:16
Please could you explain why you can say this:
"the phase of the colour carrier with respect to the dotclock. The latter is assigned randomly at power-on".
I believe the MOS-8701 works with fixed delays to generate the dot clock, so it should come up with always the same phase with respect to the color clock.

Hi!

The MOS-8701 produces a 7.88 MHz dotclock and a 4.43 MHz colour carrier from the same internal signal. Their ratio is exactly 16:9, which means that 16 hi-res pixels on the screen correspond to nine complete cycles of the colour signal.

It follows that for each hi-res pixel, the phase of the colour carrier is advanced by 9/16 revolutions, which is 202.5 degrees. If it starts at 0 degrees, then after 8 hi-res pixels it will be at 180 degrees. Hence the familiar red/green vertical banding that repeats after 16 pixels; red and green are 180 degrees apart in YUV.

Now, at which pixel is the colour carrier at 0 degrees?

This depends on the timing relationship between the 8701 and the VIC chip. The 8701 has a reset pin, but it doesn't seem to be connected in the C64. The VIC doesn't even have a reset pin.

So, during power-on, there will be a brief period before the 8701 is outputting a stable signal, and during this period the VIC state machine may or may not respond properly. Meanwhile, the internal clock-divide counters of the 8701 might not even start from zero.

That is where the random assignment happens.
lft
Linus Åkesson
Thu 23-Feb-2017 18:18
Can any byte in ram matching the $0007 pattern be changed?
I guess what is banked out would not suffer?

Yes, and often several bytes at the same time.

Banking doesn't affect anything, I'm afraid, because the Row Select procedure (LSB) is carried out regardless of what the MSB will be. The corruption happens inside the RAM chips themselves.