Nine Explained
In this video and accompanying article I reveal how the seemingly impossible effect in Nine was pulled off.
The original Commodore 64 demo, in case you missed it, is available here.
And this is how it was done:
Going deeper
In the explanation video I rush past a number of things in order to get to the point quickly. The following is an attempt to clarify and provide more in-depth technical information for interested readers.
At the end I've also included some philosophical musings that didn't make it into the video.
Sprite DMA timing
I mention sprite DMA timing a couple of times, and the need to stabilize it. What does that mean?
DMA stands for Direct Memory Access. On every rasterline, up to eight sprites may be enabled, and the video chip fetches the pixels for these sprites by momentarily pausing the CPU in order to access memory directly. In a word, the VIC chip steals clock cycles from the CPU. Depending on which sprites are active, different cycles are stolen.
A rasterline on a PAL C64 comprises 63 clock cycles in total, and sixteen of them are at risk of being stolen for sprite fetches. They are coloured blue in the following diagram (the numbers inside the boxes indicate Sprite 0 through 7):
data:image/s3,"s3://crabby-images/61543/615433e9fc4f5520f80ceb17cf4b9ba61dcee5ca" alt=""
Hence the number of clock cycles left for the CPU on any given rasterline varies as the sprites move up and down on the screen.
Due to a design quirk, the 6502 CPU cannot be paused instantly: The VIC needs to send a signal three cycles ahead of time, during which the CPU can still execute memory write cycles, but will halt as soon as it tries to read.
Thus, for instance, if Sprites 2 and 6 are enabled, the cycles marked blue in the following diagram are unavailable to the CPU, but the green cycles are available for writing only:
data:image/s3,"s3://crabby-images/56196/56196b8021077cfe12bba3cd0add8631e81ba369" alt=""
Each CPU instruction can involve a combination of read and write cycles, so these write-only cycles can easily throw off carefully timed code. That is why I mention towards the end of the video that there's a special case when write cycles in the rastercode coincide with the start of sprite DMA. But for the following discussion, let's pretend that the CPU is only performing read cycles.
Once the CPU has been paused, the VIC chip has access to the bus for as long as it wants; there's no need for another three-cycle warning signal. In order to fetch pixels for Sprite 0 and Sprite 1, for instance, it is sufficient to steal a total of five cycles:
data:image/s3,"s3://crabby-images/d47c3/d47c33c5c67d5c3c4c60f4e605aa92ff12df9dcf" alt=""
Now suppose Sprite 0 and Sprite 2 are enabled but Sprite 1 isn't. The VIC chip can't release the bus in between the sprite fetches, because there wouldn't be enough time for another three-cycle warning period. The in-between cycles are coloured yellow in the following diagram, where 54 clock cycles per rasterline remain for the CPU:
data:image/s3,"s3://crabby-images/51fc1/51fc1ab8713712bb646a46a79f5f13461f62cb1e" alt=""
In other words, on any particular rasterline where Sprite 0 and Sprite 2 are both enabled, cycles 60 and 61 are stolen whether Sprite 1 is enabled or not. In this way we can have stable timing with a moving sprite, as long as it's flanked by two sprites that are constantly on.
The trick doesn't work for Sprites 0 and 7 because they don't have two neighbours. The best we can achieve is three freely moving sprites, e.g. number 1, 3, and 5, protected by four sprites that are always on, in this case number 0, 2, 4, and 6:
data:image/s3,"s3://crabby-images/56406/56406ee913ba324b80f33935093842b44772f6d7" alt=""
Sprite priorities
One thing that I left out entirely in the explanation video is the issue of sprite priorities. The VIC chip allows you to position sprites in front of or behind the character graphics; in this demo they're always in front. But there's also a fixed priority order between the sprites themselves: Lower-numbered sprites appear in front of higher-numbered sprites.
In the part where the side borders are open, for instance, the four static sprite columns are rendered using the odd-numbered sprites, while the three freely-moving digits are mapped to Sprites 2, 4, and 6. But the freely-moving digits must go in front of the columns! How is that possible?
Note first of all that none of the swirling spirits actually overlap the wizard in this part of the demo. He can safely be rendered using Sprites 1 and 3. The black borders, on the other hand, are assigned to Sprites 5 and 7. Therefore, freely-moving Sprite 6 would go behind one of the black columns—it happens to be the one on the right-hand side—so the multiplexer needs to ensure that this hardware sprite is only used for digits that appear towards the left-hand side of the screen.
In other parts of the demo, as the swirling spirits rotate I have to constantly adjust the mapping between digits and hardware sprites in order to maintain a consistent front-to-back order.
And here's a subtle difficulty worth mentioning: As discussed in the video, when the circle flattens and the sprites end up covering the same rasterlines, I switch to a routine that displays one of the digits using character graphics. But in fact, I have to perform the switch a little bit earlier, when the sprites are still vertically separated. Consider the following situation:
data:image/s3,"s3://crabby-images/1a80d/1a80d0d73d3c7cfe6c14740c4ecff1c80d048fa0" alt=""
Here, the top and bottom digits (9 and 4) do not cover the same rasterlines, so normal multiplexing should work in principle. But the “4” is conceptually closer to the camera than the “2”, and the “2” is closer than the “9”. So we would have to pick one of the middle hardware sprites for the “4” and “9”, one that can have other digits both in front of and behind itself.
As the nearly-flat circle spins around, it's not trivial to assign hardware sprites to digits in a way that preserves the desired priority order, as this involves checking which digits would overlap based on their horizontal position on the screen. I'm pretty sure that a valid mapping can always be found when the circle is this wide, but in the end, switching to the character-based routine earlier was a much cleaner solution.
The crunched sprites
During the upper-border part, four sprites are extended vertically using sprite crunch to stabilize the timing. I've written about advanced sprite crunch before, but this time the use case is more straightforward: The sprites are crunched once, just after they appear on the screen, to upset the internal address counters to an unaligned value. As the sprites are displayed, the counters are incremented by three on each rasterline, but the misalignment makes the counter wrap around twice before reaching the end of the sprite.
This results in triple-height sprites, without any need for costly updates to the sprite Y-position registers during the timing-critical part of the code. However, if the sprites were to be left entirely unsupervised, they would repeat the same graphics three times (staggered by one byte at each wraparound). There is thus a need to update the sprite pointers as we go.
data:image/s3,"s3://crabby-images/d46fc/d46fce5dfc2720653db90c7f347f02d9f0acf517" alt=""
The rastercode of “The Device” is already quite cramped, and there is no chance of updating four sprite pointers separately. Instead we use the fact that the sprite pointers are stored in a part of the video memory normally used for character graphics. The location of this memory is controlled by banking register $d018. By preparing different sets of sprite pointers in different memory banks, we can update all pointers with a single register write.
We still have to squeeze that single write into the cramped rastercode somehow, and the opportunity for this is baked into the idle-graphics pattern. Notice how the pattern doesn't change on every rasterline; some lines are duplicates. Hence there are some lines where we don't need to update the ghostbyte from the previous line—and there we can update the banking register instead.
Specifically, we update the sprite pointers once near the top of the device, and once near the bottom of it. This divides the crunched sprites into three sections, and ensures that a unique row of pixels is fetched for each rasterline. But the idle-graphics pattern moves up and down during the effect, whereas the crunched sprites remain stationary. In other words, the seams—the places where sprite pixels are suddenly fetched from a different buffer—also move.
So as the CPU is preparing pixel data for the crunched sprites, it needs to switch buffers at just the right time depending on the current vertical position of the device. Large stretches of data can be copied as chunks, but between the chunks are sudden changes in target address: Twice for the timed buffer-switches, and twice when the address counter wraps around modulo 64. To make this efficient, I use a precomputed table of chunks for each possible device Y-position. Since the timing is identical for the four sprites, some cycles can be saved by decoding each table entry once, and then performing the copying for all four target buffers in one go.
Horizontal alignment: The nitty-gritty bits
The horizontal scroll position of text characters (and therefore also the ghostbyte) is controlled using three bits in VIC register $d016. Position 0 is far to the left, and position 7 is far to the right. However, when you split the background by updating VIC register $d021 in the middle of a rasterline, the change is actually delayed by one pixel to line up with horizontal scroll position 1.
On the newer version of the VIC chip, the 8565, colour splits are always accompanied by a tiny grey dot (tiny as in half a pixel wide). This is annoying. When splits line up vertically, the dot is naturally extended into a thin vertical stripe. This would completely shatter the illusion of Nine—the grey dots and stripes have to be hidden!
Recall that the background splits are covered by the gaps between digits in the sprite graphics, and it seems reasonable to cover the grey dots in the same way. The gaps in the sprite graphics are aligned with the text characters (ghostbytes), but not necessarily with the processor clock cycles. Seven out of eight times, they cover the grey stripes just fine.
But for horizontal scroll position 0, the delayed colour change and the grey stripe end up just to the right of the covered gap. In this case, we need to perform the split one cycle earlier relative to the sprites. At a first glance, that may seem like it could be handled by jumping to the rastercode one cycle earlier.
However, as you'll recall from the explanation video, there's more going on in the timed code than just background splits. We have to switch video mode at just the right moment to effect a ghostbyte change that's aligned with the left edge of the first digit. That part of the code can't run a cycle earlier, or we'd expose the ghostbyte pattern a whole character too early.
Therefore we need two versions of the timed code, with a one-cycle difference in the relative timing of the background splits and the change of video mode. When the device is at the troublesome horizontal scroll position, we jump to the alternative version of the code.
But there's more: Different versions of the VIC chip have a one-cycle difference in the pixel pipeline delay. Therefore, we need to detect what version of the VIC chip is used (this is done at the start of the demo with the help of sprite collision detection), and adjust the relative timing by one more cycle as the case may be. Again this calls for two versions of the code.
These single-cycle adjustments add up or cancel out, however you want to put it: We end up needing a total of three versions of the timed code. Here are the relevant passages side by side:
; X is $70 nop ; 13 bit 0 ; 13 bit 0 ; 13 sta $d011 ; 15 sta $d011 ; 16 sta $d011-$70,x ; 16 nop ; 19 nop ; 20 sta $d021-$70,x ; 21 sta $d021 ; 22 sta $d021-$70,x ; 21 sty $d021 ; 26 sty $d021 ; 26 sty $d021 ; 26 stx $d021 ; 30 stx $d021 ; 30 stx $d021 ; 30 stx $d011 ; 34 stx $d011 ; 34 stx $d011 ; 34
Astute readers may have noticed that I'm using one of the invalid video modes ($70) rather than plain ECM ($50) in the above code, despite the claim in the video that invalid video modes are not the answer. Mode $70 behaves like ECM as far as the ghost byte is concerned, and that's crucial to making the effect work. But when we select a horizontal scroll position other than zero, a gap appears along the left edge of the screen—before the first ghostbyte—where the current background colour shines through, and depending on the current X-position of the device, the background colour may have already been set to the colour of the first digit. The invalid video mode serves to cover that gap.
Related work
I've describe how a combination of timed code, expanded sprites, and idle graphics are at the heart of the impossible-looking effect. This relies in part on an established technique called ghostbyte shine-through that has been used for scrolltexts as far back as 1992. Trident has written an excellent document about it.
Ghostbyte shine-through has been refined and developed over the years, notably in the 2024 demo The Ghost by Fairlight and Genesis Project (YouTube).
Final musings
In my opinion, good magic subverts Occam's razor: We all like simple explanations, and Occam's razor is the principle that when you have two competing explanations for the same phenomenon, the simpler explanation is objectively better. The canonical example is that the Earth revolves around the sun instead of the other way.
But this is only true when the two theories have equal explanatory power. If a simple theory only explains 99% of what you see, then it is not as good as a complicated theory that covers everything. But we are still drawn to the simple theory.
So the art of magic (and impossible demo-effect making) is to create a complicated reality that's easier to categorise as a simple thing with a bit of magic, than a complicated thing without magic. Our mind will cling to the simple theory as a working hypothesis and refuse to look beyond it, and that makes us vulnerable to deception—which, in magic, is a good thing!
Posted Friday 28-Feb-2025 09:10
Discuss this page
Disclaimer: I am not responsible for what people (other than myself) write in the forums. Please report any abuse, such as insults, slander, spam and illegal material, and I will take appropriate actions. Don't feed the trolls.
Jag tar inget ansvar för det som skrivs i forumet, förutom mina egna inlägg. Vänligen rapportera alla inlägg som bryter mot reglerna, så ska jag se vad jag kan göra. Som regelbrott räknas till exempel förolämpningar, förtal, spam och olagligt material. Mata inte trålarna.
Fri 28-Feb-2025 09:47
Fri 28-Feb-2025 18:57