Navigation
Home & news
Random page
All pages

Databases
Fortune cookies
SID themes

Page collections
Blag
Chip music
Chipophone
Games
Hardware projects
Music downloads
Obfuscated programming
Piano music
Sane programming
Scene productions
SID related pages
Software downloads
Underhanded code
Video clips

Featured pages
15 Years of Scene Spirit
Å-machine
A Mind Is Born
Autosokoban
AVR Programmer 64
Bach Forever
BASIC Music
Blackbird
Boléro
C64 Theremin
Chipophone
Commodordion
Computers
Craft
Craverly Heights in Dialog
Dial-a-SID
Dialog
Elements of Chip Music
Faking Fissile Material
Fratres
Front-panel booting
GCR decoding on the fly
Guitar Slinger
Hanlon Fugue
Hardsync
Impossible Bottle
Kaleidoscopico
Kernighan's lever
Lunatico
Machine Yearning
MISC
Music For Microcontrollers
Nine
Padme
Parallelogram
Partita Prelude
Paulimba
Perpetual Fragility
Poems for bugs
Quondam Tunneling
Qweremin
Qwertuoso
Safe VSP
Sidreloc
Sommarfågel
Spindle v3
TTY demystified
Vocalise
Watch Room
We learn the nibbles
Wings I've lost in dreams
Withering Bytes
Wood Chip 2025

Fund my projects
“Shut up & take my money!”
Patreon
Steady

Don't miss
A 9-bit pitch technique

Forum
Register
Log in
Latest comments

Syndication
RSS feed

Nine Explained

In this video and accompanying article I reveal how the seemingly impossible effect in Nine was pulled off.

The original Commodore 64 demo, in case you missed it, is available here.

And this is how it was done:

Going deeper

In the explanation video I rush past a number of things in order to get to the point quickly. The following is an attempt to clarify and provide more in-depth technical information for interested readers.

At the end I've also included some philosophical musings that didn't make it into the video.

Sprite DMA timing

I mention sprite DMA timing a couple of times, and the need to stabilize it. What does that mean?

DMA stands for Direct Memory Access. On every rasterline, up to eight sprites may be enabled, and the video chip fetches the pixels for these sprites by momentarily pausing the CPU in order to access memory directly. In a word, the VIC chip steals clock cycles from the CPU. Depending on which sprites are active, different cycles are stolen.

A rasterline on a PAL C64 comprises 63 clock cycles in total, and sixteen of them are at risk of being stolen for sprite fetches. They are coloured blue in the following diagram (the numbers inside the boxes indicate Sprite 0 through 7):

Hence the number of clock cycles left for the CPU on any given rasterline varies as the sprites move up and down on the screen.

Due to a design quirk, the 6502 CPU cannot be paused instantly: The VIC needs to send a signal three cycles ahead of time, during which the CPU can still execute memory write cycles, but will halt as soon as it tries to read.

Thus, for instance, if Sprites 2 and 6 are enabled, the cycles marked blue in the following diagram are unavailable to the CPU, but the green cycles are available for writing only:

Each CPU instruction can involve a combination of read and write cycles, so these write-only cycles can easily throw off carefully timed code. That is why I mention towards the end of the video that there's a special case when write cycles in the rastercode coincide with the start of sprite DMA. But for the following discussion, let's pretend that the CPU is only performing read cycles.

Once the CPU has been paused, the VIC chip has access to the bus for as long as it wants; there's no need for another three-cycle warning signal. In order to fetch pixels for Sprite 0 and Sprite 1, for instance, it is sufficient to steal a total of seven cycles:

Now suppose Sprite 0 and Sprite 2 are enabled but Sprite 1 isn't. The VIC chip can't release the bus in between the sprite fetches, because there wouldn't be enough time for another three-cycle warning period. The in-between cycles are coloured yellow in the following diagram, where 54 clock cycles per rasterline remain for the CPU:

In other words, on any particular rasterline where Sprite 0 and Sprite 2 are both enabled, cycles 60 and 61 are stolen whether Sprite 1 is enabled or not. In this way we can have stable timing with a moving sprite, as long as it's flanked by two sprites that are constantly on.

The trick doesn't work for Sprites 0 and 7 because they don't have two neighbours. The best we can achieve is three freely moving sprites, e.g. number 1, 3, and 5, protected by four sprites that are always on, in this case number 0, 2, 4, and 6:

Sprite priorities

One thing that I left out entirely in the explanation video is the issue of sprite priorities. The VIC chip allows you to position sprites in front of or behind the character graphics; in this demo they're always in front. But there's also a fixed priority order between the sprites themselves: Lower-numbered sprites appear in front of higher-numbered sprites.

In the part where the side borders are open, for instance, the four static sprite columns are rendered using the odd-numbered sprites, while the three freely-moving digits are mapped to Sprites 2, 4, and 6. But the freely-moving digits must go in front of the columns! How is that possible?

Note first of all that none of the swirling spirits actually overlap the wizard in this part of the demo. He can safely be rendered using Sprites 1 and 3. The black borders, on the other hand, are assigned to Sprites 5 and 7. Therefore, freely-moving Sprite 6 would go behind one of the black columns—it happens to be the one on the right-hand side—so the multiplexer needs to ensure that this hardware sprite is only used for digits that appear towards the left-hand side of the screen.

In other parts of the demo, as the swirling spirits rotate I have to constantly adjust the mapping between digits and hardware sprites in order to maintain a consistent front-to-back order.

And here's a subtle difficulty worth mentioning: As discussed in the video, when the circle flattens and the sprites end up covering the same rasterlines, I switch to a routine that displays one of the digits using character graphics. But in fact, I have to perform the switch a little bit earlier, when the sprites are still vertically separated. Consider the following situation:

Here, the top and bottom digits (9 and 4) do not cover the same rasterlines, so normal multiplexing should work in principle. But the “4” is conceptually closer to the camera than the “2”, and the “2” is closer than the “9”. So we would have to pick one of the middle hardware sprites for the “4” and “9”, one that can have other digits both in front of and behind itself.

As the nearly-flat circle spins around, it's not trivial to assign hardware sprites to digits in a way that preserves the desired priority order, as this involves checking which digits would overlap based on their horizontal position on the screen. I'm pretty sure that a valid mapping can always be found when the circle is this wide, but in the end, switching to the character-based routine earlier was a much cleaner solution.

The crunched sprites

During the upper-border part, four sprites are extended vertically using sprite crunch to stabilize the timing. I've written about advanced sprite crunch before, but this time the use case is more straightforward: The sprites are crunched once, just after they appear on the screen, to upset the internal address counters to an unaligned value. As the sprites are displayed, the counters are incremented by three on each rasterline, but the misalignment makes the counter wrap around twice before reaching the end of the sprite.

This results in triple-height sprites, without any need for costly updates to the sprite Y-position registers during the timing-critical part of the code. However, if the sprites were to be left entirely unsupervised, they would repeat the same graphics three times (staggered by one byte at each wraparound). There is thus a need to update the sprite pointers as we go.

The rastercode of “The Device” is already quite cramped, and there is no chance of updating four sprite pointers separately. Instead we use the fact that the sprite pointers are stored in a part of the video memory normally used for character graphics. The location of this memory is controlled by banking register $d018. By preparing different sets of sprite pointers in different memory banks, we can update all pointers with a single register write.

We still have to squeeze that single write into the cramped rastercode somehow, and the opportunity for this is baked into the idle-graphics pattern. Notice how the pattern doesn't change on every rasterline; some lines are duplicates. Hence there are some lines where we don't need to update the ghostbyte from the previous line—and there we can update the banking register instead.

Specifically, we update the sprite pointers once near the top of the device, and once near the bottom of it. This divides the crunched sprites into three sections, and ensures that a unique row of pixels is fetched for each rasterline. But the idle-graphics pattern moves up and down during the effect, whereas the crunched sprites remain stationary. In other words, the seams—the places where sprite pixels are suddenly fetched from a different buffer—also move.

So as the CPU is preparing pixel data for the crunched sprites, it needs to switch buffers at just the right time depending on the current vertical position of the device. Large stretches of data can be copied as chunks, but between the chunks are sudden changes in target address: Twice for the timed buffer-switches, and twice when the address counter wraps around modulo 64. To make this efficient, I use a precomputed table of chunks for each possible device Y-position. Since the timing is identical for the four sprites, some cycles can be saved by decoding each table entry once, and then performing the copying for all four target buffers in one go.

Horizontal alignment: The nitty-gritty bits

The horizontal scroll position of text characters (and therefore also the ghostbyte) is controlled using three bits in VIC register $d016. Position 0 is far to the left, and position 7 is far to the right. However, when you split the background by updating VIC register $d021 in the middle of a rasterline, the change is actually delayed by one pixel to line up with horizontal scroll position 1.

On the newer version of the VIC chip, the 8565, colour splits are always accompanied by a tiny grey dot (tiny as in half a pixel wide). This is annoying. When splits line up vertically, the dot is naturally extended into a thin vertical stripe. This would completely shatter the illusion of Nine—the grey dots and stripes have to be hidden!

Recall that the background splits are covered by the gaps between digits in the sprite graphics, and it seems reasonable to cover the grey dots in the same way. The gaps in the sprite graphics are aligned with the text characters (ghostbytes), but not necessarily with the processor clock cycles. Seven out of eight times, they cover the grey stripes just fine.

But for horizontal scroll position 0, the delayed colour change and the grey stripe end up just to the right of the covered gap. In this case, we need to perform the split one cycle earlier relative to the sprites. At a first glance, that may seem like it could be handled by jumping to the rastercode one cycle earlier.

However, as you'll recall from the explanation video, there's more going on in the timed code than just background splits. We have to switch video mode at just the right moment to effect a ghostbyte change that's aligned with the left edge of the first digit. That part of the code can't run a cycle earlier, or we'd expose the ghostbyte pattern a whole character too early.

Therefore we need two versions of the timed code, with a one-cycle difference in the relative timing of the background splits and the change of video mode. When the device is at the troublesome horizontal scroll position, we jump to the alternative version of the code.

But there's more: Different versions of the VIC chip have a one-cycle difference in the pixel pipeline delay. Therefore, we need to detect what version of the VIC chip is used (this is done at the start of the demo with the help of sprite collision detection), and adjust the relative timing by one more cycle as the case may be. Again this calls for two versions of the code.

These single-cycle adjustments add up or cancel out, however you want to put it: We end up needing a total of three versions of the timed code. Here are the relevant passages side by side:

; X is $70

nop             ; 13     bit 0       ; 13     bit 0           ; 13
sta $d011       ; 15     sta $d011   ; 16     sta $d011-$70,x ; 16
nop             ; 19     nop         ; 20  
sta $d021-$70,x ; 21     sta $d021   ; 22     sta $d021-$70,x ; 21
sty $d021       ; 26     sty $d021   ; 26     sty $d021       ; 26
stx $d021       ; 30     stx $d021   ; 30     stx $d021       ; 30
stx $d011       ; 34     stx $d011   ; 34     stx $d011       ; 34

Astute readers may have noticed that I'm using one of the invalid video modes ($70) rather than plain ECM ($50) in the above code, despite the claim in the video that invalid video modes are not the answer. Mode $70 behaves like ECM as far as the ghost byte is concerned, and that's crucial to making the effect work. But when we select a horizontal scroll position other than zero, a gap appears along the left edge of the screen—before the first ghostbyte—where the current background colour shines through, and depending on the current X-position of the device, the background colour may have already been set to the colour of the first digit. The invalid video mode serves to cover that gap.

Related work

I've describe how a combination of timed code, expanded sprites, and idle graphics are at the heart of the impossible-looking effect. This relies in part on an established technique called ghostbyte shine-through that has been used for scrolltexts as far back as 1992. Trident has written an excellent document about it.

Ghostbyte shine-through has been refined and developed over the years, notably in the 2024 demo The Ghost by Fairlight and Genesis Project (YouTube).

Final musings

In my opinion, good magic subverts Occam's razor: We all like simple explanations, and Occam's razor is the principle that when you have two competing explanations for the same phenomenon, the simpler explanation is objectively better. The canonical example is that the Earth revolves around the sun instead of the other way.

But this is only true when the two theories have equal explanatory power. If a simple theory only explains 99% of what you see, then it is not as good as a complicated theory that covers everything. But we are still drawn to the simple theory.

So the art of magic (and impossible demo-effect making) is to create a complicated reality that's easier to categorise as a simple thing with a bit of magic, than a complicated thing without magic. Our mind will cling to the simple theory as a working hypothesis and refuse to look beyond it, and that makes us vulnerable to deception—which, in magic, is a good thing!

Posted Friday 28-Feb-2025 09:10

Discuss this page

Disclaimer: I am not responsible for what people (other than myself) write in the forums. Please report any abuse, such as insults, slander, spam and illegal material, and I will take appropriate actions. Don't feed the trolls.

Jag tar inget ansvar för det som skrivs i forumet, förutom mina egna inlägg. Vänligen rapportera alla inlägg som bryter mot reglerna, så ska jag se vad jag kan göra. Som regelbrott räknas till exempel förolämpningar, förtal, spam och olagligt material. Mata inte trålarna.

Anonymous
Fri 28-Feb-2025 09:47

Wow, this is like a magic trick in reverse! A thing that at 1st glance seems simple but once explained seems impossible!

Anonymous
Fri 28-Feb-2025 18:57

The balloon sprites in the explanation video at "https://youtu.be/MXxSPgt_7Z4?t=106" are an injoke, they appear as a tutorial example in a Commodore 64 manual.

Anonymous
Sat 1-Mar-2025 16:10

Linus, where you say "sufficient to steal a total of five cycles" it looks like 7 cycles. Typo or am I just confused?

lft
Linus Åkesson
Sat 1-Mar-2025 21:30

Linus, where you say "sufficient to steal a total of five cycles" it looks like 7 cycles. Typo or am I just confused?

You're right! My error. I've fixed it now.

Stein
Wed 12-Mar-2025 15:44

Fantastic! I have only one question: Did you actually create a cross-stitch embroidery of the idle pattern for this write-up?

AtoMick-u235
Mick Bartram (UK)
Sun 13-Apr-2025 13:46

Hey LFT, , can you explain = i was just thinking of how you timed it, the numbers leaving\entering the hat is spot on with the beat of the tune, , and the general animation, , ,

You must of used the note data as some kind of reference ? !!

lft
Linus Åkesson
Thu 17-Apr-2025 12:34

AtoMick-u235 wrote:

Well, yes and no. The song position does indeed trigger certain transitions in time with the music. But that isn't the whole explanation:

Normally on 8-bit systems you work with tables of 256 entries, for instance to represent sine curves. In this demo I chose something that was divisible by nine instead: My sine tables are 252 entries long, which is 9x28. To get a reasonable speed for the big rotating circle, I ended up going through the table two steps per video frame, which gives one whole revolution per 126 video frames. Thus, a digit moves to the "next position" in 14 frames.

To guarantee a good music sync, it was then natural to use tempo 7 as a basis for the music. This means, simplified, that new notes can be played every 7 frames. Since the music is in 9/8 time in groups of three notes per beat, that gives 21 frames per beat. In beats per minute, that's 60*50.12/21 = 143.2 bpm.

However, when the digits appear from the hat, each new digit must fill the slot immediately after the previous digit, and by that time the great circle has rotated 10/9 of a revolution. That is 252*10/9 = 280 frames later than the previous appearance. In tempo 7, that's 280/7 = 40 steps in the music. And so, for that part of the music, I went for a theme in 5/8 time. You can hear that the ostinato is a repeating motif of 20 quick notes; the motif repeats twice per digit, giving exactly 40 notes per digit.

I hope that gives a bit more insight into the process!

lft
Linus Åkesson
Thu 17-Apr-2025 12:38

Stein wrote:

Fantastic! I have only one question: Did you actually create a cross-stitch embroidery of the idle pattern for this write-up?

The thought did cross my mind, but I was lazy. =) I took a photo of an existing embroidery, cut out "on" and "off" squares and made them tileable in gimp, and then I wrote a program to create a mosaic from a given picture.

Anonymous
Fri 25-Jul-2025 10:22

FMode:
Did you start codeing with the "9 sprites moveing in the upper area" part?

lft
Linus Åkesson
Mon 28-Jul-2025 17:46