But with a fraction of CPU resources. Arduino Nano's Cortex M33 is overclocked at 135 MHz, while GBA's ARM7TDMI is running at mere 16.78 MHz.
ARM7TDMI takes 1-4 cycles to perform a simple 32bit x 32bit multiply, depending on the multiplier. I believe Cortex M33 takes just 1 cycle to do same. ARM7TDMI has no divide instruction and critically, no FPU that Quake requires.
GBA has only 32 kB of 0-wait state RAM (AKA internal working RAM). Versus 276 kB on the Arduino Nano.
GBA's 256 kB RAM block (external working RAM) has massive 6 cycle access time when loading a 32-bit value.
It's a true miracle someone managed to even get 1/3 of resolution on this weak hardware!
I think the article says the same. The gba port is impressive.
I guess FPU would not be even required with 120 pix horizontal resolution.
CM33 does in a single cycle even more: 2 16 bits multiplications, addition and accumulation, for instance.
Still it is the first time the "full" Quake was ported in less than 300 kB.