If you assume that A * 10 isn't going to overflow, so that ASL A moves 0 into the carry flag (so no need for CLC), then instead of using the undocumented RRA opcode, you can just do:
sta $00
asl a
asl a
adc $00
asl a
This is also 7 bytes, but is faster since adc $00 is 3 cycles, vs rra $00 being 5 cycles.
The A = max(A, X) example is certainly interesting, but not very useful since it loops through the code twice (very slow) and assumes that $8a is available. The much faster obvious version only adds one byte:
stx $00
cmp $00
bcs done
txa
done:
Interesting and fun read - we are well into the terrain of what was completely impossible to do back then. Now I can't wait to see a faster AppleSoft ROM ;-)
That’s incredibly clever and a fun read. Well done!
I imagine lots of demo coders glancing back and forth between that writeup and their own carefully hand-tuned assembly.
Great
reminds me a bit of https://pubby.games/codegen.html just that its approach seems way more refined and useful.