If you want to experiment with hardcoding small programs into transformer weights, maybe try ALTA: https://arxiv.org/abs/2410.18077v2
I'm less interested in turning programs into transformers and more interested in turning programs into subnetworks within large language models.
Which the blog post brings up as a research direction, but never actually elaborates upon. And the interface between the two is a hard problem.
I'll check out the link though, thanks.
I'm less interested in turning programs into transformers and more interested in turning programs into subnetworks within large language models.
Which the blog post brings up as a research direction, but never actually elaborates upon. And the interface between the two is a hard problem.
I'll check out the link though, thanks.