I find myself wanting genetic algorithms to be applied to try to develop and improve these structures...
But I always want Genetic Algorithms to show up in any discussion about neural networks...
I've been messing around with GA recently, esp indirect encoding methods. This paper seems in support of perspectives I've read while researching. In particular, that you can decompose weight matrices into spectral patterns - similar to JPEG compression and search in compressed space.
Something I've been interested in recently is - I wonder if it'd be possible to encode a known-good model - some massive pretrained thing - and use that as a starting point for further mutations.
Like some other comments in this thread have suggested, it would mean we can distill the weight patterns of things like attention, convolution, etc. and not have to discover them by mutation - so - making use of the many phd-hours it took to develop those patterns, and using them as a springboard. If papers like this are to be believed, more advanced mechanisms may be able to be discovered.
I got crazy obsessed with EvoLisa¹ back in the day and although there is nothing in common between that algorithm and those that make up training an LLM, I can't help but feel like they are similar.
¹ https://www.rogeralsing.com/2008/12/07/genetic-programming-e...
That would be an excellent use of GA and all the other 'not based on training a network' methods, now that we have a target and can evaluate against it!
I'm the same but with vector quantization.
I have a real soft spot for the genetic algorithm as a result of reading Levy's "Artificial Life" when I was a kid. The analogy to biological life is more approachable to my poor math education than neural networks. I can grok crossover and mutation pretty easily. Backpropagation is too much for my little brain to handle.