logoalt Hacker News

nickpsecurity05/15/20251 replyview on HN

Non-expert here who likes reading lots of this kind of research. I have a few questions.

1. Why does it need a zeroth order optimizer?

2. Most GA's I've seen use thousands of solutions. Sometimes ten thousand or more. What leads you to use 60,000 calls per iteration?

3. How do you use populations and "islands?" I never studied using islands.

4. You said the smaller models are often better for "shorter" code. That makes sense. I've seen people extend the context of model with training passes. You think it would help to similarly shrink a larger model to a smaller context instead of using the small models?


Replies

aseg05/15/2025

Happy to answer them!

1. Because we only have blackbox access to the LLM and the evaluation function might not be differentiable.

2. We're trying to search over the space of all programs in a programming language. To cover enough of this huge search space, we need to instantiate (1) a large number of programs in each population and (2) a large number of populations themselves (3) A large number of update steps for each population.

3. I have a couple of graphics motivating, conceptually, what an island/population looks like: https://trishullab.github.io/lasr-web/ . This whitesheet might also be useful: https://arxiv.org/abs/2305.01582

4. This is an interesting question. I believe so. However, my observations were derived from a non turing complete language (mathematical equations). There might be other ways of enforcing a succinctness pressure.