Does anyone know what kind of RL environments they are talking about? They mention they used 15k env...

mynti • today at 11:27 AM • 2 replies • view on HN

Does anyone know what kind of RL environments they are talking about? They mention they used 15k environments. I can think of a couple hundred maybe that make sense to me, but what is filling that large number?

Replies

robkop • today at 12:40 PM

Rumours say you do something like:

  Download every github repo
    -> Classify if it could be used as an env, and what types
      -> Issues and PRs are great for coding rl envs
      -> If the software has a UI, awesome, UI env
      -> If the software is a game, awesome, game env
      -> If the software has xyz, awesome, ...
    -> Do more detailed run checks, 
      -> Can it build
      -> Is it complex and/or distinct enough
      -> Can you verify if it reached some generated goal
      -> Can generated goals even be achieved
      -> Maybe some human review - maybe not
    -> Generate goals
      -> For a coding env you can imagine you may have a LLM introduce a new bug and can see that test cases now fail. Goal for model is now to fix it
    ... Do the rest of the normal RL env stuff

➕ show 1 reply

yorwba • today at 12:23 PM

Every interactive system is a potential RL environment. Every CLI, every TUI, every GUI, every API. If you can programmatically take actions to get a result, and the actions are cheap, and the quality of the result can be measured automatically, you can set up an RL training loop and see whether the results get better over time.

➕ show 1 reply

alt Hacker News

Replies