But is the model aware of the training? Unless you hook the model up to an MCP server, or something similar, and have it analyze the RL changes, it will not know if it has changed or not. Even if it is real-time RL, it is not aware of the previous state.