They memorize the answers not the process to arrive at answers

fsmv • last Monday at 6:51 PM • 2 replies • view on HN

Replies

EternalFury • last Monday at 8:59 PM

They learn the value of specific actions in specific contexts based on the rewards they received during their play time. Specific actions and specific contexts are not transferable for various reasons. John quoted that varying frame rates and variable latency between action and effect really confuse the models.

➕ show 1 reply

IshKebab • last Monday at 7:00 PM

This has been disproven so many times... They clearly do both. You can trivially prove this yourself.

➕ show 1 reply

alt Hacker News

Replies