logoalt Hacker News

fsmvlast Monday at 6:51 PM2 repliesview on HN

They memorize the answers not the process to arrive at answers


Replies

EternalFurylast Monday at 8:59 PM

They learn the value of specific actions in specific contexts based on the rewards they received during their play time. Specific actions and specific contexts are not transferable for various reasons. John quoted that varying frame rates and variable latency between action and effect really confuse the models.

show 1 reply
IshKebablast Monday at 7:00 PM

This has been disproven so many times... They clearly do both. You can trivially prove this yourself.

show 1 reply