logoalt Hacker News

sethops1yesterday at 11:13 PM4 repliesview on HN

> Testing GPT-5, Claude, Gemini, Grok, and DeepSeek with $100K each over 8 months of backtested trading

So the results are meaningless - these LLMs have the advantage of foresight over historical data.


Replies

PTRFRLLyesterday at 11:14 PM

> We were cautious to only run after each model’s training cutoff dates for the LLM models. That way we could be sure models couldn’t have memorized market outcomes.

show 2 replies
itakeyesterday at 11:14 PM

> We time segmented the APIs to make sure that the simulation isn’t leaking the future into the model’s context.

I wish they could explain what this actually means.

show 2 replies
joegibbsyesterday at 11:23 PM

That's only if they're trained on data more recent than 8 months ago

CPLXyesterday at 11:14 PM

Not sure how sound the analysis is but they did apparently actually think of that.

show 1 reply