> Testing GPT-5, Claude, Gemini, Grok, and DeepSeek with $100K each over 8 months of backtested t...

sethops1 • yesterday at 11:13 PM • 4 replies • view on HN

> Testing GPT-5, Claude, Gemini, Grok, and DeepSeek with $100K each over 8 months of backtested trading

So the results are meaningless - these LLMs have the advantage of foresight over historical data.

Replies

> We were cautious to only run after each model’s training cutoff dates for the LLM models. That way we could be sure models couldn’t have memorized market outcomes.

➕ show 2 replies

itake • yesterday at 11:14 PM

> We time segmented the APIs to make sure that the simulation isn’t leaking the future into the model’s context.

I wish they could explain what this actually means.

➕ show 2 replies

joegibbs • yesterday at 11:23 PM

That's only if they're trained on data more recent than 8 months ago

CPLX • yesterday at 11:14 PM

Not sure how sound the analysis is but they did apparently actually think of that.

➕ show 1 reply

alt Hacker News

Replies