logoalt Hacker News

bcrosby95yesterday at 11:20 PM8 repliesview on HN

> Grok ended up performing the best while DeepSeek came close to second. Almost all the models had a tech-heavy portfolio which led them to do well. Gemini ended up in last place since it was the only one that had a large portfolio of non-tech stocks.

I'm not an investor or researcher, but this triggers my spidey sense... it seems to imply they aren't measuring what they think they are.


Replies

IgorPartolayesterday at 11:28 PM

Yeah I mean if you generally believe the tech sector is going to do well because it has been doing well you will beat the overall market. The problem is that you don’t know if and when there might be a correction. But since there is this one segment of the overall market that has this steady upwards trend and it hasn’t had a large crash, then yeah any pattern seeking system will identify “hey this line keeps going up!” Would it have the nuance to know when a crash is coming if none of the data you test it on has a crash?

It would almost be more interesting to specifically train the model on half the available market data, then test it on another half. But here it’s like they added a big free loot box to the game and then said “oh wow the player found really good gear that is better than the rest!”

Edit: from what I causally remember a hedge fund can beat the market for 2-4 years but at 10 years and up their chances of beating the market go to very close to zero. Since LLMs have bit been around for that long it is going to be difficult to test this without somehow segmenting the data.

show 2 replies
ollieproyesterday at 11:24 PM

A more sound approach would have been to do a monte carlo simulation where you have 100 portfolios of each model and look at average performance.

show 2 replies
culitoday at 1:42 AM

I'd like to see this study replicated during a bear market

tclancytoday at 1:24 AM

I mean, run the experiment during a different trend in the market and the results would probably be wildly different. This feels like chartists [1] but lazier.

[1] https://www.investopedia.com/terms/c/chartist.asp

show 1 reply
etchalonyesterday at 11:23 PM

I don't feel like they measured anything. They just confirmed that tech stocks in the US did pretty well.

show 1 reply
monksyyesterday at 11:48 PM

They're not measuring performance in the context of when things happen and in the time that they are. It think its only showing recent performance and popularity. To actually evaluate how these do you need to be able to correct the model and retrain it per different time periods and then measure how it would do. Then you'll get better information from the backtesting.

seanmcdirmidtoday at 1:14 AM

We had this discussion in previous posts about congressional leaders who had the risk appetite to go tech heavy and therefore outperformed normal congress critters.

Going heavy on tech can be rewarding, but you are taking on more risk of losing big in a tech crash. We all know that, and if you don't have that money to play riskier moves, its not really a move you can take.

Long term it is less of a win if a tech bubble builds and pops before you can exit (and you can't out it out to re-inflate).

show 1 reply
KPGv2today at 3:37 AM

Also studying for eight months is not useful. Loads of traders do this well for eight months and then do shit for the next five years. And tellingly, they didn't beat the S&P 500. They invested in something else that beat the S&P 500. And the one that didn't invest in that something did worse than the S&P 500.

What this tells me is they were lucky to have picked something that would beat the market for now.