If you work with model architecture and read papers, how could not know there are a flood of new ide...

strangescript • yesterday at 8:38 PM • 5 replies • view on HN

If you work with model architecture and read papers, how could not know there are a flood of new ideas? Only few yield interesting results though.

I kind of wonder if libraries like pytorch have hurt experimental development. So many basic concepts no one thinks about anymore because they just use the out of the box solutions. And maybe those solutions are great and those parts are "solved", but I am not sure. How many models are using someone else's tokenizer, or someone else's strapped on vision model just to check a box in the model card?

Replies

delifue • today at 5:24 AM

The hardware(GPU)'s architectural limitations may slow research more than PyTorch. The hardware lottery https://hardwarelottery.github.io/

thenaturalist • yesterday at 9:35 PM

That's been the very normal way of the human world.

When the foundation layer at a given moment doesn't yield an ROI on intellectual exploration - say because you can overcompensate with VC funded raw compute and make more progess elsewhere -, few(er) will go there.

But inevitably, as other domains reach diminishing returns, bright minds will take a look around where significant gains for their effort can be found.

And so will the next generation of PyTorch or foundational technologies evolve.

kevmo314 • yesterday at 9:16 PM

The people who don't think about such things probably wouldn't develop experimentally sans pytorch either.

mardifoufs • today at 12:11 AM

Yeah and even then, it's been like ~ 2-3 years since the last rather major Architectural improvement, major enough for a lot of people to actually hear about it and use it daily. I think some people lose perspective on how short of a time frame 3 years is.

But yes, there's a ton of interesting and useful stuff (beyond datasets and data related improvements) going on right now, and I'm not even talking about LLMs. I don't do anything related to LLM and even then I still see tons of new stuff popping up regularly.

_giorgio_ • today at 12:39 AM

It's the opposite.

Frameworks like pytorch are really flexible. You can implement any architecture, and if it's not enough, you can learn CUDA.

Keras it's the opposite, it's probably like you describe things.

alt Hacker News

Replies